Cortex · data-mesh memory engine for LLMs

One memory engine.
Any domain.

Cortex gives an LLM a memory that understands meaning, time, cause, and structure - not just similarity. Seven retrieval strategies run in parallel and fuse into a single ranked, fully traceable context bundle. Every result explains why it was retrieved.

Open the showcase See how a query runs

query › “What caused Friday's outage?”

Same engine, same topology - only the vocabulary changes with the domain.

What it remembers

Four kinds of memory, one engine.

A cosine-similarity lookup covers the first kind and forgets the rest. Cortex covers all four - plus user preference - and fuses them by the query's intent.

Meaning

Semantic & hybrid

Dense vectors plus BM25 find what a query is about, even when it's phrased in words the documents never use.

Time

Temporal & episodic

Recency is weighed, and events are grouped into ordered episodes - so the engine can recall a story in the order it happened.

Cause

Causal

Declared and inferred CAUSED_BY / FIXED_BY edges answer “what caused this?” and “how was it fixed?” directly.

Structure

Graph

Typed relationships - SUPPORTS, CONTRADICTS, DEPENDS_ON - are walked one to two hops for context that vectors miss.

How a query runs

Six steps, every one of them visible.

01
Query
A question arrives via the API - from the showcase, an assistant, or your own application.
POST /api/retrieval
02
Intent
A lightweight classifier produces an intent distribution that re-scales the fusion weights. It never gates a strategy - causal queries lean causal, recall queries lean episodic.
re-scales, never gates
03
Retrieve ×7
Vector, hybrid, temporal, episodic, causal, graph, and preference strategies run in parallel. Every hit carries a why trace: strategy, score, graph path, recency.
asyncio.gather · timeout
04
Fuse
Weighted reciprocal-rank fusion merges the seven lists, deduplicates by id, and normalizes scores using the intent-scaled weights.
weighted RRF
05
Rerank
Optionally, a reranker re-scores the fused pool - a cross-encoder or an LLM judging each query–candidate pair - and reorders hits by true relevance. Best-effort: if it fails or times out, the fusion order stands.
optional · post-fusion
06
Trace
The full run - per-strategy hits, intent, fused ranking - is persisted as a RetrievalTrace, and the bundle returns with a trace_id: a permalink to its own reasoning. When context feeds an LLM, a composer additionally bins hits into a token budget with MMR diversity.
trace_id returned

7 strategies · 5 memory layers · 4 stores · every hit traced

Architecture

Four stores, one responsibility each.

No blurred boundaries. FAISS is the only ANN path, and per-user isolation is structural - not a filter bolted on.

FAISSSemantic search

Per-user, per-namespace ANN indexes over documents, memories, episodes, and events. Snapshotted to local/S3, lazily loaded, hot-reloaded across processes.

Postgres 16Source of truth

All structured data: users, documents, chunks, memories, events, episodes, entities, and retrieval traces. Never queried for vector search.

Neo4j 5Relationship graph

Documents, chunks, entities, episodes, and events linked by typed edges - SUPPORTS, CONTRADICTS, MENTIONS, DEPENDS_ON, CAUSED_BY, FIXED_BY.

RedisShort-term memory

Session memory, embedding cache, and the pub/sub channel that hot-reloads FAISS indexes across processes.

FastAPI · Celery · SQLAlchemy · FAISS · Neo4j 5 · Postgres 16 · Redis · Next.js · OpenTelemetry

LLM-agnostic behind one adapter interface: OpenAI, Anthropic, Google Gemini, local Ollama - with a keyless local option for embeddings.

Why not plain RAG?

Beyond similarity search.

Typical RAGCortex

One similarity searchSeven parallel strategies, fused by intent

Flat chunk listGraph-aware, episode-ordered, causally linked

Black-box rankingEvery hit carries a why trace

No sense of timeRecency and order are first-class signals

Global indexPer-user, per-namespace isolation

FAQ

Common questions.

Is this a vector database?

No - FAISS handles ANN search, but the engine layers a graph, episodic and causal memory, and intent-aware fusion on top. Vector search is one of seven strategies.

Does it send my context to an LLM automatically?

No. Retrieved context is displayed for inspection. The LLM is only called when you explicitly ask for a summary, or in the separate Assistant view.

Can I see why a result was retrieved?

Yes - every hit carries a why trace (strategy, score, graph path, recency, weights), and the full pipeline is visualized as an interactive DAG.

Which LLM providers are supported?

OpenAI, Anthropic, Google Gemini, and local models via Ollama. Embeddings are configurable independently, including a keyless local option.

Is data isolated per user?

Yes. FAISS indexes and the knowledge graph are both scoped per user - structurally, not by filter.

Retrieval you can reason about.

Give your LLM a memory that remembers in graphs, episodes, and cause-and-effect - and shows you every step.

Open the showcase View on GitHub

One memory engine.Any domain.

Four kinds of memory, one engine.

Semantic & hybrid

Temporal & episodic

Causal

Graph

Six steps, every one of them visible.

Query

Intent

Retrieve ×7

Fuse

Rerank

Trace

Four stores, one responsibility each.

Beyond similarity search.

Common questions.

Retrieval you can reason about.

One memory engine.
Any domain.