Context retrieval
that shows its work.
Most retrieval systems return a flat list of chunks and hope for the best. The Cortex runs seven retrieval strategies in parallel — semantic, hybrid, temporal, episodic, causal, graph, and preference — fuses them with intent-aware weighting, and hands back a fully traceable context bundle. Every result tells you why it was retrieved.
Illustrative why trace — each hit carries strategy · score · graph path · recency · weight.
Vector search alone forgets too much.
When retrieval is a black box, your LLM's mistakes are unfixable. You're left re-ranking blind and inflating the context window until cost and latency hurt.
- A cosine-similarity lookup can't tell you that Incident A caused Outage B.
- It doesn't know that three chunks belong to the same episode, told in order.
- It treats a six-month-old note and a fresh one as equally relevant.
- And it never explains why a chunk surfaced — so you can't debug a bad answer.
Cognition, not just similarity.
Cognitive, not just semantic
Seven strategies cover different ways of remembering: meaning (vector/hybrid), time (temporal/episodic), cause-and-effect (causal), structure (graph), and user intent (preference). Fusion blends them by detected intent.
Fully traceable
Every retrieved hit carries a why trace — strategy, score, graph path, recency, and weights. A retrieval-trace DAG visualizes the whole pipeline. No more guessing why the model saw what it saw.
Graph-aware memory
A Neo4j knowledge graph links documents, chunks, entities, episodes, and events with typed relationships — SUPPORTS, CONTRADICTS, CAUSED_BY, FIXED_BY, DEPENDS_ON, and more. Retrieval can walk these edges, not just match vectors.
7
Retrieval strategies
Run concurrently, fused into one ranked list.
5
Cognitive layers
Semantic · graph · temporal · episodic · causal.
4
Purpose-built stores
FAISS · Postgres · Neo4j · Redis — no blurred boundaries.
Retrieval, rebuilt as cognition.
Seven strategies, fused by intent, and a full reasoning trace for every result. Built for engineers who need to debug what their model sees.
Seven retrieval strategies, fused
Vector, hybrid (BM25 + dense), temporal, episodic, causal, graph, and preference — run concurrently and merged by intent-weighted reciprocal-rank fusion into a single ranked list. No strategy is ever silenced; intent only re-scales the weights.
Intent-weighted fusion
A lightweight classifier reads the query's intent and re-scales strategy weights — causal queries lean on the causal graph, recall queries lean on episodes. Then weighted RRF deduplicates and normalizes into one list.
Full retrieval traces
Every run persists a RetrievalTrace with per-strategy outputs and size-capped why payloads, visualized as an interactive React Flow DAG.
- per-strategy hits + timing
- intent + fusion weights
- graph path per hit
Per-user knowledge graph
Neo4j stores documents, chunks, entities, episodes, and events with typed, timestamped edges. Retrieval traverses 1–2 hops for context the vectors miss.
Episodic & causal memory
Events are grouped into ordered episodes; declared and inferred CAUSED_BY / FIXED_BY edges let you ask “what caused this?” and “how was it fixed?”
Entity layer
Documents are mined for entities, linked to the chunks that mention them and to each other via DEPENDS_ON. Powers entity-aware graph expansion.
Per-user, per-namespace vectors
FAISS indexes are isolated per user and per namespace (documents, memories, episodes, events). Lazy-loaded, snapshotted to local/S3, hot-reloaded across processes.
Pluggable LLM adapters
OpenAI, Anthropic, Google Gemini, and local Ollama — behind one adapter interface. Embeddings are decoupled from chat and include a keyless local option.
Memory hierarchy
Short-term (Redis), long-term (FAISS-backed), semantic (graph), and episodic memory — with decay, reinforcement, and archival.
Adaptive context composer
Token-budgeted bin-filling + MMR diversity, producing structured context (causal chains, timelines, prior incidents, knowledge) ready to prompt.
Four stores, one responsibility each.
No blurred boundaries. Each store does exactly one job, and per-user isolation is structural — not a filter bolted on.
FAISS
ANN semantic searchThe only ANN semantic-search path. Indexes are per-user, per-namespace — keyed by (namespace, user_id) over documents, memories, episodes, events. Snapshotted to local/S3, lazily loaded on first access, hot-reloaded across processes via a Redis version key.
Postgres 16
Source of truthSource of truth for all structured data: users, documents, chunks (with raw embedding JSON as the FAISS rebuild source), memories, events, episodes, entities, and retrieval traces. No vector search.
Neo4j 5 (+APOC)
Relationship graphThe relationship / context graph. Nodes: :Document, :Chunk, :Entity, :Episode, :Event. Typed edges: :HAS_CHUNK, :NEXT, :SUPPORTS / :CONTRADICTS / :ELABORATES, :MENTIONS, :DEPENDS_ON, :HAS_EVENT, :CAUSED_BY / :FIXED_BY.
Redis
STM / cache / pub-subSession memory (STM), embedding cache, and FAISS version pub/sub (vector:{ns}:{user_id}:version).
FAISS is the only ANN path
Postgres embedding JSON columns are cold-storage rebuild sources — never queried for search. (pgvector was removed in migration 0008.)
Per-user isolation is structural
Not a filter bolted on — FAISS indexes and the knowledge graph are both scoped per user.
Next.js · React · TypeScript
HTTP / SSE
FastAPI · REST & streaming
Multi-strategy retrieval, fusion & context composition
FAISS
vector
Postgres
relational
Neo4j
graph
Redis
cache
Async background processing
LLM / embeddings are provider-agnostic behind one adapter interface:
Two real flows, not one black box.
An asynchronous write path that makes data both vector-searchable and graph-traversable, and the signature read path that fuses seven strategies into a traceable bundle.
Query
POST /api/retrieval (showcase) or POST /api/chat (Assistant).
Classify intent
A rule-based classifier produces an intent distribution that re-scales (never gates) the fusion weights.
7 strategies (parallel)
asyncio.gather runs all enabled strategies with a timeout; each returns scored RetrievalHits carrying a why trace.
Weighted RRF fusion
WeightedRRF merges, deduplicates by id, and normalizes to 0–1 using the intent-scaled weights.
Composed context + trace_id
The ContextComposer bins hits, fills a token budget, applies MMR for diversity, and persists a RetrievalTrace whose trace_id is returned — a structured bundle plus a permalink to its full reasoning.
Beyond similarity search.
Where it earns its keep.
DevOps / incident retrieval
“What caused last week's outage, and how was it fixed?” The causal graph traverses CAUSED_BY / FIXED_BY edges across episodes.
Agent long-term memory
Persistent, decaying, reinforceable memory so agents recall prior sessions without re-stuffing the prompt.
Explainable KB search
Surface supporting and contradicting passages, with the reasoning visible for audit.
Research / document understanding
Entity-linked, cross-document relationships beyond keyword or vector matching.
This is a context-retrieval showcase, not a chatbot.
Type a query and watch all seven strategies light up — see the per-strategy hits, the fused ranking, and the reasoning behind each. Context is shown to you, not silently piped to an LLM. Want a synthesis? Hit Summarize context — the one place the LLM is intentionally invoked.
Try a queryQuestions, answered.
No — FAISS handles ANN search, but the engine layers a graph, episodic/causal memory, and intent-aware fusion on top. Vector search is one of seven strategies.
Retrieval you can reason about.
Give your LLM a memory that remembers in graphs, episodes, and cause-and-effect — and shows you every step.