Cortex
Showcase
Cognitive Retrieval Engine — live

Context retrieval
that shows its work.

Most retrieval systems return a flat list of chunks and hope for the best. The Cortex runs seven retrieval strategies in parallel — semantic, hybrid, temporal, episodic, causal, graph, and preference — fuses them with intent-aware weighting, and hands back a fully traceable context bundle. Every result tells you why it was retrieved.

7 strategies, fusedevery hit tracedper-user isolation
cognitive-memory-engine / retrieval-tracestatus: traced

Illustrative why trace — each hit carries strategy · score · graph path · recency · weight.

The problem

Vector search alone forgets too much.

When retrieval is a black box, your LLM's mistakes are unfixable. You're left re-ranking blind and inflating the context window until cost and latency hurt.

  • A cosine-similarity lookup can't tell you that Incident A caused Outage B.
  • It doesn't know that three chunks belong to the same episode, told in order.
  • It treats a six-month-old note and a fresh one as equally relevant.
  • And it never explains why a chunk surfaced — so you can't debug a bad answer.
The solution

Cognition, not just similarity.

Cognitive, not just semantic

Seven strategies cover different ways of remembering: meaning (vector/hybrid), time (temporal/episodic), cause-and-effect (causal), structure (graph), and user intent (preference). Fusion blends them by detected intent.

Fully traceable

Every retrieved hit carries a why trace — strategy, score, graph path, recency, and weights. A retrieval-trace DAG visualizes the whole pipeline. No more guessing why the model saw what it saw.

Graph-aware memory

A Neo4j knowledge graph links documents, chunks, entities, episodes, and events with typed relationships — SUPPORTS, CONTRADICTS, CAUSED_BY, FIXED_BY, DEPENDS_ON, and more. Retrieval can walk these edges, not just match vectors.

7

Retrieval strategies

Run concurrently, fused into one ranked list.

5

Cognitive layers

Semantic · graph · temporal · episodic · causal.

4

Purpose-built stores

FAISS · Postgres · Neo4j · Redis — no blurred boundaries.

Core Architecture

Retrieval, rebuilt as cognition.

Seven strategies, fused by intent, and a full reasoning trace for every result. Built for engineers who need to debug what their model sees.

Seven retrieval strategies, fused

Vector, hybrid (BM25 + dense), temporal, episodic, causal, graph, and preference — run concurrently and merged by intent-weighted reciprocal-rank fusion into a single ranked list. No strategy is ever silenced; intent only re-scales the weights.

vectorhybridtemporalepisodiccausalgraphpreference
Intent-aware

Intent-weighted fusion

A lightweight classifier reads the query's intent and re-scales strategy weights — causal queries lean on the causal graph, recall queries lean on episodes. Then weighted RRF deduplicates and normalizes into one list.

causal
vector
preference

Full retrieval traces

Every run persists a RetrievalTrace with per-strategy outputs and size-capped why payloads, visualized as an interactive React Flow DAG.

  • per-strategy hits + timing
  • intent + fusion weights
  • graph path per hit
query
vector
causal
graph
fused + trace_id

Per-user knowledge graph

Neo4j stores documents, chunks, entities, episodes, and events with typed, timestamped edges. Retrieval traverses 1–2 hops for context the vectors miss.

Episodic & causal memory

Events are grouped into ordered episodes; declared and inferred CAUSED_BY / FIXED_BY edges let you ask “what caused this?” and “how was it fixed?”

Entity layer

Documents are mined for entities, linked to the chunks that mention them and to each other via DEPENDS_ON. Powers entity-aware graph expansion.

Per-user, per-namespace vectors

FAISS indexes are isolated per user and per namespace (documents, memories, episodes, events). Lazy-loaded, snapshotted to local/S3, hot-reloaded across processes.

Pluggable LLM adapters

OpenAI, Anthropic, Google Gemini, and local Ollama — behind one adapter interface. Embeddings are decoupled from chat and include a keyless local option.

Memory hierarchy

Short-term (Redis), long-term (FAISS-backed), semantic (graph), and episodic memory — with decay, reinforcement, and archival.

Adaptive context composer

Token-budgeted bin-filling + MMR diversity, producing structured context (causal chains, timelines, prior incidents, knowledge) ready to prompt.

Architecture

Four stores, one responsibility each.

No blurred boundaries. Each store does exactly one job, and per-user isolation is structural — not a filter bolted on.

FAISS

ANN semantic search

The only ANN semantic-search path. Indexes are per-user, per-namespace — keyed by (namespace, user_id) over documents, memories, episodes, events. Snapshotted to local/S3, lazily loaded on first access, hot-reloaded across processes via a Redis version key.

Postgres 16

Source of truth

Source of truth for all structured data: users, documents, chunks (with raw embedding JSON as the FAISS rebuild source), memories, events, episodes, entities, and retrieval traces. No vector search.

Neo4j 5 (+APOC)

Relationship graph

The relationship / context graph. Nodes: :Document, :Chunk, :Entity, :Episode, :Event. Typed edges: :HAS_CHUNK, :NEXT, :SUPPORTS / :CONTRADICTS / :ELABORATES, :MENTIONS, :DEPENDS_ON, :HAS_EVENT, :CAUSED_BY / :FIXED_BY.

Redis

STM / cache / pub-sub

Session memory (STM), embedding cache, and FAISS version pub/sub (vector:{ns}:{user_id}:version).

FAISS is the only ANN path

Postgres embedding JSON columns are cold-storage rebuild sources — never queried for search. (pgvector was removed in migration 0008.)

Per-user isolation is structural

Not a filter bolted on — FAISS indexes and the knowledge graph are both scoped per user.

Frontend

Next.js · React · TypeScript

HTTP / SSE

API

FastAPI · REST & streaming

Cognitive Retrieval Engine

Multi-strategy retrieval, fusion & context composition

FAISS

vector

Postgres

relational

Neo4j

graph

Redis

cache

Celery workers

Async background processing

FastAPICelerySQLAlchemy + AlembicFAISSNeo4j 5 (+APOC)Postgres 16RedisNext.js + ReactCytoscape.js + React FlowOpenTelemetry

LLM / embeddings are provider-agnostic behind one adapter interface:

OpenAIAnthropicGoogle Geminilocal Ollamasentence-transformers (embed-only, keyless)
How it works

Two real flows, not one black box.

An asynchronous write path that makes data both vector-searchable and graph-traversable, and the signature read path that fuses seven strategies into a traceable bundle.

The signature pipeline.
01

Query

POST /api/retrieval (showcase) or POST /api/chat (Assistant).

02

Classify intent

A rule-based classifier produces an intent distribution that re-scales (never gates) the fusion weights.

03

7 strategies (parallel)

asyncio.gather runs all enabled strategies with a timeout; each returns scored RetrievalHits carrying a why trace.

04

Weighted RRF fusion

WeightedRRF merges, deduplicates by id, and normalizes to 0–1 using the intent-scaled weights.

05

Composed context + trace_id

The ContextComposer bins hits, fills a token budget, applies MMR for diversity, and persists a RetrievalTrace whose trace_id is returned — a structured bundle plus a permalink to its full reasoning.

Why not just RAG?

Beyond similarity search.

Plain RAG
Cortex
One similarity search
Seven parallel strategies, fused by intent
Flat chunk list
Graph-aware, episode-ordered, causally-linked results
Black-box ranking
Every hit carries a why trace + visual DAG
No sense of time
Temporal & episodic strategies weigh recency and order
No cause-and-effect
Causal graph answers “what caused / fixed this”
Global index
Per-user, per-namespace isolation
Use cases

Where it earns its keep.

DevOps / incident retrieval

“What caused last week's outage, and how was it fixed?” The causal graph traverses CAUSED_BY / FIXED_BY edges across episodes.

Agent long-term memory

Persistent, decaying, reinforceable memory so agents recall prior sessions without re-stuffing the prompt.

Explainable KB search

Surface supporting and contradicting passages, with the reasoning visible for audit.

Research / document understanding

Entity-linked, cross-document relationships beyond keyword or vector matching.

Showcase

This is a context-retrieval showcase, not a chatbot.

Type a query and watch all seven strategies light up — see the per-strategy hits, the fused ranking, and the reasoning behind each. Context is shown to you, not silently piped to an LLM. Want a synthesis? Hit Summarize context — the one place the LLM is intentionally invoked.

Try a query
FAQ

Questions, answered.

No — FAISS handles ANN search, but the engine layers a graph, episodic/causal memory, and intent-aware fusion on top. Vector search is one of seven strategies.

Retrieval you can reason about.

Give your LLM a memory that remembers in graphs, episodes, and cause-and-effect — and shows you every step.

Per-user isolationEvery hit traced