Acervo

Semantic compression layer for AI agents.
Your agent's context window is finite. Acervo makes it infinite.

21x

vs Agent efficiency

100%

Ground Truth

Turns Tested

Domains

v0.5.0 · April 2026 · Apache 2.0 · Open Source

The Problem

Every AI chat sends the entire conversation history on every turn. Turn 1 costs 200 tokens. Turn 50 costs 9,000. Turn 100 hits the context window limit and starts losing information.

Acervo replaces growing history with a constant ~350 tokens of compressed knowledge.

How It Works

S1Extract entities
& topics

→

S2Search graph
by topic (BFS)

→

S3Compress to
~350 tokens

→

LLMResponds with
graph knowledge

After the LLM responds, S1.5 extracts new knowledge back into the graph. The graph grows. The context doesn't.

Live Example

Real data from the C1 test scenario. The graph grows as the user talks, and retrieval uses BFS traversal from the topic seed.

👤 "Tenemos 4 proyectos: Butaco con Angular, Checkear con React y Supabase, Walletfy con Next.js y Supabase, Altovallestudio con Next.js y Firebase"

S1 extracts 9 entities, 18 edges · Graph: 9 nodes

👤 "¿Qué proyectos usan Supabase?"

S2 BFS from Supabase → HOT: Supabase · WARM: Checkear, Walletfy

🤖 "Checkear y Walletfy usan Supabase." — 81 tokens of context

👤 "¿Y cuáles usan Firebase?"

S2 BFS from Firebase → HOT: Firebase · WARM: Butaco, Altovallestudio

🤖 "Butaco y Altovallestudio." — same graph, different traversal

The graph grows. The context doesn't. Turn 50 costs the same as turn 2.

The Efficiency Story

An agent with tools makes 2–3 tool calls per question, consuming 7,000+ tokens. Acervo answers the same questions with ~350 tokens and zero tool calls.

21x

fewer tokens than an agent with tools · same questions, same correct answers
12x in v0.4 → 21x in v0.5 (format compression)

What We Test

85%

Resolve

100%

Ground

67%

Recall

100%

Focus

89%

Adapt

RESOLVE (85%) — Can Acervo answer questions that need project knowledge? A stateless LLM can't. An agent needs 3 tool calls. Acervo has it in the graph.

GROUND (100%) — Does Acervo prevent hallucination? "Does this project use GraphQL?" — Acervo checks the graph: no GraphQL node.

RECALL (67%) — Can Acervo remember facts from earlier? Our weakest category. Improving extraction = improving recall.

FOCUS (100%) — Does Acervo send only relevant context? When you ask about auth, only auth nodes arrive.

ADAPT (89%) — Can Acervo handle topic changes? BFS starts from a different seed node — context switches instantly.

Graph Quality

v0.5 introduced automated quality specs: verify the graph contains what it SHOULD and doesn't contain what it SHOULDN'T. No more phantom entities.

Project	Checks	Entities	Nodes	Edges
P1 Code (Todo App)	28/28	7	231	1,109
P2 Literature (Sherlock Holmes)	21/21	5	40	307
P3 PM Docs	32/32	6	108	331

Conversation Scenarios

v0.4 only worked with pre-indexed projects. v0.5 adds real-time conversation memory: as you chat, the knowledge graph grows. Every fact becomes a node. Every relationship becomes an edge.

Scenario	Turns	Passed	Graph	Entity Acc
C1: Multi-project Portfolio	10	7/10	13n / 27e	72%
C2: Personal Knowledge	6	3/6	5n / 4e	60%
C3: Progressive Building	8	7/8	6n / 5e	83%

v0.4 → v0.5

Metric	v0.4.0	v0.5.0	Change
Architecture	God module (1,848 LOC)	Hexagonal (~200 LOC)	Refactor
GROUND accuracy	92%	100%	+8%
Efficiency ratio	12.1x	21.3x	+76%
Conversation pipeline	Not working	71% pass rate	NEW
BFS semantic layers	Not working	HOT/WARM/COLD	NEW
Graph quality specs	None	85/85 checks	NEW
warm_tokens > 0 (conv)	0%	80%+	↑
RESOLVE accuracy	100%	85%	-15%

Known Gaps

v0.5 is not perfect. Here's what we know doesn't work well yet.

RECALL: 67%

S1 doesn't always extract facts from the assistant's response. Fix: more S1.5 training examples in fine-tune v3.

RESOLVE dropped 100% → 85%

BFS-based S2 is more precise but sometimes misses the right seed node. Fix: combine BFS with keyword fallback in v0.6.

Person extraction in non-technical context

The model misses person names outside code discussions. Fix: more diverse training data.

S1 intent: 78% accuracy

Over-classifies as "overview" when it should be "specific." Fix: more intent training examples.