Roadmap

Current version: v0.3.0 (PyPI · Changelog)

v0.3.0 proved that conversation memory works: 76% token savings, 94% context hits. Now we need to prove that indexation works just as well, make Acervo usable from any tool, and then scale.

Small versions, one theme per release, each with publishable benchmarks.

Overview

v0.3.0  ========  DONE    Conversation memory works (benchmarks, CLI, .md ingestion)
v0.4.0  ========  NEXT    Indexation works for real (4 domains, formats, fine-tune v2)
v0.5.0  ========          Usable from anywhere (MCP server, TS SDK, integrations)
v0.6.0  ========          Production-ready (Docker, metrics, progressive retrieval)
v0.7.0  ========          Ecosystem (multi-tenant, packs, Studio v2)

Version	What it proves	Evidence produced
v0.3.0	Conversation memory	76% savings, scissors chart
v0.4.0	Document indexation	Indexation scorecard, query recall per domain
v0.5.0	Works in any tool	MCP server downloads, integration demos
v0.6.0	Deploy in production	Docker one-liner, live metrics
v0.7.0	Scales to teams	Multi-user demos, knowledge pack catalog

What works today (v0.3.0)

Knowledge graph with JSON persistence
UNIVERSAL / PERSONAL two-layer architecture
prepare() / process() context proxy API
S1 Unified extraction (topic + entities in one call)
S1.5 Async graph curation (merges, corrections)
Fine-tuned extraction model (Qwen 3.5 9B, single model for chat + extraction)
Topic-based context layers (HOT/WARM/COLD)
History windowing (constant token usage)
acervo serve / acervo up — proxy + dev stack
acervo index — structural + semantic codebase indexing
Document ingestion with graph-linked chunks (.md)
Node-scoped chunk retrieval with specificity classifier
Graph inspection CLI (acervo graph show/search/delete/merge/repair)
Configurable logging (--log-level trace|debug|info)
Reproducible benchmarks (360 turns, 6 scenarios, HTML reports)

v0.4.0 — "Indexation works for real"

Goal: Same level of evidence for document indexation that we already have for conversation memory.

New ingestion formats

Format	Parser	Use case
`.txt`	Line/paragraph splitter	Literature, books
`.pdf`	PyMuPDF / pdfplumber	Academic papers, manuals
`.docx`	python-docx	Business docs, specs
`.md`	Already works	Technical docs

Semantic chunking

Replace fixed-size chunking with embedding-based boundary detection. Consecutive paragraphs are embedded; chunk breaks happen where cosine similarity drops. Hierarchical chunks for long documents (section -> subsection -> semantic chunk).

4 domain benchmarks

Domain	Test material	Key metrics
Code	Small (~20 files) + large (~200 files) project	Node coverage, import accuracy, query recall
Literature	Short story + novel	Character coverage, event recall, context tokens
Academic	Short paper + thesis (PDF)	Concept extraction, methodology accuracy
Multi-project	3 simultaneous projects	Project isolation, shared UNIVERSAL nodes

Each domain produces HTML benchmark reports: indexation scorecard, query recall chart, token efficiency comparison (Acervo node-scoped vs global RAG).

Fine-tune v2

Training data from M3 failure modes. Target: extraction accuracy 85% -> 92%+. New training signal: chunk retrieval decisions (summary_only | with_chunks) as S1 output.

v0.5.0 — "Usable from anywhere"

MCP Server

The most important integration. An Acervo MCP server exposing tools (acervo_prepare, acervo_process, acervo_index, acervo_search, acervo_status) and resources (acervo://graph, acervo://nodes/{id}, acervo://traces/latest).

Compatible with Claude Desktop, Cursor, Windsurf, Continue.dev. Install Acervo, add the MCP server to your config, and you have persistent memory in any tool.

TypeScript SDK

npm install acervo-client — REST API wrapper with full types. Examples with Vercel AI SDK and Next.js.

Framework integrations

AcervoMemory — LangChain ConversationBufferMemory drop-in
AcervoRetriever — LlamaIndex retriever drop-in

Same interface, but with knowledge graph compression instead of raw chunks.

v0.6.0 — "Production-ready"

Docker Compose

docker compose up -> Acervo proxy + Ollama + model + ChromaDB. GPU passthrough, persistent volumes. "Try Acervo in 30 seconds."

Progressive retrieval

Hot layer insufficient -> automatically escalate to warm, then cold. Detection via topic confidence and LLM "no info" signals. Configurable budget per layer.

Runtime metrics

GET /acervo/metrics — Prometheus-compatible endpoint. Tokens saved, compression ratio, latency, hit rate. Minimal dashboard in Acervo Studio.

Advanced error recovery

Automatic graph backup before destructive operations. acervo graph rollback to revert. LLM down -> cache last known graph context.

v0.7.0 — "Ecosystem"

Multi-tenant graphs

UNIVERSAL layer shared across users, PERSONAL scoped per API key. Automatic UNIVERSAL node merging.

Knowledge packs

Pre-built domain graphs. acervo pack install javascript. Export/import graphs. GitHub-based registry.

Acervo Studio v2

Real-time graph visualization, visual node/relation editor, session comparator, metrics dashboard.

Model v3+

Extraction accuracy >95%. Multi-language: PT, FR, DE. Document reference extraction. Benchmark vs GPT-4o.

What's NOT planned before v1.0

Feature	Reason
Cloud-hosted version	Needs infra + billing
Voice/audio ingestion	Text-first, niche
Image/multimodal extraction	Requires multimodal model
Graph database backend (Neo4j)	JSON + ChromaDB scales enough
Real-time collaboration	Multi-tenant first

Want to help?

Check out Contributing or open an issue to discuss ideas.