Document Ingestion
Index markdown files into the knowledge graph. Chunks are linked to graph nodes for node-scoped retrieval.
How It Works
- Parse — structural parser splits
.mdfiles into sections by heading - Enrich — semantic enricher generates embeddings for each chunk
- Store — chunks are stored in ChromaDB with their embeddings
- Link — chunk IDs are linked to file and section nodes in the graph
When a user asks a question, Acervo activates relevant graph nodes and retrieves only the chunks linked to those nodes (not a global search across all chunks).
CLI
Index a file
acervo index --path docs/notes.md
This runs the full pipeline: parse, enrich, store, link.
Check indexed documents
acervo graph show --kind file # list file nodes
acervo graph show my_notes_md # show node detail with chunk_ids
REST API
Upload a document
POST /acervo/documents
Content-Type: multipart/form-data
file: notes.md
Response:
{
"document_id": "notes_md",
"chunk_count": 12,
"node_id": "notes_md"
}
List documents
GET /acervo/documents
Document detail
GET /acervo/documents/{id}
Delete a document
DELETE /acervo/documents/{id}
Removes the file node, section nodes, and all linked chunks from the vector store.
Specificity Classifier
Not all questions need chunks. Acervo uses a heuristic classifier to decide:
- Specific queries (code, numbers, dates, "show me", "exact") — retrieve top 3 chunks from activated nodes
- Conceptual queries (explain, why, overview, summarize) — use node summaries only
This keeps conceptual answers concise (~100 tokens of context) while specific questions get detailed chunks (~400 tokens).
Supported Formats
Currently only .md (Markdown) files are supported. The structural parser splits by heading hierarchy.
Configuration
Embeddings must be configured in .acervo/config.toml:
[acervo.embeddings]
url = "http://localhost:11434"
model = "qwen3-embedding"
Vector store data is persisted in .acervo/data/vectordb/.