Tutorial: Build a chat with persistent memory in 5 minutes
This guide walks you through setting up Acervo with a local LLM and running a chat that remembers everything across sessions.
Prerequisites
- Python 3.11 or higher
- LM Studio (free, runs LLMs locally)
Step 1: Download a model
- Open LM Studio
- Search for
qwen2.5-3b-instructin the model browser - Download it (about 2GB)
- Go to the Local Server tab
- Load the model and click Start Server
- The server starts at
http://localhost:1234/v1
Why a 3B model?
Acervo uses a small utility model for entity extraction, topic detection, and query planning. A 3B parameter model is fast and cheap — it runs on most laptops. Your main chat model can be larger (7B, 13B, etc.) and run separately.
Step 2: Install Acervo
pip install acervo
Or install from source:
git clone https://github.com/sandyeveliz/acervo.git
cd acervo
pip install -e .
Step 3: Run the example
Save this as chat.py (or use examples/chat.py from the repo):
import asyncio
from acervo import Acervo, OpenAIClient
async def main():
llm = OpenAIClient(
base_url="http://localhost:1234/v1",
model="qwen2.5-3b-instruct",
api_key="lm-studio",
)
memory = Acervo(llm=llm, owner="User")
history = [{"role": "system", "content": "You are a helpful assistant."}]
print("Acervo Chat (type 'quit' to exit)\n")
while True:
user_input = input("You: ").strip()
if not user_input or user_input.lower() in ("quit", "exit", "q"):
break
history.append({"role": "user", "content": user_input})
# Acervo enriches context from the knowledge graph
prep = await memory.prepare(user_input, history)
# Call the LLM with enriched context
response = await llm.chat(prep.context_stack, temperature=0.7, max_tokens=500)
print(f"AI: {response}\n")
history.append({"role": "assistant", "content": response})
# Acervo extracts knowledge from the response
await memory.process(user_input, response)
print(f" [{memory.graph.node_count} nodes, {memory.graph.edge_count} edges]\n")
asyncio.run(main())
Run it:
python chat.py
Step 4: Try it out
Tell the agent some facts:
You: My name is Sandy and I live in Cipolletti
AI: Nice to meet you, Sandy! What would you like to talk about?
[2 nodes, 1 edges]
You: I work at Altovallestudio, we build software
AI: Interesting! What kind of projects does Altovallestudio work on?
[3 nodes, 3 edges]
Now quit (Ctrl+C or type quit) and restart the script. The graph persists:
You: What do you know about me?
AI: You're Sandy, you live in Cipolletti and work at Altovallestudio.
[3 nodes, 3 edges]
The knowledge survived the restart because Acervo stores it in data/graph/nodes.json.
Step 5: Look at the graph
Open data/graph/nodes.json to see what Acervo extracted:
[
{
"id": "sandy",
"label": "Sandy",
"type": "Persona",
"layer": "PERSONAL",
"owner": "User",
"facts": [
{"fact": "Sandy lives in Cipolletti", "source": "user"},
{"fact": "Sandy works at Altovallestudio", "source": "user"}
]
},
{
"id": "cipolletti",
"label": "Cipolletti",
"type": "Lugar",
"layer": "UNIVERSAL",
"facts": []
}
]
And data/graph/edges.json for the relationships:
[
{"source": "sandy", "target": "cipolletti", "relation": "ubicado_en"},
{"source": "sandy", "target": "altovallestudio", "relation": "TRABAJA_EN"}
]
What's happening under the hood
Each turn, Acervo runs a 3-step pipeline:
flowchart LR
subgraph prepare["acervo.prepare()"]
TD["Topic\nDetector"] --> QP["Query\nPlanner"]
QP --> CI["Context\nIndex"]
end
subgraph process["acervo.process()"]
EX["Extractor"] --> GP["Graph\nPersist"]
end
UM["User\nmessage"] --> prepare
prepare --> LLM["Your app\ncalls LLM"]
LLM --> process
process --> KG["Knowledge\ngraph grows"]
style prepare fill:#2d5016,stroke:#4a8c1c
style process fill:#2d5016,stroke:#4a8c1c
style LLM fill:#1a3a5c,stroke:#3a7abd
prepare()— detects the topic, plans what data to retrieve, builds a context stack from the knowledge graph- Your app calls the LLM — with the enriched context
process()— extracts entities, relations, and facts from the response, persists them to the graph
The context stack
The context that prepare() builds has a specific structure:
flowchart TD
S["[system] Your app's system prompt\n(immutable, KV cached)"]
W["[user] Warm context from graph\n(verified knowledge about the topic)"]
A["[assistant] 'Entendido.'\n(acknowledgment)"]
H["[user/assistant] Hot layer\n(last 2 turn pairs from history)"]
U["[user] Current user message"]
S --> W --> A --> H --> U
style S fill:#1a3a5c,stroke:#3a7abd
style W fill:#2d5016,stroke:#4a8c1c
style A fill:#2d5016,stroke:#4a8c1c
style H fill:#5c3a1a,stroke:#bd7a3a
style U fill:#5c1a1a,stroke:#bd3a3a
- System prompt stays immutable (KV cache friendly)
- Warm context from the graph enters as a separate user message — always included when relevant
- Hot layer keeps only the last 2 turn pairs from conversation history
- Token usage stays flat: the graph grows, but the context window doesn't
Next steps
- Configuration — customize context settings, token budgets, embedding thresholds
- Knowledge Layers — understand UNIVERSAL vs PERSONAL layers
- Web Search example — add Brave Search for real-time data