RAG systems treat knowledge as disconnected chunks and rely purely on vector similarity to find relevant context. This works for simple lookups but breaks down when agents need to reason across multiple pieces of information or remember previous interactions. The core issue is architectural: RAG retrofits context onto stateless systems rather than building with memory as a foundation.
This talk demonstrates an alternative approach using cognee, an open-source Python library that combines knowledge graphs with vector search to create persistent memory. I'll start by showing the basic API, which reduces the typical RAG boilerplate to a few lines of async Python. From there, we'll look at what happens under the hood: how documents get transformed into graph structures, how the ECL pipeline (Extract, Cognify, Load) processes different data types, and how queries traverse both graph relationships and vector similarity.
The live coding portion will build a working memory system using Kuzu for the graph layer and LanceDB for vectors, with Ollama providing local LLM inference. No API keys or cloud services required. I'll walk through adding unstructured data and executing searches that combine graph traversal with semantic matching.
We'll also cover practical considerations: when graph-based retrieval outperforms pure vector search, how to define custom ontologies for domain-specific applications, and the tradeoffs between different strategies. The talk concludes with a brief look at feedback mechanisms that allow the memory layer to improve over time based on user corrections.
Attendees will leave with working code they can run locally and a clear understanding of how memory-first architecture makes sense against RAG approaches.