Schema + Search
How RA-H stores, indexes, and retrieves your graph
Why SQLite?
RA-H uses SQLite so your graph can live locally in one portable database file.
That file is not just a pile of notes. It stores structured nodes, edges, source chunks, metadata, full-text indexes, and semantic-vector lookup tables.
That means RA-H can retrieve likely matches directly from indexes instead of asking an agent to reread every note, transcript, or markdown file from scratch.
Core Schema
nodes, edges, and chunks.Query + Retrieval Flow
Main Tables
nodes
One saved thing: a source, idea, person, project, decision, note, transcript, or artifact.
Important fields:
titledescriptionsourcelinkmetadatachunk_statusevent_dateembeddingembedding_updated_atembedding_text
source is the canonical long-form field. It is what RA-H splits into chunks.
embedding_text is not user-authored content. It is the readable input RA-H used to create the whole-node embedding.
edges
Connections between nodes.
The important public field is explanation: the human-readable reason the connection exists.
chunks
Readable slices of long source text.
Shape:
idnode_idchunk_idxtextembedding_typemetadata
Chunks are what let RA-H find a relevant passage inside a transcript, article, book, PDF, or long note.
vec_chunks
Machine-readable vectors for chunks.
chunks.text is readable prose. vec_chunks.embedding is a 1536-number fingerprint of the chunk's meaning. The join is:
chunks.id = vec_chunks.chunk_id
Use vec_chunks when the system needs the right passage inside a long source.
vec_nodes
Machine-readable vectors for whole nodes.
The join is:
nodes.id = vec_nodes.node_id
RA-H also stores:
nodes.embedding, the binary vector BLOB on the node rownodes.embedding_updated_at, when that vector was generatednodes.embedding_text, the readable text used to make the node-level embedding
Use vec_nodes when the system needs the right node as a whole object.
nodes_fts and chunks_fts
FTS means full-text search.
nodes_ftsindexes node title, description, and source.chunks_ftsindexes chunk text.
Full-text search is best when the words matter: names, quotes, exact phrases, product names, error messages, or near-literal recall.
How Indexing Works
Save source text
|
+--> store node
+--> update full-text node index
+--> create whole-node embedding
+--> write nodes.embedding, nodes.embedding_text, and vec_nodes
+--> split source into chunks
+--> update full-text chunk index
+--> create chunk embeddings
+--> write vec_chunks
Full-text search asks: where do these words appear?
Semantic search asks: what has similar meaning, even if the words differ?
Graph traversal asks: what is connected to this thing, and why?
Whole-node semantic search and passage-level semantic search are different:
vec_nodesfinds likely relevant saved objects.vec_chunksfinds likely relevant excerpts inside source text.
Chunked vs Vectorized
chunk_status = chunked means readable chunk rows exist.
It does not always prove every historical chunk has a matching row in vec_chunks. Current ingestion is expected to create both chunks and vectors, but older data may need a separate backfill/repair pass.
Use precise language:
- node embedded: node-level vector exists on
nodes.embedding - node present in
vec_nodes: optimized whole-node vector lookup row exists - chunks exist: readable passage rows exist in
chunks - chunks embedded: matching passage vectors exist in
vec_chunks
Retired Structures
These are not current public schema doctrine:
dimensionsnode_dimensionscontexts- old
content/chunk/typeas the main node model is_pinnedas a current public contract
Why This Beats Loose Markdown
A markdown folder can be useful as a source or export layer. But RA-H's indexed SQLite graph can:
- find likely matches without rereading every file
- combine exact text search with semantic search
- retrieve passages inside long sources
- connect results through explicit graph edges
- keep one local source of truth for app and MCP access