Schema + Search

How RA-H stores, indexes, and retrieves your graph

Why SQLite?

RA-H uses SQLite so your graph can live locally in one portable database file.

That file is not just a pile of notes. It stores structured nodes, edges, source chunks, metadata, full-text indexes, and semantic-vector lookup tables.

That means RA-H can retrieve likely matches directly from indexes instead of asking an agent to reread every note, transcript, or markdown file from scratch.

Core Schema

RA-H core schemaA schema diagram centered on nodes, edges, and chunks, with full-text and vector indexes shown as secondary lookup tables.The source-of-truth schema is three readable tablesFull-text and vector tables sit beside them so search can be fast.nodesid, title, descriptionsource, link, metadataembedding, embedding_textOne saved object in your graph.edgesfrom_node_idto_node_idexplanation, sourcechunksid, node_id, chunk_idxtext, embedding_type, metadataReadable passages split from node source text.nodes_ftsword index for nodesvec_nodesmeaning index for nodeschunks_ftsword index for passagesvec_chunksmeaning index for passagesfromtonode source is splitmirrorsmirrorsmirrorsmirrorsHow to read itPrimary tables hold readable data.Secondary tables are rebuilt lookupsurfaces, not separate user content.
The public schema should be read from the three primary tables first: nodes, edges, and chunks.

Query + Retrieval Flow

RA-H query and retrieval flowA simple flow showing how a person or agent question becomes text search, semantic search, graph lookup, and a final answer with cited nodes and passages.What happens when you or an agent asks RA-H a questionThe system gathers candidate nodes and passages, then gives the agent the useful evidence.1. Query"Find the podcast sectionabout indexing."2a. Exact wordsnodes_fts, chunks_fts2b. Similar meaningvec_nodes, vec_chunks2c. Connectionsedges3. Candidate evidencelikely nodesexact text hitsrelevant passagesnearby connected nodes4. Responseanswer, cite, oropen the sourcePractical ruleText search finds words. Vector search finds meaning. Edges add graph context.
Retrieval is not one table lookup. RA-H combines word matches, semantic matches, and graph relationships before handing evidence to the user or agent.

Main Tables

nodes

One saved thing: a source, idea, person, project, decision, note, transcript, or artifact.

Important fields:

  • title
  • description
  • source
  • link
  • metadata
  • chunk_status
  • event_date
  • embedding
  • embedding_updated_at
  • embedding_text

source is the canonical long-form field. It is what RA-H splits into chunks.

embedding_text is not user-authored content. It is the readable input RA-H used to create the whole-node embedding.

edges

Connections between nodes.

The important public field is explanation: the human-readable reason the connection exists.

chunks

Readable slices of long source text.

Shape:

  • id
  • node_id
  • chunk_idx
  • text
  • embedding_type
  • metadata

Chunks are what let RA-H find a relevant passage inside a transcript, article, book, PDF, or long note.

vec_chunks

Machine-readable vectors for chunks.

chunks.text is readable prose. vec_chunks.embedding is a 1536-number fingerprint of the chunk's meaning. The join is:

chunks.id = vec_chunks.chunk_id

Use vec_chunks when the system needs the right passage inside a long source.

vec_nodes

Machine-readable vectors for whole nodes.

The join is:

nodes.id = vec_nodes.node_id

RA-H also stores:

  • nodes.embedding, the binary vector BLOB on the node row
  • nodes.embedding_updated_at, when that vector was generated
  • nodes.embedding_text, the readable text used to make the node-level embedding

Use vec_nodes when the system needs the right node as a whole object.

nodes_fts and chunks_fts

FTS means full-text search.

  • nodes_fts indexes node title, description, and source.
  • chunks_fts indexes chunk text.

Full-text search is best when the words matter: names, quotes, exact phrases, product names, error messages, or near-literal recall.

How Indexing Works

Save source text
  |
  +--> store node
  +--> update full-text node index
  +--> create whole-node embedding
  +--> write nodes.embedding, nodes.embedding_text, and vec_nodes
  +--> split source into chunks
  +--> update full-text chunk index
  +--> create chunk embeddings
  +--> write vec_chunks

Full-text search asks: where do these words appear?

Semantic search asks: what has similar meaning, even if the words differ?

Graph traversal asks: what is connected to this thing, and why?

Whole-node semantic search and passage-level semantic search are different:

  • vec_nodes finds likely relevant saved objects.
  • vec_chunks finds likely relevant excerpts inside source text.

Chunked vs Vectorized

chunk_status = chunked means readable chunk rows exist.

It does not always prove every historical chunk has a matching row in vec_chunks. Current ingestion is expected to create both chunks and vectors, but older data may need a separate backfill/repair pass.

Use precise language:

  • node embedded: node-level vector exists on nodes.embedding
  • node present in vec_nodes: optimized whole-node vector lookup row exists
  • chunks exist: readable passage rows exist in chunks
  • chunks embedded: matching passage vectors exist in vec_chunks

Retired Structures

These are not current public schema doctrine:

  • dimensions
  • node_dimensions
  • contexts
  • old content / chunk / type as the main node model
  • is_pinned as a current public contract

Why This Beats Loose Markdown

A markdown folder can be useful as a source or export layer. But RA-H's indexed SQLite graph can:

  • find likely matches without rereading every file
  • combine exact text search with semantic search
  • retrieve passages inside long sources
  • connect results through explicit graph edges
  • keep one local source of truth for app and MCP access