Structure is Good: How I Built a Wiki-Base (Context Engine) for the Latent Space Community

I did a workshop at Latent Space Builders Club and wanted to record a condensed version. This covers the core idea, plus an implementation I built for the LS community — useful whether you've used RA-H before or you're just thinking about how to structure external context.

The debate

This makes people angry. It gets religious.

One side: keep it simple, flat files and folders. The other side: everything should be a graph. Strong opinions everywhere.

I'm interested in this because I've built a SQLite-backed tool for externalizing context — fully open source, Mac app available. Simple database, good ingestion pipelines.

The debate gets heated. I've posted about this in subreddits. Some received it well. Some roasted me. One post I keep coming back to: if you don't understand databases, you'll end up building one that's way more complicated than it needs to be. That was me. I had a mess of Obsidian files with backlinks everywhere. Should have just started with a database.

The actual point

People over-index on the type of storage they use. What actually matters is the context window.

There's just a context window and how you fill it. Who cares what you use if what you're putting back into the context is shit?

The thesis: more and more people are going to interface through AI — coding agents, language models, whatever. That system has a context window. You want it to pull the right context from an external corpus — not too much, just the right amount. And you want it to write back to that corpus so the next interaction is better.

Structure enables this. It lets agents read more efficiently. It lets them write back cleanly. And it's a forcing function — it makes you think carefully about what you're actually storing.

Why structure matters

For personal knowledge management: how do you distinguish between ideas, insights, chat conversations, and active projects? For businesses, this becomes even more critical.

You also need an ingestion pipeline — a way to automatically push relevant context into the substrate in a structured way.

And this is only going to matter more. Sub-agents are already happening. Multiple agents reading and writing to the same context substrate, refining data in parallel. Structure becomes the foundation that makes that work.

The context window is the bottleneck

Even with an infinite context window, you still need to think about what you put in it. Irrelevant context kills performance.

A few positions I hold:

Pixels are ephemeral. The interface will change. The context window is what matters.
The constraint isn't files vs database — it's how efficiently you get the right context to the model.
Everybody should be building an external context corpus. When I ask, people agree. When I ask what they're actually doing — almost nobody is doing it. They leave context in the model or write half-assedly to an Obsidian vault. I'd say give me a good argument against building this.
Structure makes context usable. Whatever storage you use.

My position: SQLite

In most cases, SQLite is the best abstraction. I know graph databases and heavier traversal have their place. I know file systems work for codebases. But SQLite is underused.

Simple tables, an edges table for connections, and you can mirror most of what a graph database does. I call what I built a wiki-base — intentionally neutral, so I don't start a war.

The Latent Space wiki-base

I wanted to put this to the test with a real community. Latent Space is a content engine for AI engineers — podcasts, articles, Discord, Builders Club, paper club, writers workshops, AI News. Lots of content. Lots of connections.

The schema is simple. Almost everything goes into a single nodes table — podcasts, guests, members, articles, workshops. All nodes. The relationships between them live in a separate edges table.

This is the key difference from a file system. Backlinks try to represent connections between documents. In a database, those connections are first-class citizens. The edges table maps them explicitly — including the nature of the relationship.

Example: I exist as a node. The workshop I gave today is a node. There's an edge between us that says I hosted it. A podcast I listened to — there's an edge with the type of relationship. This sounds simple but it's exactly what you need for an agent to navigate context at scale.

Chat conversations with the agent are also stored.

Skills and documentation

Alongside the database, there's a set of skills — documentation that tells the agent how to work with the wiki-base. Inspired by Anthropic and OpenAI's work on how to structure agent-facing docs.

Skills include: DB operations (schema overview), member profiles (how to pull in and store member info), and event scheduling (so the bot can help people sign up for workshops and paper clubs).

Two interfaces

MCP server — install it, connect it to your coding agent, and you can query the Latent Space context corpus directly. Recent podcasts, member profiles, workshop signups — all from Claude Code or whatever agent you're using.

Discord bot — same thing inside the Latent Space Discord. Ask questions about podcasts, papers, articles. Sign up for events. Guardrails on what can be written back for now.

Ingestion and enrichment

A cron runs hourly — checks for new podcasts, workshops, anything new, pulls it in, and runs it through an enrichment pipeline.

The enrichment pipeline chunks and embeds content at the node level, then chunks and embeds again into a chunks table. Every 30 minutes it extracts entities (hosts, guests, themes), checks if they already exist in the DB, and creates connections automatically.

Indexing

Three indexes: vector, FTS5, and B-tree.

Node-level embeddings handle general questions. Chunk-level embeddings handle specific ones — "what did Dylan Patel say about X" runs a vector search against the chunks table. For broad queries, searching just the nodes table is usually enough.

You can bolt search onto a file system, but not at this level of granularity.

The thing most people miss

The description of what each thing is needs to be extremely explicit.

You're building a map — an ontology — of everything in the corpus. How things connect and what they are is more important than the things themselves. But most ingestion systems don't explain to the model what it's ingesting.

The model doesn't live in the messy human world. It works with what's in its training distribution. So you have to be precise. If it's a podcast, say it's a podcast. If it's an idea, say it's an idea. If the description is vague, the agent can't make the right connections.

I have to fight my enrichment pipeline constantly to be more specific. It's the hardest part, and the most important.

Same goes for edges. It's obvious to a human that a podcast links to Dylan Patel because he was a guest. It's not obvious to a model running searches at scale. The edge needs to say why.

Get involved

If you're in the Latent Space Discord, /join adds you to the context graph. Ask Slop questions about any content in the ecosystem.

The whole point is to build a living external context corpus — continually read from and written to — and find interesting use cases for what agents can do with it.

I think this is where most businesses and communities are heading. Customers and members will increasingly use agents to interact with content and data. Better surfaces for that are going to be expected.

And at the individual level — everyone should own their own context. That's my opinion.

All open source. Join the Latent Space community, or ping me directly.

Build your own context corpus: