Build Your Own Knowledge Base (Open Source)

Everything I'm about to share is fully open source. It's free, you can do what you want with it. Promise — this is not an AI slop-coded product pitch.

If you'd prefer to watch instead of read:

The Idea

Everybody who is building, or using, or interacting with AI should be building their own external context.

We're at a point where if you give really good context to these tools, to these language models, it can really help you do incredible things as a thinking partner, as a research assistant, as a coding agent. And if you're lazy, which most people are, they're going to make you completely stupid. Building your own externalised context is the best way to use these tools more effectively.

What do I mean by this? All of your knowledge, all of these interactions that you're having with language models is likely valuable context — if it's organised in the right way.

Currently, for most of us, this kind of knowledge or context is fragmented across different platforms and devices. You might be using Grok or ChatGPT or Claude or whatever else. All of these different tools, these different language models. And then you're kind of, in a half-assed way, storing that knowledge either in projects within these tools or in separate software — Notes, Notion, Roam, Obsidian.

You really need to be more thoughtfully organising, connecting, and collecting your context in a way that both you and increasingly these agents that you're working with can leverage.

What I'm Going to Show You

I'm going to give you a template that I've used for myself and show you how to set it up.

I think this is the best and most appropriate way to store and organise your external context. I'm open to hearing some pushback — I've tried all of the different approaches. Notion, Roam, NotebookLM, Obsidian.

I love Obsidian with Claude Code. I just don't think that any of these are the right abstraction moving forward.

It basically just boils down to this: if you think you're increasingly going to be interfacing with your knowledge and with your context through AI, then you should be building a system that's appropriate for them — not trying to sticky tape after-the-fact solutions that do this.

A simple, local, relational database has to be the best abstraction we have. And it's also because if you're actually doing a good job externalising your context, the connections between things that exist in that context graph are just as important as the things themselves. A relational database allows you to create a more robust separate table where these connections exist and have some kind of semantic identification that the language model can use.

Setting It Up

I'm going to explain this in a way that somebody with no technical experience, somebody who hasn't used GitHub or open source software, can follow along. It's really easy now that we have these tools. I'm assuming that you're going to be using some coding agent — I'm going to use Claude Code, but you could also use OpenCode, Cursor, or Codex. You can use any of these tools.

This is the repo. All you need to know is when you clone it, it's just like you're downloading a traditional piece of software. It's going to go onto your device, and you're going to have that database I'm talking about. There's a script that's going to seed and generate a local SQLite database — it's just one file, and it's going to exist in the library. You never even have to open up that file. It's going to have those relational tables I spoke about.

It's also going to install the dependencies and basically set up a simple TypeScript frontend that you can use to interact with your data. Or interact with your context, I should say.

Now the great thing about these coding agents is you can just grab the repo URL and point Claude, or whatever coding agent you're using, directly at it — and it's going to be able to install it for you. It's going to take away any of the gnarlier, overwhelming stuff that would normally come with running open source software.

The database itself is stored by Apple in a library folder that's kind of hard to access — it's made hard to access because they put the kind of stuff in there that if you mess with it, it could break the applications on your Mac. It's good that it exists in there. You never have to go in there and find it, but you can if you want. It's a single SQLite file.

Running the App

The app will be easy to find, just like any other application on your device. It's open source now, so you can modify it, but I would say stick to the structure for now.

Jump into the directory, run npm run dev, and it's going to launch the application locally on your device. That easy. You could also ask Claude Code to run that for you — you can have it running persistently in the background. It's basically going to always be available and it doesn't need to be running for your agent to be able to interact with the database. It's just nice for you to have open on your device whenever you want.

Adding Content

If you, the muggle human interacting with your context, want to add stuff, you can do that through the app. The system is set up so that it can ingest content really quickly. You can grab a podcast conversation, drop the URL in. You can also put anything else in there — a note, an idea, or you can drag a PDF. It's going to appear in the database as a node.

One thing I should mention — when you first download and clone the repo and open it up, it's probably going to prompt you for an API key. So go get an OpenAI API key. It literally costs like cents per day. I use it every day.

What that API key does is there's just a few pipelines running in the background. The content you're adding — for example, if you're adding a podcast — it's going to use the OpenAI key to chunk and embed that entire transcript into the knowledge base. And then it's also going to use a cheap model, GPT-4 mini, which runs in the background and enhances your data as it's coming in and applies dimensions. It's going to cost you cents at most per day, and it's worth it because it's going to enhance and clean up and organise your data — which is good for lazy people.

The Interface

The app is pretty intuitive. You basically have different panes that are just different ways of visualising your data. There's a full database view so you can see the actual table. There's a map or graph that displays your nodes and how they connect together. And there's a dimension view — dimensions are basically just one simple layer of organisation that exists as folders, and the system can help organise the data as it comes in.

The MCP Server — This is the Fun Stuff

The whole idea of this open source — the reason I'm doing this — is partly because I don't want to be greedy and I want to make open source tools. But it's also the way that things are changing. A lot of people are going to be using these coding harnesses, and rather than having a monolithic piece of software where the agent lives in the tool and you pay for that, I think people are going to be using these external tools and then interfacing through them to write to their database.

So what we're going to set up here is an MCP server. This is all listed in the readme document and your coding agent is going to know exactly how to do this.

I won't go into great depth on what MCP is — you can go and read more about that. Basically, it's just a standardised way that allows products or services to expose themselves to the user or the agent. For example, you might want to connect your email to your agent or plug into Salesforce — it's just a more consistent way for the agent to be able to use tools to interact with that service.

What's more interesting here is that this kind of flips the situation. What you're doing is allowing Claude, or whatever agent you're using, to have a bridge directly to your knowledge graph. This is kind of like giving Claude access to your second brain. You're basically allowing it to continually read, search, add, and connect things directly between your interactions with the coding agent and this ontology or knowledge graph that you're building.

When you install the MCP server, your Claude Code instance is basically being updated with access to this bridge. The way the tools have been configured is going to encourage the agent to be continually reading and writing from this knowledge graph for you. It's going to be kind of ambiently building your knowledge graph in the background, if you set things up correctly.

Using It

Once the MCP server is installed, you just restart Claude Code and the tools are ready.

Most of the tools are pretty self-explanatory — they're just simple ways to interact with the knowledge base. There's a SQLite query tool and other tools that have the schema of your knowledge graph and the dimensions in context.

The getContext tool is important — it basically allows your agent to traverse your knowledge graph and pull in interesting and relevant things. As you build your context graph, certain nodes are going to accumulate more connections. That indicates to the agent that these might be hub nodes or things that are more important for context.

So when you're embarking on any mission with Claude Code — whether that's research or coding or building an application — it's going to be able to pull in relevant context by traversing that graph. Rather than having things stored in a single memory markdown file, the idea is that your context, your ideas, your beliefs, your memory — this is all fluid and changing and evolving over time. By giving your agent access to your knowledge graph, it's going to be able to pull in very efficient and very effective context.

Here's an example: I can say "hey, can you go and do some research on Peter Steinberger and add this as a separate node for me to follow up?" Now Claude can go off, do some web search, and come back and start creating a new node in my context graph. It automatically added the edge — the connection — without me asking, without me telling it to. That's pretty cool. And as you build this graph more thoughtfully, it gets better at doing this.

The Schema — Quick Overview

Everything in the app is just a different way to view or augment your data.

Nodes are the atomic units of context that exist within your knowledge graph. It could be an idea, an insight, a memory, an event, a person or entity, a podcast, an article — they all exist as individual nodes.

The description is really important. Models need verbose explanation of what things are in very simple terms. Any item that goes into the context graph should have a very specific description — the most high-level abstraction of what the thing is, why it's important, and ideally why it's important for you.

There's notes, which you can edit, update, and add your own notes to.

Edges are the connections, also really important. The connections between nodes should have really good explanations, and the system should be able to infer the type of connection. Why these two things connect — this is critical because it allows for dynamic traversal for context.

There's a source. In some cases, if it's something really big like a podcast with a long transcript, the transcript is automatically pulled in, chunked and embedded in a separate table. That's just for longer items to allow vector embedding search over longer research items.

And then dimensions — just a simple one-layer of organisation. You can add new dimensions and you should always add a dimension description. That informs the model of what belongs in that folder. When you add new stuff, it looks at the description and says "ah, this looks like a podcast, or an idea, or a preference, or a memory, or a project." You can lock the ones that are most important to you.

By design, there's not much hierarchical organisation — there's no folders within folders. The heavy lifting should be done by the edges you create more thoughtfully, and the descriptions you've written for individual nodes.

Hub Nodes and Context

The context feature shows your most highly-connected nodes. The nodes with the most connections appear to the agent as the ones that are most contextually important for you.

What I like to do in my fully built-out context graph is have hub nodes that reflect the big projects I'm working on, areas of interest, things I'm building. They have hundreds and in some cases thousands of connections. So when I start using one of these agents, it has that awareness that this is the stuff that's most important to me. It can then traverse out from those hub nodes.

By default, if you are thoughtfully adding stuff and connecting stuff, the information in the context graph that you connect the most will go to the top of the graph and appear to the agent first.

The graph view will cluster the most-connected nodes toward the middle — that's basically a representation of what the agent is going to see. It sees those most highly-connected nodes first, and then it can traverse out.

Guides

Guides are like skills. You can store raw information in markdown with some frontmatter, and the agent can refer to these guides when it's trying to do something for you.

I've pre-packaged the app with foundational, universal guides that the agent can use when organising your data. It can look in there for the schema to see how your database is structured. It can look for your preferences on how you like to store or enrich or organise data. And there's information on how to traverse the context graph to pull in valuable information.

You can do anything with guides though. I have guides that help me create audio debriefs, guides that automatically pull in research and resources that I like and build me flashcards — all of this kind of thing you can do.

Get Involved

That's basically it. If you're going to give this a try, please come and join the Discord. Give me a star on GitHub.

Contributions would be really appreciated. I'd just love to start building this out and understanding how different people want to use this and build their context graph — hopefully get some feedback and be able to evolve the product and the framework.

GitHub: RA-H Open Source