LLM Wiki – example of an "idea file" (gist.github.com)

by tamnd 95 comments 296 points
Read article View on HN

95 comments

[−] devnullbrain 41d ago
I don't see why this wouldn't just lead to model collapse:

https://www.nature.com/articles/s41586-024-07566-y

If you've spent any time using LLMs to write documentation you'll see this for yourself: the compounding will just be rewriting valid information with less terse information.

I find it concerning Karpathy doesn't see this. But I'm not surprised, because AI maximalists seem to find it really difficult to be... "normal"?

Rule of thumb: if you find yourself needing to broadcast the special LLM sauce you came up with instead of what it helped you produce, ask yourself why.

[−] gojomo 41d ago
Here in 2026, many forms of training LLMs on (well-chosen) outputs of themselves, or other LLMs, have delivered gigantic wins. So 2024 & earlier fears of 'model collapse' will lead your intuition astray about what's productive.

It is unlikely you are accurately perceiving some limitation that Karpathy does not.

[−] ChrisGreenHeur 41d ago
The article is not on training LLMs. it is about using LLMs to write a wiki for personal use. The article assumes a fully trained LLM such as ChatGPT or Claude already exists to be used.
[−] khalic 40d ago
Don't even try, after vibe coding, people seem to be adopting vibe thinking. "Model Collapse sounds cool, I'm gonna use it without looking up"
[−] hombre_fatal 40d ago
Also, TFA prescribes putting ground truth source files into a /raw directory.

Everything is derived from them and backlinks into them. Which is necessary to be vigilant about staleness, correctness, drift, and more. Just like in a human-built knowledge base.

[−] sebmellen 41d ago
Edit for context: the sibling comment from karpathy is gone after being flagged to oblivion. Not sure if he deleted it or if it was just removed based on the number of flags? He had copy-pasted a few snarky responses from Claude and essentially said “Claude has this to say to you:” followed by a super long run on paragraph of slop.

————

Wow, I respect karpathy so much and have learned a ton from him. But WTF is the sibling comment he wrote as a response to you? Just pasting a Claude-written slop retort… it’s sad.

Maybe we need to update that old maxim about “if you don’t have something nice to say, don’t say it” to “if you don’t have something human to say, don’t say it.”

So many really smart people I know have seen the ‘ghost in the machine’ and as a result have slowly lost their human faculties. Ezra Klein, of all people, had a great article about this recently titled “I Saw Something New in San Francisco” (gift link if you want to read it): https://www.nytimes.com/2026/03/29/opinion/ai-claude-chatgpt...

[−] prodigycorp 41d ago
It's not sad. He's a person like you and me. devnullbrain's comment is snarky. He invoked model collapse which has nothing to do with the topic of a wiki/kb, he wrote that karpathy is not normal, and then seemed to imply that the idea was useless. I'd be pretty in my feels and the fact that he wrote it and deleted it seems like a +1 normal guy thing.
[−] sebmellen 40d ago
Yeah. I know you didn’t see it, but it was truly a substance-free response. Glad to see he deleted it and I know I’ve been guilty of the same kind of knee-jerk response before.
[−] prodigycorp 40d ago
I saw it. It sucked, I agree. But like you said, we all get one (or a few) of those.
[−] jahala 40d ago
I did a proof of concept for self-updating html files (polyglot bash/html) some weeks ago. It actually works quite well, with simple prompting it seems to not just go in circles (https://github.com/jahala/o-o)
[−] kwar13 41d ago
also my experience. it can't even keep up with a simple claude.md let alone a whole wiki...
[−] Vetch 41d ago
This sounds very like Licklider's essay on Intelligence Amplification: Man Computer Symbiosis, from 1960:

> Men will set the goals and supply the motivations, of course, at least in the early years. They will formulate hypotheses. They will ask questions. They will think of mechanisms, procedures, and models. They will remember that such-and-such a person did some possibly relevant work on a topic of interest back in 1947, or at any rate shortly after World War II, and they will have an idea in what journals it might have been published. In general, they will make approximate and fallible, but leading, contributions, and they will define criteria and serve as evaluators, judging the contributions of the equipment and guiding the general line of thought.

> In addition, men will handle the very-low-probability situations when such situations do actually arise. (In current man-machine systems, that is one of the human operator's most important functions. The sum of the probabilities of very-low-probability alternatives is often much too large to neglect. ) Men will fill in the gaps, either in the problem solution or in the computer program, when the computer has no mode or routine that is applicable in a particular circumstance.

> The information-processing equipment, for its part, will convert hypotheses into testable models and then test the models against data (which the human operator may designate roughly and identify as relevant when the computer presents them for his approval). The equipment will answer questions. It will simulate the mechanisms and models, carry out the procedures, and display the results to the operator. It will transform data, plot graphs ("cutting the cake" in whatever way the human operator specifies, or in several alternative ways if the human operator is not sure what he wants). The equipment will interpolate, extrapolate, and transform. It will convert static equations or logical statements into dynamic models so the human operator can examine their behavior. In general, it will carry out the routinizable, clerical operations that fill the intervals between decisions.

https://www.organism.earth/library/document/man-computer-sym...

[−] ramoz 41d ago
Wow. fascinating insights he had.

e.g. (amongst many others) Desk-Surface Display and Control: Certainly, for effective man-computer interaction, it will be necessary for the man and the computer to draw graphs and pictures and to write notes and equations to each other on the same display surface. The man should be able to present a function to the computer, in a rough but rapid fashion, by drawing a graph. The computer should read the man's writing, perhaps on the condition that it be in clear block capitals, and it should immediately post, at the location of each hand-drawn symbol, the corresponding character as interpreted and put into precise type-face.

[−] kenforthewin 41d ago
This is just RAG. Yes, it's not using a vector database - but it's building an index file of semantic connections, it's constructing hierarchical semantic structures in the filesystem to aid retrieval .. this is RAG.

On a sidenote, I've been building an AI powered knowledge base (yes, it uses RAG) that has wiki synthesis and similar ideas, take a look at https://github.com/kenforthewin/atomic

[−] panarky 41d ago
There's nothing about RAG that requires embeddings.

The retrieval part can be grep if you don't care about semantic search.

[−] alfiedotwtf 41d ago
You should have started your comment with “ I have a few qualms with this app”.

I’ve been thinking something along the lines of a LLM-WIKI for a while now which could truely act as a wingman-executive-assistant-second-brain, but OP has gone deeper than my ADHD thoughts could have possibly gone.

Looking forward to seeing this turn into fruition

[−] Jet_Xu 41d ago
I believe Multimodal KB+Agentic RAG is a suitable solution for personal KB. Imagine you have tons of office docs and want to dig some complex topics within it. You could try https://github.com/JetXu-LLM/DocMason

Fully retrieve all diagram or charts info from ppt and excels, and then leverage Native AI agents(e.g. Codex) to conduct Agentic Rad

[−] locknitpicker 41d ago

> This is just RAG.

More to the point, this is how LLM assistants like GitHub Copilot use their custom instructions file, aka copilot-instructions.md

https://docs.github.com/en/copilot/how-tos/configure-custom-...

[−] darkhanakh 41d ago
eh i'd push back on "just RAG". like yes the retrieval-generation loop is RAG shaped, no ones arguing that. but the interesting bit here is the write loop - the LLM is authoring and maintaining the wiki itself, building backlinks, filing its own outputs back in. thats not retrieval thats knowledge synthesis. in vanilla RAG your corpus is static, here it isnt

also the linting pass is doing something genuinely different - auditing inconsistencies, imputing missing data, suggesting connections. thats closer to assistant maintaining a zettelkasten than a search engine returning top-k chunks

cool project btw will check it out

[−] devmor 41d ago
This is just persistent memory RAG. I have had a setup like this since about a day after I started using copilot, except it's an MCP server that uses sqlite-vec and has recall endpoints to contextually load the proper data instead of a bunch of extra files polluting context.

OP's example isn't something new or incredibly thoughtful at all - in fact this pattern gets "discovered" every other day here, reddit or social media in general by people that don't have the foresight to just look around and see what other people are doing.

[−] kenforthewin 41d ago
I agree with you, the linting pass seems valuable and it's something I'm thinking about adding - it's a great idea.

What I'm pushing back on specifically is the insistence that the core loop - retrieving the most relevant pieces of knowledge for wiki synthesis - is not RAG. In order for the LLM to do a good job at this, it needs some way to retrieve the most relevant info. Whether that's via vector DB queries or a structured index/filesystem approach, that fundamental problem - retrieving the best data for the LLM's context - is RAG. It's a problem that has been studied and evaluated for years now.

thanks for checking it out

[−] Covenant0028 41d ago
I'm curious how this linting step scales with larger wikis. Looking for an inconstency across N files requires N*N comparisons, and that's assuming each file contains a single idea.
[−] Imanari 41d ago
Isn’t this just kicking the can down the road?

> but the LLM is rediscovering knowledge from scratch on every question

Unless the wiki stays fully in context now the LLM hast to re-read the wiki instead of re-reading the source files. Also this will introduce and accumulate subtle errors as we start to regurgitate 2nd-order information.

I totally get the idea but I think next gen models with 10M context and/or 1000tps will make this obsolete.

[−] nidnogg 40d ago
I've recently lazied out big time on a company project going down a similar rabbit hole. After having a burnout episode and dealing with sole caregiver woes in the family for the past year, I've had less and less energy to piece together intense, correct thought sequences at work.

As such I've taken to delegating substantial parts architecture and discovery to multiagent workflows that always refer back to a wiki-like castle of markdown files that I've built over time with them, fronted by Obsidian so I can peep efficiently often enough.

Now I'm certainly doing something wrong, but the gaps are just too many to count. If anything, this creates a weird new type of tech debt. Almost like a persistent brain gap. I miss thinking harder and I think it would get me out of this one for sure. But the wiki workflow is just too addictive to stop.

[−] voidhorse 41d ago
This makes me feel like karpathy is behind on the times a tad. Many agent users I know already do precisely this as part of "agentic" development. If you use a harness, the harness is already empowered to do much of this under the hood, no fancy instruction file required. Just ask the agent to update some knowledge directory at the end of each convo, done. If you really need to automate it, write some scheduling tool that tells the agent to read past convos and summarize. It really is that easy.
[−] nurettin 41d ago
He really wants to shine, but how is this different than claude memory or skills? When I encounter something it had difficulty doing, or consistently start off with incorrect assumptions, I solve for it and tell it to remember this. If it goes on a long trial and error loop to accomplish something, once it works I tell it to create a skill.
[−] mpazik 41d ago
Happy to see it gets attention. The friction shows up once you mix docs with structured things like work items or ADRs. Flat markdown doesn't query well and gets inconsistent. You can read TASKS.md fine. The agent can't ask "show me open tasks blocking this epic" without scanning prose or maintaining a parallel index.

The AGENTS.md approach papers over this by teaching the LLM the folder conventions. Works until the data gets complex but gets worse after many iterations.

Both are needed: files that open in any editor, and a structured interface the agent can actually query. Been building toward that with Binder (github.com/mpazik/binder), a local knowledge platform. Data lives in a structured DB but renders to plain markdown with bi-directional sync. LSP gives editors autocomplete and validation. Agents and scripts get the same data through CLI or MCP.

[−] saberience 35d ago
Karpathy is at his best when working on teaching materials for ML beginners.

When it comes to other stuff he seems to be in a X/Twitter induced AI Psychosis like Garry Tan where he thinks everything he does is amazing and novel because he gets glazed by 1000 X bots who just post "You're amazing" after anything he tweets.

This is most definitely an (old) solution in search of a problem. Plenty of people have been trying variants of this for several years at this point but the issue isn't putting stuff into a wiki, or git, or markdown files. It's how you keep then up to date, how you deal with conflicts, how you deal with bloat, how you decide what to keep and delete over time, and also, when you've got this big mass of notes and markdown, when do you surface it?

It sounds great on paper until you try and use it and realize that in reality it isn't that useful and doesn't become part of your daily life. That is, it's more fun to build than to actually use, and you don't end up using it outside of the initial novelty.

[−] cyanydeez 41d ago
Too much context pollution.

Start with short text context, and flow through DAGs via choose your own adventure. We alreadybreached context limits. Nows the time to let LLMs build their contexts through decision trees and prune dead ends.

[−] cookiengineer 40d ago
This is essentially the hack that Claude Code did with their dedicated sub-agent that summarizes a list of previous messages and maintains different types of memory (temporary, persistent, cross-agent). They also built tools so that agents can talk to each other and use that summary to propagate knowledge to other agents if needed.

Setting aside that their codebase is absolute slopcrap, I think something like this might work nicely if it's built from the ground up.

For my own test environment I'm relying on Golang and its conventions (go build, go test, go fmt, gopls etc) which saves a lot of prompts and tokens down the line. Additionally I think that spec driven development might be more successful but I haven't found out yet what the right amount of details for specifications is, so that semantic anchors can help summarize it better.

Anyways if you're curious, it's made for short agent lifecycles and it kinda works every time most of the time: https://github.com/cookiengineer/exocomp

Still need to implement the summarizing agent and memory parts, it's a little fiddlework to get that right so I'm currently experimenting a lot locally with both ollama and vllm as an inference engine.

[−] sornaensis 40d ago
I've been working on my own thing with more of a 'management' angle to it. It lets me connect memories to tasks and projects across all of my workspaces, and gives me a live SPA to view and edit everything, making controlling what the models are doing a lot easier in my experience https://github.com/Sornaensis/hmem in a way that suits how I think vs other project management or markdown systems.

I would be interested in trying to make the models go into more of a research mode and organize their knowledge inside it, but I've found this turns into something like LLM soup.

For coding projects, the best experience I have had is clear requirements and a lot of refinement followed through with well documented code and modules. And only a few big 'memories' to keep the overall vision in scope. Once I go beyond that, the impact goes down a lot, and the models seem to make more mistakes than I would expect.

[−] gchamonlive 41d ago
I don't think this is taking it as far as it can go.

Everything should live in the repo. Code and docs yes. But also the planning files, epics, work items, architectural documentation and decisions. Here is a small example of my Linux system doc: https://github.com/gchamon/archie/tree/main/docs

And you don't need to reinvent the wheel. Code docs can like either right next to it in the readme or in docs/ if it's too big for a single file or the context spams multiple modules. ADRs live in docs/architecture/decisions. Epics and Workitems can also live in the docs.

Everything is for agents and everything is for humans, unless put in AGENTS.md and docs/agents or something similar, and even those are for human too.

In a nutshell, put everything in the repo, reuse standards as much as possible, the idea being it's likely the structure is already embedded in the model, and always review documentation changes.

[−] zbyforgotpass 40d ago
A list of systems that implement this or related ideas: https://zby.github.io/commonplace/notes/related-systems/rela...

This list is also part of my own contender in this race: https://zby.github.io/commonplace/ - my own LLM operated knowledge base (this is the html rendering of that KB - there is also the github repo linked there).

The main feature is that I use it to build a theory about such systems - and the neat trick is that llms can read this theory and implement it so the very theory works as an LLM runtime too.

It works for me - but it has some rough edges still - so I guess it is not for everyone.

[−] asakin 37d ago
I'd been working on a personal wiki system for a while and adapted it to implement this pattern. Ended up as a git template:

https://github.com/asakin/llm-context-base

Main additions on top of the pattern: a training period where the AI learns how you work over 30 days then gets quieter over time, a metadata standard so files are queryable by summary, and a lint pass for stale content and context loading optimization. Never have to design a taxonomy upfront.

[−] thoughtpeddler 40d ago
I’m surprised Karpathy thinks this is a viable solution for a quasi-continual learning system. Yes, it’s cool to experiment with these sort of ‘intermediate knowledge systems’, but the real goal within this current LLM paradigm remains clear: new information within a knowledge system should be manifest through updating the weights! Many efforts are taking a crack at this, which this really helpful talk [0] by Jack Morris goes into. I’m in full agreement with the other comments here: this ‘LLM Wiki’ merely results in “2nd-order information” that will only muddy the picture.

[0] “Stuffing Context is not Memory, Updating Weights Is": https://www.youtube.com/watch?v=Jty4s9-Jb78

[−] gbro3n 40d ago
I built AS Notes for VS Code (https://www.asnotes.io) with the option for this usage pattern in mind. By augmenting VS Code so it has the tooling we use in personal knowledge management systems, it makes it easy to write, link and update markdown / wikilinked notes manually (with mermaid / LaTeX rendering capability also) - but by using VS Code we have easy access to an Agent harness that we can direct to work on, or use our notes as context. Others have pointed out that context bloat is an issue, but no more so than when you use the copilot harness (or any other) inside a large codebase. I find I get more value from my AI conversations when I persist the outputs in markdown like this.
[−] atbpaca 41d ago
An LLM that maintains a Confluence space. That looks like an interesting idea!
[−] tristanMatthias 40d ago
I built a tool[0] in a similar vein to this. It’s for the code base specifically, uses hashes to detect source file changes and distills everything down with an llm into a single asset that explains each file. All addressable via a cli.

I find it helps a LOT with discovery. Llm spends a lot less time figuring out where things are. It’s essentially “cached discovery”

[0] https://github.com/tristanmatthias/llmdoc

[−] ractive 41d ago
I built a tool to exactly help an LLM navigate and search a knowledgebase of md files. It helps a lot to build such a wiki by providing basic content search à la grep but also structured search for frontmatter properties. It also helps to move files around without breaking links and to fix links automatically. It is a CLI tool, mainly meant to be driven by AI tools.

Check it out: https://github.com/ractive/hyalo

[−] foo42 41d ago
I built a very similar system into my own assistant type project. In all honesty though I've not used it enough to know how well it works out in practice.
[−] mbreese 41d ago
I’ve been doing something similar with a RAG system where in addition to storing the documents, we use an LLM to pull out “facts”. We’re using the LLM to look for relationships between different entities. This is then also returned when we query the database.

But I like the idea of an LLM generated/maintained wiki. That might be a useful addition to allow for more interactive exploration of a document database.

[−] estetlinus 41d ago
Sounds like a solution in search for a problem.
[−] mememememememo 41d ago
This sounds like compaction for RAG.
[−] argee 41d ago
This is what Semiont is trying to do, to some extent [0].

Doesn't really feel that useful in practice.

[0] https://github.com/The-AI-Alliance/semiont

[−] jdthedisciple 41d ago
The challenge to me seems quality assurance:

I'd rather have it source the original document everytime, then an LLM-generated wiki which I most likely wouldn't have the time to fact-check and review myself.

[−] thegeolab 39d ago
Built a versioned schema standard on top of this workflow: github.com/arturseo-geo/llm-knowledge-base
[−] Lockal 41d ago
This thing already exists for multiple years - see https://deepwiki.com/ (99% it is autonomous, but it can be manually structured - see https://docs.devin.ai/work-with-devin/deepwiki#steering-deep...). There were also multiple attempts to replicate it with local LLMs.

The problem is that it is still a slop: not only it adds a lot of noise ("architecture" diagrams based on some cherry-picked filenames, incomplete datatables, hyperfocusing on strange things), it also hallucinates, adding factually incorrect information (while direct questions to LLM shows correct information).

[−] serendipty01 41d ago
[−] 0123456789ABCDE 41d ago
this is so validating•

https://grimoire-pt5.sprites.app/

[−] ansc 40d ago
The comments in the gist is depressing.
[−] qaadika 41d ago

> You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time.

I'm not sure how you can get any closer to "turning your thinking over to machines." These tasks may be "grunt work," but it's while doing these things that new ideas pop in, or you decide on a particular or novel way to organize or frame information. Many of my insights in my (analog? vanilla? my human-written) Obsidian vault (that I consider my "personal wiki") have been made or expanded on because I happened to see one note after another in doing the "grunt work", or just by opening one note and seeing its title right beside a previously forgotten one.

There's nothing "personal" about a knowledge base you filled by asking AI questions. It's the AI's database, you just ask it to write stuff. Learn how to learn and answer your own damn questions.

Soon pedagogy will be a piece of paper that says "Ask AI."

I hate this idea that a result is all that matters, and the quicker you can get the result the better, at any cost (mental or financial, short-term or long-term).

If we optimized showers to be 20 seconds, we'd stop having shower thoughts. I like my shower thoughts. And so too my grunt-work thoughts.

---

As an aside, I'm not totally against AI writing in a personal knowledgebase. I include it at times in my own. But since I started my current obsidian vault in 2023 (now 4100 self-written notes, including maybe up to 5% Web Clipper notes), I've had a Templater (Obsidian plugin) template I wrap around anything AI-written to 'quarantine' it from my own words:

==BEGIN AI-GENERATED CONTENT==

<% tp.file.cursor(1) %>

==END AI-GENERATED CONTENT==

I've used this consistently and it's helped me keep (and develop) my own writing voice apart from any of my AI usage. It actually motivates me to write more, because I know I could always take the easy route and chunk whatever I'm thinking into the AI, but I'm choosing not to by writing it myself, with my own vocabulary, in my own voice, with my own framing. I trick myself into writing because my pride tells me I can express my knowledge better than the AI can.

I also manually copy and paste from wherever I'm using AI into my notes. Nothing automated. The friction keeps me from sliding into the happy path of turning my brain off.

[−] claudiug 40d ago
i had the feeling that Karpathy is not a software developer at all, reading his tweets is like he is discovering half a century ago ideas. like specification...
[−] vbarsoum 39d ago

  I built an implementation of this and tested it on 3 Alex Hormozi books (~155K words, 68 source files). Some data for the skeptics:
                                                                                                                                                                                              
  The naive version (each book as 1 file) produced exactly the slop people are describing here. But splitting into chapter-level files and recompiling changed the output categorically. Same model, same prompts — the only variable was source granularity.                                                                                                                             
                  
  The compiler produced 210 concept pages with 4,597 cross-references (19.2 avg links per page). 20+ concepts synthesized across all 3 books unprompted — one pulled from 11 source files and found a genuine contradiction between two books that neither makes explicit. 173K words of output from 155K input. It's not compression — it's synthesis.
                                                                                                                                                                                              
  The thing I think the "this is just RAG" comments are missing: a vector database is only useful to machines. You can't open a .faiss file and browse it. A wiki is useful to both. I open these files in Obsidian, browse the graph, follow links, read concept pages — no AI needed. But when I do ask the AI a question, it reads the same wiki pages I do, and the answers are better than RAG because the knowledge is already structured and cross-referenced instead of retrieved as raw chunks.                                                                        
                  
  That's the key insight in Karpathy's idea. The compiled wiki is the interface for humans AND the knowledge layer for AI. Same artifact, two audiences.                                      
   
  ~Cost: 12M tokens, ~10-15 min. Repo: https://github.com/vbarsoum1/llm-wiki-compiler
[−] kmaitreys 40d ago

> You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it.

Then what is the point? Why be averse so to use your own brain so much? Why are tech bros like this?

[−] meidad_g 41d ago
[flagged]
[−] ariasbruno 40d ago
[dead]
[−] sschlegel 36d ago
[dead]
[−] aimemobe 36d ago
[flagged]
[−] vlsiddarth7 38d ago
[dead]
[−] MarcelinoGMX3C 40d ago
[dead]
[−] bambushu 40d ago
[flagged]
[−] maryjeiel 39d ago
[flagged]
[−] alejandrosplitt 40d ago
[dead]
[−] LeonTing1010 40d ago
[flagged]