Even cool projects can learn from others. Maybe they missed something that could benefit the project, or made some interesting technical choice that gives a different result.
For the readers/learners, it's useful to understand the differences so we know what details matter, and which are just stylistic choices.
Well, the person who asked the question, for one. I'm sure they're not the only one. Best not to assume why people are asking though, so you can save time by not writing irrelevant comments.
Is there some documentation for this? The code is probably the simplest (Not So) Large Language Model implementation possible, but it is not straight forward to understand for developers not familiar with multi-head attention, ReLU FFN, LayerNorm and learned positional embeddings.
This projects shares similarities with Minix. Minix is still used at universities as an educational tool for teaching operating system design. Minix is the operating system that taught Linus Torvalds how to design (monolithic) operating systems. Similarly having students adding capabilities to GuppyLM is a good way to learn LLM design.
Cool project. I'm working on something where multiple LLM agents share a world and interact with each other autonomously. One thing that surprised me is how much the "world" matters — same model, same prompt, but put it in a system with resource constraints, other agents, and persistent memory, the behavior changes dramatically. Made me realize we spend too much time optimizing the model and not enough thinking about the environment it operates in.
I love these kinds of educational implementations.
I want to really praise the (unintentional?) nod to Nagel, by limiting capabilities to representation of a fish, the user is immediately able to understand the constraints. It can only talk like a fish cause it’s very simple
Especially compared to public models, thats a really simple correspondence to grok intuitively (small LLM > only as verbose as a fish, larger LLM > more verbose) so kudos to the author for making that simple and fun.
Could it be possible to train LLM only through the chat messages without any other data or input?
If Guppy doesn't know regular expressions yet, could I teach it to it just by conversation? It's a fish so it wouldn't probably understand much about my blabbing, but would be interesting to give it a try.
Or is there some hard architectural limit in the current LLM's, that the training needs to be done offline and with fairly large training set.
I am trying to find how the synthetic data was created (looking through the repo) and didn't find it. Maybe I am missing it - Would love to see the prompts and process on that aspect of the training data generation!
This is a nice idea. A tiny implementation can be way more useful for learning than yet another wrapper around a big model, especially if it keeps the training loop and inference path small enough to read end to end.
Does this work by just training once with next token prediction? Want to understand better how it creates fluent sentences if anyone can provide insights.
Wow that is such a cool idea! And honestly very much needed. LLMs seem to be this blackbox nobody understands. So I love every effort to make that whole thing less mysterious. I will definitely have a look at dabbling with this, may it not be a goldfish LLM :)
134 comments
For the readers/learners, it's useful to understand the differences so we know what details matter, and which are just stylistic choices.
This isn't art; it's science & engineering.
> Who cares how it compares
Well, the person who asked the question, for one. I'm sure they're not the only one. Best not to assume why people are asking though, so you can save time by not writing irrelevant comments.
This projects shares similarities with Minix. Minix is still used at universities as an educational tool for teaching operating system design. Minix is the operating system that taught Linus Torvalds how to design (monolithic) operating systems. Similarly having students adding capabilities to GuppyLM is a good way to learn LLM design.
You> hello Guppy> hi. did you bring micro pellets.
You> HELLO Guppy> i don't know what it means but it's mine.
How does it handle unknown queries?
I want to really praise the (unintentional?) nod to Nagel, by limiting capabilities to representation of a fish, the user is immediately able to understand the constraints. It can only talk like a fish cause it’s very simple
Especially compared to public models, thats a really simple correspondence to grok intuitively (small LLM > only as verbose as a fish, larger LLM > more verbose) so kudos to the author for making that simple and fun.
If Guppy doesn't know regular expressions yet, could I teach it to it just by conversation? It's a fish so it wouldn't probably understand much about my blabbing, but would be interesting to give it a try.
Or is there some hard architectural limit in the current LLM's, that the training needs to be done offline and with fairly large training set.
> you're my favorite big shape. my mouth are happy when you're here.
Laughed loudly :-D
Now, I ask, have LLMs ben demystified to you? :D
I am still impressed how much (for the most part) trivial statistics and a lot of compute can do.
But right now people make it a hobby, and that thing can run on a laptop.
This is just so wild.
> A 9M model can't conditionally follow instructions
How many parameters would you need for that?
* How training. In cloud or in my own dev
* How creating a gguf