I've seen a lot of such systems come and go. One of my friends is working on probably the best (VC-funded) memory system right now.
The problem always is that when there are too many memories, the context gets overloaded and the AI starts ignoring the system prompt.
Definitely not a solved problem, and there need to be benchmarks to evaluate these solutions. Benchmarks themselves can be easily gamed and not universally applicable.
Seems interesting. Ill give it a try on my agent, memory is definitely an ongoing issue. How long have you been running this in a continuous state? Also have you tried other LLM's and seen a difference on how well they can use it?
Your example is with Codex - OpenAI could implement this easily on their end right? Every prompt of yours was an API call and they have a log, they can easily re-create a quick history of what you did/asked for before?
34 comments
The problem always is that when there are too many memories, the context gets overloaded and the AI starts ignoring the system prompt.
Definitely not a solved problem, and there need to be benchmarks to evaluate these solutions. Benchmarks themselves can be easily gamed and not universally applicable.
Has been working for me for a couple of months already. So far human curation of context is the way to go.
I guess the markdown approach really has a advantage over others.
PS : Something I built on markdown : https://voiden.md/