I have found it to be the complete opposite tbh. Not lisp but I've been generating Scheme with claude for about 5 months and it's a pleasure. What I did was to make sure CLAUDE.md had clear examples and also I added a skill that leverages ast-grep for ast-safe replacement (the biggest pain is that some times claude will mess up the parens, but even lately it came up with its own python scripts to count the parens and balance the expressions on its own).
I created Schematra[1] and also a schematra-starter-kit[2] that can be spun from claude and create a project and get you ready in less than 5 minutes. I've created 10+ side projects this way and it's been a great joy. I even added a scheme reviewer agent that is extremely strict and focus on scheme best practices (it's all in the starter kit, btw)
I don't think the lack of training material makes LLMs poor at writing lisp. I think it's the lack of guidelines, and if you add enough of them, the fact that lisp has inherently such a simple pattern & grammar that it makes it a prime candidate (IMO) for code generation.
Thanks for the Scheme setup examples. I have created very simple skills markdown files for Common Lisp and Hylang/hy (Clojure-like lisp on top of Python). I need to spend more effort on my skills files though.
This is incredibly useful - not for Scheme, but for someone like me interested in bootstrapping languages and frameworks in general. I hope you find a way to share the best practices you've learned in a broader context.
I have been using AI to write Clojure code this past half year. The frontline LLM has no problem with writing idiomatic Clojure code. Both Codex and Claude Code fix their missing closing parentheses quickly. So I won't say "Writing Lisp is AI resistant". In fact, Clojure is a great fit with AI coding agent: it is token efficient, and the existing Clojure code used for training are mostly high quality code, as Clojure tends to attract experienced coders.
I am glad you enjoyed it. I am happy to report that the next release will have many new features: Raft consensus based high availability (comes with an extensive Jepsen test suite); built-in MCP server; built-in llama.cpp for in DB embedding; JSON API; language bindings for Java, Python and Javascript.
Interesting, and not quite my experience. While I do get better agentic coding results for Python projects, I also get good results working with Common Lisp projects. I do have a habit of opening an Emacs buffer and writing a huge prompt with documentation details, sometimes sample code in other languages or if I am hitting APIs I add a working CURL example. For Common Lisp my initial prompts are often huge, but I find thinking about a problem and prompt creation to be fun.
The article mentions a REPL skill. I don’t do that: letting model+tools run sbcl is sufficient.
Personally, I think we're using LLMs wrong for programming. Computer programs are solutions to a given constraint logic problem (the specs).
We should be using LLMs to translate from (fuzzy) human specifications to formal specifications (potentially resolving contradictions), and then solving the resulting logic problem with a proper reasoning algorithm. That would also guarantee correctness.
This rings true for me. LLMs in my experience are great at Go, a little less good at Java, and much less good at GCL (internal config language).
This is definitely partly training data, but if you give an LLM a simple language to use on the fly it can usually do ok. I think the real problem is complexity.
Go and Java require very little mental modelling of the problem, everything is written down on the page really quite clearly (moreso with Go, but still with Java).
In GCL however the semantics are _weird_, the scoping is unlike most languages, because it's designed for DSLs. Humans writing DSL content requires little thought, but authoring DSLs requires a fair amount of mental modelling about the structure of the data that is not present on the page. I'd wager that Lisp is similar, more of a mental model is required.
The problem is of course that LLMs don't have a mental model, or at least what they do have is far from what humans have. This is very apparent when doing non-trivial code, non-CRUD, non-React, anything that requires thinking hard about problems more than it requires monkeys at typewriters.
I've had it write Scheme with little issue -- it even completely the latter half of a small toy compiler. I think the REPL is the issue, not the coding; forcing it to treat the REPL like another conversation participant is likely the only way for that to work, and this article does not handle it that way. Instead, hand it a compiler and let it use the workflow it is optimized for.
Claude has really helped me improve my Emacs config (elisp) substantially, and sometimes even fix issues I've found in packages. My emacs setup is best it has ever been. Can't say it just works and produces the best solution and sometimes it would f** up with closing parens or even make things up (e.g. it suggest load-theme-hook which doesn't exist). But overall, changing things in Emacs and learning elisp is definitely much easier for me (I'm not good with elisp, but pretty good Racket programmer).
I learned Common Lisp years ago while working in the AI lab at the University of Toronto, and parts of this article resonated strongly with me.
However, if you abandon the idea of REPL-driven development, then the frontier models from Anthropic and OpenAI are actually very capable of writing Lisp code. They struggle sometimes editing it (messing up parens)), but usually the first pass is pretty good.
I've been on an LLM kick the past few months, and two of my favorite AI-coded (mostly) projects are, interestingly, REPL-focused. icl (https://github.com/atgreen/icl) is a TUI and browser-based front end for your CL REPL designed to make REPL programming for humans more fun, whether you use it stand-alone, or as an Emacs companion. Even more fun is whistler (https://github.com/atgreen/whistler), which allows you to write/compile/load eBPF code in lisp right from your REPL. In this case, the AI wrote the highly optimizing SSA-based compiler from scratch, and it is competitive against (and sometimes beating) clang -O2. I mean... I say the AI wrote it... but I had to tell it what I wanted in some detail. I start every project by generating a PRD, and then having multiple AIs review that until we all agree that it makes sense, is complete enough, and is the right approach to whatever I'm doing.
I am a bit (ok very) worried AI will most likely kill language diversity in programming. I also don't see it settling on a more optimal solution it will probably just use the most available languages out there and be very hard to push out of that rut. And it's not limited to languages I expect knowledge ruts all over the place and due to humans and AI choosing the path of least resistance I don't see an active way to fight this.
100 comments
I created Schematra[1] and also a schematra-starter-kit[2] that can be spun from claude and create a project and get you ready in less than 5 minutes. I've created 10+ side projects this way and it's been a great joy. I even added a scheme reviewer agent that is extremely strict and focus on scheme best practices (it's all in the starter kit, btw)
I don't think the lack of training material makes LLMs poor at writing lisp. I think it's the lack of guidelines, and if you add enough of them, the fact that lisp has inherently such a simple pattern & grammar that it makes it a prime candidate (IMO) for code generation.
[1]: https://schematra.com/
[2]: https://forgejo.rolando.cl/cpm/schematra-starter-kit
The article mentions a REPL skill. I don’t do that: letting model+tools run sbcl is sufficient.
We should be using LLMs to translate from (fuzzy) human specifications to formal specifications (potentially resolving contradictions), and then solving the resulting logic problem with a proper reasoning algorithm. That would also guarantee correctness.
LLMs are a "worse is better" kind of solution.
This is definitely partly training data, but if you give an LLM a simple language to use on the fly it can usually do ok. I think the real problem is complexity.
Go and Java require very little mental modelling of the problem, everything is written down on the page really quite clearly (moreso with Go, but still with Java).
In GCL however the semantics are _weird_, the scoping is unlike most languages, because it's designed for DSLs. Humans writing DSL content requires little thought, but authoring DSLs requires a fair amount of mental modelling about the structure of the data that is not present on the page. I'd wager that Lisp is similar, more of a mental model is required.
The problem is of course that LLMs don't have a mental model, or at least what they do have is far from what humans have. This is very apparent when doing non-trivial code, non-CRUD, non-React, anything that requires thinking hard about problems more than it requires monkeys at typewriters.
I learned Common Lisp years ago while working in the AI lab at the University of Toronto, and parts of this article resonated strongly with me.
However, if you abandon the idea of REPL-driven development, then the frontier models from Anthropic and OpenAI are actually very capable of writing Lisp code. They struggle sometimes editing it (messing up parens)), but usually the first pass is pretty good.
I've been on an LLM kick the past few months, and two of my favorite AI-coded (mostly) projects are, interestingly, REPL-focused. icl (https://github.com/atgreen/icl) is a TUI and browser-based front end for your CL REPL designed to make REPL programming for humans more fun, whether you use it stand-alone, or as an Emacs companion. Even more fun is whistler (https://github.com/atgreen/whistler), which allows you to write/compile/load eBPF code in lisp right from your REPL. In this case, the AI wrote the highly optimizing SSA-based compiler from scratch, and it is competitive against (and sometimes beating) clang -O2. I mean... I say the AI wrote it... but I had to tell it what I wanted in some detail. I start every project by generating a PRD, and then having multiple AIs review that until we all agree that it makes sense, is complete enough, and is the right approach to whatever I'm doing.