I have found it to be the complete opposite tbh. Not lisp but I've been generating Scheme with claude for about 5 months and it's a pleasure. What I did was to make sure CLAUDE.md had clear examples and also I added a skill that leverages ast-grep for ast-safe replacement (the biggest pain is that some times claude will mess up the parens, but even lately it came up with its own python scripts to count the parens and balance the expressions on its own).
I created Schematra[1] and also a schematra-starter-kit[2] that can be spun from claude and create a project and get you ready in less than 5 minutes. I've created 10+ side projects this way and it's been a great joy. I even added a scheme reviewer agent that is extremely strict and focus on scheme best practices (it's all in the starter kit, btw)
I don't think the lack of training material makes LLMs poor at writing lisp. I think it's the lack of guidelines, and if you add enough of them, the fact that lisp has inherently such a simple pattern & grammar that it makes it a prime candidate (IMO) for code generation.
Thanks for the Scheme setup examples. I have created very simple skills markdown files for Common Lisp and Hylang/hy (Clojure-like lisp on top of Python). I need to spend more effort on my skills files though.
This is incredibly useful - not for Scheme, but for someone like me interested in bootstrapping languages and frameworks in general. I hope you find a way to share the best practices you've learned in a broader context.
I have been using AI to write Clojure code this past half year. The frontline LLM has no problem with writing idiomatic Clojure code. Both Codex and Claude Code fix their missing closing parentheses quickly. So I won't say "Writing Lisp is AI resistant". In fact, Clojure is a great fit with AI coding agent: it is token efficient, and the existing Clojure code used for training are mostly high quality code, as Clojure tends to attract experienced coders.
I am glad you enjoyed it. I am happy to report that the next release will have many new features: Raft consensus based high availability (comes with an extensive Jepsen test suite); built-in MCP server; built-in llama.cpp for in DB embedding; JSON API; language bindings for Java, Python and Javascript.
Interesting, and not quite my experience. While I do get better agentic coding results for Python projects, I also get good results working with Common Lisp projects. I do have a habit of opening an Emacs buffer and writing a huge prompt with documentation details, sometimes sample code in other languages or if I am hitting APIs I add a working CURL example. For Common Lisp my initial prompts are often huge, but I find thinking about a problem and prompt creation to be fun.
The article mentions a REPL skill. I don’t do that: letting model+tools run sbcl is sufficient.
Yes, I've also found llms can generate working common lisp code quite well, albeit I've only been solving simple problems.
I haven't tried integrating it into a repl or even command line tools though. The llm can't experience the benefit of a repl so it makes sense it struggled with it and preferred feeing entire programs into sbcl each time.
Personally, I think we're using LLMs wrong for programming. Computer programs are solutions to a given constraint logic problem (the specs).
We should be using LLMs to translate from (fuzzy) human specifications to formal specifications (potentially resolving contradictions), and then solving the resulting logic problem with a proper reasoning algorithm. That would also guarantee correctness.
Full program inference from specs is actually a very hard problem, because the compiler/SAT solver cannot autonomously derive loop invariants (or, similarly, inductive hypotheses) that are necessary to write correct code. So using a LLM that can look at the spec and provide a heuristic solution makes a lot of sense. Obviously the solution still has to be verified, though.
Perhaps you meant to say "coding", not "programming". AI is immensely helpful for programming. Coding is just the last, and in a proper programming session sometimes even unnecessary step - there are times when an adequate investigation requires deleting code rather than writing new one, or writing pages of documentation without a single code change.
You have to be a detective and know what threads to pull to rope in the relevant data, digging inductively and deductively - soaring high to get the "big picture" of things and diving into the depths of a single code line change.
I've been developing software for decades now (not claiming to be great, but at least I think I've built certain intuition and knack for it), and I always struggled with the "story telling" aspect of it - you need to compose a story about every bug, every feature request - in your head, your notes, your diagrams. A story with actors, with plot lines, with beginning, middle, and end. With a villain, a hero, and stakes. But software doesn't work that way. It's fundamentally an exploratory, iterative, often chaotic process. You're not telling what happened - you're constructing a plausible fiction that satisfies the format. The tension I felt for decades is that I am a systems thinker being asked to repeatedly perform as a narrator, and that is hard.
Modern AI is already capable of digging up me the details for my narrative - I gave it access to everything - Slack, Jira, GitHub, Splunk, k8s, Prometheus, Grafana, Miro, etc. - and now I can ask it to explain a single line of code - including historical context, every conversation, every debate, every ADR, diagram, bug and stack trace - it's complete bananas.
It doesn't mean I don't have to work anymore, if anything, I have to work more now, because now I can - the reasons become irrelevant (see Steve Jobs' janitor vs. CEO quote). I didn't earn a leadership role - AI has granted it? Forced me into it? Honestly, I don't know anymore. I have mixed feelings about all of it. It is exciting and scary at the same time. Things that I dreamed about are coming true in a way that I couldn't even imagine and I don't know how to feel about all that.
In case you’re not familiar, I will point you to the classical program synthesis literature. There the task is to take a spec written in say first-order logic, and output a program that satisfies this spec.
I think the biggest barrier to adoption of program synthesis is writing the spec/maintaining it as the project matures. Sometimes we don’t even know what we want as the spec until we have a first draft of the program. But as you’re pointing out, LLMs could help address all of these problems.
This rings true for me. LLMs in my experience are great at Go, a little less good at Java, and much less good at GCL (internal config language).
This is definitely partly training data, but if you give an LLM a simple language to use on the fly it can usually do ok. I think the real problem is complexity.
Go and Java require very little mental modelling of the problem, everything is written down on the page really quite clearly (moreso with Go, but still with Java).
In GCL however the semantics are _weird_, the scoping is unlike most languages, because it's designed for DSLs. Humans writing DSL content requires little thought, but authoring DSLs requires a fair amount of mental modelling about the structure of the data that is not present on the page. I'd wager that Lisp is similar, more of a mental model is required.
The problem is of course that LLMs don't have a mental model, or at least what they do have is far from what humans have. This is very apparent when doing non-trivial code, non-CRUD, non-React, anything that requires thinking hard about problems more than it requires monkeys at typewriters.
I bet it would do much better at hcl (or Starlark, maybe even yaml, something that it has seen plenty of examples of in the wild).
This is a weird moment in time where proprietary technology can hurt more than it can help, even if it's superior to what's available in public in principle.
How many docs do you put in the context? we maintain a lot of dsl code internally, and each file has a copy of the spec + guide as a comment at the top. Its about 50 locs and the relevant models are great at writing it.
I've had it write Scheme with little issue -- it even completely the latter half of a small toy compiler. I think the REPL is the issue, not the coding; forcing it to treat the REPL like another conversation participant is likely the only way for that to work, and this article does not handle it that way. Instead, hand it a compiler and let it use the workflow it is optimized for.
Claude has really helped me improve my Emacs config (elisp) substantially, and sometimes even fix issues I've found in packages. My emacs setup is best it has ever been. Can't say it just works and produces the best solution and sometimes it would f** up with closing parens or even make things up (e.g. it suggest load-theme-hook which doesn't exist). But overall, changing things in Emacs and learning elisp is definitely much easier for me (I'm not good with elisp, but pretty good Racket programmer).
I learned Common Lisp years ago while working in the AI lab at the University of Toronto, and parts of this article resonated strongly with me.
However, if you abandon the idea of REPL-driven development, then the frontier models from Anthropic and OpenAI are actually very capable of writing Lisp code. They struggle sometimes editing it (messing up parens)), but usually the first pass is pretty good.
I've been on an LLM kick the past few months, and two of my favorite AI-coded (mostly) projects are, interestingly, REPL-focused. icl (https://github.com/atgreen/icl) is a TUI and browser-based front end for your CL REPL designed to make REPL programming for humans more fun, whether you use it stand-alone, or as an Emacs companion. Even more fun is whistler (https://github.com/atgreen/whistler), which allows you to write/compile/load eBPF code in lisp right from your REPL. In this case, the AI wrote the highly optimizing SSA-based compiler from scratch, and it is competitive against (and sometimes beating) clang -O2. I mean... I say the AI wrote it... but I had to tell it what I wanted in some detail. I start every project by generating a PRD, and then having multiple AIs review that until we all agree that it makes sense, is complete enough, and is the right approach to whatever I'm doing.
I am a bit (ok very) worried AI will most likely kill language diversity in programming. I also don't see it settling on a more optimal solution it will probably just use the most available languages out there and be very hard to push out of that rut. And it's not limited to languages I expect knowledge ruts all over the place and due to humans and AI choosing the path of least resistance I don't see an active way to fight this.
100 comments
I created Schematra[1] and also a schematra-starter-kit[2] that can be spun from claude and create a project and get you ready in less than 5 minutes. I've created 10+ side projects this way and it's been a great joy. I even added a scheme reviewer agent that is extremely strict and focus on scheme best practices (it's all in the starter kit, btw)
I don't think the lack of training material makes LLMs poor at writing lisp. I think it's the lack of guidelines, and if you add enough of them, the fact that lisp has inherently such a simple pattern & grammar that it makes it a prime candidate (IMO) for code generation.
[1]: https://schematra.com/
[2]: https://forgejo.rolando.cl/cpm/schematra-starter-kit
The article mentions a REPL skill. I don’t do that: letting model+tools run sbcl is sufficient.
I haven't tried integrating it into a repl or even command line tools though. The llm can't experience the benefit of a repl so it makes sense it struggled with it and preferred feeing entire programs into sbcl each time.
We should be using LLMs to translate from (fuzzy) human specifications to formal specifications (potentially resolving contradictions), and then solving the resulting logic problem with a proper reasoning algorithm. That would also guarantee correctness.
LLMs are a "worse is better" kind of solution.
> We should be using LLMs to translate from (fuzzy) human specifications to formal specifications (potentially resolving contradictions)
Agreed! This is why having LLMs write assembly or binary, as people suggest, is IMO moving in the wrong direction.
> then solving the resulting logic problem with a proper reasoning algorithm. That would also guarantee correctness.
Yes! I.e. write in a high-level programming language, and have a compiler, the reasoning algorithm, output binary code.
It seems like we're already doing this!
> using LLMs wrong for programming
Perhaps you meant to say "coding", not "programming". AI is immensely helpful for programming. Coding is just the last, and in a proper programming session sometimes even unnecessary step - there are times when an adequate investigation requires deleting code rather than writing new one, or writing pages of documentation without a single code change.
You have to be a detective and know what threads to pull to rope in the relevant data, digging inductively and deductively - soaring high to get the "big picture" of things and diving into the depths of a single code line change.
I've been developing software for decades now (not claiming to be great, but at least I think I've built certain intuition and knack for it), and I always struggled with the "story telling" aspect of it - you need to compose a story about every bug, every feature request - in your head, your notes, your diagrams. A story with actors, with plot lines, with beginning, middle, and end. With a villain, a hero, and stakes. But software doesn't work that way. It's fundamentally an exploratory, iterative, often chaotic process. You're not telling what happened - you're constructing a plausible fiction that satisfies the format. The tension I felt for decades is that I am a systems thinker being asked to repeatedly perform as a narrator, and that is hard.
Modern AI is already capable of digging up me the details for my narrative - I gave it access to everything - Slack, Jira, GitHub, Splunk, k8s, Prometheus, Grafana, Miro, etc. - and now I can ask it to explain a single line of code - including historical context, every conversation, every debate, every ADR, diagram, bug and stack trace - it's complete bananas.
It doesn't mean I don't have to work anymore, if anything, I have to work more now, because now I can - the reasons become irrelevant (see Steve Jobs' janitor vs. CEO quote). I didn't earn a leadership role - AI has granted it? Forced me into it? Honestly, I don't know anymore. I have mixed feelings about all of it. It is exciting and scary at the same time. Things that I dreamed about are coming true in a way that I couldn't even imagine and I don't know how to feel about all that.
I think the biggest barrier to adoption of program synthesis is writing the spec/maintaining it as the project matures. Sometimes we don’t even know what we want as the spec until we have a first draft of the program. But as you’re pointing out, LLMs could help address all of these problems.
This is definitely partly training data, but if you give an LLM a simple language to use on the fly it can usually do ok. I think the real problem is complexity.
Go and Java require very little mental modelling of the problem, everything is written down on the page really quite clearly (moreso with Go, but still with Java).
In GCL however the semantics are _weird_, the scoping is unlike most languages, because it's designed for DSLs. Humans writing DSL content requires little thought, but authoring DSLs requires a fair amount of mental modelling about the structure of the data that is not present on the page. I'd wager that Lisp is similar, more of a mental model is required.
The problem is of course that LLMs don't have a mental model, or at least what they do have is far from what humans have. This is very apparent when doing non-trivial code, non-CRUD, non-React, anything that requires thinking hard about problems more than it requires monkeys at typewriters.
This is a weird moment in time where proprietary technology can hurt more than it can help, even if it's superior to what's available in public in principle.
I learned Common Lisp years ago while working in the AI lab at the University of Toronto, and parts of this article resonated strongly with me.
However, if you abandon the idea of REPL-driven development, then the frontier models from Anthropic and OpenAI are actually very capable of writing Lisp code. They struggle sometimes editing it (messing up parens)), but usually the first pass is pretty good.
I've been on an LLM kick the past few months, and two of my favorite AI-coded (mostly) projects are, interestingly, REPL-focused. icl (https://github.com/atgreen/icl) is a TUI and browser-based front end for your CL REPL designed to make REPL programming for humans more fun, whether you use it stand-alone, or as an Emacs companion. Even more fun is whistler (https://github.com/atgreen/whistler), which allows you to write/compile/load eBPF code in lisp right from your REPL. In this case, the AI wrote the highly optimizing SSA-based compiler from scratch, and it is competitive against (and sometimes beating) clang -O2. I mean... I say the AI wrote it... but I had to tell it what I wanted in some detail. I start every project by generating a PRD, and then having multiple AIs review that until we all agree that it makes sense, is complete enough, and is the right approach to whatever I'm doing.