C89cc.sh – standalone C89/ELF64 compiler in pure portable shell (gist.github.com)

by gaigalas 59 comments 201 points
Read article View on HN

59 comments

[−] gaigalas 44d ago
Single standalone file, no external tools used, PATH='' (empty), portable (bash, dash, ksh, zsh), produces x86 ELF executables, has mini-libc builtin.

Usage:

printf 'int main(){puts("hello");return 0;}' | sh c89cc.sh > hello

chmod +x hello

./hello

[−] angry_octet 43d ago
I can't think of a reason to use c89cc.sh, but I salute this effort nonetheless.
[−] gaigalas 42d ago
The main show are the techniques for writing portable shell scripts, not the compiler.

If you want something one would actually use, try my project tuish:

http://github.com/alganet/tuish

[−] honkcity 42d ago
Is there a linter to ensure scripts are portable across shells? I try to write them like that but I'm certainly no master so I write them to work with busybox.
[−] gaigalas 42d ago
A linter, not yet.

You can use what I use: https://github.com/alganet/shell-versions

It's a container with lots of shells that you can test. Like esvu but for the shell.

Might have a little outdated docs, hit me with an issue if you use it and face any problems (I'm also the author).

[−] pvtmert 42d ago
I think shellcheck helps quite a lot, you must set type to "sh" (not bash) somewhere in the comments though...
[−] t-3 43d ago
Why not POSIX or some common external tools where it makes sense? Most of those big switch statements could be easily replaced with some standard programs that already exist everywhere.
[−] gaigalas 43d ago
One main reason is performance. Forking for other tools is very expensive.

That said, using larger sed or awk programs instead of ad-hoc calls for small snippets would perhaps be net-positive for performance and readability.

I'm currently working on very strict bootstrap scenarios in which sed and awk might not be available, but a shell might be (if I'm able to write it). It is possible that in such scenarios, the fist send and awk versions will be shell-written polyfills anyway.

[−] MisterTea 42d ago

> One main reason is performance

This assumes the executed program is as fast or slower than the caller.

[−] Brian_K_White 42d ago
Why not just use gcc which already exists everywhere?

When you answer that, same answer. If you can't imagine any answer for that, then the answer won't be convincing or make sense even if anyone tried to articulate it. Which is fine. Everyone doesn't have to find meaning in the same things.

[−] t-3 42d ago
Shell without a userland is like FORTH without the ability to define new words. It's really contrary to the whole idea of what a shell is. Bootstrapping in very constrained conditions makes some sense, but where would you have a POSIX shell and not a POSIX userland (or close equivalent) to work with? When I wrote a similar compiler in shell, I purposely offloaded everything I could to external tools and used the shell for composition, so I found the approach intriguing and wanted to ask. I wasn't trying to criticize or dismiss the project, I think it's really cool or else I wouldn't have bothered to read the code in the first place.
[−] anthk 42d ago
Busybox it's what you need. On Forth, Subleq+EForth can do a lot more than you think.
[−] Brian_K_White 42d ago
gcc exists essentially everywhere a shell exists too. If you're ok with using grep and bc or whatever, then why not gcc?

Or better yet, awk? awk is as old and ubiquitous as sh itself, on every machine even ancient ones that don't even have a compiler because that was a paid extra. Only unlike sh it's actually a full normal programming language that can do basically everything the shell can do only in a far more readable and sane way instead of using wierd expansions and abusing the command line parser to achieve functions it doesn't have overt functions for. Just write directly in awk the same way you would in say python or js. If you have sh, especially if you also have the userland you are talking about, then you have awk. It's part of that userland in a way that gcc is admittedly not.

More in your vein actually, when I do things like this I pick yet a different ideal goal than either you or the author. I avoid all externals (and even child shells) to whatever extent possible, but I do use bash for all its' worth. Every possible intentional or hack bashism. Require bash, but leverage bash to within an inch of it's life and require nothing else.

But this project tageting more portable code that doesn't require bash is really cool and valuable. Even though it's not a standard I personally shoot for even when I am specifically shell-golfing.

There are probably as many different points along the spectrum to draw the line as there are individual developers, each with some actually reasonbable argument to justify that particular place to draw the line.

To me using grep and sed and tr and ls and cat etc etc when I don't need them is just unsatisfying, inelegant, uninteresting.

If you are in bash or ksh93 or zsh, you don't need all kinds of things like basename, dirname, cut, tr, wc, nor some of the more powerful stuff either most of the time. I have a shell function that uses the built-in read combined with a named pipe file created in tmp to make a sleep that doesn't need /bin/sleep. Why bother? because it's awesome. And usually the only times I need to use sleep it's in some rapid short duration polling loop that really is better if you don't have to fork & exec & teardown on every iteration. It's bad enough to be polling like that in the first place. And it just doesn't matter how "probably all the externals will be there", not using them is even better. And these days a lot of once-common "userland" is no longer common or installed by default. A script that never tries to run dos2unix never cares that it's not installed, or that the bsd version behaves differently, or the mac version is stupid old, etc.

[−] jonahx 43d ago
gorgeous!
[−] kelsey98765431 43d ago
Would be a lot better if it came with tests. Please do this justice and dont let it rot as a gist, make a real repo and add some docs and at least smoke tests or some kind. Thanks
[−] gaigalas 43d ago
This gist is a concatenation of several shell script modules which form a comprehensive parser library for the portable shell.

The main parser and emitter are BFN-generated (that's why they look so mechanical). The BNF parser generator is also written in portable shell (I posted another gist with a preview of it in another thread).

All modules have comprehensive tests, but it is still lacking documentation and not ready for prime time!

[−] akavel 42d ago
In the classic FLOSS tradition, it would be cool if you might still consider publishing such a "not-ready" repository - some people may (or may not!) be still interested, and also (sorry!) there's the bus factor... But on the other hand, in the classic FLOSS tradition, it's also 100% your decision and you have the full right to do any way you like!
[−] fuhsnn 43d ago
Don't understand why you were downvoted. An untested C compiler is simply worthless.
[−] uecker 43d ago
I am tempted to click the "report abuse" link ;-)
[−] _ache_ 43d ago
I'm tempted to execute it, but it may as well be shellcode I couldn't tell.
[−] jey 43d ago
It targets x86-64/ELF? I thought it would target sh to be portable?
[−] wengo314 42d ago
if one could boostrap tcc with it, then it might be a viable tool.
[−] cestith 42d ago
I love this as a novelty, and it could be useful for bootstrapping a system that’s had a shell cross-compiled to it.

Thinking about this in the context of a job I used to do, security on shared hosting environments, it gives me a bit of a shiver. There are reasons compilers aren’t available to normal users on those.

[−] dmitrygr 43d ago
Many parts of this are clearly autogenerated, but that in no way diminishes the sickening impressiveness of it!
[−] dmead 42d ago
This is vibe coded right?
[−] redoh 42d ago
[flagged]
[−] JackSmith_YC 43d ago
[dead]
[−] self_awareness 43d ago
"Claude please generate me a C compiler in bash"

I mean, today it's possible to generate it in Tcl, Elisp, Windows BAT, Powershell.

The effort is just 1 prompt.

The WHY question is much more important today -- "because I can" no longer makes sense, because we all can do much, much more with minimum effort today than before LLMs.