Great work :) If you're interested in Korean programming languages, there's a functional one called 'Nuri': https://github.com/suhdonghwi/nuri/
Rather than just translating keywords, it lets you write code that actually uses Korean grammar. For example, "10을 5로 나누고 출력하다" (literally "10 by 5 divide and print") outputs "2".
That is a fair feedback and I have known those languages which are very reasonable and fairly designed language. But I wanted to more focused on translated into rust for english speakers first, which would make bigger user for this language. Thanks for your feedback!
Yes, countries in the Sinosphere have historically used Chinese characters to write their languages. That's why Korean "yaksok" and Japanese "yakusoku" sound so similar. Both words are written with the same Chinese characters, "約束". The characters were borrowed from Chinese, but each language adapted them to its own pronunciation system.
For example, "library" is pronounced "tu-shu-guan" in Chinese, "do-seo-gwan" in Korean, and "to-sho-kan" in Japanese. All three can be written with the same characters, "圖書館". In modern Korea, though, people use Hangul, so very few Koreans actually know how to write "library" in Chinese characters. In Japan, Chinese characters are still heavily used, but for difficult ones, they often write kana alongside them as a reading aid.
It's very much like how Latin "universitas" became "university" in English, "universidad" in Spanish, and "università" in Italian.
There's a significant amount of Japanese loanwords in modern Korean due to Japanese annexation(1910-1945/1965), as well as in modern Chinese to much lesser extent.
These aren't an indication of a shared vocabulary or ancestry, just loanwords for concepts that were novel and scientific by victorian standards.
Wonderful! What a cool idea. For anyone interested, you can learn the whole of Hangul in an afternoon; it's cleverly designed to be very logical and has some handy mnemonics: https://korean.stackexchange.com/a/213
These are really cool! Will also add a version of these mnemonics to the Korean guide I have been writing: https://tolearnkorean.com/
Learning the Korean alphabet (Hangul) can be done quite quickly, it's only about as many "letters" as the English alphabet!
Remembering the words is a bit more difficult though, especially if you don't know a similar language. Have been using Anki and my own app for that: https://game.tolearnkorean.com/
That is a deep knowledge that even Korean-natives would not know. I will add this site as a reference to Github. I am glad that I have you as a supporter!
Really? That's how it was taught to me by Korean teachers at University. Even if it isn't daily-useful bit of info, it's such a fundamental component of the written form that I would have expected it to be common knowledge.
It's part of the official origin story that was published alongside the introduction of the script, so students will learn about it at some point long after they're already fluent readers and writers, and then promptly forget about this bit of trivia. (Do you remember that A is an upside-down ox head?) It probably doesn't help that the original explanation covers Middle Korean for an audience literate in Chinese: https://ko.wikisource.org/wiki/%ED%9B%88%EB%AF%BC%EC%A0%95%E...
Meanwhile, "Korean writing is so easy and logical you can learn it in no time at all" has become a meme to the point where I suspect the number of people who've been exposed to the meme and don't remember a single character might be larger than the number of Koreans who've heard about the tongue shape thing and still remember it.
When I was studying Computer Science in college, I once remarked how lucky we, English speakers, are that programming languages use English nouns and verbs. A ton of my classmates were here on a student visa, and English was not their first language. I always thought that programming in English put me at an advantage on the learning curve. I also always thought it was silly when someone would quip that programming should count for “foreign language” credit. Anyway, always cool to see non-English programming languages.
At a risk of going against the hivemind, I disagree.
I self-taught programming quite early in my life, way before I had a good command of the English language. I've read books in my native language, talked on programming forums in my native language. In the end the "english" in programming languages is just a handful of keywords, and it didn't hinder me one bit that I had no idea "int" stands for "integer".
Of course, I started by writing code like "bool es_primo(int numero)" (in my language), but there's nothing in C that says identifiers must be english, just convention. Standard library and packages nowadays would be a problem, but back then standard library were thin and "strcpy" name is obscure anyway. The real hard part was always learning how to program and design properly.
And for more advanced topics, documentation and learning materials in english only are HUGE problem for ESL, because one has to actually read and understand them. But this is not something programming language can help with.
That's coming from a Spanish speaker used to the alphabet, QUERTY, etc. I imagine you'd find it much more difficult if C were written in Chinese or Arabic, for instance.
I have a similar experience, I learned English much later than my first programming languages, and picking up some keywords and basic APIs was never an issue (it was BASIC and C/C++ at the time). Maybe I would occasionally look up in a dictionary what is 'needle' and 'haystack' in a code snippet, and I was puzzled by the ubiquitous "foo, bar, baz", which to my relief turned out to be equally cryptic for the native speakers. I still don't feel about code as a kind of English prose, it occupies a separate part of my brain, compared to the natural languages.
For people that use similar keyboards I don’t imagine it’s that different though like you said occasionally knowing that bool means Boolean or int means integer may make it slightly easier for English speakers I think a big disadvantage would likely be for people from say China that use incredibly different keyboards if I had to add a wildly different second language and switch to it every time I wanted to create a var or import something or write an if statement I’m not sure if I would’ve continued learning to code it may have been one step to many
True. English is a major reason why India is the IT back-office for most of the western world. I too have personally observed how my fellow classmates, who had done their schooling in their regional language, struggled with the coursework in college because it was solely in English. And some of them were state rankers - it felt bad to realise that they had to put in twice the effort needed to keep up their grades. I think there's a lot of potential wasted in India because of this kind of hardship / struggle - a lot of intelligent people are held back just because they lack an aptitude for multilingualism.
Naah, my non-english-speaking friends say that the keywords are less than 1% complexity of a programmer's job, so it really doesn't matter.
Also, in most languages you already can name variables/classes/members in any Unicode letters. So only "if/for/while" keywords and stdlib classes remain English. It makes little sense to translate those.
However, in the vast majority of cases, non-ASCII characters are rarely used for variable or function names during programming. This is because they can cause conflicts when using different encoding systems, and some automation tools fail to recognize them. Consequently, programmers in non-English speaking regions must invest more effort into naming variables than English speakers, as they have to translate all localized expressions into English.
When Toss, a Korean unicorn startup, announced that they would start using Korean for variable names within financial contexts, it sparked significant debate and a wide range of reactions among Korean programmers.
Nah. If anything, treating keywords as special sigils actually helps.
Also, not all natural languages are suitable for programming languages. In highly inflected languages you often end up with grammatically incorrect forms. Or with stilted language.
Thank you for your empathy. English has been the one of the most frequent languages for globe so that it is reasonable to Eng in many coding project, though.
It's may also be reasonable to make localized translations for a programming language. This is rarely done in reality for obvious reasons. An exception are Excel's function names. People who don't know English, or hardly know it, appreciate it.
That’s the least of their problems. The best computer science textbooks are published first and foremost in English and only translated belatedly. The research papers are in English and not often translated. Even the manuals of both commercial and FOSS programming tools tend not be translated. A few keywords is what, half an hour of rote memorization.
Hello, interesting project. I’m a native Korean speaker, so I wanted to share a quick perspective from Korean.
Nouns translate fairly naturally, but standalone verb commands in English need more care. In English, a verb like "find" can stand alone, but in Korean a verb usually needs an ending, and different endings can sound quite different or awkward depending on context. For example, "find" could become 찾다, 찾기, or 찾음, but those are not interchangeable.
Plural forms are also tricky. English distinguishes strongly between singular and plural, but Korean usually does not. Explicit plurals like “단어들” often sound unnatural unless the individuality of each item is important.And it feel same with "단어목록"
Overall, this is a very interesting project with real potential. I think it could become even stronger if it considers the structural differences between English and Korean, rather than treating it as simple keyword substitution.
A simple translation of keywords seems straightforward, I wonder why it's not standard.
# def two_sum(arr: list[int], target: int) -> list[int]:
펀크 투섬(아래이: 목록[정수], 타개트: 정수) -> 목록[정수]:
# n = len(arr)
ㄴ = 길이(아래이)
# start, end = 0, n - 1
시작, 끝 = 0, ㄴ - 1
# while start < end:
동안 시작 < 끝:
Code would be more compact, allowing things like more descriptive keywords e.g.
AbstractVerifiedIdentityAccountFactory vs 실명인증계정생성, but we'd lose out on the nice upper/lowercase distinction.
I hear that information processing speed is nearly the same across all languages though regardless of density, so in terms of processing speed, may not make much difference.
It never really took off. I think because computers already require users to read and type Latin letters in lots of other situations, and it's not that hard to learn what a few keywords mean, so you might as well stick with the English keywords everyone else is using.
Good point about compactness — 실명인증계정생성 vs AbstractVerifiedIdentityAccountFactory is a real example where Korean shines.
One distinction though: Han uses actual Korean words, not transliterations. 함수 means "function" in Korean, 만약 means "if" — they're real words Korean speakers already know.
Your example uses transliterations like 펀크 and 아래이 which would look odd to a Korean reader. That difference matters for readability.
Scratch supports Korean, but Scratch benefits from using JSON instead of bytes or code points to serialize programs, which allows the user to change their display language (similar to how hard tabs let users set indentation size).
There's probably a lot of reasons why non English programmers stick with English keywords, beyond just language/tooling support. Learning new keywords is already part of learning a programming language, and much of the documentation and resources available for languages and libraries are only in English. ASCII-only strings are still ubiquitous in software, like URLs and usernames. And in international teams, English is the go-to lingua franca.
Could this change with LLMs? Maybe, but most code in its training data is in English, so LLMs likely work most effectively in English.
I can't speak to Korean, but thinking about Japanese, one probable reason it wouldn't catch on is how tedious and inefficient it would be to constantly switch between input modes. Japanese input mode is designed for prose, and isn't well-suited to efficiently entering the symbols commonly used in programming. Even spaces. It results in needing a lot of extra keystrokes.
It’s fun to look at your code samples, have absolutely no clue what any of it means, and think about just how many non-English-speaking programmers must have felt that way looking at our all-English programming languages.
Except lisp: that’s just inscrutable symbols like cond and cons and car and cadr and a bunch of parens! :-)
This is indeed a cool project! Happy to see experiments on non-English programming languages. I have one question — not trying to be offensive or doubting, just out of curiosity — does Han make use of the unique properties of Hangul (or Korean in general)? Like, I remember sawing a Turkish programming language on HN the other day, and I might be wrong but my impression was it makes use of some syntax unique to Turkish, and I wonder if Han has similar features. Or, asking it differently, if I replaces only the lexer to another lexer recognizing a different script, will it not work?
I think is a dumb idea. Historically has been easy to learn programming languages because everyone uses the same things, no fragmention. If now everyone starts creating their own programming languages with different language everything is fragmented, you would to a company, and in the interview they will require you to know korean to work there, they will not be able to hire contractors, you will not be able to see an article about a nice feature on another programming language, and port it...
I suppose is again why I discuss with everyone as I would like to have a single language in the world, it would reduce wars, miscommunications, bound everyone closer. But ofc, the other point of view is that it reduces culture But I think it would happen as UK/US or Spanish, same language with variations, but everyone can understand each other.
I can't imagine what would have happened if Python or JS had been fragmented into X different languages because of egos, and instead of collaborating, decide each to create their own languages. I don't think we would be where we are today, probably AIs would not be around, since we would be fighting to understand so many different programming languages.
I don't know Korean but I really appreciate the interesting discussion around linguistics you started here. Some favorite comments that taught me something:
Your comment on how LLM tokenizers shorten common inputs in training data; Korean is more visually compact but suffers from poor token compression: https://news.ycombinator.com/item?id=47381843
@apt-apt-apt-apt pointed out in a separate comment that:
>A simple translation of keywords seems straightforward, I wonder why it's not standard.
I replied that for Japanese at least, probably due to symbol input being too tedious. However I think it's worth mentioning a potential mitigation, and maybe even an advantage.
As a mitigation, full-width symbols could be used instead, which are typically the default in Japanese input. Japanese itself is also fixed-width so if done across the board the language itself becomes fixed-width, not just by virtue of a font selection.
As an advantage, some logical symbols, greek letters, other rare characters are easy to input in Japanese mode, and this could lend itself to a more symbol-heavy language design. I already take advantage of this myself with Δ, φ, and τ use selectively in a few projects. Symbols with easy entry may differ by OS, but here are a few other examples that could be useful:
Have you considered rotating the layout? I always though a CJK programming language written vertically would be very ergonomic. Instead of scrolling vertically the program would flow right-to-left or left-to-right. I guess you'd probably want to rotate the bracket/paran glyphs which is a bit less trivial to do
As a native Korean speaker who learned programming in English the code looks very easy to read.
The downside is understanding logic feels harder because
- the order of Korean is diff from English
- e.g. "동안" (while) hard to reason about as "동안" comes after the condition in Korean linguistics and comes before in English.
So my only suggestion is to go diff direction from normal writing flow as shown below for readability
Great work! I still remember my high school IT class, where teachers were using multiple languages to explain the functionality of different 'modules' in programming language (it was still python 2.x back then), and by that time I was trying to do a 'multilingual' programming language (tbh it was just changing variables into different languages), and now 10 years have passed, and the dream is finally catching up with me.
I know this is mostly about keyword substitution but it still tickles me that you still write f(x) in this language and not (x)f given that Korean is SOV but I guess that's just how you notate that no matter what cultural context you're in. Hadn't ever considered that the convention of writing a function before its arguments might have been a contingency of this notation being developed by speakers of SVO languages.
This makes a lot of sense from a teaching perspective. If you're introducing programming to Korean-speaking kids, having keywords in their own language removes one abstraction layer. They can focus on the logic without also needing to memorize what 'while' or 'return' means in English first.
116 comments
Rather than just translating keywords, it lets you write code that actually uses Korean grammar. For example, "10을 5로 나누고 출력하다" (literally "10 by 5 divide and print") outputs "2".
You might already know this, but there's also a Korean programming language called 'Yaksok'. Here's a 2048 written entirely in Korean: https://github.com/yaksok/yaksok/blob/master/code_examples/2...
For example, "library" is pronounced "tu-shu-guan" in Chinese, "do-seo-gwan" in Korean, and "to-sho-kan" in Japanese. All three can be written with the same characters, "圖書館". In modern Korea, though, people use Hangul, so very few Koreans actually know how to write "library" in Chinese characters. In Japan, Chinese characters are still heavily used, but for difficult ones, they often write kana alongside them as a reading aid.
It's very much like how Latin "universitas" became "university" in English, "universidad" in Spanish, and "università" in Italian.
These aren't an indication of a shared vocabulary or ancestry, just loanwords for concepts that were novel and scientific by victorian standards.
a big one: hanja (kr) kanji (jp) both are 漢字
Learning the Korean alphabet (Hangul) can be done quite quickly, it's only about as many "letters" as the English alphabet!
Remembering the words is a bit more difficult though, especially if you don't know a similar language. Have been using Anki and my own app for that: https://game.tolearnkorean.com/
Meanwhile, "Korean writing is so easy and logical you can learn it in no time at all" has become a meme to the point where I suspect the number of people who've been exposed to the meme and don't remember a single character might be larger than the number of Koreans who've heard about the tongue shape thing and still remember it.
Also, ㄹ is obviously anatomically impossible for human tongues. It does however closely resemble similar letters in some Brahmic scripts. I'm partial to ʼPhags-pa ꡙ https://en.wikipedia.org/wiki/Origin_of_Hangul#%CA%BCPhags-p...
I self-taught programming quite early in my life, way before I had a good command of the English language. I've read books in my native language, talked on programming forums in my native language. In the end the "english" in programming languages is just a handful of keywords, and it didn't hinder me one bit that I had no idea "int" stands for "integer".
Of course, I started by writing code like "bool es_primo(int numero)" (in my language), but there's nothing in C that says identifiers must be english, just convention. Standard library and packages nowadays would be a problem, but back then standard library were thin and "strcpy" name is obscure anyway. The real hard part was always learning how to program and design properly.
And for more advanced topics, documentation and learning materials in english only are HUGE problem for ESL, because one has to actually read and understand them. But this is not something programming language can help with.
Also, in most languages you already can name variables/classes/members in any Unicode letters. So only "if/for/while" keywords and stdlib classes remain English. It makes little sense to translate those.
When Toss, a Korean unicorn startup, announced that they would start using Korean for variable names within financial contexts, it sparked significant debate and a wide range of reactions among Korean programmers.
Also, not all natural languages are suitable for programming languages. In highly inflected languages you often end up with grammatically incorrect forms. Or with stilted language.
Nouns translate fairly naturally, but standalone verb commands in English need more care. In English, a verb like "find" can stand alone, but in Korean a verb usually needs an ending, and different endings can sound quite different or awkward depending on context. For example, "find" could become 찾다, 찾기, or 찾음, but those are not interchangeable.
Plural forms are also tricky. English distinguishes strongly between singular and plural, but Korean usually does not. Explicit plurals like “단어들” often sound unnatural unless the individuality of each item is important.And it feel same with "단어목록"
Overall, this is a very interesting project with real potential. I think it could become even stronger if it considers the structural differences between English and Korean, rather than treating it as simple keyword substitution.
I hear that information processing speed is nearly the same across all languages though regardless of density, so in terms of processing speed, may not make much difference.
It never really took off. I think because computers already require users to read and type Latin letters in lots of other situations, and it's not that hard to learn what a few keywords mean, so you might as well stick with the English keywords everyone else is using.
One distinction though: Han uses actual Korean words, not transliterations. 함수 means "function" in Korean, 만약 means "if" — they're real words Korean speakers already know.
Your example uses transliterations like 펀크 and 아래이 which would look odd to a Korean reader. That difference matters for readability.
There's probably a lot of reasons why non English programmers stick with English keywords, beyond just language/tooling support. Learning new keywords is already part of learning a programming language, and much of the documentation and resources available for languages and libraries are only in English. ASCII-only strings are still ubiquitous in software, like URLs and usernames. And in international teams, English is the go-to lingua franca.
Could this change with LLMs? Maybe, but most code in its training data is in English, so LLMs likely work most effectively in English.
It’s fun to look at your code samples, have absolutely no clue what any of it means, and think about just how many non-English-speaking programmers must have felt that way looking at our all-English programming languages.
Except lisp: that’s just inscrutable symbols like cond and cons and car and cadr and a bunch of parens! :-)
I suppose is again why I discuss with everyone as I would like to have a single language in the world, it would reduce wars, miscommunications, bound everyone closer. But ofc, the other point of view is that it reduces culture But I think it would happen as UK/US or Spanish, same language with variations, but everyone can understand each other.
I can't imagine what would have happened if Python or JS had been fragmented into X different languages because of egos, and instead of collaborating, decide each to create their own languages. I don't think we would be where we are today, probably AIs would not be around, since we would be fighting to understand so many different programming languages.
Hangul's phonetic symbolic design: https://news.ycombinator.com/item?id=47382219
Korean plural forms: https://news.ycombinator.com/item?id=47386312
Your comment on how LLM tokenizers shorten common inputs in training data; Korean is more visually compact but suffers from poor token compression: https://news.ycombinator.com/item?id=47381843
Hangul keyboard layout - so cool that the layout is split between consonant and vowel hands and forms rhythmic harmony while typing: https://news.ycombinator.com/item?id=47382081
I replied that for Japanese at least, probably due to symbol input being too tedious. However I think it's worth mentioning a potential mitigation, and maybe even an advantage.
As a mitigation, full-width symbols could be used instead, which are typically the default in Japanese input. Japanese itself is also fixed-width so if done across the board the language itself becomes fixed-width, not just by virtue of a font selection.
As an advantage, some logical symbols, greek letters, other rare characters are easy to input in Japanese mode, and this could lend itself to a more symbol-heavy language design. I already take advantage of this myself with Δ, φ, and τ use selectively in a few projects. Symbols with easy entry may differ by OS, but here are a few other examples that could be useful:
≠, ≡, ∞, ∴, λ, θ, α, β, ・, °, ※
https://github.com/farant/rhubarb/blob/main/include/latina.h
edit: oh, maybe you can’t do full unicode. that’s too bad!
The downside is understanding logic feels harder because - the order of Korean is diff from English - e.g. "동안" (while) hard to reason about as "동안" comes after the condition in Korean linguistics and comes before in English.
So my only suggestion is to go diff direction from normal writing flow as shown below for readability
- from: 동안 n < 5 - to : n < 5 동안
This seems like a reasonably good security measure too