No Semicolons Needed (terts.dev)

by karakoram 89 comments 64 points
Read article View on HN

89 comments

[−] Animats 56d ago
Classic mistakes in language design that have to be fixed later.

- "We don't need any attributes", like "const" or "mut". This eventually gets retrofitted, as it was to C, but by then there is too much code without attributes in use. Defaulting to the less restrictive option gives trouble for decades.

- "We don't need a Boolean type". Just use integers. This tends to give trouble if the language has either implicit conversion or type inference. Also, people write "|" instead of "||", and it almost works. C and Python both retrofitted "bool". When the retrofit comes, you find that programs have "True", "true", and "TRUE", all user-defined.

Then there's the whole area around Null, Nil, nil, and Option. Does NULL == NULL? It doesn't in SQL.

[−] tadfisher 55d ago
That's what's nice about coarse-grained feature options like Rust's editions or Haskell's "languages", you can opt in to better default behavior and retain compatibility with libraries coded to older standards.

The "null vs null" problem is commonly described as a problem with the concept of "null" or optional values; I think of it as a problem with how the language represents "references", whether via pointers or some opaque higher-level concept. Hoare's billion-dollar mistake was disallowing references which are guaranteed to be non-null; i.e. ones that refer to a value which exists.

[−] dfawcus 55d ago
Attribute (qualifier), or storage class?

https://www.airs.com/blog/archives/428

The use of 'const' in C is very much a mixed blessing; I certainly have experience of the 'const poisoning' issue. Possibly it would have been better as a storage class.

For bool, yes it was a useful addition. Especially for the cases where old code would have something like:

    #define FLAG_A 1u
    #define FLAG_B 2u
    int has_flag_B (something *some) { return some->field & FLAG_B; }
and that was then combined with logic expecting 'true' to be 1; which could sneak in over time.
[−] Ferret7446 54d ago

> Does NULL == NULL? It doesn't in SQL.

That's because SQL null is semantically different. Most nulls are a concrete value (null pointer). SQL null is the relational null ("not known"). It's closer to NaN (but still different)

[−] bmandale 55d ago

> I would love to see a language try to implement a rule where only an indented line is considered part of the previous expression.

After python, it seems like every language decided that making parsing depend on indents was a bad idea. A shame, because humans pretty much only go by indents. An example I've frequently run into is where I forget a closing curly brace. The error is reported at the end of the file, and gives me no advice on where to go looking for the typo. The location should be obvious, as it's at exactly the point where the indentation stops matching the braces. But the parser doesn't look at indents at all, so it can't tell me that.

[−] stinkbeetle 55d ago

> An example I've frequently run into is where I forget a closing curly brace. The error is reported at the end of the file, and gives me no advice on where to go looking for the typo. The location should be obvious, as it's at exactly the point where the indentation stops matching the braces. But the parser doesn't look at indents at all, so it can't tell me that.

That's somewhat a quality of service issue though. Compilers should look at where the braces go out of kilter vs indentation and suggest the possible unmatched opening brace.

[−] Joker_vD 55d ago
Perhaps having unary minus (and especially unary plus) is just a bad idea in general; just mandate "0 - expr". To make negative constants work properly, you still have to either special case literals "256", "65536", etc. and ideally check whether they got negated or not, or introduce a special syntax just for them, like "~1" of ML for negative one, or "-1" (which you are not allowed to break with whitespace) of some other language I've forgotten the name of.

While we're at it, probably the unary bitwise complement could go as well? Obviously, "^(0-1)" would suck to write but since 99% of the time bitwise "not" used in expressions/statements like "expr &~ mask..." or "var &= ~mask", I feel like simply having binary "and-not" operator that looks like "&~" or "&^" (Golang) is just better.

Also, a small prize (a "thank you!" from a stranger on the Internet i.e. me) to someone who can propose a good syntax for compound assignment with reversed subtraction:

    x ^= 0-1    # in-place bitwise complement
    x ^= true   # in-place logical negation
    x ?= 0      # in-place integer negation???
[−] xigoi 55d ago

> Also, a small prize (a "thank you!" from a stranger on the Internet i.e. me) to someone who can propose a good syntax for compound assignment with reversed subtraction:

I would do away with operator-assign operators and instead introduce a general syntax for updating a variable that can be used with any expression.

  x = _ + 1  # increment x
  x = 0 - _  # negate x
  output = sanitizeHtml(_)
  index = (_ + 1) % len(arr)
[−] Joker_vD 54d ago
Thank you! Having a syntactical construct to reference the value of the lhs of the assignment's operator in its rhs without recalculating it is a pretty nifty trick: it works because there is only one lvalue on the lhs of an assignment, so there is no ambiguity to what it refers to. And it could even be extended to work with multivalued assignments, although less neatly. Amazing! Thank you again!
[−] teo_zero 55d ago
+1 on the convenience of a "mask" operator that perform the and-not. It's an operation that's used more often than others which do have the privilege of having their own symbol, like xor.
[−] librasteve 56d ago
This article makes a strong case for every language to use ‘;’ as a statement separator.
[−] rao-v 55d ago
Exactly. I genuinely do not understand how any significant user of python can handle white space delimitation. You cannot copy or paste anything without busywork, your IDE or formatter dare not help you till you resolve the ambiguity.

One day https://github.com/mathialo/bython one day!

[−] xigoi 55d ago

> You cannot copy or paste anything without busywork

Sounds like a tool issue. My editor (Neovim with a few plugins) can handle copying/pasting with indentation just fine.

[−] eviks 55d ago

> your IDE or formatter dare not help you

Get the ones that do help you! Problem solved, enjoy your clean reading experience!

[−] rao-v 53d ago
The problem is that if you copy random code from the internet it cannot figure out the right indentation level - whitespace has meaning in python. What IDE can automagically handle this?
[−] eviks 53d ago
Do you mean "figure out that I'm pasting at level 3, so all pasted code should have +3 levels of indents" like plugins like this one do?

https://marketplace.visualstudio.com/items?itemName=hyesun.p...

(Sublime has the same, so does vim see comment above, so do many real IDEs)

Or do you mean something different?

[−] rao-v 48d ago
This is nice, but it's not always the case that +3 indent is the right solution (e.g. if I'm copying already indented code it may be over indented).

It's basically a non problem in most other languages, and a IDE formatter hook will always clean up the code and organize it correctly in a way that you cannot get in Python.

[−] eviks 47d ago
Have you not used any of such IDEs/plugins? It's not X+3 indent, it's "starting at +3", so if you have lines with +10 indent (overindented) copied and paste them at +3 indent, they all get indents cut by 7 levels and end up at the same +3 level as expected.
[−] silon42 55d ago
looks cool..

Alternatively, I've several times used 'pass' as block terminator for my personal code.

[−] jasperry 55d ago
Indeed it does, by showing how many different and confusing types of parsing rules are used in languages that don't have statement terminators. Needing a parser clever enough to interpret essentially a 2-d code format seems like unnecessary complexity to me, because at its core a programming language is supposed to be a formal, unambiguous notation. Not that I'm against readability; I think having an unambiguous terminating mark makes it easier for humans to read as well. If you want to make a compiler smart enough to help by reading the indentation, that's fine, but don't require it as part of the notation.

Non-statement-based (functional) languages can be excepted, but I still think those are harder to read than statement-based languages.

[−] hajile 55d ago
Lisps aren’t necessarily functional, but don’t need semicolons either.
[−] II2II 55d ago
The syntax of languages like Lisp and Forth are so fundamentally different that they don't need an explicit statement separator. You don't have to think about many other things either, or I should say you don't have to think about them in the same way. Consider how much simpler the order of operations is in those languages.
[−] wvenable 55d ago
Lisp has explicit "statement" terminators (just aren't semicolons)
[−] hajile 55d ago
All the lisps I know of have only expressions (no statements).
[−] marcosdumay 56d ago
Looks at 11 languages, ignores Haskell or anything really different...
[−] jfengel 55d ago
Once I learned Haskell, everything else looks pretty much identical. Java, C, C++, Smalltalk... At least Lisp looks a little bit different.
[−] librasteve 56d ago
or Raku
[−] jasperry 55d ago
Those are functional languages that generally don't use statements, so it makes sense to leave them out of a discussion about statement separators. If you think more people should use functional languages and so avoid the semicolon problem altogether, you could argue that.
[−] Blikkentrekker 55d ago
Functional hardly matters Haskell has plenty of indentation which is by the way interchangeable with { ... }, one can use both at one's own pleasure and it's needed for many things.

Also, famously do { x ; y ; z } is just syntactic sugar for x >> y >> z in Haskell where >> is a normal pure operator.

[−] marcosdumay 55d ago
Yet, the author ends with a half-backed clone of the Haskell syntax.
[−] sheept 56d ago
Because formatters are increasingly popular, I think it'd be interesting to see a language that refuses to compile if the code is improperly formatted, and ships with a more tolerant formatter whose behavior can change from version to version. This way, the language can worry less about backwards compatibility or syntax edge cases, at the cost of taking away flexibility from its users.
[−] sheept 56d ago

> I would love to see a language try to implement a rule where only an indented line is considered part of the previous expression.

Elm does this (so maybe Haskell too). For example

    x = "hello "
     ++ "world"

    y = "hello "
    ++ "world" -- problem
[−] em-bee 55d ago
how to handle expressions that need more than two lines?
[−] kayson 55d ago
Are we really saving that much by not having semicolons? IDEs could probably autocomplete this with high success, and it removes ambiguity from weird edge cases. On the other hand, I've not once had to think about where go is putting semicolons...
[−] Mawr 55d ago

> The first thing that I dislike about this is that it encourages thinking of semicolons being inserted instead of statements being terminated

It might, but that's irrelevant since you never think about semicolons in Go at all.

> I like these formatting choices, but I'd prefer if the "wrong style" was still syntactically valid and a formatter would be able to fix it.

Your preference likely comes from some idealistic idea of "cleaniness" or similar, which isn't very convincing. Forcing everyone to use the same style is a huge win, to the point that it's a mistake to do anything else, as seen in the description of what Odin does. Completely wrong priorities there and refusal to learn from the past.

"Code formatting" isn't some inherent property of code that we must preserve at all costs, just a consequence of some unfortunate syntactical choices. There's no inherent reason why a language needs to allow you freedom to choose how to "format" your code. And there are in fact a lot of reasons why it shouldn't.

[−] IshKebab 55d ago

> how does Gleam determine that the expression continues on the second line?

The fact that it isn't obvious means the syntax is bad. Stuff this basic shouldn't be ambiguous.

> Go's lexer inserts a semicolon after the following tokens if they appear just before a newline ... [non-trivial list] ... Simple enough!

Again I beg to differ. Fundamentally it's just really difficult to make a rule that is actually simple, and lets you write code that you'd expect to work.

I think the author's indentation idea is fairly reasonable, though I think indentation sensitivity is pretty error-prone.

[−] i_don_t_know 55d ago
The Haskell language report precisely defines how layout determines structure: https://www.haskell.org/onlinereport/haskell2010/haskellch10...

Any language that uses layout to determine structure should have a similarly precise definition.

[−] teo_zero 55d ago
It's very subjective, and I don't want to say that semicolons are good or bad, indentation is good or bad, etc.

But I think a language should decide whether white space is significant or not. If it's not, don't add exceptions!

Operators changing meaning when surrounded by spaces in otherwise context-free languages are an abomination!

[−] xigoi 55d ago

> Operators changing meaning when surrounded by spaces in otherwise context-free languages are an abomination!

How does that make the language not context-free?

[−] lightingthedark 55d ago
It's interesting seeing all of the different ways language designers have approached this problem. I have to say that my takeaway is that this seems like a pretty strong argument for explicit end of statements. There is enough complexity inherent in the code, adding more in order to avoid typing a semicolon doesn't seem like a worthwhile tradeoff.

I'm definitely biased by my preferences though, which are that I can always autoformat the code. This leads to a preference for explicit symbols elsewhere, for example I prefer curly brace languages to indentation based languages, for the same reason of being able to fully delegate formatting to the computer. I want to focus on the meaning of the code, not on line wrapping or indentation (but poorly formatted code does hinder understanding the meaning). Because code is still read more than it is written it just doesn't seem correct to introduce ambiguity like this.

Would love to hear from someone who does think this is worthwhile, why do you hate semicolons?

[−] duped 55d ago
Start from the perspective of the user seeing effectively:

> error: expected the character ';' at this exact location

The user wonders, "if the parser is smart enough to tell me this, why do I need to add it at all?"

The answer to that question "it's annoying to write the code to handle this correctly" is thoroughly lazy and boring. "My parser generator requires the grammar to be LR(1)" is even lazier. Human language doesn't fit into restrictive definitions of syntax, why should language for machines?

> Because code is still read more than it is written it just doesn't seem correct to introduce ambiguity like this.

That's why meaningful whitespace is better than semicolons. It forces you to write the ambiguous cases as readable code.

[−] estebank 55d ago
I used to hate semicolons. Then I started working in parser recovery for rustc. I now love semicolons.

Removing redundancy from syntax should be a non-goal, an anti-goal even. The more redundancy there is, the higher the likelihood of making a mistake while writing, but the higher the ability for humans and machines to understand the developer's intent unambiguously.

Having "flagposts" in the code lets people skim code ("I'm only looking at every pub fn") and the parser have a fighting chance of recovering ("found a parse error inside of a function def, consume everything until the first unmatched } which would correspond to the fn body start and mark the whole body as having failed parsing, let the rest of the compiler run"). Semicolons allow for that kind of recovery. And the same logic that you would use for automatic semicolon insertion can be used to tell the user where they forgot a semicolon. That way you get the ergonomics of writting code in a slightly less principled way while still being able to read principled code after you're done.

[−] duped 55d ago
Why is ";" different from \n from the perspective of the parser when handling recovery within scopes? Similarly, what's different with "consume everything until the first unmatched }" except substituting a DEDENT token generated by the lexer?
[−] gcanyon 56d ago
Can anyone give a good reason why supporting syntax like:

    y = 2 * x
      - 3
is worth it?
[−] whateveracct 56d ago
in Haskell, supporting that is how you get neat composition

such as applicative style formatted like this:

        f
    <$> x
    <*> y
    <*> z
[−] zephen 55d ago
Obviously, that's a really short expression.

So, the question is, if you have a long expression, should you have to worry too much about either adding parentheses, or making sure that your line break occurs inside a pair of parentheses.

It boils down to preference, but a language feature that supports whatever preference you have might be nice.

  priority = "URGENT"  if hours < 2  else
             "HIGH"    if hours < 24 else
             "MEDIUM"  if hours < 72 else
             "LOW"
[−] stack_framer 55d ago
I never actually type semicolons in my JavaScript / TypeScript. In work projects, my IDE adds them for me thanks to the linter. In personal projects, I just leave them out (I don't use a linter, so my IDE does not add them), and I've never had a problem. Not even once.

Semicolon FUD is for the birds.

[−] gethly 55d ago
[flagged]