The Rise of the Em-Dash in Hacker News Comments (boazsobrado.com)

by sobradob 96 comments 44 points
Read article View on HN

96 comments

[−] Iuz 29d ago
I don't comment much but I have read everything that Friedrich Nietzsche wrote, and because of him, have always used em-dashes on my writing. I think I even saw some memes in circles that discuss his work when people started realizing GPT used them a lot...
[−] MikeTheGreat 29d ago
genuine question: How could you tell they were em-dashes?

Like, I could see some people noticing that the book they're reading has dashes that are a bit longer than normal, but what made you think "That must be it's own thing, separate from a normal dash" as opposed to something like "In this font the dashes are very long"?

[−] nagaiaida 29d ago
well, hyphenation will most likely insert a lot of regular dashes for easy comparison to rule out "this font is blessed with uncommonly long dashes" and the differing uses of both en and em dashes will cluster along grammatical lines (with em dashes separating clauses and en dashes relating concepts or bounds) which ought to eventually make it clear even to someone who initially bins those together separate from hyphens.
[−] pinusc 28d ago
If you know the difference between em dash, en dash, and hyphen, you start seeing it everywhere—whether thet are used correctly or not. Books tend to have correct typesetting, so if you see a dash used as an em dash ought to be used, and if it looks kinda long, you can assume it's an em dash. AFAIK often manuscripts are submitted either with hyphens or --- in place of em dashes and then the editor or typesetter fixes it.

Also, it's called em dash because it's as long as the letter m (as a rule of thumb), so it's usually an easy visual comparison. Finally, a typeface with hyphens as long as em dashes would be terrible and quite noticeably wrong!

[−] Iuz 29d ago
He uses it a lot, so it didn’t take long for me to notice that the dash was longer than usual. At that point, it felt less like a font quirk and more like a deliberate stylistic choice. I also recall that one of the translators mentioned his use of dashes.
[−] ranger_danger 29d ago
It's always funny to see people arguing that em-dash use is indicative of LLM usage, yet they don't realize where that training came from in the first place.
[−] palmotea 29d ago

> It's always funny to see people arguing that em-dash use is indicative of LLM usage, yet they don't realize where that training came from in the first place.

The em-dash is indicative of AI usage when it shows up in contexts where it doesn't belong. Like informal context like forum comments and emails (though "smart" substitutions do complicate the picture a bit).

I'd only be funny if they argued it indicated AI usage in context where it does belong, like formal writing.

[−] JKCalhoun 29d ago
Informal contexts are where I get to practice my writing in general. In terms of punctuation, I don't make a distinction. (I just say "bullshit" a lot more in the informal contexts. Durn, I did it again.)
[−] lamasery 29d ago
But the em-dash is a pretty informal mark... I'd tend to re-structure my sentences to avoid it more often in a formal context, than an informal one. It's what you reach for either for a specific effect, or because it's the least-disruptive way to keep writing without having to go back and edit mid-sentence, and end up with something that scans OK. It's super-informal.
[−] palmotea 29d ago

> But the em-dash is a pretty informal mark...

I think you have to make a distinction: there's using a dash as you describe and using the actual em-dash character. Without an smartquotes-type autocorrect-type feature (which admittedly is common in certain apps/platforms like Outlook and Word), an actual em-dash is awkward to type. I'd expect someone using it informally to just use a regular dash (-) or two (--).

I think you're automatically in a pretty formal writing context if you care if you use an em-dash character or not.

Which brings up an interesting idea: would Microsoft turn off it's smartquotes-type autocorrect, because now it makes you look like a dumb AI-user? Probably, if they cared about their users. But I doubt they will because they're so into hyping AI that "Microslop" is a thing.

[−] lamasery 29d ago
It's easy to type, and even easy to discover, on the default Mac keyboard layout. Until recently, the main thing employment of the actual M-dash in web posts indicated was that the user was more likely than not typing their posts on a Mac—not for-sure, but better than even odds, despite Macs having a much smaller share than half the market.
[−] ranger_danger 28d ago

> when it shows up in contexts where it doesn't belong

I have known people that personally used em-dashes in all the wrong places way before AI... entire emails would just be paragraphs-long run-on sentences filled with dashes.

[−] yencabulator 28d ago
That doesn't explain "The Rise of the Em Dash".
[−] bb88 29d ago
I like the em-dash as well as it provides visual space more than just a "-" or a ";" or a ", and".
[−] lamasery 29d ago
I got my heavy m-dash use from Salinger, many years ago. When I find some distinctive habit of an author I'm reading, and it's to my taste, I often rob them.
[−] illiac786 28d ago
I love em dashes. I always use it wrong, as a “pause for effect”. But it makes me smile.
[−] BeetleB 29d ago
Classic case of hacking the axis to exaggerate a point.

It went from 19.3 to 32.5. It did not even double. Which means that if you see a comment with an em-dash, it's more likely to be human than LLM.

[−] meisel 29d ago
Gotta love starting the y-axis above 0
[−] tmoertel 29d ago
While it is generally considered a No-No to start a bar chart from a baseline that is not zero, there is no corresponding prohibition, especially among numerically sophisticated audiences, for scatter plots or line charts. In general, we want graphs to focus on the area of variation.

For example, take a look at just about any stock chart (try https://www.google.com/finance/beta/quote/GOOG:NASDAQ?hl=en). There's actual money on the line, but no baseline. Why do you think that is?

[−] wtallis 29d ago
For stock prices, starting the y axis wherever is aesthetically pleasing makes some sense because everybody will have a different non-zero cost basis for their investment, and the graphs need to be able to clearly depict fluctuations that are minor on a percentage basis. For something like the em-dash prevalence on HN, the most meaningful question is whether it has doubled, tripled, or whatever relative to the pre-LLM corpus, and that's most clearly visually depicted by starting the y axis at precisely zero.
[−] throwway120385 29d ago
The real answer actually depends. In cases where you want to visually emphasize the ratio between any pair of values, you should start from zero. In cases where only the difference between any pair of values matters and the ratio is meaningless you can start at a different baseline. A surprising number of measures are interesting in their ratio though, so we generally prefer a zero-based chart.
[−] BeetleB 29d ago

> In general, we want graphs to focus on the area of variation.

Visually, this is vastly exaggerating the variation. Actual usage did not even double.

[−] tmoertel 29d ago

> Visually, this is vastly exaggerating the variation. Actual usage did not even double.

No, it is literally showing the exact variation of interest. If you think it's exaggerating the variation, you are not reading the chart. You are glancing at the chart, ignoring what it actually says in multiple ways, and imagining it has a baseline of zero, when it clearly does not.

Read the chart. What does it actually say?

[−] BeetleB 29d ago

> If you think it's exaggerating the variation, you are not reading the chart.

That's true of every instance where a chart is criticized for playing around with the axes scale. Imagine the stock price of a company varied between 50.1 and 50.2 over a week. And I presented it as a chart with the min being 50.09 and max being 50.21, and drew all the variation over a large vertical space. And then tried to imply that the stock was volatile. What would be the problem?

Let me ask you this. What is the point of this chart (or any similar chart)? Simply presenting a table with all the values would have conveyed all the information - wouldn't you agree?

[−] tmoertel 29d ago

> > If you think it's exaggerating the variation, you are not reading the chart.

> That's true of every instance where a chart is criticized for playing around with the axes scale.

Indeed. The criticism, however, is only apt when the chart's intended audience is likely to have a hard time understanding what that chart is trying to communicate. If you're publishing a bar chart in USA Today and its y-axis doesn't start at zero, yeah, that's a problem.

But the OP's chart that started this whole thread? It's fine. First, the intended audience is HN readers, who can be assumed to be numerically literate. Second, it's a line chart whose y-axis labels make clear what the range of variation is. Third, the data points, themselves, are labeled with their values. Finally, the thrust of the chart, that em-dash usage in HN posts has markedly increased since the widespread adoption of LLMs, is itself also explicitly called out and labeled: "+79% from pre-AI baseline."

If you try to tell me that the author of that chart is trying to mislead HN readers about the growth of em-dash use on HN, I'm going to have a hard time taking your claim seriously.

> Imagine the stock price of a company varied between 50.1 and 50.2 over a week. And I presented it as a chart with the min being 50.09 and max being 50.21, and drew all the variation over a large vertical space.

I have an easy time imagining your chart because that's how stock charts are plotted. That's what the financial community expects. That's how it's done: The y axis is bracketed by the low and high values over the period being charted, perhaps after rounding to the nearest nice value. For example, today's chart for the Russell 2000 Index shows a gain of just 0.30%, similar to the tiny relative volatility in your example. The chart's y axis ranges from 2,695 to 2,715 (https://share.google/oKPQxlmZFsgSVoNOS). It does not start at zero.

If it did start at zero, it would be unsuited for its intended purpose. How would you observe the day's variation on what appeared to be a flat horizontal line at the top of a chart whose y axis ranged from 0 to 3000?

Why do you think the financial world does stock charts the way it does stock charts? Do you think financial analysts don't know how to communicate the day’s movement of a stock to each other?

> And then tried to imply that the stock was volatile. What would be the problem?

The problem would be that your audience, if they were accustomed to reading stock charts, would think you didn't know what you're talking about. Your chart would refute your claims, and anybody accustomed to reading stock charts would know it.

> Let me ask you this. What is the point of this chart (or any similar chart)? Simply presenting a table with all the values would have conveyed all the information - wouldn't you agree?

The point of this chart, like any good chart, is to present the intended information to the intended audience faster and more conveniently than the alternatives. (Do you have any problem with that claim?) And, in this case, I'd say the OP's chart met that standard. Likewise, I'd argue that the typical stock chart, which is bracketed by the stock's low and high values, meets that standard as well.

In both of those examples, you could also communicate the same information in a table, but a table wouldn't be as fast or convenient as a chart, given the expected audiences.

[−] BeetleB 29d ago

> If you try to tell me that the author of that chart is trying to mislead HN readers about the growth of em-dash use on HN, I'm going to have a hard time taking your claim seriously.

I am saying precisely that. A significant number of HN users have a strong (and IMO irrational) anti-LLM bias. And these people pollute the discussion forums accusing people of using LLMs to write the content/comments.

It's not a stretch to believe that those folks will look at the chart uncritically. Everyone - even the smartest of folks - have blind spots (this was quite apparent when I worked with top professors in their fields while in academia). And blind spots often correlate with their biases.

[−] tmoertel 29d ago

> I am saying precisely that [the author of that chart is trying to mislead HN readers about the growth of em-dash use on HN].

Well, then, do you believe that the following evidence supports or undermines your hypothesis that the author is trying to mislead HN readers about em-dash use?

1. The author explicitly labeled each data point with its numeric value so that even if readers ignored the y-axis labels they could not misread the points.

2. The author explicitly labeled the pre- to post-AI growth as +79% so that even if readers ignored the y-axis labels and the data-point labels they could not misread the growth.

(The fact that you posed an example about a stock chart earlier but then completely ignored my response that refuted your argument about it suggests that you are not likely to be swayed by evidence and reason, but I'm giving it this one last try.)

[−] mcphage 29d ago

> take a look at just about any stock chart

Honestly, I hate that about stock charts. They adjust the axes and scales so that the graph itself provides no information. Did it go up 1 point? 200 points? 5%? 50%? You can’t tell, because the graph is just a scale free squiggle.

[−] cosmotic 29d ago
And even worse, no glyph to denote the deviation
[−] ortusdux 29d ago
Did AI raise awareness of Em-dashes, causing more people to use them organically?
[−] xxxxxxxx 29d ago
This is interesting. I just fixed a Github issue where the code did not handle Em-Dash correctly. Ran some queries to check the stats there. No surprises: https://deepspaceplace.com/emdash
[−] ChrisArchitect 29d ago
Related from last year:

Show HN: Hacker News em dash user leaderboard pre-ChatGPT

https://news.ycombinator.com/item?id=45071722

[−] flowerthoughts 29d ago
And here I am, just wishing that someone with the knowledge would make font ligatures that render -- and --- as en and em dashes, so I could use them more.
[−] number6 29d ago
shamless self plug: https://emdashmanifesto.org/
[−] kkfx 28d ago
While I use it in LaTeX I tend to avoid it in non-LaTeX contexts BUT even though it's much used by LLMs having switched to EurKey layout years ago I can type it on my keyboard as well as × and many others, so it's not such a perfect AI indicator.
[−] northisup 29d ago
me waiting for the "the rise of posts analyzing the rise of the em-dash on hacker news" posts
[−] juped 29d ago
You can pry my em dash—short for "Emily's dash", after the poet—from my cold dead hands.
[−] lz400 29d ago
I just learnt that em dash in a mac is option+shift+hyphen. I hadn't realized it was so difficult and inconvenient, and in the end it looks so similar to the other one: — -. Thin value. It's no surprise humans barely use them. Then why did it get picked up so much by AIs? I'd have imagined it's not in a lot of training data. Print media practices I guess?
[−] crazygringo 29d ago
How is it picking the comments?

If it's all comments, including flagged/dead/downvoted/etc., then it's not reflective of the actual filtering HN does.

But if it's weighting comments by their likelihood of being read -- e.g. mostly top comments on popular stories -- then I'd be a lot more curious.

I'm not surprised AI spam has increased substantially. But I'd be surprised if it's affected the comments most people actually read to anywhere close to the degree shown in this graph.

[−] lapcat 29d ago
Now someone do "the rise of Hacker News meta-analysis blog posts".
[−] negura 29d ago
stylometric analysis can be used to profile you. so if you were using em-dashes, this is good news. it helps you blend in better than before
[−] andrewclunn 29d ago
[dead]
[−] derbOac 29d ago
[dead]
[−] Rekindle8090 29d ago
I'll stand firm on my believe that no one types an em or en dash. its always an llm. its a pain in the ass to type on most keyboards, impossible on some, and pointless on phones
[−] adampunk 29d ago
WooooooOOOOOOOOooooooooOOOOOOOOOoooooo—

— A spooky ghost

WooooooOOOOOOOOooooooooOOOOOOOOOoooooo-

- A less spooky ghost