88 em dashes in 10,500 words, and yet written by a human

Note: This post is also shared on LinkedIn.

Are you AI, or simply a grammar pedant? (Subtitle from the FT article: “Use of em dashes is being taken as a tell-tale sign of machine-generated writing.”)

That headline from the Financial Times stopped me in my tracks (on.ft.com/42m9YlK). I have been noticing the overuse of em dashes lately, too. So—why might LLMs be so fond of them?

I have a theory.

Back when I was a PhD student, I spent a lot of time skimming through papers—and carefully reading the ones that mattered. As a non-native English speaker, I was not just absorbing content. I was learning how to write academically.

Armed with a fluorescent highlighter, I would underline both the novelty of the research and stylistic gems—turns of phrase or grammatical constructions I could reuse in my own writing. That is when I discovered the em dash.

In math, we just nest parentheses. But in English, that feels clunky. Enter the em dash: elegant, powerful, and—to my French-educated eyes—exotic. I became an em dash abuser.

Need proof?

My MBA thesis, published before “Attention is All You Need” (i.e., before LLMs existed), contains 88 em dashes… in just 10,500 words. That is about one every 120 words. You can verify it yourself: doi.org/10.17863/CAM.9207

So here is my hypothesis: perhaps LLMs were trained on a lot of academic texts—just like me. And they, too, picked up the habit.

A side note: if this is true, maybe think twice before relying on an LLM to validate your experimental design or A/B tests. After all, if the reproducibility crisis made it into the training data, your helpful assistant might just be p-hacking with the best of them—dressing up shortcuts and statistical noise in the language of solid science.