Profiling Hacker News users based on their comments (simonwillison.net)

by simonw 86 comments 88 points
Read article View on HN

86 comments

[−] alexpotato 55d ago
Story time:

My first full time job (early 2000s) was working for a firm that did online cybersecurity related investigations for Fortune 500 companies (generally via a 3rd party law firm they had retained).

A big part of this was running investigations into people running "pump and dump" stock schemes on Yahoo message boards. We would generally start by scraping all of the posts for a user who had instigated one of these and then handing off the posts to an analyst.

It's amazing:

a. how much info people give out even when they think they are being careful

b. related to a, how even small tidbits combined over time can build a pretty accurate picture of who someone is.

e.g. they post "oh man, the Cubs lost", then a year later "went for a walk on Lakeshore drive", another year later, there was a fire at my local subway stop etc etc and you pretty quickly narrow down the rough neighborhood where they live in Chicago.

Combined with tools like Lexis Nexus and you get a list of people that you can narrow down by age, sex, occupation etc and we could narrow it down to <20 people based on other info they had shared.

Then you fold in their posting patterns and it's pretty obvious who is at work (posting 9 to 5pm) vs home (posting 7pm to 1am).

Again, you keep adding constraints and the intersection of the Venn diagrams gets smaller and smaller.

This was all in the early 2000s before we had cellphones that tracked your location and ad infrastructure that followed you around the internet.

[−] eth0up 55d ago
So how do you think this situation will change now that LexisNexis, Oracle, Palantir, Clearview and others are all converging with our four frontier LLM models (plus military contracts) or directly with their own AI?

What used to require a little work is now instant. And we're much further into the predictive part than most will admit.

[−] martin-t 55d ago
I thought I had deja vu when reading your comment so I searched and found that you wrote something very similar 6 years ago, then 4 months ago and then 3 comments within the last month.

Out of curiosity and without meaning it to sound like an accusation, did you write such similar posts by hand or do you use some form of automation for commenting?

[−] alexpotato 55d ago
It’s funny because someone asked me about this on Twitter too. Specifically, how was I able to reply to tweets of other people with a relevant Twitter thread I had already written.

It’s all manual and I guess just how my brain works. My wife actually calls it “the database” because I can quickly access stories and I apparently tell them in a very similar way.

I’m just as impressed that you noticed and had the Déjà vu.

[−] ethbr1 55d ago
Out of curiosity, did you come from a family where older generations were storytellers? E.g. parents, extended, or grandparents?

In the sense that there were stories you heard retold (sometimes by the same person) over the years, mutating a bit in each retelling?

I think some brains get wired so that oral (or at least reproductive in some medium) story transmission is effortless, but affinity does seem to differ person-by-person.

[−] alexpotato 54d ago

> In the sense that there were stories you heard retold (sometimes by the same person) over the years, mutating a bit in each retelling?

It's funny you mention this b/c "yes". Both of my parents are big storytellers.

Never realized this so thanks for pointing it out!

[−] em-bee 55d ago
if it's a good story, it is worth retelling. my personal approach is to try to link to the old post or at least mention that i told this before. i don't know if that is better or not though. but certainly if the story fits, then it should be posted, and here it fits.
[−] alexpotato 54d ago

> my personal approach is to try to link to the old post or at least mention that i told this before. i don't know if that is better or not though

I do this on Twitter.

Specifically, I'll retweet the thread or tweet I already have written on the topic.

Retweeting is, IMHO, the best part of twitter as it lets you "weave" a narrative of old tweets and threads. It's also why I think articles are dumb b/c you can't like to the specific part of an article like you can with a tweet thread.

[−] martin-t 55d ago
Thanks for the reply.

> reply to tweets of other people with a relevant Twitter thread I had already written

I noticed I also have topics I talk about regularly and this would be really nice. Some things only need one high-effort explanation and then linking to that.

Using LLMs would in theory save some effort by rephrasing to it doesn't look copy pasted but I am strongly opposed to the mild reality distortion they are prone to doing (like hallucinating random tidbits which never happened, using "A big part of this" as a rhetorical device instead of an actual quantity, etc.) in addition to flat out lying and other mis-generation (I don't call it hallucinations since this is normal operation, not something exceptional).

[−] alexpotato 54d ago

> Some things only need one high-effort explanation and then linking to that.

I mentioned in a sibling thread that I do this on Twitter (and it's a lot of fun to get to re-use old threads for new audiences).

> Using LLMs would in theory save some effort by rephrasing to it doesn't look copy pasted but I am strongly opposed to the mild reality distortion they are prone to doing

Same thoughts from me. There is also a bit of "John Henry" [0] in that I want to keep my brain strong in this skill versus letting the machines take it away.

0 - https://en.wikipedia.org/wiki/John_Henry_(folklore)

[−] api 55d ago
Think about browser fingerprinting. Every little bit of info is literally one more bit, so by the time you get to 32 bits you’ve narrowed it down to one in four billion. An oversimplification but that’s the idea.

Being strongly private online requires spy tradecraft levels of precaution.

[−] ethbr1 55d ago
Or just making certain topics verboten. Different pieces of information can be order of magnitude more or less useful.
[−] nunez 55d ago
People search engines do a lot of the heavy lifting and can give you that data on a platter for a few dollars. I pay for a service that employs people to periodically do data removal requests with them. It's not great that _they_ have a bunch of data about me, but I'd rather it be in one place that tries to safeguard it than in a bunch of places all over the Internet. (There are A LOT of people search engines.)

As for using clues to discover people's whereabouts and such: lots of police/detective shows have turned "finding where people are through Instagram photos" into a meme. Most people don't think about cybersecurity outside of "oh, I need to change my password now."

[−] ethbr1 55d ago

>

I pay for a service that employs people to periodically do data removal requests with them.

Curious about any recommendations from you or others.

> Most people don't think about cybersecurity outside of "oh, I need to change my password now."

Cyber anonymity is a concept that most people don't think about at all, especially post-Facebook normalization of real names ~10s.

In many ways, it feels like Eternal September talking to younger people who never used a pseudonym online.

[−] tedmiston 55d ago

> b. related to a, how even small tidbits combined over time can build a pretty accurate picture of who someone is.

that's basically why clickstream analytics is so powerful

[−] stego-tech 55d ago
This is...disquieting. It's one thing to know that it's possible, another thing to know nation states or large megacorps are doing it, but another thing entirely to see such verbose output from free models about, well, me.

The first two, I've made peace with (nothing I can do about it anyway). The last one picks quite fiercely at old trauma that really makes me reconsider my socials in general, not just HN.

But maybe that's just the anxiety and trauma talking, encouraging me to recede back into the shadows and re-apply the old mask of "acceptableness" I've been trying to toss aside. Maybe the fact a free chatbot can do such a thorough analysis is in fact reason enough to stop worrying about every aspect of my identity and its perception by others, and instead just...be me, and deal with whatever consequences arise from that.

I dunno. Just...lot of emotions, here, most of them quite bad.

[−] johnfn 55d ago

> This is arguably their defining HN characteristic: they are one of the most vocal, persistent AI optimists on the platform. They claim ~90-95% of their shipped code is AI-generated, report 5-10x productivity gains, and have built a detailed methodology around it — using Playwright for visual verification, static typechecking as a hallucination filter, and e2e test suites as automated validation harnesses

Wow, I sound really annoying. Sorry about that everyone!

[−] Forgeties79 55d ago

> “Two things can be true at the same time” — he holds nuanced positions

I feel the need to point out that 99% of the time that phrase is essentially an insult and isn’t indicative of a “nuanced position” lol it generally means “you’re myopic in your views/your argument lacks nuance.” That strikes me as a pretty charitable interpretation by the model there.

You seem like a good dude, and I’m not going to pretend I haven’t thrown out the flippant quip here and there in my comments. I just thought that interpretation was pretty funny.

[−] janalsncm 55d ago
Not doubting the method works in general, but Simon Willison is a public-enough figure so the baseline level of info is higher than just HN comments. If you turn off Claude’s web search:

> Simon Willison is a British software developer, blogger, and open-source advocate, best known for…

[−] sachaa 55d ago
You can also do this with a simple bookmarklet, no extension needed.

Create a new bookmark in your browser, name it something like "Profile HN User", and paste this as the URL:

javascript:void(function(){var u;var m=window.location.href.match(/news\.ycombinator\.com\/user\?id=([^&]+)/);if(m){u=m[1]}else{u=prompt(%27Enter HN username:%27)}if(!u)return;var msg=%27Profile this HN user: https://hn.algolia.com/api/v1/search_by_date?tags=comment,au...})()

If you're on a HN profile page (news.ycombinator.com/user?id=someone) it grabs the username automatically. Otherwise it prompts you to type one. It copies the profiling prompt to your clipboard and opens a new Claude conversation, just Cmd/Ctrl+V and hit Enter.

[−] Ancapistani 55d ago
This is impressive, and a bit terrifying. My “profile” is extremely in depth and mostly accurate. I’ve always treated this account as at most pseudo-anonymous, so no harm done - but there is easily enough information there to identify me. In fact, I think I’ll try to do just that tomorrow as a weekend project.

I created this account after using my real name here for years, to build at least some kind of separation. At the time, I think I was applying for jobs and had a couple of interviews - positive ones, oddly enough - where my political views were referenced. Given our political climate in the US, I decided it would be best to make at least my current views more difficult to associate with me.

For me, this just underscores the fact that while we always knew those data were out there for someone targeting you and determined - this makes it an order of magnitude easier to access.

… I just typed out an explanation why I made the above statement, but decided not to post it as it describes a potential criminal act that would likely be very profitable :(.

[−] tamimio 55d ago

> Recurring Hobby Horse

>The word "engineer" being diluted by software/bootcamp culture is something they return to obsessively — arguably their strongest ideological position alongside surveillance criticism

Busted!!

That being said, not surprised because it listed exactly what I want my persona to appear, does that mean I am like that irl? No, I rarely bring the above “engineer” term IRL let alone to be obsessed about it, but in HN it makes sense to bring up, rest are mostly about techie stuff that I usually don’t bring with my friends or family. Also, this can be about anything you produce, like your blog, books, YouTube, or anything, that personality is what attracts (or repels) other people to be around you, it’s human society 101.

[−] raw_anon_1111 55d ago
Just pasting my last 100 comments into ChatGPT using the API and cutting out anything positive it said about me…

“Your communication style is direct and often adversarial, using rhetorical questions and sharp analogies to pressure-test assumptions, with little tolerance for what you see as naïve, performative, or abstract reasoning. You prioritize competence, execution, and practical tradeoffs over signaling or theory, and while that makes your analysis grounded and often incisive, it can also make your stance appear combative and less receptive to edge cases or emerging paradigms that don’t yet fit established incentive structures”

[−] zoogeny 55d ago
It is interesting that Marc Andreesen was having a bit of a X crash out over his belief that introspection is bad [1]

I disagree because I tend to seek a middle way. I would agree that too much (excessive) introspection is bad. But I would argue that too little is equally bad.

I think obsessively examining ones own comment history would verge on excessive. I'm wondering how much LLM analysis of my public and private life can remain healthy.

1.https://x.com/pmarca/status/2035190797218587116

[−] michaelteter 55d ago
And note that HN does not allow you to delete your comments after a short time passes.

If you contact them and ask for your data to be deleted, they will directly refuse.

[−] n2d4 55d ago
This was interesting to do on my own profile. It got a bunch of personality attributes about me right that I haven't directly mentioned on here, which is impressive.

I then followed it up with "Given my chat history, how do they compare to me?", and it started making comparisons of myself to myself. Very fun experience.

[−] irthomasthomas 55d ago
A friend made a cli tool, ideal for agents, which does this and can aggregate intelligence across multiple platforms.

https://github.com/bm-github/owasp-social-osint-agent

[−] Simulacra 55d ago
I've been doing this for a long time, it's amazing what ChatGPT can suss out with enough data. I like to feed it comments from message boards to try to uncover interesting business opportunities, or threads to follow for my own research.
[−] plun9 55d ago
You can just ask a chatbot about Hacker News or reddit users based on their username.
[−] ich 55d ago
Nice! Quite accurate as well. Apart from:

" The Atari + German book reference indicates:

* Interest in legacy computing / systems history "

No. I'm just that old. I read the book when the Atari ST was state of the art :-)

[−] few 55d ago
[−] jhanschoo 55d ago
Currently in the process of migrating from Gemini to Claude, this post has been a boon to me in getting Claude to know about myself into its memory.
[−] JSR_FDED 55d ago
Given a profile like this, how good would an LLM be at figuring out whether the profile if from a bot or a real person?
[−] vpribish 55d ago
HAHAHA - I like me. but claude (sonnet 4.6) seemed like it was cheerleading a bit
[−] sgbeal 55d ago
(...does this thing to check own profile[^1]...)

> Old man raising fist at, and yelling at, clouds. Get off his lawn.

[^1]: not really - this is speculation (so... kinda the same thing the LLM is doing) but is possibly an accurate representation.

[−] SanjayMehta 55d ago
"Fetched 0 comments."

Edit: turns out it's case sensitive.

Sounds about right:

roughly “anti-imperialist realist” with Indian/Global South anchoring and paleoconservative/libertarian-adjacent distrust of state-corporate surveillance power.

[−] bibimsz 55d ago
hacker news is a goldmine since you can't delete comments nor even delete your account. this site is a privacy nightmare, in a world where everyone is excited to cancel and dox for unpopular opinions (on this site that means anything to the right of bernie sanders).
[−] Bishop197 52d ago
[dead]
[−] alexgandy 55d ago
This just in; posting ridiculous amounts of personal information on the internet can lead to you being profiled correctly. Wild stuff.