12k AI-generated blog posts added in a single commit (github.com)

by noslop 148 comments 155 points
Read article View on HN

148 comments

[−] ConceitedCode 41d ago
I suspect we'll address this by just going back to older ranking algorithms for search. We'll go back to the primary signal of good content being links from trusted sources.

People gaming the content based algorithms will eventually cause their own downfall.

[−] eh_why_not 41d ago
It's becoming much harder to determine on a daily basis what content is original, thought-out by a person, and trustworthy. Ironically, verifiably-old content is easier to trust now. Examples from recent personal experience:

1) Some time ago I was searching for growing information about a specific and uncommonly-grown plant, and was led to a top-ranked website with long pages containing everything about it, including other plants. Surprised at how prolific the writing was, I spent more than an hour on the website, taking notes, etc. Every few paragraphs it would include an amazon affiliate link to something topical, which I thought was fair. Until I realized that the links near the bottom of the page were looking more random. Then it hit me, the website is all AI-generated, and the affiliate links themselves are also AI-chosen. And everything new I "learned" from that site was now useless because I had no way to know what was grounded in actual agricultural experience and what was hallucinated.

2) Recently I did a youtube search for a book I had just finished reading, looking for some reviews. Came across a channel that was reading the book as new audio (i.e. not the original published audiobook). I thought it was a fan making it. The voice was beautiful, soothing, and natural with all kinds of relevant emotions correctly included. I started listening to the book again, until I noticed a consistent error in word ordering being made every few lines. Then it hit me! The channel even included one upload with a video recording of a seemingly-real person reading with that voice. Both the audio and video are AI-generated, but very hard to tell.

3) Next to those videos, YT recommended many strange/new channels. One had the photo and the exact voice of a famous (and now very old) physicist, with tens of clickbaity titles about controversial topics in the domain. The only tell was that the voice was too vigorous and consistently energetic, while if you've listened to that physicist before, you know his cadence is slower. At first I thought maybe the channel is reading one of his books; no, the content itself was AI-generated, maybe based on his books. There was a lot of engagement, with many comments like "mind blown" and "learned so much today".

Both #1 and #3 are harmful, because you think you're learning from a reliable source but you end up learning hallucinated nothings. #2 I didn't mind much, still enjoyed the new voice, and even preferred it over my original audible version.

[−] fn-mote 41d ago
I thought somebody counted them… incredibly, the log message admits to committing 12,000 articles.

I guess that means the log message was authored by AI as well. Figures.

[−] arcza 41d ago
So whatever OneUptime is, I now know it has zero integrity and is something I should avoid.
[−] jpdb 41d ago
I've been seeing this company in ~all of my searches across various tech topics.

They're absolutely dominating search results. The quality isn't terrible, but there's so much content that I can't trust them to be accurate.

[−] TrackerFF 41d ago
I've seen an increase in this "firehose" tactic among the passive-income folks, where the idea is to just saturate certain niches with AI-generated content, and collect some cents here and some cents there - in the hopes it will generate as much money as maintaining a single high-quality content channel.

Don't know if they actually make any money doing it like that. A couple of weeks ago I stumbled across some content-creator that said he had hundreds of faceless YouTube channels, which was made possible due to AI tools.

[−] ThrowawayR2 41d ago
If the dead Internet theory wasn't true before, it sure will be soon.
[−] MattGaiser 41d ago
One of the issues is that the purpose of business internet writing is not to be read, but to be ranked well.
[−] raincole 41d ago
Serious question: What is this post about and why should we care? It's a repo with 35 stars. Is adding 12,000 posts in a single commit somehow technically difficult or significant?
[−] petterroea 41d ago
This is why i never trust blog posts any more. If a company logo is attached its just SEO garbage
[−] hirako2000 41d ago

> All content must be original and not published anywhere else.

Do what I say, not what I do.

[−] CrzyLngPwd 41d ago
There doesn't seem to be a workable plan for how to cope with the onslaught of AI output, and it's going to get much worse.

The sentinel servers, meta/google/ms/etc. just seem to be largely ignoring it, or even supporting it.

It's already nauseatingly common on all major platforms.

[−] gib444 41d ago
"Showing 1 - 25 of 45488 posts"

I miss the days when we could assume that's just a pagination code bug

[−] wartywhoa23 41d ago
AI is the stellar moment for all mediocrity and conmen.
[−] miyuru 41d ago
Commit maker is here and have only posts slop here as well.

https://news.ycombinator.com/submitted?id=ndhandala

wonder when will he submit them here.

[−] username223 41d ago
[GitHub] platform activity is surging. — https://twitter.com/kdaigle/status/2040164759836778878
[−] chloeburbank 41d ago
I have visited a blog on this site while searching for something. Suffice to say it was a very shoddy attempt at a blog and at this point I should just network block this site entirely
[−] StrLght 41d ago
I am so glad DuckDuckGo allows blocking specific sites from the search. Just did this for a domain linked in this repository.
[−] avian 41d ago
Just this morning I opened up my RSS reader and found that it was flooded by weird, twisty prose exalting the virtues of online gambling. Since I follow a few blogs that post long form content I first thought this was satire or something, but after reading for a bit and seeing that the posts just never end my best guess was it's just AI slop indented to drive traffic to some gambling site - not clear which since there were not links. All posts came from a RSS feed of an apparently abandoned tech blog I was following that had the last legit post in 2020. My guess is the domain expired, a squatter bought it, saw a bunch of requests for the RSS feed and grabbed the opportunity. Although to what end I'm not sure.
[−] troupo 41d ago
Ironic, considering the README:

--- start quote ---

These blog posts are written by the OneUptime team and open source contributors. We write about our experiences, our learnings, and our thoughts on the world of software development, Kubernetes, Ceph, SRE, DevOps, Cloud and more. We hope you find our posts helpful and insightful.

--- end quote ---

[−] Steppphennn 41d ago
I don’t see how the author isn’t embarrassed. Maybe it’s just me having imposter syndrome or maybe I can self reflect, maybe. If he used AI to slop all those articles up doesn’t he know any developer can use AI to get that content through the IDE? He’s trying to game something with a tool that effectively killed off that game in the first place.
[−] setnone 41d ago
i guess 11K won't do it and 13K is just way too much
[−] alin23 41d ago
They even have a scrolling 5-star reviews section, clearly generated: https://oneuptime.com/#reviews-title

https://github.com/OneUptime/oneuptime/commit/538e40c4ae724e...

https://github.com/OneUptime/oneuptime/commit/2bc585df20e6bb...

You can fabricate a professional business image in a few days with AI now. It's going to be hard to build an honest brand when everyone is going to point and say "vibe coded slop" because of examples like this website.

I'm already seeing such comments whenever someone posts an app on /r/macapps and it's really discouraging for beginners. If I would have met that resistance and amount of mean comments when I launched Lunar, I would have probably never put in that amount of effort.

[−] r_lee 41d ago
I've seen this blog slop on Google for the last month or so, no action taken whatsoever. it's mostly bullshit or regurgitated info from docs.

like Google or their Search team really doesn't seem to care at all. all of a sudden a random blog website just happens to rank first page on every topic

[−] whycombinetor 41d ago
If it's between a human or an AI copywriting SEO slop, I'm happy to see an AI take that job. SEO content marketing is so painful to read once you realize you're reading it, and I have to imagine it's as painful to write if you're a technically talented writer.
[−] srhyne 41d ago
I’ve naturally landed a handful of their posts recently through search. I was impressed with the quality.

Interesting to see this after the fact.

[−] ieie3366 41d ago
Ironically due to slop I feel like we are regressing as a civilization

2020, want to know how to use Redix for Redis connections in Elixir? Google it and the results were most likely high quality, written by senior engineers who knew what they were doing

Today google that, and it will be endless amounts of slop

[−] mzajc 41d ago
uBlacklist is a great extension to avoid slopfarms like this: https://ublacklist.github.io/docs
[−] Topfi 41d ago
I know there is a lot of valid criticism of GitHubs poor performance when scrolling, but in this case I think we can let them off the hook.

I'll just leave this here: https://developers.google.com/search/help/report-quality-iss...

[−] schmookeeg 41d ago
We are all quickly becoming allergic to AI writing.

To fool us into thinking writing is not AI generated, we will create "human-ifying" filters to the LLM. This will introduce common keystroke, grammar, and spelling issues that surely no automation would ever create on its own.

Soon the writing most vaunted and trusted will be the writing that appears written by a 4 year old with a crayon.

Sigh.

[−] nelsonfigueroa 41d ago
Well, at least they're not exactly hiding it.
[−] WJW 41d ago
Github only reports 5012 changed files though.
[−] sigmonsays 41d ago
when AI starts training itself accidentally on AI generated content, we all lose...
[−] hoppp 41d ago
What is this monstrosity, cmon.

Why would anyone read AI generated blog posts when I can just ask AI for what I need already

For gaming SEO this is still bad, no backlinks.

[−] whattheheckheck 41d ago
If this shit can come from the llms why are we redragging it out of them?

To reverify correctness?

[−] cebert 41d ago
What is the point of this?
[−] nunez 41d ago
Welcome to the slop age!
[−] tadfisher 41d ago

> Showing 1-25 of 58891 posts

I have to imagine that one quality post worth reading would be linked in multiple places, thus would beat tens of thousands of slop articles for SEO purposes?

[−] nicbvs 41d ago
Trying to hide all their CVE behind AI slop
[−] antiloper 41d ago
"Nawaz Dhandala"
[−] ugiox 41d ago
Now we know why GitHub has a hard time with stability and reliability. Because of this AI slop BS inflicted on us by the Silicon Valley tech bros and all their followers.
[−] Manchitsanan 31d ago
[dead]
[−] socialvideogen 41d ago
[dead]
[−] applicative 41d ago
[dead]
[−] amanzi 41d ago
[dead]
[−] LorenzoBloedow 41d ago
[dead]
[−] cachius 41d ago
At which URL(s) are the blog posts visible?