The “small web” is bigger than you might think (kevinboone.me)

by speckx 241 comments 561 points
Read article View on HN

241 comments

[−] susam 61d ago
A little shell function I have in my ~/.zshrc:

  pages() { for _ in {1..5}; do curl -sSw '%header{location}\n' https://indieblog.page/random | sed 's/.utm.*//'; done }
Here is an example output:

  $ pages
  https://alanpearce.eu/post/scriptura/
  https://jmablog.com/post/numberones/
  https://www.closingtags.com/blog/home-networking
  https://www.unsungnovelty.org/gallery/layers/
  https://thoughts.uncountable.uk/now/
On macOS, we can also automatically open the random pages in the default web browser with:

  $ open $(pages)
Another nice place to discover independently maintained personal websites is: https://kagi.com/smallweb
[−] unsungNovelty 61d ago
Hey!!!!!

That is my website! To be fair, the hard part is hard to keep a personal website regularly updated without making people think it's abandoned. I don't have a regular post cadence. So it looks like I don't touch the website at all for months. But I regularly update my posts and other sections event if there isn't any new posts.

I also wrote something similar to OP - https://www.unsungnovelty.org/posts/10/2024/life-of-a-blog-b...

And I'd like to also mention https://marginalia-search.com/ which is a small OSS search engine I have been using more and more theese days. I find it great to find IndieWeb / Small Web content.

[−] thesuitonym 60d ago
For my part, if I come across a personal site that hasn't been updated in a few months, I don't assume it's abandoned, just that the person hasn't had anything to say for a while. I'd rather see a site with updates every few months, or even once or twice a year, than one with an update every other week saying "Sorry I haven't updated."
[−] SyneRyder 60d ago
Not sure if this will be considered helpful, but if you include:

https://www.unsungnovelty.org/index.xml" />

in the HEAD of the pages on your website, it makes autodiscovery of the RSS feed a bit easier - not just for crawlers, but also for people with RSS plugins in their browser. It will make the RSS icon appear in their browser's URL field for easy subscription. Took me a while to find the RSS link at the bottom of your pages!

[−] unsungNovelty 56d ago
Thanks. Lemme look into that and will take the necessary actions.
[−] mikestorrent 60d ago
Now this is what makes me feel the Small Web... creators randomly showing up like this on HN. I feel like I used to see this kind of thing more.
[−] sylware 60d ago
Sadely this search engine is now javascript only. So the "small" web...
[−] SyneRyder 60d ago
If that's an issue, and if you don't mind building something out yourself, Marginalia have an excellent API that you can connect to from your own personal non-Javascript meta-search engine. I did that, and I find Marginalia awesome to deal with. They're one of my favorite internet projects.

(Also, thanks for reminding me that it was time I donated something to the Marginalia project: https://buymeacoffee.com/marginalia.nu )

[−] sylware 59d ago
Are there 'public'/'anonymous' API keys I could use to perform a web search with CURL?

(I guess I would get json formatted search result data)

[−] SyneRyder 59d ago
There is! The API Key is literally "public". But apparently it often gets rate limited, because seemingly every Metasearch engine uses that one. I think there might also be a slightly less rate-limited one for Hacker News users if you search around (I no longer remember what it is since I got my own key in the end.)

You can get your own API key for free by emailing, but that would not be anonymous, I guess.

I don't have curl syntax to hand, but hopefully it's easy to figure out from these documents. I may come back and edit later with curl syntax if I get time:

https://about.marginalia-search.com/article/api/

[−] sylware 59d ago
Thx! This is great news.

If their email server does handle self-hosted SMTP server with ip literal email addresses (with the ip from the SMTP, stronger than SPF), indeed, I will probably ask for my mine.

I wish major AI services would do the same or something close.

[−] marginalia_nu 58d ago
It shouldn't be. Where are you having issues?
[−] unsungNovelty 60d ago
Couple of things.

1. No. It's not javascript only. https://old-search.marginalia.nu/ is still available. It is also mentioned in https://about.marginalia-search.com/article/redesign/ as gonna be there for a very long time.

2. I don't think just because it uses javascript make it bad. It's a very nice site now. I prefer it better than old version. My website doesn't use JS for any functionality yet. But I've never said never either. The reason hasn't arised that I need to use JS. The day it does, I will use it.

But I understand the sentiment though. I used to be a no js guy before. But I've been softened by the need to use it professionally only to think --- hmmm, not bad.

[−] sylware 59d ago
I do 100% disagree.

web apps are gated by the abominations of whatng cartel web engines, with even worse SDKs, mechanically certainly not 'small' and assurely a definitive nono.

And the 'old' interface, you bet I tried to use it... which is actually gated with javascript... so...

[−] marginalia_nu 58d ago
I assume you're blocked by the new bot blocker?

I've tested it in both w3m and dillo, should work fine as long as your browser renders noscript tags. It's very much designed from the ground up to handle browsers like that. Just requires you to manually wait a few seconds and then press the link.

One configuration that might break is if you're running something like chrome or firefox, and rigging it to not run JS. But it's really hard to support those types of configurations. If it works in w3m, it's no longer a "site requires JS" issue...

[−] rodarima 57d ago
Thanks a lot for considering no-JS browser like Dillo, in the current web hellscape is certainly a difficult task. I checked and it works well in Dillo on my end.
[−] sylware 57d ago
I have tested many times. Some while ago.

I used lynx and links2 (not yet netsurf), as far as I can recall, never got what you talked about.

Was brutally blocked and got the finger because none of those web browsers has javascript (and even CSS).

While thinking about it, I hate even more the whatng cartel for the damage they did to the web with their 'web apps'.

[−] marginalia_nu 57d ago
I don't know what to say.

Here's a video of me getting past the bot blocker with links2 I guess?

https://www.youtube.com/watch?v=19-nXUYe9cA

[−] sylware 57d ago
wt... never got that. Did I have parasites on my line??

I did re-test just now, I get the same thing than in the vid.

meh.

[−] sylware 56d ago
BTW, this should not be called "old-search" but "classic-search" as it will never be "old" until the web exist.
[−] Noumenon72 60d ago
They barely mentioned your website (fourth in five urls, mainly talking about indieblog.com and kagi.com/smallweb), so "That is my website!" is confusing and makes it seem like you're autoresponding to a keyword.
[−] ddtaylor 61d ago
Anyone curious this is the same for Linux, except use xdg-open like this:

  $ xdg-open $(pages)
[−] sdoering 61d ago
This is so lovely. Just adopted it for arch. And set it up, so that I can just type indy n (with "n" being any number) and it opens n pages in my browser.

Thanks for sharing.

[−] matheusmoreira 61d ago
These curated discovery services require RSS and Atom feeds. My site doesn't even have those. Looks like I'm too small for the small web.
[−] oooyay 61d ago
Caveat that Kagi gates that repo such that it doesn't allow self-submissions so you're only going to see a chunk of websites that other people have submitted that also know about the Kagi repo.
[−] viscousviolin 61d ago
That's a lovely bit of automation.
[−] dwedge 60d ago
That top one only updates once a year. Not saying that as a criticism, just how lucky he was to update recently enough to end up in this top comment
[−] postalcoder 61d ago
Multiple layers of curation works really well. Specifically, using HN as a curation layer for kagi's small web list. I implemented this on https://hcker.news. People who have small web blogs should post them on HN, a lot of people follow that list!
[−] nvardakas 60d ago
[dead]
[−] varun_ch 61d ago
A fun trend on the "small web" is the use of 88x31 badges that link to friends websites or in webrings. I have a few on my website, and you can browse a ton of small web websites that way.

https://varun.ch (at the bottom of the page)

There's also a couple directories/network graphs https://matdoes.dev/buttons https://eightyeightthirty.one/

[−] 8organicbits 61d ago
One objection I have to the kagi smallweb approach is the avoidance of infrequently updated sites. Some of my favorite blogs post very rarely; but when they post it's a great read. When I discover a great new blog that hasn't been updated in years I'm excited to add it to my feed reader, because it's a really good signal that when they publish again it will be worth reading.
[−] freediver 61d ago
Kagi Small Web has about 32K sites and I'd like to think that we have captured most of (english speaking) personal blogs out there (we are adding about 10 per day and a significant effort went into discovering/fidning them).

It is kind of sad that the entire size of this small web is only 30k sites these days.

[−] afisxisto 61d ago
Cool to see Gemini mentioned here. A few years back I created Station, Gemini's first "social network" of sorts, still running today: https://martinrue.com/station
[−] danhite 61d ago
Isn't this a simple compute opportunity? ...

> March 15 there were 1,251 updates [from feed of small websites ...] too active, to publish all the updates on a single page, even for just one day. Well, I could publish them, but nobody has time to read them all.

if the reader accumulates a small set of whitelist keywords, perhaps selected via optionally generating a tag cloud ui, then that est. 1,251 likely drops to ~ single page (most days)

if you wish to serve that as noscript it would suffice to partition in/visible content eg by

[−] shermantanktop 61d ago
This is a specific definition of "small web" which is even narrower than the one I normally think of. But reading about Gemini, it does make me wonder if the original sin is client-side dynamism.

We could say: that's Javascript. But some Javascript operates only on the DOM. It's really XHR/fetch and friends that are the problem.

We could say: CSS is ok. But CSS can fetch remote resources and if JS isn't there, I wonder how long it would take for ad vendors to have CSS-only solutions...or maybe they do already?

[−] upboundspiral 61d ago
I think the article briefly touches on an important part: people still write blogs, but they are buried by Google that now optimizes their algorithm for monetization and not usefulness.

Anyone interested in seeing what the web when the search engines selects for real people and not SEO optimized slop should check out https://marginalia-search.com .

It's a search engine with the goal of finding exactly that - blogs, writings, all by real people. I am always fascinated by what it unearths when using it, and it really is a breath of fresh air.

It's currently funded by NLNet (temporarily) and the project's scope is really promising. It's one of those projects that I really hope succeeds long term.

The old web is not dead, just buried, and it can be unearthed. In my opinion an independent non monetized search engine is a public good as valuable as the internet archive.

So far as I know marginalia is the only project that instead of just taking google's index and massaging it a bit (like all the other search engines) is truly seeking to be independent and practical in its scope and goals.

[−] 627467 61d ago
I read alot against monetization in the comments. I think because we are used monetization being so exploitative, filled with dark patterns and bad incentives on the Big Web.

But it doesnt need to be thia way: small web can also be about sustainable monetization. In fact there's a whole page on that on https://indieweb.org/business-models

There's nothing wrong with "publishers" aspiring to get paid.

[−] wink 60d ago
I don't want to be part of the "small web" - I want to be part of the web. If my stuff can't be found in a sea of a million ad-ridden whatever sites so be it, but I am not going out of my way to submit stuff to special search engines or web rings, I've been there in the 90s.
[−] jmclnx 61d ago
I moved my site to Gemini on sdf.org, I find it far easier to use and maintain. I also mirror it on gopher. Maintaining both is still easier than dealing with *panels or hosting my own. There is a lot of good content out there, for example:

gemini://gemi.dev/

FWIW, dillo now has plugins for both Gemini and Gopher and the plugins work find on the various BSDs.

[−] Peteragain 60d ago
I'm very keen on public libraries. I'm fortunate in that our village has a community run one, there is the county one, and I can get to The British Library. Why do these entities exist? A real question - not rhetorical. Whatever the answer, I am sure the same mechanism could "pay for" public hosting.
[−] tonymet 61d ago
I’m not sold on gemini. Less utility, weaker, immature tools. Investing on small HTTP based websites is the right direction. One could formalize it as a browser extension or small-web HTTP proxy that limits JS, dom size, cookie access etc using existing Web browsers & user agents.
[−] GuB-42 61d ago
I don't expect many people to agree but I think that the "small web" should reject encryption, which is the opposite direction that Gemini is taking.

I don't deny the importance of encryption, it is really what shaped the modern web, allowing for secure payment, private transfer of personal information, etc... See where I am getting at?

Removing encryption means that you can't reasonably do financial transactions, accounts and access restriction, exchange of private information, etc... You only share what you want to share publicly, with no restrictions. It seriously limits commercial potential which is the point.

It also helps technically. If you want to make a tiny web server, like on a microcontroller, encryption is the hardest part. In addition, TLS comes with expiring certificates, requiring regular maintenance, you can't just have your server and leave it alone for years, still working. It can also bring back simple caching proxies, great for poor connectivity.

Two problems remain with the lack of encryption, first is authenticity. Anyone can man-in-the-middle and change the web page, TLS prevents that. But what I think is an even better solution is to do it at the content level: sign the content, like a GPG signature, not the server, this way you can guarantee the authenticity of the content, no matter where you are getting it from.

The other thing is the usual argument about oppressive governments, etc... Well, if want to protect yourself, TLS won't save you, you will be given away by your IP address, they may not see exactly what you are looking at, but the simple fact you are connecting to a server containing sensitive data may be evidence enough. Protecting your identity is what networks like TOR are for, and you can hide a plain text server behind the TOR network, which would act as the privacy layer.

[−] lasgawe 61d ago
mm, yeah. I like the idea of the small web not as a size category but as a mindset. people publishing for the sake of sharing rather than optimizing for attention or monetization.
[−] tonymet 61d ago
hats off to https://1mb.club/ and https://512kb.club/ for cataloging and featuring small web experiences
[−] Gunax 61d ago
It's sad how the snall web became invisible.

I used to use all sorts of small websites in 2005. But by 2015 I used only about 10 large ones.

Like many changes, I cannot pinpoint exactly when this happened. It just occurred to me someday that I do not run into many unusual websites any longer.

It's unfortunate that so much of our behavior is dictated by Google. I dint think it's malicious or even intentional--but at some point they stopped directing traffic to small websites.

And like a highway closeure ripples through small town economies, it was barely noticed by travellers but devestating to recipients. What were once quaint sites became abandoned.

The second force seems to be video. Because video is difficult and expensive to host, we moved away from websites. Travel blogs were replaced with travel vlogs. Tutorials became videos.

[−] qudat 61d ago
I built https://prose.sh as part of my journey into Gemini and back out. Ya, it's just a simple blog, but you can completely manage it with ssh and is compatible with hugo when people want to eject.

We also recently released support for plain-text-lists which is a gemini-inspired spec that use lists as its foundational structure.

https://pico.sh/plain-text-lists

example: https://blog.pico.sh/ann-034-plain-text-lists

[−] oxag3n 61d ago

> To be fair, I should point out that the “small” web was never defined by the number of sites, but by the lack of commercial influence.

That was my understanding before it grew - it's a web of small indie sites.

[−] lich_king 61d ago
It's easy to hand-curate a list of 5,000 "small web" URLs. The problem is scaling. For example, Kagi has a hand-curated "small web" filter, but I never use it because far more interesting and relevant "small web" websites are outside the filter than in it. The same is true for most other lists curated by individual folks. They're neat, but also sort of useless because they are too small: 95% of the things you're looking for are not there.

The question is how do you take it to a million? There probably are at least that many good personal and non-commercial websites out there, but if you open it up, you invite spam & slop.

[−] followdev 61d ago
I built FollowDev.com which is like Kagi Small Web but for software developer blogs.

It has about 1000 blogs in the repo at the moment. Discovering was the most time consuming part.