Modern SQLite: Features You Didn't Know It Had (slicker.me)

by thunderbong 63 comments 244 points
Read article View on HN

63 comments

[−] aaviator42 43d ago
SQLite is insanely robust. I have developed websites serving hundreds of thousands of daily users where the storage layer is entirely handle by SQLite, via an abstraction layer I built that gives you a handy key-value interface so I don't have to craft queries when I just need data storage/retrieval: https://github.com/aaviator42/StorX
[−] arunix 42d ago
Does StorX have any special handling of concurrent writes, or would the user need to take care of that?
[−] aaviator42 41d ago
We use SQLite IMMEDIATE transactions, which lock files for writes for a few milliseconds while commiting data to the file. This is not a problem in practice until you reach more than dozens of concurrent writers. StorX configures a default busy timeout of 1.5s, but it can be configured as per your needs. You can also get a lot more out of it by being smart about how you spread your data over DB files (eg: one file per user instead of one for multiple/all users), and also by considering when you call openFile() and closeFile() (eg: keep write transactions short, don't leave a file handler open while running long calculations).
[−] krylon 43d ago
STRICT tables are something I appreciate very much, even though I cannot recall running into a problem that would have prevented by its presence in the before-time. But it's good to have all the same.

I don't think I've ever done much with SQLite's JSON functions, but I have on one or two occasions used a constraint to enforce a TEXT column contains valid JSON, which would have been very tedious to do otherwise.

[−] crazygringo 43d ago

>

even though I cannot recall running into a problem that would have prevented by its presence in the before-time

I very, very much did. I was using a Python package that used a lot of NumPy internally, and sometimes its return values would be Python integers, and sometimes they'd be NumPy integers.

The Python integers would get written to SQLite as SQLite integers. The NumPy integers would get written to SQLite as SQLite binary blobs. Preventing you from doing simple things like even comparing for equal values.

Setting to STRICT caused an error whenever my code tried to insert a binary blob into an integer column, so I knew where in the code I needed to explicitly convert the values to Python integers when necessary.

[−] QuadrupleA 43d ago
Love SQLite and most of these features.

On the STRICT mode, I've asked this elsewhere and never gotten an answer: does anyone have a loose-typing example application where SQLite's non-strict, different-type-allowed-for-each-row has been a big benefit? I love the simplicity of SQLite's small number of column types, but the any-type-allowed-anywhere design always seemed a little strange.

[−] kherud 43d ago
SQLite seems very powerful for building FTS (user enters free text, expects high precision/recall results). Still, I feel like it's non-trivial to get good search quality.

I think the naive approach is to tokenize the input and append "*" for prefix matching. I'm not too experienced and this can probably be improved a lot. There are many settings like different tokenizers, stemming, etc. Additionally, a lot can be built on top like weighting, boosting exact matches, etc.

Does anyone know good resources for this to learn and draw inspiration from?

[−] nikisweeting 43d ago
Surprised no one has mentioned Turso yet!

They recently landed multi-writer support for their rust SQLite re-implementation, which is personally the biggest issue I've had with using SQLite for high concurrency applications.

PRAGMA journal_mode = 'mvcc';

https://docs.turso.tech/tursodb/concurrent-writes

Very excited to see if SQLite responds by adding native support, I'm hoping competition here will spur improvements on both sides.

[−] 101008 43d ago
Not sure if people interested, but since I use sqlite in a lot of my own projects, I am working on a lightweight monitoring and safety layer for production SQLite. The idea is pretty simple: SQLite is amazing, but once it’s running in production you basically have zero observability. If something weird happens (unexpected writes, schema changes, background jobs touching tables, etc.) you only find out after the fact. It tries to solve that without touching application code. It's a Rust agent that runs next to your sqlite file, and connects to the server where everything is logged in. My current challenge right now is encryption and trust, mostly.

Curious if others here are running SQLite in production and if you would be interested in something like this.

[−] captn3m0 43d ago
You can shorten your JSON queries using arrow notation in sqlite.

    SELECT
    settings -> '$.languages' languages
    FROM
    user_settings
    WHERE
    settings ->> '$.languages' LIKE '%"en"%';
I use them heavily with my jekyll-sqlite projects. See https://github.com/blr-today/website/blob/main/_config.yml#L... for example.
[−] malkia 43d ago
In the past I've used the backup API - https://sqlite.org/backup.html - in order to load in memory a copy of sqlite db, and have another live one. I would do this after certain user action, and then by doing a diff, I would know what changed... I guess poor way of implementing PostgreSQL events... but it worked!

Granted it was small DB (few megabytes), I also wanted to avoid collecting changes one by one, I simply wanted a diff over last time.

[−] faizshah 43d ago
Theres also spellfix1 which is an extension you can enable to get fuzzy search.

And ON CONFLICT which can help dedupe among other things in a simple and performant way.

[−] FooBarWidget 43d ago
I've found FTSE5 not useful for serious fuzzy or subword full text search. For example I have documents saying "DaemonSet". But if the user searches for "Daemon" then there will be no results.
[−] momo_dev 43d ago
the JSON functions are genuinely useful even for simple apps. i use sqlite as a dev database and being able to query JSON columns without a preprocessing step saves a lot of time. STRICT tables are also great, caught a bug where I was accidentally inserting the wrong type and it just silently worked in regular mode
[−] kristianp 43d ago
There's table valued functions over json as well, as mentioned by [1].

https://sqlite.org/json1.html#table_valued_functions_for_par...

[1] https://news.ycombinator.com/item?id=47618597

[−] tombert 43d ago
For a long time I absolutely hated SQLite because of how terribly it was implemented in Emby, which made it so you couldn't load balance an Emby server because they kept a global lock on the database for exactly one process, but at this point I've grown a kind of begrudging respect for it, simply because it is the easiest way to shoehorn something (roughly) like journaling in terrible filesystems like exFAT.

I did this recently for a fork of the main MiSTer executable because of a few disagreements with how Sorg runs the project, and it was actually pretty easy to change out the save file features to use SQLite, and now things are a little more resistant to crashes and sudden power loss than you'd get with the terrible raw-dogged writing that it was doing before.

It's fast, well documented, and easy to use, and yeah it has a lot more features than people realize.

[−] andrewstuart 43d ago
Disturbing.

I did not know SQLite allows writing data that does not match the column type. Yuck. Now I need to review anything I built and fix it.

I understand why they wouldn’t, but STRICT should be the default.

[−] mpyne 43d ago
I actually needed that exact window function example earlier this week when I needed to figure out why our shared YNAB budget somehow got out of balance with the bank. SQLite to load the different CSVs and lay out the bank's view of the world against YNAB's with running totals was what I turned to.
[−] malkia 43d ago
One more "hidden" fact - Windows uses sqlite a lot, for a lot of tables. There is even

    "C:\Windows\System32\winsqlite3.dll" 
and

    "C:\Program Files (x86)\Windows Kits\10\Include\10.0.27975.0\um\winsqlite\winsqlite3.h"
    "C:\Program Files (x86)\Windows Kits\10\Include\10.0.27975.0\um\winsqlite\winsqlite3ext.h"

Well it's compiled in it's own way, which may not be to your liking, but it's there to use :)
[−] subhobroto 43d ago
None of these are news to the HN community. Write-ahead logging and concurrency PRAGMAs have been a given for a decade now. IIRC, FTS5 doesn't often come baked in and you have to compile the SQLite amalgamation to get it. If you do need better typing, you should really use PostgreSQL.

However, I will concede, and the article doesn't mention at all, far less are aware that you can build HA, cross region replicated SQLite using purely OSS software provided you architect your software around it. Now that would be a really good Modern SQLite: Features You Didn't Know It Had article!

Another interesting discussion point is how far self hosted PostgreSQL and pgBackRest can get you to a near-zero data loss high RPO, RTO setup. Its simply amazing we can self host all this.

[−] mastermage 42d ago
Sqlite is pretty great I personaly would appreciate abit more expressive types though.
[−] mergisi 37d ago
[dead]
[−] devnotes77 43d ago
[dead]
[−] cloudpeaklabs 43d ago
[flagged]
[−] fushihara 43d ago
I wish SQLite would add a bool type and proper date/time types. Is there really no plan to add them?

For bool, it could just be an alias of a numeric type. Something equivalent to number check(col = 0 or col = 1) would be perfectly fine.

Date/time handling is pretty weak. Having to store values as GMT text is just inconvenient.

When retrieving values in Node.js, I ended up using new Date(val), which caused the classic bug where a GMT-stored value gets interpreted in the local timezone.

The correct approach was new Date(val + ".Z"), but I really don’t want to deal with that kind of hassle anymore.