A Faster Alternative to Jq (micahkepe.com)

by pistolario 258 comments 396 points
Read article View on HN

258 comments

[−] regus 50d ago
Jq's syntax is so arcane I can never remember it and always need to look up how to get a value from simple JSON.
[−] marginalia_nu 49d ago
I think the big problem is it's a tool you usually reach for so rarely you never quite get the opportunity to really learn it well, so it always remains in that valley of despair where you know you should use it, but it's never intuitive or easy to use.

It's not unique in that regard. 'sed' is Turing complete[1][2], but few people get farther than learning how to do a basic regex substitution.

[1] https://catonmat.net/proof-that-sed-is-turing-complete

[1] And arguably a Turing tarpit.

[−] jasomill 49d ago
I was just going to say, jq is like sed in that I only use 1% of it 99% of the time, but unlike sed in that I'm not aware of any clearly better if less ubiquitous alternatives to the 1% (e.g., Perl or ripgrep for simple regex substitutions in pipelines because better regex dialects).

Closest I've come, if you're willing to overlook its verbosity and (lack of) speed, is actually PowerShell, if only because it's a bit nicer than Python or JavaScript for interactive use.

[−] HappMacDonald 49d ago
Yeah, sed (and friends) browbeat everyone into learning regex (which PERL then refined).

I think it might be more cognitive load than it is worth to expect everyone en masse to learn another single-line-punctuation-driven-language to perform everyday tasks with.

[−] d35007 50d ago
That’s interesting! Can you say a little more? I find jq’s syntax and semantics to be simple and intuitive. It’s mostly dots, pipes, and brackets. It’s a lot like writing shell pipelines imo. And I tend to use it in the same way. Lots of one-time use invocations, so I spend more time writing jq filters than I spend reading them.

I suspect my use cases are less complex than yours. Or maybe jq just fits the way I think for some reason.

I dream of a world in which all CLI tools produce and consume JSON and we use jq to glue them together. Sounds like that would be a nightmare for you.

[−] randusername 50d ago
I'm not GP, I use jq all the time, but I each time I use it I feel like I'm still a beginner because I don't get where I want to go on the first several attempts. Great tool, but IMO it is more intuitive to JSON people that want a CLI tool than CLI people that want a JSON tool. In other words, I have my own preconceptions about how piping should work on the whole thing, not iterating, and it always trips me up.

Here's an example of my white whale, converting JSON arrays to TSV.

cat input.json | jq -S '(first|keys | map({key: ., value: .}) | from_entries), (.[])' | jq -r '[.[]] | @tsv' > out.tsv

[−] nh23423fefe 49d ago

    
[−] randusername 49d ago
oh my god how could I have been doing this for so long and not realize that you can redirect before your binary.

I knew cat was an anti-pattern, but I always thought it was so unreadable to redirect at the end

[−] attentive 49d ago
it seems smart until you accidently type >input.json and nuke the file
[−] HappMacDonald 49d ago
That sounds like a mistake which would be easily to make at the end of the line, unless you are contrasting input stream redirect against cat regardless where it's written on the line?
[−] figmert 49d ago
You can use sponge for that.
[−] figmert 49d ago
Here's an easier to understand query for what you're trying to do (at least it's easier to understand for me):

    cat input.json | jq -r '(first | keys) as $cols | $cols, (.[] | [.[$cols[]]]) | @tsv'
That whole map and from entries throws it off. It's not a good use for what you're doing. tsv expects a bunch of arrays, whereas you're getting a bunch of objects (with the header also being one) and then converting them to arrays. That is an unnecessary step and makes it a little harder to understand.
[−] randusername 49d ago
Thanks for sharing, this is much better, though I actually think it is the perfect example to explain something that is brain-slippery about jq

look at $cols | $cols

my brain says hmm that's a typo, clearly they meant ; instead of | because nothing is getting piped, we just have two separate statements. Surely the assignment "exhausts the pipeline" and we're only passing null downstream

the pipelining has some implicit contextual stuff going on that I have to arrive at by trial and error each time since it doesn't fit in my worldview while I'm doing other shell stuff

[−] figmert 49d ago
I totally agree, it did take me a while to come to terms with the syntax of assigning variables specifically due to that pipe at the end. I guess sometimes we just have to know the quirks of the relevant tooling we use. I used to use PHP heavily in the 4 and 5 days, and kinda got used to all the quirks it had. So during reviews, I would pick up a lot of issues some of my colleagues did not.

Interestingly some things do use a semicolon in jq, specifically while, until, reduce and some others I can't remember right now.

[−] chuckadams 49d ago
Honestly both of those make me do the confused-dog-head-tilt thing. I'd go for something sexp based, perhaps with infix composition, map, and flatmap operators as sugar.
[−] lokar 49d ago
I find it much harder to remember / use each time then awk
[−] firesteelrain 49d ago
Trying to make a generic pipeline for json arrays because you don’t know the field names?
[−] coldtea 49d ago
Why the f would they want to hardcode the field names?
[−] firesteelrain 48d ago
Because usually I am dealing with data I know not anonymous data
[−] coldtea 48d ago
Doesn't have to be anonymous to be variable
[−] firesteelrain 48d ago
You know what I meant
[−] attentive 49d ago

> I dream of a world in which all CLI tools produce and consume JSON and we use jq to glue them together.

that world exists and mature (powershell)

[−] stingraycharles 49d ago
Sound similar to how power shell works, and it’s not great. Plain text is better.
[−] rzzzt 49d ago
I'm often having trouble with figuring out in advance what the end result will be when processing an input array: an array of mapped objects or a series of self-contained JSON objects? Why? Which one is better? What if I would like to filter out some of the elements as part of the operation?
[−] xnx 48d ago
It's extra complicated under Windows because of issues escaping/wrapping quotes "" and pipes ^|.
[−] ivaniscoding 50d ago
Shameless plug, but you might like this: https://github.com/IvanIsCoding/celq

jq is the CLI I like the most, but sometimes even I struggled to understand the queries I wrote in the past. celq uses a more familiar language (CEL)

[−] xpe 50d ago
CEL looks interesting and useful, though it isn't common nor familiar imo (not for me at least). Quoting from https://github.com/google/cel-spec

    # Common Expression Language

    The Common Expression Language (CEL) implements common
    semantics for expression evaluation, enabling different
    applications to more easily interoperate.

    ## Key Applications

    - Security policy: organizations have complex infrastructure
      and need common tooling to reason about the system as a whole
    - Protocols: expressions are a useful data type and require
      interoperability across programming languages and platforms.
[−] ivaniscoding 50d ago
That’s some fair criticism, but the same page tells that the language wanted to have a similar syntax to C and JavaScript.

I think my personal preference for syntax would be Python’s. One day I want to try writing a query tool with https://github.com/pydantic/monty

[−] TomNomNom 50d ago
Cool tool! Really appreciate the shoutout to gron in the readme, thanks! :)
[−] bigfishrunning 50d ago
I had never heard of CEL, looks useful though, thanks for posting this!
[−] dcre 49d ago
Funny that everyone is linking the tools they wrote for themselves to deal with this problem. I am no exception. I wrote one that just lets you write JavaScript. Imagine my surprise that this extremely naive implementation was faster than jq, even on large files.

    $ cat package.json | dq 'Object.keys(data).slice(0, 5)'
    [ "name", "type", "version", "scripts", "dependencies" ]
https://crespo.business/posts/dq-its-just-js/
[−] physicles 48d ago
Love it. This is so clearly the way to solve the jq writeability problem. I’m going to replace jq with this immediately.
[−] arunix 49d ago
Thanks. Can you say more about why TypeScript with Deno is your scripting language of choice?
[−] iLemming 49d ago
It's because .json itself has so much useless cruft it's often annoying to deal with. I am forever indebted for younger self forcing me to learn Clojure. Most of the time I choose not even bother with JSON anymore - EDN semantically so much cleaner - it's almost twice compact (yet lossless), it's far more readable (quotes and commas are optional), and easier to work with structurally. These days I'd use borkdude/jet or babashka and then deal with data in Clojure REPL - there I can inspect it from all sorts of angles, it's far easier to group, sort, slice, dice, map and filter through it. One can even easily visualize the data using djblue/portal. Why most people strangulate themselves with confusing jq operators unnecessarily, I would never understand. Clojure is not that hard, maybe learn some basics, it comes handy a lot. Even when your team doesn't have any Clojure code.
[−] xendo 49d ago
Highly recommend gron. https://github.com/tomnomnom/gron
[−] epr 49d ago
To fix this I recently made myself a tiny tool I called jtree that recursively walks json, spitting out one line per leaf. Each line is the jq selector and leaf value separated by "=".

No more fiddling around trying to figure out the damn selector by trying to track the indentation level across a huge file. Also easy to pipe into fzf, then split on "=", trim, then pass to jq

[−] janderland 50d ago
JMESPath is what I wish jq was. Consistent grammar. It only issue is it lacks the ability to convert JSON to other formats like CSV.
[−] charlesdaniels 49d ago
If we're plugging jq alternatives, I'll plug my own: https://git.sr.ht/~charles/rq

I was working at lot with Rego (the DSL for Open Policy Agent) and realized it was actually a pretty nice syntax for jq type use cases.

[−] voidfunc 50d ago
I just ask Opus to generate the queries for me these days.
[−] hilti 50d ago
LOL ... I can absolutely feel your pain. That's exactly why I created for myself a graphical approach. I shared the first version with friends and it turned into "ColumnLens" (ImGUI on Mac) app. Here is a use case from the healthcare industry: https://columnlens.com/industries/medical
[−] raydev 49d ago
Like I did with regex some years earlier, I worked on a project for a few weeks that required constant interactions with jq, and through that I managed to lock in the general shape of queries so that my google hints became much faster.

Of course, this doesn't matter now, I just ask an LLM to make the query for me if it's so complex that I can't do it by hand within seconds.

[−] dhuan_ 49d ago
I agree, even trivial tasks require us to go back to jq's manual to learn how to write their language.

this and other reasons is why I built: https://github.com/dhuan/dop

[−] justonceokay 49d ago
When I need it i find that relearning the jq syntax is still faster than whatever other harebrained scheme I might come up with to solve my problem. It’s just so useful 2x a year when I really need it
[−] LgWoodenBadger 49d ago
I completely agree. I much prefer leveraging actual javascript to get what I need instead of spending time trying to fumble my way through jq syntax.
[−] GaryNumanVevo 50d ago
yeah I literally just use gemini / claude to one-shot JQ queries now
[−] NSPG911 50d ago
I also genuinely hate using jq. It is one of the only things that I rely heavily on AI.
[−] d0963319287 50d ago
[flagged]
[−] 1a527dd5 50d ago
I appreciate performance as much as the next person; but I see this endless battle to measure things in ns/us/ms as performative.

Sure there are 0.000001% edge cases where that MIGHT be the next big bottleneck.

I see the same thing repeated in various front end tooling too. They all claim to be _much_ faster than their counterpart.

9/10 whatever tooling you are using now will be perfectly fine. Example; I use grep a lot in an ad hoc manner on really large files I switch to rg. But that is only in the handful of cases.

[−] Kovah 50d ago
I wonder so often about many new CLI tools whose primary selling point is their speed over other tools. Yet I personally have not encountered any case where a tool like jq feels incredibly slow, and I would feel the urge to find something else. What do people do all day that existing tools are no longer enough? Or is it that kind of "my new terminal opens 107ms faster now, and I don't notice it, but I simply feel better because I know"?
[−] hackrmn 50d ago
Having used jq and yq (which followed from the former, in spirit), I have never had to complain about performance of the _latter_ which an order of magnitude (or several) _slower_ than the former. So if there's something faster than jq, it's laudable that the author of the faster tool accomplished such a goal, but in the broader context I'd say the performance benefit would be required by a niche slice of the userbase. People who analyse JSON-formatted logs, perhaps? Then again, newline-delimited JSON reigns supreme in that particular kind of scenario, making the point of a faster jq moot again.

However, as someone who always loved faster software and being an optimisation nerd, hat's off!

[−] ifh-hn 50d ago
I learned a number of data processing cli tools: jq, mlr, htmlq, xsv, yq, etc; to name a few. Not to the level of completing advent of code or anything, but good enough for my day to day usage. It was never ending with the amount of formats I needed to extract data from, and the different syntax's. All that changed when I found nushell though, its replaced all of these tools for me. One syntax for everything, breath of fresh air!
[−] Bigpet 50d ago
When initially opening the page it had broken colors in light mode. For anyone else encountering it: switch to dark mode and then back to light mode to fix it.
[−] Jenk 50d ago
I switched to Jaq[0] a while back for the 'correctness' sake rather than performance. But Jaq also claims to be more performant than jq.

[0]: https://github.com/01mf02/jaq

[−] jiehong 50d ago
First of all, congratulations! Nice tool!

Second, some comments on the presentation: the horizontal violin graphs are nice, but all tools have the same colours, and so it's just hard to even spot where jsongrep is. I'd recommend grouping by tool and colour coding it. Besides, jq itself isn't in the graphs at all (but the title of the post made me think it would be!).

Last, xLarge is a 190MiB file. I was surprised by that. It seems too low for xLarge. I daily check 400MiB json documents, and sometimes GiB ones.

[−] Asmod4n 50d ago
You could just take simdjson, use its ondemand api and then navigate it with .at_path(_with_wildcard) (https://github.com/simdjson/simdjson/blob/master/doc/basics....)

The whole tool would be like a few dozen lines of c++ and most likely be faster than this.

[−] maxloh 50d ago
From their README [0]:

> Jq is a powerful tool, but its imperative filter syntax can be verbose for common path-matching tasks. jsongrep is declarative: you describe the shape of the paths you want, and the engine finds them.

IMO, this isn't a common use case. The comparison here is essentially like Java vs Python. Jq is perfectly fine for quick peeking. If you actually need better performance, there are always faster ways to parse JSON than using a CLI.

[0]: https://github.com/micahkepe/jsongrep

[−] vindin 50d ago
The data viz of the benchmarks is really rough. I think you’d get a lot of leverage out of rebuilding it and using colors and/or shapes to extract additional dimensions. Nobody wants to scan through raw file paths as labels to try and figure out what the hell the results are
[−] allknowingfrog 49d ago
I deal with a fair amount of newline-delimited JSON in my day job, where each line in the file is a complete JSON object. I've seen this referred to as "jsonl", and it's not entirely uncommon for logs and other kinds of time-series data dumps. Do any of the popular JSON CLI tools work with this format? I didn't see any mention of it here.
[−] throwawaypath 50d ago
After reading the title, I was worried that this wasn't written in Rust!
[−] onedognight 50d ago
Having the equivalent jq expression in these examples might help to compare expressiveness, and it might help me see if jq could “just” use a DFA when a (sub)query admits one. grep, ripgrep, etc change algorithms based on the query and that makes the speed improvements automatic.
[−] bouk 50d ago
I highly recommend anyone to look at jq's VM implementation some time, it's kind of mind-blowing how it works under the hood: https://github.com/jqlang/jq/blob/master/src/execute.c

It does some kind of stack forking which is what allows its funky syntax

[−] Self-Perfection 48d ago
I think that in most cases jq is launched to extract value from relatively small JSON document, for which raw parsing speed is not affect much. jq is just really slow to start. Version 1.6 was especially abysmally slow to start, 10x times slower than 1.5:

https://github.com/jqlang/jq/issues/1826

So any replacement candidate should also benchmark like hyperfine "jq .a <<< '{"a": 10 }'" . This oneliner does not work but should illustrate the idea.

Also please just use jshon if you need to just extract specific value from some small JSON. jshon uses way less resources by any conceivable metric.

[−] skywhopper 50d ago
If the author cares, I can’t read everything on this page. The command snippets have a “BASH” pill in the top left that covers up the command I’m supposed to run. And then there are, I guess topic headings or something that are white-on-white text, so honestly I don’t know what they say or what they are.
[−] ontouchstart 50d ago
Everything can be written in JavaScript will be written in JavaScript.

Everything can be rewritten in Rust will be written in Rust.

[−] enricozb 50d ago
I am excited for some alternative syntax to jq's. I haven't given much thought to how I'd write a new JSON query syntax if I were writing things from scratch, but I personally never found the jq syntax intuitive. Perhaps I haven't given it enough effort to learn properly.
[−] Voranto 50d ago
Quick question: Isn't the construction of a NFA - DFA a O(2^n) algorithm? If a JSON file has a couple hundred values, its equivalent NFA will have a similar amount, and the DFA will have 2^100 states, so I must be missing something.
[−] tehnub 50d ago
I've been using jj, which apparently is also faster than jq https://github.com/tidwall/jj
[−] hilti 50d ago
I'm glad you adjusted the CSS while I was typing my comment. I needed to switch to dark mode to be able to read highlighted words.

Nice write up. I will try out your tool.

[−] steelbrain 50d ago
Surprised to see that there's no official binaries for arm64 darwin. Meaning macOS users will have to run it through the Rosetta 2 translation layer.
[−] sirfz 50d ago
Nowadays I'd just use clickhouse-local / chdb / duckdb to query json files (and pretty much any standard format files)
[−] quotemstr 50d ago
Reminder you can also get DuckDB to slurp the JSON natively and give you a much more expressive query model than anything jq-like.
[−] micahkepe 48d ago
OP jsongrep author here: v0.8.0 now has multi format support for serializable formats![^1]

[1]: https://github.com/micahkepe/jsongrep/releases/tag/v0.8.0

[−] luc4 50d ago
Since the query compilation needs exponential time, I wonder how large the queries can be before jsongrep becomes slower than all the other tools. In that regard, I think the library could benefit from some functionality for query compilation at compile-time.
[−] mlmonkey 50d ago
Minor suggestion: often I just want to extract one field, whose name I know exactly. I see that jg has an option -F like this:

$ cat sample.json | jg -F name

I would humbly suggest that a better syntax would be:

$ cat sample.json | jg .name

for a leaf node named "name"; or

$ cat sample.json | jg -F .name.

for any node named "name".

[−] soleveloper 49d ago
I already can't remember jq syntax. Naming this jg just means I'll type one, instinctively use the other's syntax, and get an error anyway. It's a DX trap.

But I will admit, the new syntax makes a lot more sense.

[−] keysersoze33 50d ago
I was a bit skeptical at first, but after reading more into jsongrep, it's actually very good. Only did a very quick test just now, and after stumbling over slightly different syntax to jq, am actually quite impressed. Give it a try
[−] vismit2000 49d ago
Table of contents seems inspired by the famous ripgrep post from 2016: https://burntsushi.net/ripgrep/
[−] wolfi1 50d ago
forgive me my rant, but when I see "just install it with cargo" I immediately lose interest. How many GB do I have to install just to test a little tool? sorry, not gonna do that
[−] arjie 50d ago
Thank you. Very cool. Going to try embedding this into my JSON viewer. One thing I’ve struggled with is that live querying in the UI is constrained by performance.
[−] stuaxo 50d ago
Nice.

Some bits of the site are hard to read "takes a query and a JSON input" query is in white and the background of the site is very light which makes it hard to read.

[−] rswail 50d ago
Just about to read, but I had to change to dark mode to be able to see the examples, which are bold white on a white background.
[−] 1vuio0pswjnm7 49d ago
One problem I have not seen addressed by jq or alterataives, perhaps this one addresses it, is "JSON-like" data. That is, JSON that is not contained in a JSON file

For example, web pages sometimes contain inline "JSON". But as this is not a proper JSON file, jq-style utilties cannot process it

The solution I have used for years is a simple utility written in C using flex^1 (a "filter") that reformats "JSON" on stdin, regardless of whether the input is a proper JSON file or not, into stdout that is line-delimited, human-readable and therefore easy to process with common UNIX utilities

The size of the JSON input does not affect the filter's memory usage. Generally, a large JSON file is processed at the same speed with the same resource usage as a small one

The author here has provided musl static-pie binaries instead of glibc. HN commenters seeking to discredit musl often claim glibc is faster

Personally I choose musl for control not speed

1. jq also uses flex

[−] furryrain 50d ago
If it's easier to use than jq, they should sell the tool on that.
[−] coldtea 50d ago
Speed is good! Not a big fan of the syntax though.
[−] jrhey 49d ago
Since when was jq considered slow?
[−] PUSH_AX 50d ago
Is Jq slow?
[−] alexellisuk 50d ago
Quick comment for the author.

Just added this new tool to arkade, along with the existing jq/yq.

No Arm64 for Darwin.. seriously? (Only x86_64 darwin.. it's a "choice")

No Arm64 for Linux?

For Rust tools it's trivial to add these. Do you think you can do that for the next release?

https://github.com/micahkepe/jsongrep/releases/tag/v0.7.0

[−] peterohler 50d ago
Another alternative is oj, https://github.com/ohler55/ojg. I don't know how the performance compares to jq or any others but it does use JSONPath as the query language. It has a few other options for making nicely formatted JSON and colorizing JSON.
[−] damotiansheng 49d ago
[dead]
[−] leontloveless 50d ago
[dead]
[−] mitul005 50d ago
[flagged]
[−] ryguz 49d ago
[dead]