Ripgrep is faster than grep, ag, git grep, ucg, pt, sift (2016) (burntsushi.net)

by jxmorris12 160 comments 376 points
Read article View on HN

160 comments

[−] craftkiller 53d ago
One of my favorite moments in HN history was watching the authors of the various search tools decide on a common ".ignore" file as opposed to each having their own: https://news.ycombinator.com/item?id=12568245
[−] tmtvl 53d ago
I would argue that grep-like tools which read .gitignore violate the Principle of Least Astonishment (POLA). It would be fine if there were a --ignore flag to enable such functionality, but defaulting to it just feels wrong to me. Obviously smarter people than I disagree, but my dumdum head just feels that way.
[−] TallGuyShort 53d ago

> Obviously smarter people than I disagree, but my dumdum head just feels that way.

That's absolutely not it. What you're describing is part of the UNIX philosophy: programs should do one thing and do it well, and they should function in a way that makes them very versatile and composable, etc.

And that part of the philosophy works GREAT when everything follows another part of the philosophy: everything should be based on flat text files.

But for a number of reasons, and regardless of whatever we all think of those reasons, we live in a world that has a lot of stuff that is NOT the kind of flat text file grep was made for. Binary formats, minified JS, etc. And so to make the tool more practical on a modern *nix workstation, suddenly more people want defaults that are going to work on their flat text files and transparently ignore things like .git.

It's just that you've showed up to an wildly unprincipled world armed with principles.

[−] saghm 52d ago
Sure, but that UNIX philosophy is what got us "grep -r" as the way to search files across an entire directory, which would then compose with stuff like xargs and parallel to be able to do things concurrently. I'd argue that ripgrep shows that that bundling together stuff sometimes does end up with a user experience that people prefer. The nuance lies in figuring out where the balance between "not enough" and "too much" lies, and so far I've yet to see a pithy statement like the UNIX philosophy encapsulate it well.

Alternately, maybe people's idea of what "one thing" is ends up being more subjective than it sounds (or at least depends on context). "Searching through my code" at least sounds like a reasonable idea of "one thing", and it's not crazy that someone might consider "don't search though the stuff that isn't my code, like my npm dependencies or my Rust build artifacts" would be part of "doing it well". Having to specify it every time would be annoying, so you might want to put it in a config file, but then if then if it ends up being identical to your gitignore, having to manually symlink it or copy it each time you modify it is annoying, so it's also not crazy to just use the gitignore by default with a way to opt out of it. Now we're just back where we started; custom .ignore files, fallback to .gitignore, and a flag for when you want to skip that.

[−] rjzzleep 53d ago
Back in the day I would have agreed with you, but ever since there is js everywhere you end up with minified js that megabytes big and match everything. I still have muscle memory with grep -r and it almost always ends up with some js file, that I didn't know exists ruining the moment.
[−] MisterTea 53d ago

> Obviously smarter people than I disagree, but my dumdum head just feels that way.

No you are correct, do not doubt yourself. Baked in behavior catering to a completely separate tool is bad design. Git is the current version control software but its not the first nor last. Imagine if we move to another source control and are burdened with .gitignore files. No thanks.

The Unix tools are designed to be good and explicit at their individual jobs so they can be easily composed together to form more complex tools that cater to the task at hand.

[−] gregwtmtno 53d ago
I have to agree here. I love ripgrep, but at times I've had to go back to regular grep because I couldn't figure out what it was ignoring and why, and there were far too many settings to figure it out.
[−] henrebotha 53d ago
It's a tough one. Lately I've been doing rg -u every single time because too many things get ignored and I can't be bothered to figure out how to configure it more cleanly to do what I want by default.
[−] xorcist 52d ago
You are absolutely right. It is a good feature, but it must be a concious decision. It should not be default. You should set it in your shell alias or environment, just like you have something like

  LESS="-FQMR"
(no bell, more status, raw characters, exit if less than one page).

Those are also completely reasonable to use, but they must set conciously, otherwise the might give results that confuse the user.

[−] alwillis 52d ago
ugrep agrees with you [1].

[1]: https://ugrep.com/

[−] keybored 53d ago
It’s the kind of thing that maybe makes sense today. Less likely to make sense twenty years from now though.

But that’s the kind of problem that only successful things have to worry about.

[−] carlosneves 53d ago
An --ignore-file= flag would be nice I guess:

--ignore-file=.ignore

--ignore-file=.gitignore

--ignore-file=.dockerignore

--ignore-file=.npmignore

etc

but then, assuming all those share the same "ignore file syntax/grammar"...

[−] justin66 53d ago
Agreed. It's a footgun.
[−] drob518 53d ago
I’ve read this multiple times over the years and this post is still the most interesting and informative piece describing the problem of making a fast grep-like tool. I love that it doesn’t just describe how ripgrep works but also how all the other tools work and then compares the various techniques. It’s simultaneously a tutorial and an expert deep dive. Just a beautiful piece of writing. In a perfect world, all code would be similarly documented.
[−] boyter 53d ago
Such a good read. I actually went back though it the other day to steal the searching for the least common byte idea out to speed up my search tool https://github.com/boyter/cs which when coupled with the simd upper lower search technique from fzf cut the wall clock runtime by a third.

There was this post from cursor https://cursor.com/blog/fast-regex-search today about building an index for agents due to them hitting a limit on ripgrep, but I’m not sure what codebase they are hitting that warrants it. Especially since they would have to be at 100-200 GB to be getting to 15s of runtime. Unless it’s all matches that is.

[−] raincole 53d ago
When I first heard about ripgrep my reaction was laughing. grep had been too established. No way something that isn't 100% compatible with grep could get any traction.

And I was dead wrong. Overnight everyone uses rg (me included).

[−] unxmaal 53d ago
I just got ripgrep ported to IRIX over the weekend.

It’s fast even on a 300mhz Octane.

[−] wewewedxfgdf 53d ago
I was using ripgrep once and it had a bug that led me downa terrifying rabbit hole - I can't recall what it was but it involved not being able to find text that absolutely should have been there.

Eventually I was considering rebuilding the machine completely but for some reason after a very long time digging deep into the rabbit hole I tried plain old grep and there was the data exactly where it should have been.

So it's such a vague story but it was a while back - I don't remember the specifics but I sure recall the panic.

[−] nikisweeting 52d ago
Ripgrep is used as the defautl search backend for ArchiveBox, such a good tool. I was on ag (the-silver-searcher) for years before I switched, but haven't gone back since.

There's also RGA (ripgrep-all) which searches binary files like PDFs, ebooks, doc files: https://github.com/phiresky/ripgrep-all

[−] ventana 53d ago
One thing I learned over the years is that the closer my setup is to the default one, the better. I tried switching to the latest and greatest replacements, such as ack or ripgrep for grep, or httpie for curl, just to always return to the default options. Often, the return was caused by a frustration of not having the new tools installed on the random server I sshed to. It's probably just me being unable to persevere in keeping my environment customized, and I'm happy to see these alternative tools evolve and work for other people.
[−] dmix 53d ago
When Claude Code uses grep it's actually using rg underneath
[−] keybored 53d ago

> The binary name for

ripgrep is rg.

I don’t understand when people typeset some name in verbatim, lowercase, but then have another name for the actual command. That’s confusing to me.

Programmers are too enarmored with lower-case names. Why not Ripgrep? Then I can surmise that there might not be some program ripgrep(1) (there might be a shorter version), since using capital letters is not traditional for CLI programs.

Look at Stacked Git:

https://stacked-git.github.io/

> Stacked Git, StGit for short, is an application for managing Git commits as a stack of patches.

> ... The stg command line tool ...

Now, I’ve been puzzled in the past when inputing stgit doesn’t work. But here they call it StGit for short and the actual command is typeset in verbatim (stg(1) would have also worked).

[−] krick 53d ago
I don't remember why I didn't switch from ag, but I remember it was a conscious decision. I think it had something to do with configuration, rg using implicit '.ignore' file (a super-generic name instead of a proper tool-specific config) or even .gitignore, or something else very much unwarranted, that made it annoying to use. Cannot remember, really, only remember that I spent too much time trying to make it behave and decided it isn't worth it. Anyway, faster is nice, but somehow I don't ever feel that ag is too slow for anything. The switch from the previous one (what was it? ack?) felt like a drastic improvement, but ag vs. rg wasn't much difference to me in practice.
[−] Self-Perfection 52d ago
HWisnu wrote cgrep that he asserts is even faster, especially on loaded system. He posted interesting benchmarks

https://hwisnu.bearblog.dev/building-cgrep-using-safe_ch-cus...

It seems this was possible because ripgrep is inefficient in CPU usage when runs multithreaded and uses about 2x times more CPU time in comparison to GNU grep.

https://hwisnu.bearblog.dev/levelized-cost-of-resources-in-b...

[−] p2detar 52d ago
I did a small, 3-runs test on M3 and the results were shockingly impressive.

With 240 log files in various subfolders.

grep -q -r "22:02" --include=".log" 4.15s user 0.09s system 99% cpu 4.269 total

grep -q -r "22:02" --include=".log" 4.18s user 0.09s system 99% cpu 4.265 total

grep -q -r "22:02" --include="*.log" 4.31s user 0.09s system 99% cpu 4.401 total

rg -q "22:02" -t log 0.01s user 0.01s system 83% cpu 0.018 total

rg -q "22:02" -t log 0.01s user 0.01s system 93% cpu 0.017 total

rg -q "22:02" -t log 0.01s user 0.01s system 95% cpu 0.018 total

I really did not expect it to be that fast.

[−] Royalaid 52d ago
I don't know if this is coincidence or not but Cursor just made a post breaking down why they moved to their own solution in place or Ripgrep and it makes a lot of sense from a cursory (haha) read.

https://cursor.com/blog/fast-regex-search

[−] pipe01 53d ago
(2016)
[−] pgporada 52d ago
Last week I experienced a data truncation issue where I ran an rg -zF fixed string search piped into another rg -F. The dataset was roughly 10 million lines. Doing a single rg -z with a regex glob in the middle didn't encounter that issue.
[−] TacticalCoder 53d ago
And burntsushi is one of us: he's regularly here on HN. Big thanks to him. As soon as rg came out I was building it on Linux. Now it ships stocks with Debian (since Bookworm? Don't remember): thanks, thanks and more thanks.
[−] ianberdin 53d ago
It’s a pure delight to read this docs / pitch.
[−] dist-epoch 53d ago
(2024) gg: A fast, more lightweight ripgrep alternative for daily use cases

https://reddit.com/r/rust/comments/1fvzfnb/gg_a_fast_more_li...

[−] groundzeros2015 53d ago
That’s because it doesn’t do the same work. It’s not an equivalent tool to grep.
[−] evilturnip 53d ago
nowgrep is supposedly even faster than ripgrep:

https://x.com/CharlieMQV/status/1972647630653227054

[−] davikr 53d ago
qgrep is faster if you're fine with indexing. worth it
[−] jedisct1 53d ago
ugrep is my daily driver. https://ugrep.com

The TUI is great, and approximate matches are insanely useful.

[−] dinkumthinkum 53d ago
There is also upgrep, which is quite a good project. https://github.com/Genivia/ugrep
[−] cbm-vic-20 53d ago
fd:find::rg:grep

Someone please make an awesome new sed and awk.

[−] AdmiralAsshat 53d ago
Is it still?
[−] tgtweak 53d ago
codex is basically a ripgrep wrapper at this point :)
[−] brtkwr 53d ago
Hasn’t someone rewritten ripgrep in rust by now? C’mon it’s 2026. Oh wait it was written in Rust (back in 2016).
[−] travisdrake 53d ago
still a good read
[−] Innoraai 51d ago
[dead]
[−] devnotes77 53d ago
[dead]
[−] sy0115 53d ago
[dead]
[−] rsmtjohn 53d ago
[flagged]
[−] derodero24 53d ago
[flagged]
[−] wolandark 53d ago
and incompatible with grep syntax, which makes it useless to most system admins
[−] chriswep 53d ago
It seems to me that rg is the number one most important part that enables LLMs to be smart agents in a codebase. Who would have thought that a code search tool would enable AGI?
[−] npn 53d ago
Faster is not always the best thing. I still remember when vs code changed to ripgrep I had to change my habit using it, before then I can just open vs code to any folder and do something with it, even if the folder contains millions of small text files. It worked fine before, but then rg was picked, and it happily used all of my cpu cores scanning files, made me unable to do anything for awhile.

To be honest I hate all the new rust replacement tools, they introduce new behavior just for the sake of it, it's annoying.