Git commands I run before reading any code (piechowski.io)

by grepsedawk 509 comments 2338 points
Read article View on HN

509 comments

[−] pzmarzly 37d ago
Jujutsu equivalents, if anyone is curious:

What Changes the Most

    jj log --no-graph -r 'ancestors(trunk()) & committer_date(after:"1 year ago")' \
      -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
      | sort | uniq -c | sort -nr | head -20
Who Built This

    jj log --no-graph -r 'ancestors(trunk()) & ~merges()' \
      -T 'self.author().name() ++ "\n"' \
      | sort | uniq -c | sort -nr
Where Do Bugs Cluster

    jj log --no-graph -r 'ancestors(trunk()) & description(regex:"(?i)fix|bug|broken")' \
      -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
      | sort | uniq -c | sort -nr | head -20
Is This Project Accelerating or Dying

    jj log --no-graph -r 'ancestors(trunk())' \
      -T 'self.committer().timestamp().format("%Y-%m") ++ "\n"' \
      | sort | uniq -c
How Often Is the Team Firefighting

    jj log --no-graph \
      -r 'ancestors(trunk()) & committer_date(after:"1 year ago") & description(regex:"(?i)revert|hotfix|emergency|rollback")'
Much more verbose, closer to programming than shell scripting. But less flags to remember.
[−] palata 37d ago
To me, it makes jujutsu look like the Nix of VCSes.

Not meaning to offend anyone: Nix is cool, but adds complexity. And as a disclaimer: I used jujutsu for a few months and went back to git. Mostly because git is wired in my fingers, and git is everywhere. Those examples of what jujutsu can do and not git sound nice, but in those few months I never remotely had a need for them, so it felt overkill for me.

[−] Jenk 37d ago
Tbf you wouldn't use/switch to jj for (because of) those kind of commands, and are quite the outlier in the grand list of reasons to use jj. However the option to use the revset language in that manner is a high-ranking reason to use jj in my opinion.

The most frequent "complex" command I use is to find commits in my name that are unsigned, and then sign them (this is owing to my workflow with agents that commit on my behalf but I'm not going to give agents my private key!)

    jj log -r 'mine() & ~signed()'

    # or if yolo mode...

    jj sign -r 'mine() & ~signed()'
I hadn't even spared a moment to consider the git equivalent but I would humbly expect it to be quite obtuse.
[−] palata 37d ago
Actually, signing was one of the annoying parts of jujutsu for me: I sign with a security key, and the way jujutsu handled signing was very painful to me (I know it can be configured and I tried a few different ways, but it felt inherent to how jujutsu handles commits (revisions?)).
[−] arccy 37d ago
The only reasonable way to use signing in jj is with the sign-on-push config https://docs.jj-vcs.dev/latest/config/#automatically-signing... rather than as commits are made
[−] Zambyte 37d ago
Why? I have my signing behavior set to own and I haven't noticed any issues, but I don't actually rely on signatures for much.
[−] singron 37d ago
If you need to type in a password to unlock your keychain (e.g. default behavior for gpg-agent), then signing commits one at a time constantly is annoying.

Does "own" try to sign working copy snapshot commits too? That would greatly increase the number and frequency of signatures.

[−] Zambyte 37d ago
Ah, I use my SSH key to sign my commits and I don't have a password on my SSH key.

> Does "own" try to sign working copy snapshot commits too?

Yes

[−] rjh29 37d ago
It's the dvorak of git... Maybe more efficient but incompatible with everyone else and a very loud vocal minority.

You can find this pattern again and again. How many redditors say 120fps is essential for gaming or absolutely require a mechanical keyboard?

[−] mamcx 37d ago
No, jj is super simple in daily use, in contrast with git that is a constant chore (and any sane person use alias). This include stuff that in git is a total mess of complexity like dealing with rebases. So not judge the tool for this odd case.
[−] alper 36d ago
Nix does not really work in that even basic things are absurdly complicated and can take days of messing with poor libraries and documentation.

That's not been my experience with jj which after the initial hurdle is a breeze.

[−] buu700 37d ago
I don't know about jujutsu, but I've actually found that Nix removes a lot of complexity. It's essentially just npm for tooling.

Managing a flake.nix can be a bit more complex than a package.json in practice, due to the flexibility of the format and some quirks around Nix's default caching behavior, but working with it is a breath of fresh air compared relying on globally installed tools. Having said that, you might want to check out Devbox. I haven't used it myself, but found it recently and thought it looked like a nice abstraction over raw Nix.

[−] qudat 36d ago
To be completely fair to JJ: you can still use git commands and any aliases with it. I daily JJ in all of my repos but I created these aliases inside of git. That's one of the great aspects of JJ: it's fully compatible with git.
[−] stingraycharles 37d ago
I don’t understand how people can remember all these custom scripting languages. I can’t even remember most git flags, I’m ecstatic when I remember how to iterate over arrays in “jq”, I can’t fathom how people remember these types of syntaxes.
[−] bsuvc 37d ago
I love how the author thinks developers write commit messages.

All joking aside, it really is a chronic problem in the corporate world. Most codebases I encounter just have "changed stuff" or "hope this works now".

It's a small minority of developers (myself included) who consider the git commit log to be important enough to spend time writing something meaningful.

AI generated commit messages helps this a lot, if developers would actually use it (I hope they will).

[−] joshstrange 37d ago
I ran these commands on a number of codebases I work on and I have to say they paint a very different picture than the reality I know to be true.

> git shortlog -sn --no-merges

Is the most egregious. In one codebase there is a developer's name at the top of the list who outpaced the number 2 by almost 3x the number of commits. That developer no longer works at the company? Crisis? Nope, the opposite. The developer was a net-negative to the team in more ways than one, didn't understand the codebase very well at all, and just happened to commit every time they turned around for some reason.

[−] RickHull 37d ago
Thanks for this. My updated relevant portion of ~/.gitconfig:

    [alias]
        st = status
        ci = commit
        co = checkout
        br = branch
        df = diff
        dfs = diff --stat
        dfc = diff --cached
        dfh = diff --histogram
        dfn = diff --name-status
        rs = restore
        rsc = restore --staged
        last = log -1 HEAD
        lg = log --graph --decorate --oneline --abbrev-commit
        cm = commit -m
        ca = commit --amend
        cane = commit --amend --no-edit
        who = shortlog -sn --no-merges HEAD
        dmg = log --oneline -i -E --grep='(incident|outage|downtime|rollback|revert|mitigate|mitigation|hotfix|broke|prod)' --since='1 year ago'
        bugs = log --oneline -i -E --grep='(bug|bugfix|fix|fixed|fixes|defect|regression|hotfix|broke)' --since='1 year ago'
        bugfiles = !git log --name-only --format='' -i -E --grep='(bug|bugfix|fix|fixed|fixes|defect|regression|hotfix|broke)' --since='1 year ago' | sort | uniq -c | sort -nr
        monthly = !git log --since='1 year ago' --format='%ad' --date=format:'%Y-%m' | sort | uniq -c
        churn = !git log --format='' --name-only --diff-filter=AM --since='1 year ago' | sort | uniq -c | sort -nr | head -20
[−] ramon156 37d ago

> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”

The most changed file is the one people are afraid of touching?

[−] JetSetIlly 37d ago
Some nice ideas but the regexes should include word boundaries. For example:

git log -i -E --grep="\b(fix|fixed|fixes|bug|broken)\b" --name-only --format='' | sort | uniq -c | sort -nr | head -20

I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.

[−] mattrighetti 37d ago
I have a summary alias that kind of does similar things

  # summary: print a helpful summary of some typical metrics
  summary = "!f() { \
    printf \"Summary of this branch...\n\"; \
    printf \"%s\n\" $(git rev-parse --abbrev-ref HEAD); \
    printf \"%s first commit timestamp\n\" $(git log --date-order --format=%cI | tail -1); \
    printf \"%s latest commit timestamp\n\" $(git log -1 --date-order --format=%cI); \
    printf \"%d commit count\n\" $(git rev-list --count HEAD); \
    printf \"%d date count\n\" $(git log --format=oneline --format=\"%ad\" --date=format:\"%Y-%m-%d\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
    printf \"%d tag count\n\" $(git tag | wc -l); \
    printf \"%d author count\n\" $(git log --format=oneline --format=\"%aE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
    printf \"%d committer count\n\" $(git log --format=oneline --format=\"%cE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
    printf \"%d local branch count\n\" $(git branch | grep -v \" -> \" | wc -l); \
    printf \"%d remote branch count\n\" $(git branch -r | grep -v \" -> \" | wc -l); \
    printf \"\nSummary of this directory...\n\"; \
    printf \"%s\n\" $(pwd); \
    printf \"%d file count via git ls-files\n\" $(git ls-files | wc -l); \
    printf \"%d file count via find command\n\" $(find . | wc -l); \
    printf \"%d disk usage\n\" $(du -s | awk '{print $1}'); \
    printf \"\nMost-active authors, with commit count and %%...\n\"; git log-of-count-and-email | head -7; \
    printf \"\nMost-active dates, with commit count and %%...\n\"; git log-of-count-and-day | head -7; \
    printf \"\nMost-active files, with churn count\n\"; git churn | head -7; \
  }; f"
EDIT: props to https://github.com/GitAlias/gitalias
[−] icedchai 37d ago
I wouldn't trust "commit counts." The quality and content of a "commit" can vary widely between developers. I have one guy on my team who commits only working code that has been thoroughly tested locally, another guy who commits one line changes that often don't work, only to be followed by fixes, and more fixes. His "commits" have about 1/100th of the value of the first guy.
[−] whstl 37d ago
> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.

In my experience, when the team doesn't squash, this will reflect the messiest members of the team.

The top committer on the repository I maintain has 8x more commits than the second one. They were fired before I joined and nobody even remembers what they did. Git itself says: not much, just changing the same few files over and over.

Of course if nobody is making a mess in their own commits, this is not an issue. But if they are, squash can be quite more truthful.

[−] croemer 37d ago
Rather than using an LLM to write fluffy paragraphs explaining what each command does and what it tells them, the author should have shown their output (truncated if necessary)
[−] blenderob 37d ago

> Is This Project Accelerating or Dying > > git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c

If the commit frequency goes down, does it really mean that the project is dying? Maybe it is just becoming stable?

[−] aa-jv 37d ago
Great tips, added to notes.txt for future use ..

Another one I do, is:

    $alias gss='git for-each-ref --sort=-committerdate'

    $gss

    ce652ca83817e83f6041f7e5cd177f2d023a5489 commit refs/heads/project-feature-development
    ce652ca83817e83f6041f7e5cd177f2d023a5489 commit refs/remotes/origin/project-feature-development
    1ef272ea1d3552b59c3d22478afa9819d90dfb39 commit refs/remotes/origin/feature/feature-removal-from-good-state
    c30b4c67298a5fa944d0b387119c1e5ddaf551f1 commit refs/remotes/origin/feature/feature-removal
    eda340eb2c9e75eeb650b5a8850b1879b6b1f704 commit refs/remotes/origin/HEAD
    eda340eb2c9e75eeb650b5a8850b1879b6b1f704 commit refs/remotes/origin/main
    3f874b24fd49c1011e6866c8ec0f259991a24c94 commit refs/heads/project-bugfix-emergency
    ...

This way I can see right away which branches are 'ahead' of the pack, what 'the pack' looks like, and what is up and coming for future reference ... in fact I use the 'gss' alias to find out whats going on, regularly, i.e. "git fetch --all && gss" - doing this regularly, and even historically logging it to a file on login, helps see activity in the repo without too much digging. I just watch the hashes.
[−] youknownothing 37d ago
I like the mindset, it reminds me of "Your code as a crime scene" by Adam Tornhill: https://www.adamtornhill.com/articles/crimescene/codeascrime...

Also, very tangentially, to the notion of the Developer's Legacy Index: https://www.javaadvent.com/2021/12/using-jgit-to-analyse-the...

[−] kelnos 37d ago
I really wanted to like this. The author presents a well-thought-out rationale for what conclusions to draw, but I'm skeptical. Commit counts aren't a great signal: yes, the person with the highest night be the person who built it or knows the most about it, but that could also be the person who is sloppy with commits (when they don't squash), or someone who makes a lot of mistakes and has to go back and fix them.

The grep for bugs is not particularly comprehensive: it will pick up some things that aren't bugs, and will miss a bunch of things too.

The "project accelerating or dying" seems odd to me. By definition, the bulk of commits/changes will be at the very beginning of history. And regardless, "stability" doesn't mean "dying".

[−] aledevv 36d ago

>

Commit count by month, for the entire history of the repo. I scan the output looking for shapes. A steady rhythm is healthy. But what does it look like when the count drops by half in a single month?

Let's NOT jump to conclusions; it could mean many things. For example, a period with other priorities, different urgencies, other issues external to the project itself and beyond our control, vacations, illnesses, or anything else that could impact the commit history.

I think these considerations and the others expressed in this article can easily lead to hasty conclusions and erroneous deductions, too simplistic.

Coding flow, like business needs, cannot always be objectively and deterministically measured.

[−] StableAlkyne 37d ago
Biggest life changer for me has been:

git clone --depth 1 --branch $SomeReleaseTag $SomeRepoURL

If you only want to build something, it only downloads what you need to build it. I've probably saved a few terabytes at this point!

[−] moritzwarhier 37d ago
Interesting ideas, but some to me seem very overgeneralizef, e.g.:

> How Often Is the Team Firefighting

> git log --oneline --since="1 year ago" | grep -iE 'revert|hotfix|emergency|rollback

> Crisis patterns are easy to read. Either they’re there or they’re not.

I disagree with the last two quoted sentences, and also, they sound like an LLM.

[−] fzaninotto 37d ago
Instead of focusing on the top 20 files, you can map the entire codebase with data taken from git log using ArcheoloGit [1].

[1]: https://github.com/marmelab/ArcheoloGit

[−] bullen 37d ago
Dying or stabilizing?

Most good projects end up solving a problem permanently and if there is no salary to protect with bogus new features it is then to be considered final?

[−] gherkinnn 37d ago
These are some helpful heuristics, thanks.

This list is also one of many arguments for maintaining good Git discipline.

[−] ivanjermakov 37d ago
When at work we migrated to monorepo, there was an implicit decision to drop commit history. I was the loudest one to make everyone understand how important it is.
[−] alaudet 37d ago
This is good stuff. Why I never think of things like this is beyond me. Thanks
[−] Cthulhu_ 37d ago
For "what changes the most", in my project it's package.json / lock (because of automatic dependency updates) and translation / localization files; I'd argue that's pretty normal and healthy.

For the "bus factor", there's one guy and then there's me, but I stopped being a primary contributor to this project nearly two years ago, lol.

[−] juliob 28d ago
There's also https://github.com/hjr265/gittop, which provides a lot more useful views
[−] Yondle 37d ago
Hey guys this was just meant to give you inspiration, its not a set of rules. How about use what works for you (:
[−] fmbb 37d ago

> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.

Well isn't it typical that the person who wrote is also the person that merged? I have never worked in a place where that is not the norm for application code.

Even if you are one of those insane teams that do not squash merge because keeping everyone's spelling fixes and "try CI again" commits is important for some reason, you will still not see who _wrote_ the code, you will only see who committed the code. And if the person that wrote the code is not also the person that merges the code, I see no reason to trust that the person making commits is also the person writing the code.

[−] TacticalCoder 37d ago

> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”

I've got my Emacs set up to display next to every file that is versioned the number of commits that file has been modified in (for the curious: using a modified all-the-icons-ivy-rich + custom elisp code + custom Bash scripts I wrote and it's trickier than it seems to do in a way that doesn't slows everything down). For example in the menu to open a file or open a recently visited file etc.: basically in every file list, in addition to its size, owner, permissions, etc. I also add the number of commits if it's a versioned file.

I like the fix/bug/broken search in TFA to see where the bugs gather.

[−] niedbalski 37d ago
Ages ago, google released an algorithm to identify hotspots in code by using commit messages. https://github.com/niedbalski/python-bugspots
[−] arthurjj 37d ago
These were interesting but I don't know if they'd work on most or any of the places I've worked. Most places and teams I've worked at have 2-3 small repos per project. Are most places working with monorepos these days?
[−] niedbalski 37d ago
Ages ago google wrote an algorithm to detect hotspots by using commit messages, https://github.com/niedbalski/python-bugspots
[−] alkonaut 37d ago
Trusting the messages to contain specific keywords seems optimistic. I don't think I used "emergency" or "hotfix" ever. "Revert" is some times automatically created by some tools (E.g. un-merging a PR).
[−] pscanf 37d ago
I just finished¹ building an experimental tool that tries to figure out if a repo is slopware or not just by looking at it's git history (plus some GitHub activity data).

The takeaway from my experiment is that you can really tell a lot by how / when / what people commit, but conclusions are very hard to generalize.

For example, I've also stumbled upon the "merge vs squash" issue, where squashes compress and mostly hide big chunks of history, so drawing conclusions from a squashed commit is basically just wild guessing.

(The author of course has also flagged this. But I just wanted to add my voice: yeah, careful to generalize.)

¹ Nothing is ever finished.

[−] yonatan8070 37d ago
My team usually uses "Squash and merge" when we finish PRs, so I feel that would skew the results significantly as it hides 99% of the commit messages inside the long description of the single squashed merge commit.
[−] seba_dos1 37d ago

> If the team squashes every PR into a single commit, this output reflects who merged, not who wrote.

Squash-merge workflows are stupid (you lose information without gaining anything in return as it was easily filterable at retrieval anyway) and only useful as a workaround for people not knowing how to use git, but git stores the author and committer names separately, so it doesn't matter who merged, but rather whether the squashed patchset consisted of commits with multiple authors (and even then you could store it with Co-authored-by trailers, but that's harder to use in such oneliners).

[−] konovalov-nk 37d ago
To me all of these are symptoms of the problem that I outlined in my recent blog post: https://news.ycombinator.com/item?id=47606192

and it touches in detail what exactly commit standards should be, and even how to automate this on CI level.

And then I also have idea/vision how to connect commits to actual product/technical/infra specs, and how to make it all granular and maintainable, and also IDE support.

I would love to see any feedback on my efforts. If you decide to go through my entire 3 posts I wrote, thank you

[−] mikaoelitiana 37d ago
I created a small TUI based on the article https://github.com/mikaoelitiana/git-audit
[−] amai 34d ago
Is there a git command to find the developer who deleted the most lines of code? That is the best developer in the team.
[−] nola-a 37d ago
For more insights on Git, check out https://github.com/nolasoft/okgit
[−] traceroute66 37d ago

> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about.

What a weird check and assumption.

I mean, surely most of the "20 most-changed files" will be README and docs, plus language-specific lock-files etc. ?

So if you're not accounting for those in your git/jj syntax you're going to end up with an awful lot of false-positive noise.

[−] nextlevelwizard 37d ago
These are actually fun to run. Just checked from work who makes most commits and found I have as many commits in past 2 years as 3 next people.

That probably isn’t a good sign

[−] pvtmert 37d ago
The best is: You know that you have a major issue when the data (especially ones around commit messages) is empty or noisy.

Plus, adding an extra point: When you run git log --oneline --graph and the pattern on the left is more complex than the Persian carpet patterns or Ancient Egyptian writings in the Great Pyramid of Giza, you know it's engineering & process quality issue than the code itself...

[−] boxed 37d ago
Just looking at how often a file changes without knowing how big the file is seems a bit silly. Surely it should be changes/line or something?
[−] md224 37d ago
The last sentence of the article is "Here’s what the rest of the week looks like." and then it just stops. Am I missing something?
[−] fishbacon 36d ago
Excellent set of commands.

Of course the two most useful ones would never be useful in the code base I am currently working on.

"fix" might be the single most common commit message, and after that comes "."

Trying for two years, to get people to include at least some information in their commit messages, has exhausted me.

[−] jlarocco 37d ago
I'm so used to magit, it seems kind of primitive to pipe git output around like this.

Anyway, I can glean a lot of this information in a few minutes scrolling through and filtering the log in magit, and it doesn't require memorizing a bunch of command line arguments.