The Git Commands I Run Before Reading Any Code

(piechowski.io)

170 points | by grepsedawk 2 hours ago

13 comments

  • pzmarzly 1 hour ago
    Jujutsu equivalents, if anyone is curious:

    What Changes the Most

        jj log --no-graph -r 'ancestors(trunk()) & committer_date(after:"1 year ago")' \
          -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
          | sort | uniq -c | sort -nr | head -20
    
    Who Built This

        jj log --no-graph -r 'ancestors(trunk()) & ~merges()' \
          -T 'self.author().name() ++ "\n"' \
          | sort | uniq -c | sort -nr
    
    Where Do Bugs Cluster

        jj log --no-graph -r 'ancestors(trunk()) & description(regex:"(?i)fix|bug|broken")' \
          -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
          | sort | uniq -c | sort -nr | head -20
    
    Is This Project Accelerating or Dying

        jj log --no-graph -r 'ancestors(trunk())' \
          -T 'self.committer().timestamp().format("%Y-%m") ++ "\n"' \
          | sort | uniq -c
    
    How Often Is the Team Firefighting

        jj log --no-graph \
          -r 'ancestors(trunk()) & committer_date(after:"1 year ago") & description(regex:"(?i)revert|hotfix|emergency|rollback")'
    
    Much more verbose, closer to programming than shell scripting. But less flags to remember.
    • palata 55 minutes ago
      To me, it makes jujutsu look like the Nix of VCSes.

      Not meaning to offend anyone: Nix is cool, but adds complexity. And as a disclaimer: I used jujutsu for a few months and went back to git. Mostly because git is wired in my fingers, and git is everywhere. Those examples of what jujutsu can do and not git sound nice, but in those few months I never remotely had a need for them, so it felt overkill for me.

      • Jenk 19 minutes ago
        Tbf you wouldn't use/switch to jj for (because of) those kind of commands, and are quite the outlier in the grand list of reasons to use jj. However the option to use the revset language in that manner is a high-ranking reason to use jj in my opinion.

        The most frequent "complex" command I use is to find commits in my name that are unsigned, and then sign them (this is owing to my workflow with agents that commit on my behalf but I'm not going to give agents my private key!)

            jj log -r 'mine() & ~signed()'
        
            # or if yolo mode...
        
            jj sign -r 'mine() & ~signed()'
        
        I hadn't even spared a moment to consider the git equivalent but I would humbly expect it to be quite obtuse.
        • palata 10 minutes ago
          Actually, signing was one of the annoying parts of jujutsu for me: I sign with a security key, and the way jujutsu handled signing was very painful to me (I know it can be configured and I tried a few different ways, but it felt inherent to how jujutsu handles commits (revisions?)).
  • mattrighetti 18 minutes ago
    I have a summary alias that kind of does similar things

      # summary: print a helpful summary of some typical metrics
      summary = "!f() { \
        printf \"Summary of this branch...\n\"; \
        printf \"%s\n\" $(git rev-parse --abbrev-ref HEAD); \
        printf \"%s first commit timestamp\n\" $(git log --date-order --format=%cI | tail -1); \
        printf \"%s latest commit timestamp\n\" $(git log -1 --date-order --format=%cI); \
        printf \"%d commit count\n\" $(git rev-list --count HEAD); \
        printf \"%d date count\n\" $(git log --format=oneline --format=\"%ad\" --date=format:\"%Y-%m-%d\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
        printf \"%d tag count\n\" $(git tag | wc -l); \
        printf \"%d author count\n\" $(git log --format=oneline --format=\"%aE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
        printf \"%d committer count\n\" $(git log --format=oneline --format=\"%cE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
        printf \"%d local branch count\n\" $(git branch | grep -v \" -> \" | wc -l); \
        printf \"%d remote branch count\n\" $(git branch -r | grep -v \" -> \" | wc -l); \
        printf \"\nSummary of this directory...\n\"; \
        printf \"%s\n\" $(pwd); \
        printf \"%d file count via git ls-files\n\" $(git ls-files | wc -l); \
        printf \"%d file count via find command\n\" $(find . | wc -l); \
        printf \"%d disk usage\n\" $(du -s | awk '{print $1}'); \
        printf \"\nMost-active authors, with commit count and %%...\n\"; git log-of-count-and-email | head -7; \
        printf \"\nMost-active dates, with commit count and %%...\n\"; git log-of-count-and-day | head -7; \
        printf \"\nMost-active files, with churn count\n\"; git churn | head -7; \
      }; f"
    • duskdozer 15 minutes ago
      Curious - why write it as a function in presumably .gitconfig and not just a git-summary script in your path? Just seems like a lot of extra escapes and quotes and stuff
      • mattrighetti 3 minutes ago
        It's a very old config that I copied from someone many years ago, agree that it's a bit hard to parse visually.
  • ramon156 54 minutes ago
    > The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”

    The most changed file is the one people are afraid of touching?

    • rbonvall 26 minutes ago
      Just like that place that's so crowded nobody goes there anymore.
    • mchaver 26 minutes ago
      Definitely not in my experience. The most changed are the change logs, files with version numbers and readmes. I don't think anyone is afraid of keeping those up to date.
    • dewey 50 minutes ago
      I've just tried this, and the most touched files are also the most irrelevant or boring files (auto generated, entry-point of the service etc.) in my tests.
      • nulltrace 28 minutes ago
        Yeah same thing happens with lockfiles and CI configs. You end up filtering out half the list before it tells you anything useful.
    • mememememememo 51 minutes ago
      Yes. Because the fear is butressed with necessity. You have to edit the file, and so does everyone else and that is a recipe for a lot of mess. I can think back over years of files like this. Usually kilolines of impossible to reason about doeverything.
    • szszrk 31 minutes ago
      Could be also that a frequently edited file had most opportunity to be broken. And it was edited by the most random crowd.
  • JetSetIlly 44 minutes ago
    Some nice ideas but the regexes should include word boundaries. For example:

    git log -i -E --grep="\b(fix|fixed|fixes|bug|broken)\b" --name-only --format='' | sort | uniq -c | sort -nr | head -20

    I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.

  • croemer 16 minutes ago
    Rather than using an LLM to write fluffy paragraphs explaining what each command does and what it tells them, the author should have shown their output (truncated if necessary)
  • gherkinnn 1 hour ago
    These are some helpful heuristics, thanks.

    This list is also one of many arguments for maintaining good Git discipline.

  • seba_dos1 46 minutes ago
    > If the team squashes every PR into a single commit, this output reflects who merged, not who wrote.

    Squash-merge workflows are stupid (you lose information without gaining anything in return as it was easily filterable at retrieval anyway) and only useful as a workaround for people not knowing how to use git, but git stores the author and committer names separately, so it doesn't matter who merged, but rather whether the squashed patchset consisted of commits with multiple authors (and even then you could store it with Co-authored-by trailers, but that's harder to use in such oneliners).

    • theshrike79 35 minutes ago
      Can you explain to me (an avid squash-merger) what extra information do you gain by having commits that say "argh, let's see if this works", "crap, the CI is failing again, small fix to see if it works", "pushing before leaving for vacation" in the main git history?

      With a squash merge one PR is one commit, simple, clean and easy to roll back or cherry-pick to another branch.

      • seba_dos1 27 minutes ago
        These commits reaching the reviewer are a sign of either not knowing how to use git or not respecting their time. You clean things up and split into logical chunks when you get ready to push into a shared place.
        • croemer 20 minutes ago
          What if the shared place is the place where you run a bunch of CI? Then you push your work early to a branch to see the results, fix them etc.
          • mr_mitm 4 minutes ago
            You can always force-push a cleaned up version of your branch when you are ready for review, or start a new one and delete the WIP one.
          • seba_dos1 19 minutes ago
            You can do whatever you want with stuff nobody else looks at. I do too.
        • zaphirplane 23 minutes ago
          What are examples of better ones. I don’t get the let me show the world my work and I’m not a fan of large PR
          • duskdozer 0 minutes ago
            if you mean better messages, it's not really that. those junk messages should be rewritten and if the commits don't stand alone, merged together with rebase. it's the "logical chunks" the parent mentioned.

            it's hard to say fully, but unless a changeset is quite small or otherwise is basically 0% or 100%, there are usually smaller steps.

            like kind of contrived but say you have one function that uses a helper. if there's a bug in the function, and it turns out to fix that it makes a lot more sense to change the return type of the helper, you would make commit 1 to change the return type, then commit 2 fix the bug. would these be separate PRs? probably not to me but I guess it depends on your project workflow. keeping them in separate commits even if they're small lets you bisect more easily later on in case there was some unforseen or untested problem that was introduced, leading you to smaller chunks of code to check for the cause.

        • yokoprime 20 minutes ago
          Haha, good luck working with a team with more than 2 people. A good reviewer looks at the end-state and does not care about individual commits. If im curious about a specific change i just look at the blame.
          • hhjinks 16 minutes ago
            You review code not to verify the actual output of the code, but the code itself. For bugs, for maintainability. Commit hygiene is part of that.
          • seba_dos1 14 minutes ago
            I have no troubles working on big FLOSS projects where reviews usually happen at the commit level :)
      • Aachen 25 minutes ago
        If someone uses git commits like the save function of their editor and doesn't write messages intended for reading by anyone else, it makes sense to want to hide them

        For other cases, you lose the information about why things are this way. It's too verbose to //comment on every like with how it came to be this way but on (non-rare in total, but rare per line) occasion it's useful to see what the change was that made the line be like this, or even just who to potentially ask for help (when >1 person worked on a feature branch, which I'd say is common)

        • seba_dos1 3 minutes ago
          > If someone uses git commits like the save function of their editor

          I use it like that too and yet the reviewers don't get to see these commits. Git has very powerful tools for manipulating the commit graph that many people just don't bother to learn. Imagine if I sent a patchset to the Linux Kernel Mailing List containing such "fix typo", "please work now", "wtf" patches - my shamelessness has its limits!

    • arnorhs 28 minutes ago
      The author is talking about the case where you have coherent commits, probably from multiple PRs/merges, that get merged into a main branch as a single commit.

      Yeah, I can imagine it being annoying that sqashing in that case wipes the author attribution, when not everybody is doing PRs against the main branch.

      However, calling all squash-merge workflows "stupid" without any nuance.. well that's "stupid" :)

      • seba_dos1 20 minutes ago
        I don't think there's much nuance in the "I don't know --first-parent exists" workflow. Yes, you may sometimes squash-merge a contribution coming from someone who can't use git well when you realize that it will just be simpler for everyone to do that than to demand them to clean their stuff up, but that's pretty much the only time you actually have a good reason to do that.
    • filcuk 34 minutes ago
      Having the tree easy to filter doesn't matter if it returns hundreds of commits you have to sift through for no reason.
  • boxed 4 minutes ago
    Just looking at how often a file changes without knowing how big the file is seems a bit silly. Surely it should be changes/line or something?
  • traceroute66 45 minutes ago
    > The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about.

    What a weird check and assumption.

    I mean, surely most of the "20 most-changed files" will be README and docs, plus language-specific lock-files etc. ?

    So if you're not accounting for those in your git/jj syntax you're going to end up with an awful lot of false-positive noise.

    • theshrike79 34 minutes ago
      Why would you touch the README file hundreds of times a year?

      You're right about package.json, pnpm-lock etc though, but those are easy to filter out if the project in question uses them.

      • traceroute66 23 minutes ago
        > Why would you touch the README file hundreds of times a year?

        You're right, perhaps I should have said CHANGELOG etc.

        Although some projects e.g. bump version numbers in README or add extra one-liner examples ....

      • raxxorraxor 16 minutes ago
        Some readme files include changelogs. But aside from that I think this can still net some useful information. I like to look at the most recently changed files in a repo as well.
  • aa-jv 22 minutes ago
    Great tips, added to notes.txt for future use ..

    Another one I do, is:

        $alias gss='git for-each-ref --sort=-committerdate'
    
        $gss
    
        ce652ca83817e83f6041f7e5cd177f2d023a5489 commit refs/heads/project-feature-development
        ce652ca83817e83f6041f7e5cd177f2d023a5489 commit refs/remotes/origin/project-feature-development
        1ef272ea1d3552b59c3d22478afa9819d90dfb39 commit refs/remotes/origin/feature/feature-removal-from-good-state
        c30b4c67298a5fa944d0b387119c1e5ddaf551f1 commit refs/remotes/origin/feature/feature-removal
        eda340eb2c9e75eeb650b5a8850b1879b6b1f704 commit refs/remotes/origin/HEAD
        eda340eb2c9e75eeb650b5a8850b1879b6b1f704 commit refs/remotes/origin/main
        3f874b24fd49c1011e6866c8ec0f259991a24c94 commit refs/heads/project-bugfix-emergency
        ...
    
    
    This way I can see right away which branches are 'ahead' of the pack, what 'the pack' looks like, and what is up and coming for future reference ... in fact I use the 'gss' alias to find out whats going on, regularly, i.e. "git fetch --all && gss" - doing this regularly, and even historically logging it to a file on login, helps see activity in the repo without too much digging. I just watch the hashes.
  • T3RMINATED 21 minutes ago
    [dead]
  • T3RMINATED 23 minutes ago
    [dead]