What's Lost in the Squash

There have been countless posts written over the past decade about git workflow and the merits and pitfalls of rebase operations. I don't expect this page adds anything new to the discourse at large, but I find myself wanting documentation of the ways that GitHub's Squash and Merge button complicates my workflow.

My goal in writing this instead of referencing one of its many predecessors is to restrain myself to the pain points that have come up in practice in recent work, as opposed to the many hypothetical “what if?” scenarios we imagine when ask ourselves what we want from a version control system.

I hope you'll rein me in if I go too far astray.

Downstream Pull Requests

The branch being squashed may already have other work-in-progress branches based on it. Squash-merge keeps the branch's original commits off the main timeline while including their contents, which leaves other branches with those commits in their ancestry in a confusing state.

Here's what a work-in-progress branch looked like after its parent branch (#4342) was squashed-merged as 211be. The left line is trunk:

fix (build): use gradle to resolve all runtime classpath for facades
keturn 27 22:15 2a70bdbef
chore (build): refactor lambda to named function
keturn 27 21:54 006c8b482
chore: use short-form copyright
keturn 27 21:15 15d0194e3
perf (build): don't ask jcenter for org.terasology dependencies
keturn 27 20:12 c3bfef012
chore (build): remove old snowplow repo
keturn 27 19:54 2e42c09a6
feat(JOML): migrate to Rectanglei nui.animation (#4341)
MichaelP 28  6:34 94caff32d
build: save build time by not checking jcenter for terasology dependencies (#4342)
keturn 28  6:07 211bea317
fix: casing fix for homedir arg from Gradle (#4337)
Cervator 27 21:54 4d11f8917
feat: add BlockArea to replace Rect2i (#4050)
MichaelP 27  9:13 691aac09a

Some of the commits early in that branch were part of the squash-merged 4342, but we don't know which ones. I've highlighted c3bfe here manually, which I'm guessing from the commit message, but my branch management tools don't show me that automatically. We know the commit just before it must have also been in that branch, but what about the one after it?

If the repo still had the old branch marked, we could tell that way, but our practice is to delete branches after they're merged to keep the clutter down.

Now the work-in-progress branch looks like it's diverged from trunk more than it really has, and it has more commits that change files that are also changed in trunk than necessary. Because they're changed in exactly the same way that the squashed commit changes them, they will merge cleanly, but [here I'm concerned about the branch's list of modified files, but I haven't recorded an example of that so I'll refrain from posting assumptions as facts.]

Granular commits may contain useful information

Sometimes I find myself looking at a specific commit on GitHub. I might be hoping to find a hint in the commit message that was too low-level to include in the overview of the whole branch, or checking to see if a specific test was checked in at the same time as the change to that method, or maybe the diff of the PR as a whole is messy due to renamed files and I need to screen those out so I can tell if any of the logic within those files changed.

Of course, if the PR was squashed, we'll never arrive at any of those individual commits by looking at the git log or git annotate for a specific file. But despite my protests in earlier about not being able to tell which commits were part of the merged branch, GitHub's web interface does retain this information: commits for #4342.

Right under the commit message there, where I'd usually look to find information about which branches and release tags this change was included in, I find this instead:

This commit does not belong to any branch on this repository.

Was this commit not merged? But that change exists in the present! Was it made under some other PR, or some later iteration of this one?

It's probably possible to figure that out with some careful reading if we're seeing that commit as part of its PR, as there are elements on that page that navigate back to the PR's overall discussion and timeline. But if we look up that commit from another context, we get here:

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

— with no indication that it was part of a merged PR whatsoever.

Squashing destroyed the history available to us if not using the GitHub-specific interface, and GitHub's interface leaves me second-guessing if I'm really looking at the thing that is part of the accepted history.

Cleaning out old branches

For the same reason we delete branches after they've been merged on GitHub, it's useful to do the same in one's local repository. When I have a branch that's been fully merged, IntelliJ lets me delete it without fuss. But when I have a local branch that does not seem to be merged in to any other branch, and does not match any branch on the remote, it'll warn about discarding unmerged commits.

It's a safety feature useful for warning you that you're doing something that discards data you're not going to be able to get back even with version control (…unless you dig through your local reflog before it gets garbage collected), but it'll need to get more sophisticated if squashing and discarding commits is a regular part of the workflow.

Appendix: Commit List

The list of commits and their branches shown above is an HTML approximation of what I see in IntelliJ's git log view, but if you prefer git's command line, simply type:

git log --graph \
    --date=format-local':%b %e %k:%M' \
    --pretty='format:%s %aL %ad %h' \
    2a70bdbe 94caff32 \
    $( git merge-base 2a70bdbe 94caff32 )'^!'