Git and SHA-256: stage 4

In case you’d missed it, Git 2.29 has recently been released with a full stage 4 SHA-256 implementation. What exactly does that mean? Let’s take a look.

Git has a transition plan for the hash function, which you can find in the source repository at Documentation/technical/hash-function-transition.txt. It describes four different modes of operation:

Dark launch, where data is stored in SHA-256 but input and output are in SHA-1.
Early transition, where both algorithms are allowed in input, but output is in SHA-1.
Late transition, where both algorithms are allowed in input, but output is SHA-256.
Post-transition, where everything is SHA-256

Note that in all these cases, SHA-256 is used for the data on disk.

The stage 4 implementation we have now implements the post-transition stage, where we have repositories which are either SHA-1 or SHA-256 only and are not interoperable. This may seem like a bizarre implementation to have, but it is actually much simpler to implement. As a result, people who desperately want to live on the bleeding edge (or who have regulatory requirements) can switch to SHA-256 now while the rest of the compatibility code is implemented.

The next step is to implement writing objects to a lookup table. That is, when working in a SHA-256 repository with the feature enabled, Git will take each object written, create a SHA-256 version, which it writes into the repository, and a SHA-1 version, which it does not, and map the SHA-1 object to the SHA-256 object. This is important because, for example, the SHA-1 version of a tree will need to reference SHA-1 blob names, so the blobs must be able to be mapped back and forth. This will also allow lookup with both sets of names.

Once this is available, we can then rewrite the objects on the fly when we interoperate with a remote repository. This will be slow and inefficient, but it will work. The goal is that most users will use SHA-256 on disk, but SHA-1 can be served to users using legacy implementations.

A final question you might be asking yourself is what the scenario looks like for our next hash function transition. What happens when SHA-256 is weak?

The good news is that this is relatively easy. The framework for multiple hash algorithms and the transition code will already be present. The biggest problems will be agreeing on a hash to use and updating the testsuite, which has many large lists of constants. But overall, this could probably be done in a single large (40-50 patch) series instead of many such series.