Git internals and SHA-1

LWN reminds us that Git still uses SHA-1 by default. Commit or tag signing is not a mitigation, and to understand why you need to know a little about Git’s internal structure.

Git internally looks rather like a content-addressable filesystem, with four object types: tags, commits, trees and blobs.

Content-addressable means changing the content of an object changes the way you address or reference it, and this is achieved using a cryptographic hash function. Here is an illustration of the internal structure of an example repository I created, containing two files (./foo.txt and ./bar/bar.txt) committed separately, and then tagged:

Graphic showing an example Git internal structure featuring tags, commits, trees and blobs, and how these relate to each other.

You can see how ‘trees’ represent directories, ‘blobs’ represent files, and so on. Git can avoid internal duplication of files or directories which remain identical. The hash function allows very efficient lookup of each object within git’s on-disk storage.

Tag and commit signatures do not directly sign the files in the repository; that is, the input to the signature function is the content of the tag/commit object, rather than the files themselves. This is analogous to the way that GPG signatures actually sign a cryptographic hash of your email, and there was a time when this too defaulted to SHA-1. An attacker who can break that hash function can bypass the guarantees of the signature function.

A motivated attacker might be able to replace a blob, commit or tree in a git repository using a SHA-1 collision. Replacing a blob seems easier to me than a commit or tree, because there is no requirement that the content of the files must conform to any particular format.

There is one key technical mitigation to this in Git, which is the SHA-1DC algorithm; this aims to detect and prevent known collision attacks. However, I will have to leave the cryptanalysis of this to the cryptographers!

So, is this in your threat model? Do we need to lobby GitHub for SHA-256 support? Either way, I look forward to the future operational challenge of migrating the entire world’s git repositories across to SHA-256.

Tim Retout

A solution architect

By Tim Retout, 2022-06-29