In The Wizard of Oz, Dorothy and her travelling companions learn that the magical wizard is just an old man hiding behind a curtain, operating a control panel. The special effects that he controlled merely made him seem magical. A simple look behind the curtains may reveal a much simpler truth.
As we saw in the previous post, Git’s object storage is quite straightforward. But we only tried to store content that we passed to Git directly via
stdin. When using Git as a version control system we want to store files in its repository though. But Git makes us stage files before we can store them in its repository. Sometimes this is referred to as adding your file to the staging area. This sounds like the file is being moved to some magical place, but that’s not really what’s happening.
Where is this magical index?
There is nothing magical about the index. Don’t believe me? Let’s look behind the curtain…
We’ll start by creating a simple file and check how Git will hash the contents of this file using the knowledge of Git’s internals that we gained in the previous post.
> echo "San Francisco" > test > cat test | git hash-object --stdin 9a63b5a6857cd18567033f8b54463458d67ca3b6 > git add test
After we add a file to the index, the file contents are stored in the repository. But the
.git directory also now contains a new file named
index. This is the
man behind the curtain index.
Unfortunately the index is a binary file, so we cannot look at its contents directly to grok it. So we’ll use Git’s low-level
ls-files command to inspect the index.
What’s in the index?
ls-files command allows us to list files in the working tree (default) as well as the index (with the
> git ls-files test > git ls-files --stage 100644 9a63b5a6857cd18567033f8b54463458d67ca3b6 0 test
The astute reader recognises the file’s mode (
100644) and the hash value from before. There’s also a stage number (
0), which we’re going to ignore for now. Not visible in
ls-file’s output but also present in the index is the timestamp of the file’s last modification time.
So the index contains metadata for objects stored in its repository, in an optimised binary format. This allows Git to maintain a virtual tree that can be efficiently compared to the working tree. So we can change the file in the working tree and compare it to the file we staged earlier.
> echo "New York" > test > cat test | git hash-object --stdin 825b603258f31bd8639378c5b314f33056dcb20c > git diff diff --git a/test b/test index 9a63b5a..825b603 100644 --- a/test +++ b/test @@ -1 +1 @@ -San Francisco +New York
The fact that the file’s content at the time of staging is stored as a blob in the repository enables us to revert our changes:
> git checkout test > cat test San Francisco
If instead we change the file and ‘re-stage’ it, Git will store a new blob in its repository (825b), and the reference in the index will point to this new blob. The old blob (9a63) is still present in the repository, and can be removed using the
git prune command.
> echo "New York" > test > git add test > git ls-files --stage 100644 825b603258f31bd8639378c5b314f33056dcb20c 0 test > git prune --verbose 9a63b5a6857cd18567033f8b54463458d67ca3b6 blob
Removing a file from the index
We can remove the reference to the object in the repository from the index using the
git rm command. This does not remove the object itself from the repository, nor does it remove the file from the working tree.
> git rm --cached test rm 'test'
Now that the blob is no longer referenced by the index, we can remove it from the repository using
Pay no attention to that man behind the curtain.
Except we did, and what we found was the familiar way that Git stores content as blobs in its repository, and a relatively simple binary file that stores references to said blobs along with some metadata. The metadata allows Git to track changes to files in the working tree.