Page 39 - MSDN Magazine, August 2017
P. 39

string value for the tree path (or simply NUL for the root tree). The following value is an ASCII value, so it is to be read as the “7” you see in the hex editor—the number of blob entries covered by the current tree (because this is the root tree, it has the same num­ ber of entries you saw earlier when issuing the Git ls­files stage command). The next character is a space, followed by another ASCII number to represent the number of subtrees the current tree has.
The root tree for our project has only 1 subtree: MSDNConsoleApp. This value is followed by a line­ feed character, then the SHA­1 for the tree. The SHA­1 starts at offset 291, beginning with 0d21e2.
Let’s confirm that 0d21e2 is actually the root tree SHA­1. To do that, go to the command window and enter:
git log
This displays details of the recent commits:
commit 5192391e9f907eeb47aa38d1c6a3a4ea78e33564 Author: Jonathan Waldman <jonathan.waldman@live.com> Date: Mon May 8 21:24:15 2017 -0500
Add project files.
commit dc0d3343fa24e912f08bc18aaa6f664a4a020079 Author: Jonathan Waldman <jonathan.waldman@live.com> Date: Mon May 8 21:24:07 2017 -0500
Add .gitignore and .gitattributes.
The most recent commit is the one with the timestamp 21:24:15, so that’s the one that last updated the index. I can use that commit’s SHA­1 to find the root­tree SHA­1 value:
git cat-file -p 51923
This generates the following output:
tree 0d21e2f7f760f77ead2cb85cc128efb13f56401d
parent dc0d3343fa24e912f08bc18aaa6f664a4a020079
author Jonathan Waldman <jonathan.waldman@live.com> 1494296655 -0500 committer Jonathan Waldman <jonathan.waldman@live.com> 1494296655 -0500
The preceding tree entry is the root tree object. It confirms that the 0d21e2 value at offset 291H in the index dump is, in fact, the SHA­1 for the root tree object.
The other tree entries appear immediately after the SHA­1 value, starting at offset 2A5H. To confirm the SHA­1 values for cached tree objects under the root tree, run this command:
git ls-tree -r -d master
This displays only the tree objects, recursively on the current branch:
040000 tree c7c367f2d5688dddc25e59525cc6b8efd0df914d MSDNConsoleApp
040000 tree 2723ceb04eda3051abf913782fadeebc97e0123c MSDNConsoleApp/Properties
The mode value of 040000 in the first column indicates that this object is a directory rather than a file.
Finally, the last 20 bytes of the index contain an SHA­1 hash rep­ resenting the index itself: As expected, Git uses this SHA­1 value to validate the data integrity of the index.
While I’ve covered all of the entries in this article’s example index file, larger and more complex index files are the norm. The index file format supports additional extension data streams, such as:
• One that supports merging operations and merge­conflict res­ olution. It has the signature “REUC” (for resolve undo conflict). • One for maintaining a cache of untracked files (these are files to be excluded from tracking, specified in the .gitignore
and .git\\\\info\\\\exclude files and by the file pointed to by core.
excludesfile). It has the signature “UNTR.”
• One to support a split­index mode in order to speed index
updates for very large index files. It has the signature “link.” The index’s extension feature makes it possible to continue
adding to its capabilities.
Wrapping Up
In this article, I reviewed the Git three­tree architecture and delved into details behind its index file. I showed you that Git updates the index in response to certain operations and that it also relies on infor­ mation the index contains in order to carry out other operations.
It’s possible to use Git without thinking much about the index. Yet having knowledge about the index provides invaluable insight into Git’s core functionality while shedding light on how Git detects changes to files in the working directory, what the staging area is and why it’s useful, how Git manages merges, and why Git performs some operations so quickly. It also makes it easy to understand com­ mand­line variants of the check out and rebase commands—and the difference between soft, mixed and hard resets. Such features let you specify whether the index, working directory, or both the index and working directories should be updated when issuing certain commands. You’ll see such options when reading about Git workflows, strategies and advanced operations. The purpose of this article is to orient you to the important role the index plays so you can better digest the ways in which it can be leveraged. n
Jonathan Waldman is a Microsoft Certified Professional who has worked with Microsoft technologies since their inception and who specializes in software ergo- nomics. Waldman is a member of the Pluralsight technical team and he currently leads institutional and private-sector software-development projects. He can be reached at jonathan.waldman@live.com.
thanks to the following Microsoft technical experts for reviewing this article: Kraig Brockschmidt, Saeed Noursalehi, Ralph Squillace and Edward Thomson
Figure 8 The Git Index File Tree-Extension Object Data Format
Index File - Cached Tree-Extension Header
4 bytes
TREE
Fixed signature for a cached tree-extension entry.
4 bytes
32-bit number representing the length of TREE extension data
Cached Tree-Extension Entry
Variable
Path
NUL-terminated path string (null only for the root tree).
ASCII number
Number of entries
ASCII number representing the number of entries in the index covered by this tree entry.
1 byte
20H (space character)
ASCII number
Number of subtrees
ASCII number representing the number of subtrees this tree has.
1 byte
0AH (linefeed character)
20 bytes
Tree object’s SHA-1
SHA-1 values of the tree object this entry produces.
msdnmagazine.com
August 2017 33






































   37   38   39   40   41