Page 36 - MSDN Magazine, August 2017
P. 36

Following the 12­byte header is a list of n index entries, where n matches the number of entries described by the index header. The format for each index entry is presented in Figure 7. Git sorts index entries in ascending order based on the path/file name field.
The first 8 bytes represent the time the file was created as an off­ set from midnight of Jan. 1, 1970. The second 8 bytes represent the time the file was modified as an offset from midnight of Jan. 1, 1970. Next are five 4­byte values (device, inode, mode, user id and group id) of file­attribute metadata related to the host OS. The only value used under Windows is the mode, which most often will be the octal 100644 I mentioned earlier when showing output from the ls­files command (this converts to the 4­byte 814AH value, which you can see at position 26H in Figure 5).
Following the metadata is the 4­byte length of the file contents. In Figure 5, this value starts at 030, which shows 00 00 0A 15 (2,581 decimal)—the length of the .gitattributes file on my system:
At offset 034H is the 20­byte SHA­1 value for the blob object:
1ff0c423042b46cb1d617b81efb715defbe8054d.
Remember, this SHA­1 points to the blob object that contains the file contents for the file in question: .gitattributes.
At 048H is a 2­byte value containing two 1­bit flags, a 2­bit merge­ stage value, and a 12­bit length of the path/file name for the current index entry. Of the two 1­bit flags, the high­order bit designates whether the index entry has its assume­unchanged flag set (typi­ cally done using the Git update­index plumbing command); the low­order bit indicates whether another two bytes of data precede the path\\\\file name entry—this bit can be 1 only for index versions 3 and higher). The next 2 bits hold a merge­stage value from 0 to 3, as described earlier. The 12­bit value contains the length of the path\\\\file name string.
If the extended flag was set, a 2­byte value holds the skip­work­ tree and intent­to­add bit flags, along with filler placeholders.
Finally, a variable length sequence of bytes contains the path\\\\file name. This value is terminated with one or more NUL characters. Following that termination is the next blob object in the index or one or more index extension entries (as you’ll see shortly).
Earlier, I mentioned that Git doesn’t build tree objects until you commit what’s been staged. What that means is the index starts out with only path/file names and references to blob objects. As soon as you issue a commit, however, Git updates the index so it contains references to the tree objects it created during the last commit. If those directory references still exist in your working direc­ tory during the next commit, the cached tree object references can be used to reduce the work Git needs to do during the next commit. As you can see, the role of the index is multifaceted, and that’s why it’s described
as an index, staging area and cache.
The index entry shown in Figure 7 supports only
blob object references. To store tree objects, Git uses an extension.
Index Extensions
The index can include extension entries that store specialized data streams to provide additional infor­ mation for the Git engine to consider as it monitors files in the working directory and when it prepares the next commit. To cache tree objects created during the last commit, Git adds a tree extension object to the index for the working directory’s root as well as for each sub­directory.
Figure 5, Marker 2, shows the final bytes of the index and captures the tree objects that are stored in the index. Figure 8 shows the format for the tree­ extension data.
The tree­extension data header, which appears at off­ set 284H, is composed of the string “TREE” (marking the start of the cached tree extension data) followed by a 32­bit value that indicates the length of the extension data that follows. Next are entries for each tree entry: The first entry is a variable­length null­terminated
05/08/2017 05/08/2017 05/08/2017 05/08/2017 05/08/2017 05/08/2017
Figure 7 The Git Index File-Index Entry Data Format
09:24 PM <DIR> 09:24 PM <DIR> 09:24 PM
09:24 PM
09:24 PM <DIR> 09:24 PM
.
..
2,581 .gitattributes
4,565 .gitignore MSDNConsoleApp
1,009 MSDNConsoleApp.sln 8,155 bytes
3 File(s)
3 Dir(s) 92,069,982,208 bytes free
Index File - Index Entry
4 bytes
32-bit created time in seconds
Number of seconds since Jan. 1, 1970, 00:00:00.
4 bytes
32-bit created time - nanosecond component
Nanosecond component of the created time in seconds value.
4 bytes
32-bit modified time in seconds
Number of seconds since Jan. 1, 1970, 00:00:00.
4 bytes
32-bit modified time - nanosecond component
Nanosecond component of the created time in seconds value.
4 bytes
device
Metadata associated with the file—these originate from file attributes used on the Unix OS.
4 bytes
inode
4 bytes
mode
4 bytes
user id
4 bytes
group id
4 bytes
file content length
Number of bytes of content in the file.
20 bytes
SHA-1
Corresponding blob object’s SHA-1 value.
2 bytes
Flags
(High to low bits)
1 bit: assume-valid/assume-unchanged flag
1-bit: extended flag (must be 0 for versions less than 3; if 1 then an additional 2 bytes follow before the path\\\\ file name)
2-bit: merge stage
12-bit: path\\\\file name length (if less than 0xFFF)
2 bytes (version 3 or higher)
Flags
(High to low bits)
1-bit: future use
1-bit: skip-worktree flag (sparse checkout) 1-bit: intent-to-add flag (git add -N) 13-bit: unused, must be zero
Variable Length
Path/file name
NUL terminated
32 msdn magazine
DevOps























   34   35   36   37   38