Page 36 - MSDN Magazine, August 2017
P. 36
Following the 12byte header is a list of n index entries, where n matches the number of entries described by the index header. The format for each index entry is presented in Figure 7. Git sorts index entries in ascending order based on the path/file name field.
The first 8 bytes represent the time the file was created as an off set from midnight of Jan. 1, 1970. The second 8 bytes represent the time the file was modified as an offset from midnight of Jan. 1, 1970. Next are five 4byte values (device, inode, mode, user id and group id) of fileattribute metadata related to the host OS. The only value used under Windows is the mode, which most often will be the octal 100644 I mentioned earlier when showing output from the lsfiles command (this converts to the 4byte 814AH value, which you can see at position 26H in Figure 5).
Following the metadata is the 4byte length of the file contents. In Figure 5, this value starts at 030, which shows 00 00 0A 15 (2,581 decimal)—the length of the .gitattributes file on my system:
At offset 034H is the 20byte SHA1 value for the blob object:
1ff0c423042b46cb1d617b81efb715defbe8054d.
Remember, this SHA1 points to the blob object that contains the file contents for the file in question: .gitattributes.
At 048H is a 2byte value containing two 1bit flags, a 2bit merge stage value, and a 12bit length of the path/file name for the current index entry. Of the two 1bit flags, the highorder bit designates whether the index entry has its assumeunchanged flag set (typi cally done using the Git updateindex plumbing command); the loworder bit indicates whether another two bytes of data precede the path\\\\file name entry—this bit can be 1 only for index versions 3 and higher). The next 2 bits hold a mergestage value from 0 to 3, as described earlier. The 12bit value contains the length of the path\\\\file name string.
If the extended flag was set, a 2byte value holds the skipwork tree and intenttoadd bit flags, along with filler placeholders.
Finally, a variable length sequence of bytes contains the path\\\\file name. This value is terminated with one or more NUL characters. Following that termination is the next blob object in the index or one or more index extension entries (as you’ll see shortly).
Earlier, I mentioned that Git doesn’t build tree objects until you commit what’s been staged. What that means is the index starts out with only path/file names and references to blob objects. As soon as you issue a commit, however, Git updates the index so it contains references to the tree objects it created during the last commit. If those directory references still exist in your working direc tory during the next commit, the cached tree object references can be used to reduce the work Git needs to do during the next commit. As you can see, the role of the index is multifaceted, and that’s why it’s described
as an index, staging area and cache.
The index entry shown in Figure 7 supports only
blob object references. To store tree objects, Git uses an extension.
Index Extensions
The index can include extension entries that store specialized data streams to provide additional infor mation for the Git engine to consider as it monitors files in the working directory and when it prepares the next commit. To cache tree objects created during the last commit, Git adds a tree extension object to the index for the working directory’s root as well as for each subdirectory.
Figure 5, Marker 2, shows the final bytes of the index and captures the tree objects that are stored in the index. Figure 8 shows the format for the tree extension data.
The treeextension data header, which appears at off set 284H, is composed of the string “TREE” (marking the start of the cached tree extension data) followed by a 32bit value that indicates the length of the extension data that follows. Next are entries for each tree entry: The first entry is a variablelength nullterminated
05/08/2017 05/08/2017 05/08/2017 05/08/2017 05/08/2017 05/08/2017
Figure 7 The Git Index File-Index Entry Data Format
09:24 PM <DIR> 09:24 PM <DIR> 09:24 PM
09:24 PM
09:24 PM <DIR> 09:24 PM
.
..
2,581 .gitattributes
4,565 .gitignore MSDNConsoleApp
1,009 MSDNConsoleApp.sln 8,155 bytes
3 File(s)
3 Dir(s) 92,069,982,208 bytes free
Index File - Index Entry
4 bytes
32-bit created time in seconds
Number of seconds since Jan. 1, 1970, 00:00:00.
4 bytes
32-bit created time - nanosecond component
Nanosecond component of the created time in seconds value.
4 bytes
32-bit modified time in seconds
Number of seconds since Jan. 1, 1970, 00:00:00.
4 bytes
32-bit modified time - nanosecond component
Nanosecond component of the created time in seconds value.
4 bytes
device
Metadata associated with the file—these originate from file attributes used on the Unix OS.
4 bytes
inode
4 bytes
mode
4 bytes
user id
4 bytes
group id
4 bytes
file content length
Number of bytes of content in the file.
20 bytes
SHA-1
Corresponding blob object’s SHA-1 value.
2 bytes
Flags
(High to low bits)
1 bit: assume-valid/assume-unchanged flag
1-bit: extended flag (must be 0 for versions less than 3; if 1 then an additional 2 bytes follow before the path\\\\ file name)
2-bit: merge stage
12-bit: path\\\\file name length (if less than 0xFFF)
2 bytes (version 3 or higher)
Flags
(High to low bits)
1-bit: future use
1-bit: skip-worktree flag (sparse checkout) 1-bit: intent-to-add flag (git add -N) 13-bit: unused, must be zero
Variable Length
Path/file name
NUL terminated
32 msdn magazine
DevOps