Page 31 - MSDN Magazine, August 2017
P. 31

Figure 1 The Team Explorer Changes Pane Can Show the Same File in Its Changes and Staged Changes Sections
Git adds helpful metadata to the items it stores in the index and in commit objects. For example, the metadata it stores in the index helps it detect changes to files in the working directory, while the metadata it stores in commit objects helps it track who issued the commit and for what reason.
To review the three trees in the three­tree architecture and to put some perspective around the remainder of this article’s focus: You already know how the working­ directory tree functions, because
in Figure 2, the first tree is the collection of files and folders in the working directory—the OS directory that contains the hidden .git folder; the second tree is typically stored in a single binary file called index, located in the root of the .git folder; the third tree is composed of Git objects that represent the DAG (recall that SHA­1­named Git objects are located in two­hex­digit­named folders .git\\\\objects and can also be stored in “pack” files located in .git\\\\objects\\\\pack and in file paths defined by the .git\\\\objects\\\\ info\\\\alternates file). Keep in mind that the Git repo is defined by all files that sit in the .git folder. Often, people refer to the DAG as the Git repo, and that’s not quite accurate: The index and the DAG are both contained in the Git repo.
Notice that while each tree stores a directory structure and files, each leverages different data structures in order to retain tree­ specific metadata and to optimize storage and retrieval. The first tree (the working directory tree, also called “the working tree”) is plainly the OS files and folders (no special data structures there, other than what’s at the OS level) and serves the needs of the software devel­ oper and Visual Studio; the second tree (the Git index) straddles the working directory and the commit objects that form the DAG, thereby helping Git perform speedy working­directory file­content comparisons and quick commits; the third tree (the DAG) makes it possible for Git to track a history of commits, as discussed in the previous article. In its capacity as a robust version control system,
it’s actually the OS file system you’re already well­versed in using. And if you read my earlier article, you should have good work­ ing knowledge of the DAG. Thus, at this point, the missing link is the index tree (hereafter, “the index”) that straddles the working directory and the DAG. In fact, the index plays such an important role that it’s the sole subject of the remainder of this article.
How the Index Works
You might have heard the friendly advice that the index is syn­ onymous with the “staging area.” While that’s somewhat accurate, to speak of it that way belies its true role, which is not only to support a staging area, but also to facilitate the ability of Git to detect changes to files in your working directory; to mediate the branch­ merge process, so you can resolve conflicts on a file­by­file basis and safely abort the merge at any time; and to convert staged files and folders into tree objects whose references are written to the next commit object. Git also uses the index to retain information about files in the working tree and about objects retrieved from the DAG—and thus further leveraging the index as a type of cache. Let’s investigate the index more thoroughly.
The index implements its own self­contained file system, giving it the ability to store references to folders and files along with meta­ data about them. How and when Git updates this index depends on the kind of Git command issued and the command options spec­ ified (if you’re so inclined, you can even use the Git update­index plumbing command to manage the index yourself ), so exhaustive coverage here isn’t possible. However, as you work with the Visual Studio Git tooling, it’s helpful to be aware of the primary ways in which Git updates the index and in which Git uses information stored in the index. Figure 3 shows that Git updates the index with working directory data when you stage a file, and it updates the index with DAG data when you initiate a merge (if there are merge conflicts), clone or pull, or switch branches. On the other hand, Git relies on information stored in the index when it updates the DAG after you issue a commit, and when it updates the working directory after you clone or pull, or after you switch branches. Once you realize that Git relies on the index and that the index straddles so many Git operations, you’ll begin to appreciate the advanced Git commands that modify the index, effectively empowering you to finesse how Git operates.
The OS File System
Working Directory
Index
Where Stored
.git\\\\index
Single Binary File
Git Repo
Where Stored
.git\\\\objects
Compressed Binary Files
DAG
Where Stored
folder containing .git
OS File System
Figure 2 The Git Three-Tree Architecture Leverages the All- Important Index File for Its Smart and Efficient Performance
msdnmagazine.com
August 2017 27









































































   29   30   31   32   33