Page 30 - MSDN Magazine, August 2017
P. 30

DEVOPS
Git Internals: Architecture
and Index Files
Jonathan Waldman
In my last article (msdn.com/magazine/mt809117), I showed how Git uses a directed acyclic graph (DAG) to organize a repo’s com­ mit objects. I also explored the blob, tree and tag objects to which commit objects can refer. I concluded the article with an intro­ duction to branching, including the distinction between HEAD and head. That article is a prerequisite to this one, in which I’ll dis­ cuss the Git “three­tree” architecture and the importance of its index file. Understanding these additional Git internals will build on the foundational knowledge that will make you a more effective Git user and will provide new insights as you explore various Git oper­ ations fronted by the graphical Git tooling in the Visual Studio IDE.
Recall from the last article that Visual Studio communicates with Git using a Git API, and that the Visual Studio IDE Git tooling abstracts away the complexity and capabilities of the underlying Git engine. That’s a boon for developers who want to implement a version­con­ trol workflow without needing to rely on the Git command­line interface (CLI). Alas, the otherwise helpful Git abstractions of the
IDE can sometimes lead to confusion. For example, ponder the basic workflow of adding a project to Git source control, modify­ ing project files, staging them and then committing the staged files. To do that, you open the Team Explorer Changes pane to view the list of changed files and then you select the ones you want to stage. Consider the leftmost image in Figure 1, which shows that I changed two files in the working directory (Marker 1).
In the next image to the right, I staged one of those changed files: Program.cs (Marker 2). When I did that, Program.cs appears to have “moved” from the Changes list to the Staged Changes list. If I further modify and then save the working directory’s copy of Program.cs, it continues to appear in the Staged Changes section (Marker 3)—but it also appears in the Changes section (Marker 4)! Without understanding what Git is doing behind the scenes, you might be flummoxed until you figured out that two “copies” of Program.cs exist: one in the working folder and one in the Git internal database of objects. Even if you realize that, you might not have any insight as to what would happen when you unstage the staged copy, try to stage the second changed copy of Program.cs, undo changes to the working copy or switch branches.
To truly grasp what Git is doing as you stage, unstage, undo, commit and check out files, you first must understand how Git is architected.
The Git Three-Tree Architecture
Git implements a three­tree architecture (a “tree” in this context refers to a directory structure and files). Working from left to right
This article discusses:
• Git’s three-tree architecture
• How the Git index works
• Index extensions
Technologies discussed:
Visual Studio 2017, Git for Windows 2.10
26 msdn magazine

















































































   28   29   30   31   32