MSDN Magazine, June 2017

Page 31 - MSDN Magazine, June 2017

P. 31

Commit bfeb09 contains a tree property with ID ca853d. Figure 7, Marker 2 shows the cat-file -p ca853d output. Each tree object contains a permissions property corresponding to the POSIX permissions mask of the object (040000 = Directory, 100644 = Regular non-executable file, 100664 = Regular non-executable group-writeable file, 100755 = Regular executable file, 120000 = Symbolic link, and 160000 = Gitlink); type (tree or blob); SHA-1 (for the tree or blob); and name. The name is the folder name (for tree objects) or the file name (for blob objects). Observe that this tree object is composed of three blob objects and another tree object. You can see that the three blobs refer to files .gitattributes, .gitignore and DemoConsole.sln, and that the tree refers to folder DemoConsoleApp (Figure 7, Marker 3). Although tree object ca853d is associated with the project’s second commit, its first two blobs represent files .gitattributes and .gitignore—files added during the first commit (see Figure 4, Marker 1)! The reason these files appear in the tree for the second commit is that each commit rep- resents the previous commit object along with changes captured by the current commit object. To “walk the tree” one level deeper, Figure 7, Marker 3 shows the cat-file -p a763da output, showing three more blobs (App.config, DemoConsoleApp.csproj and Program.cs) and another tree (folder Properties).
Blob objects are again just zlib-compressed files. If the uncom- pressed file contains text, you can extract a blob’s entire content using the same cat-file command along with the blob ID (Figure 7, Marker 5). Because blob objects represent files, Git uses the SHA-1 blob ID to determine if a file changed from the previous commit; it also uses SHA-1 values when diffing any two commits in the repo.
Tag Object The cryptic alphanumeric nature of SHA-1 values can be a bit unwieldy to communicate. The tag object lets you assign a friendly name to any commit, tree or blob object—although it’s most common to tag only commit objects. There are two types of tag object: lightweight and annotated. Both types appear as files in the .git\\\\refs\\\\tags folder, where the file name is the tag name. The content of a lightweight tag file is the SHA-1 to an existing com- mit, tree or blob object. The content of an annotation tag file is the SHA-1 to a tag object, which is stored in the .git\\\\objects folder
along with all other Git objects. To view the content of a tag object, leverage the same cat-file -p command. You’ll see the SHA-1 value of the object that was tagged, along with the object type, tag author, date-time and tag message. There are a number of ways to tag commits in Visual Studio. One way is to click the Create Tag link in the Commit Details window (Figure 4). Tag names appear in the Commit Details window (Figure 4, Marker 3) and in View History reports (see the earlier Figure 3, Marker 9).
Git populates the info and pack folders in the .git\\\\objects folder when it applies storage optimizations to objects in the repo. I’ll discuss these folders and the Git file-storage optimizations more fully in an upcoming article.
Armed with knowledge about the four Git object types, realize that Git is referred to as a content-addressable file system because any kind of content across any number of files and folders can be reduced to a single SHA-1 value. That SHA-1 value can later be used to accurately and reliably recreate the same content. Put another way, the SHA-1 is the key and the content is the value in an exalted im- plementation of the usually prosaic key-index-driven lookup table. Additionally, Git can economize when file content hasn’t changed between commits because an unchanged file produces the same SHA-1 value. This means that the commit object can reference the same SHA-1 blob or tree ID value used by a previous commit without having to create any new objects—this means no new copies of files!
Branching
Before truly understanding what a Git branch is, you must master how Git internally defines a branch. Ultimately, this boils down to grasping the purpose of two key terms: head and HEAD.
The first, head (all lowercase), is a reference Git maintains for every new commit object. To illustrate how this works, Figure 8 shows several commits and branch operations. For Commit 01, Git creates the first head reference for the repo and names it master by default (master is an arbitrary name with no special meaning other than it’s a default name—Git teams often rename this reference). When Git creates a new head reference, it creates a text file in its ref\\\\ heads folder and places the full SHA-1 for the new commit object
into that file. For Commit 01, this means that Git creates a file called master and places the SHA-1 for commit object A1 into that file. For Commit 02, Git updates the master head file in the heads folder by removing the old SHA-1 value and replacing it with the full SHA-1 commit ID for A2. Git does the same thing for Commit 03: It updates the head file called master in the heads folder so that it holds the full commit ID for A3.
You might have guessed correctly that the file called master in the heads folder is the branch name for the commit object to which it points. Oddly, perhaps, at first, a branch name points to a single commit object rather than to a sequence of com- mits (more on this specific concept in a moment).
Observe the Create Branch & Checkout Files section in Figure 8. Here, the user created a new branch for a print-preview feature in Visual Studio.
June 2017 27
Figure 7 Using the Git CLI to Explore Tree Object Details msdnmagazine.com

29 30 31 32 33