Frank's Guide to Git Objects
Introduction
This article serves as the DLC to my previous article. While it doesn't introduce any changes to the Git workflow, it provides insight into the inner workings of Git. The discussion will revolve around the four primary objects utilized by Git, and as a spoiler alert: I already covered the main object in the previous article.
Hash Functions
A hash function takes an object as input and generates a corresponding hash code as output. When provided with the same unchanged object, the hash function consistently produces the same hash code, regardless of how many times it is invoked. However, if the object is changed, the hash function will yield a different hash code.
In Figure #1, we input an arbitrary file into the hash function, resulting in the hash code S7MOP8. Repeating the process with the unaltered file maintains the hash code as S7MOP8. On the third attempt, after modifying the file, the hash function produces a new hash code, RF9GH6.
Git relies on hash codes to determine whether changes have occurred in the repository. When a file is staged with git add, a hash function is executed to check for a new hash code. If the resulting hash code changes, a new commit object can be created by executing git commit.
Git Objects
The four fundamental types of Git objects—tags, blobs, trees, and commits—are all stored within the .git folder located in a Git rep. These objects serve as the backbone of Git, enabling the execution of various commands and workflows associated with Git. Let's take a brief look before looking at each type in more detail:
- Tag: An object that assigns a human-readable string to another object. For example, a commit object can be tagged with a label like strictly-confidential.
- Blob: An object designed to store the contents of a file, without including the filenames.
- Tree: An object responsible for storing the contents of a directory.
- Commit: Created by the git commit command, this object tracks changes in a Git repo.
Blobs
Git blobs, short for binary large objects, represent the object type used by Git to store file contents in a Git repo. The most notable characteristic of blobs is their lack of metadata about the file, including its name. Like other Git objects, blobs are assigned a hash code, serving as a unique identifier that maps to its corresponding file name.
Recommended by LinkedIn
Figure #2 is a visual representation of a blob, featuring the contents of an arbitrary Python file along with its unique hash code.
Trees
Git trees represent the object type used by Git to store the contents of a directory. Each entry within a tree includes a hash code pointing to either a blob or another tree, along with some metadata. The tree houses the file name of a blob, and they are mapped to each other via the hash code.
Figure #3 is a visual representation of a tree alongside its child blob and tree. Notice that the arrows point to its matching hash code.
Commits
Git commit objects track every change made to the Git repository. These objects store metadata, including details such as the author, committer, and commit message. Each commit object points to a tree object, capturing a snapshot of the Git repos state at the time the git commit command is executed. Furthermore, commit objects point to their parent commit, except the initial commit, because it lacks a parent. When utilizing Git's time-traveling features, such as executing git checkout, the hash of the commit object is used to map to the correct contents of the desired commit.
Figure #4 is a visual representation of the initial commit and its child commit. The initial commit also contains a label, but it has been removed from the child commit. Also, notice how the child commit points to its parent.
Conclusion
After executing the git commit command, Git captures a snapshot of the state of the Git repository and then inputs it through a hash function. If the output hash code is different from the previous one, a new commit object is created. The new entries in the tree object referenced in the newly created commit object represent the updated files and directories. Within the tree object, the blob objects contain the actual changes made on the Git repo. Finally, tags can be assigned to all objects in Git.
And that completes my second article on Git, I hope you enjoyed it and learned something new!
GeologicAI•881 followers
2yThanks for sharing! I didn't know all those details