6. Each commit contains a few pieces of information:
● A snapshot of the entire repo
● Who made this change
● When this change was made
● A message describing the commit
● A pointer to the previous commit
(This is a bit of an over simplification, for a more detailed explanation, see:
http://blog.thoughtram.io/git/2014/11/18/the-anatomy-of-a-git-commit.html)
What’s in a commit
7. Each commit contains a few pieces of information:
● A snapshot of the entire repo
● Who made this change
● When this change was made
● A message describing the commit
● A pointer to the previous commit
(This is a bit of an over-simplification, for a more detailed explanation, see:
http://blog.thoughtram.io/git/2014/11/18/the-anatomy-of-a-git-commit.html)
What’s in a commit
SHA-1
Unique
40-
char
Hash
11. ● Each git repository starts out with a single branch,
called master.
● There can be multiple branches of each repo.
● Each branch is essentially a copy of the master branch
and all of it’s history
● Branches are cheap and easy (unlike TFS or SVN), so use
them as much as you want!
Branches: the connection between commits
12. Mental model: a linked list of commits
a4df41aaec56116dab6a71a35312 912bde5
13. Each commit contains a few pieces of information:
● A snapshot of the entire repository
● Who made this change
● When this change was made
● A message describing the commit
● A pointer to the previous commit
Mental model: a linked list of commits
a4df41aaec56116dab6a71a35312 912bde5
14. Mental model: a linked list of commits
a4df41aaec56116dab6a71a35312 912bde5
parent child
15. Mental model: a linked list of commits
a4df41aaec56116dab6a71a35312 912bde5
16. Branches are a reference to a commit
master
a4df41aaec56116dab6a71a35312 912bde5
27. Remote/Origin Local Staging/Index Workspace Stash
Your machinegithub
Central server where
shared git
repositories are
stored
--
Remotes typically
are “bare” - you
can’t directly
modify them
28. Your local copy
of the remote
git repository
--
Contains the
entire history
and all
branches of a
remote repo
Remote/Origin Local Staging/Index Workspace Stash
Your machinegithub
29. Snapshot of
changes to
the current
branch that
you want to
commit
--
Is a copy of
all of the
files in the
repo, not
just the
changed
files
Remote/Origin Local Staging/Index Workspace Stash
Your machinegithub
30. Where changes
to files are
made
--
Analogous to
the physical
directory
where files
are stored on
disk
Remote/Origin Local Staging/Index Workspace Stash
Your machinegithub
31. A place to
store changes
to files that
you aren’t
ready to
commit yet
--
Aka
“shelving”
changes for
later
Remote/Origin Local Staging/Index Workspace Stash
Your machinegithub
40. Aside: Detached HEAD
Detached HEAD means that HEAD is not a symbolic reference
anymore, therefore new commits will not be part of history.
Happens when you:
● Checkout a commit that is not the tip of a branch, or
● Checkout a remote tracking branch
Fix it by:
● Checking out a branch, or
● Create a new branch from this state
42. git reset HEAD@{x}
# or
git reset HEAD~x
# or
git reset <commit hash>
Reset
Go back to a previous
point in time
43. git reset --soft HEAD~
git reset --mixed HEAD~
git reset --hard HEAD~
Three types of
resetting
44. soft
Takes you back
in history, and
leaves your
changes in
staging
Takes you back
in history, and
discards those
changes
hardmixed
(default) takes
you back in
history, and
leaves your
changes in the
workspace
46. git revert <commit hash>
# or
git revert HEAD~X
Revert
Undo a public commit
47. ● Pass in the identifier(s) of specific commits
● Git creates a new commit that undoes the work of the
specified commit.
● You can revert a commit in the middle of other commits,
but if a later commit modifies the same file, you will
need to resolve that conflict.
Reverting commits
48. Revert multiple
commits
You can use either HEAD~
references, or commit
hashes
# range
git revert HEAD~3..HEAD
# or list newest->oldest
git revert HEAD~2 HEAD~3
HEAD~4
49. Revert without
auto-commit
In case you want to
double-check the revert
git revert --no-commit
<commit or range or list>
# leaves changes staged
# for manual commit
85. Why lots of commits are better
● Commits are cheap and easy
● Save progress over time - easier to go back
● Smaller diffs are easier to reason about
● Less chance of committing the wrong thing when you are
reviewing small changelists
87. Avoid the problem: Committing the wrong thing
● Set up your command line to show you what branch you are
on (https://github.com/jimeh/git-aware-prompt)
● Be careful about what files you edit - use .gitignore to
your advantage
● git status is your BFF
● Understand how staging works (subsequent changes to a
file are not reflected)
88. Fix the problem
Entering the wrong commit
message
# make sure nothing is in staging
git commit -amend
# follow prompts to change
# the commit message
89. Fix the problem
Forgetting to commit a
file
git add filename
git commit -amend
# follow prompts to change
# the commit message
90. Fix the problem
Committing the wrong file
# undo your last commit
# but leave the changes in staging
git reset HEAD~
# unstage the file
git reset HEAD filename
# re-do commit
git commit -m “Commit message”
91. Fix the problem
Committing to the wrong
branch (version one)
# undo the last commit, but leave
the changes available
git reset HEAD~ --soft
git stash
# move to the correct
# branch
git checkout name-of-the-correct-
branch
git stash pop
git add . # or add individual
files
git commit -m "your message here"
# now your changes are on the
correct branch
92. Fix the problem
Committing to the wrong
branch (version two)
git checkout name-of-the-correct-
branch
# grab the last commit to master
git cherry-pick master
# delete it from master
git checkout master
git reset HEAD~ --hard
94. Why a feature branch workflow is better
● Create a new branch of master, do your work on that
branch. When you are ready, merge your changes back into
master
● Safety net: you aren’t changing master directly until
your feature is ready for prime time
● Allows you to switch between tasks/manage unrelated
changes
● Preserve the history of larger features or long-term work
and share with a team
96. MERGE REBASE
● Adds a new commit to
your feature branch.
● You are resolving
potential conflicts
created by other
people’s code.
● Replays your commits
on top of the latest
of master.
● You are resolving
potential conflicts
created by your own
code.
Avoid the problem: rebase vs. merge
97. Rebasing gotchas & caveats
● Each commit is applied as a separate patch, so you might
need to resolve conflicts for each commit :( :( :(
● Don’t change the public history of a branch or you’re
gonna have a bad time.
● Some people like to keep commit history of branches.
Discuss with your team which is preferred!
98. git rebase -i HEAD~x
Avoid the
problem
Combine commits so you
have fewer conflicts to
deal with when rebasing
103. git checkout master && git
merge --squash feature-
branch
Merging back to
master
*controversial opinion*
104. Remote (Origin) Local Staging/Index Workspace Stash
master
Your VMgithub.etsycorp.com
Etsyweb
Etsyweb
feature-branch
master
105. ● Combines all of the commits in your feature branch into a
single changeset
● Leaves you in a state where the changes are not
committed, you need to make the final commit
● Lose historical connection to the feature branch
Squash merging
107. ● Communication with teammates
● Update your local master and origin/master all.the.time.
● Periodically squash commits to reduce the number of
commit conflicts to resolve (esp. When rebasing)
Avoid conflicts in the first place
108. DO DON’T
● Merge feature-branch
into master
● Rebase feature-
branch against
master
● Merge master into
feature-branch
● Rebase master against
feature-branch
Avoid conflicts: merge & rebase in the right direction
110. Fix the problem
Check for remaining
conflict markers before
committing
git diff --check
111. Pro tip
Use git mergetool command
to open a GUI to help
resolve conflicts. HIGHLY
RECOMMEND any Jetbrains
IDE, also meld looks good
git config merge.tool
<toolname>
112. If all else fails
Abort! abort!
git merge --abort
# or
git rebase --abort
114. Recap: Avoiding messes
● Understand the fundamentals (commits, branches, history,
environments, workflow)
● Use tools to help you work smarter (cmd line formatting
mergetool, etc.)
● Always be committing & branching
● Rebase & merge in the right direction
● DON’T PANIC! Everything is fixable (one way or another)
115. ● Visual git cheatsheet
http://ndpsoftware.com/git-cheatsheet.html
● Oh shit, git!
http://ohshitgit.com
● Atlassian git tutorial (esp. the advanced tutorials)
https://www.atlassian.com/git/
● Git for Humans book
https://abookapart.com/products/git-for-humans
Useful links
I don’t know how the rest of you learned git, but I learned it (poorly), by having someone stand next to me and give me a list of commands to run. I had no idea what I was doing or why or what anything meant and I’ve never in my 12 years as a professional web developer felt so stupid as those first few months trying to figure out git. That’s where oh shit git came from, actually, was my file of notes I kept on how I managed to get out of some sticky situations. Anyways, I’m not stupid, YOU aren’t stupid. Git is stupid, and hard. It was written by the guy who wrote linux so it’s just as confusing as fucking linux. BUT, it’s also incredibly powerful which is why it’s taken off so well. And if I can figure it out eventually, so can you. I promise.
So, how do you get out of your git messes - well, the #1 thing I’ve learned is don’t get into a mess in the first place! And I didn’t really start to figure out how to work in such a way that I wasn’t creating messes for myself to clean up, until I learned some more about the fundamentals of git: what the various structures are, and what all those commands actually *do*.
So, to that end, let’s start by going over some fundamental concepts. We are nearing the end of the month so hopefully most of you have used git at least a little bit, so let’s take a step back and see why things work the way they do. Hopefully these concepts will be the epiphany for you that they were for me.
Let’s dive deeper into commits
What is actually in a commit? When we talk about git, a commit doesn’t store just the file or files that have changed, it stores a snapshot of the entire tree of files in our repository (repository is a tree of files that we want to store, analogous to a directory).
It also saves information about who made a change, a timestamp of when it was made, a user-provided message describing the commit, and a pointer to the previous commit in our history (more on this later).
Git takes all of these pieces of information and uses them to generate a unique, 40-character long SHA1 hash. This is one of the powerful features of git - every commit is guaranteed to be unique! So, if two different people make the exact same change to the exact same file, each person’s commit will have a different, unique hash identifier.
So, if we look at the history of commits in a project, there are these long strings of seemingly random characters which are unique hashes that git uses to reference each point in time. And because of how our hashes are generated, each point in time is guaranteed unique.
In fact, git is so unique that you can usually reference only the first few characters of a commit and git knows which one you are talking about! The minimum is 4 characters, and the standard default is 7 characters - Etsy’s web repo (along with the unix kernel project) need up to 11 characters to determine uniqueness! I mention this because you might see these shorthand abbreviations in various places.
So, we’ve talked about commits, let’s talk about branches
Branches are the connection between our commits - they are a way to tie together the history of our project, by creating a chain of commits that are linked together chronologically. Every project starts out with a single branch, called master, and teams can create any number of additional branches, which are essentially copies of that master branch. If you come from a version control solution like TFS or SVN, like I did, you might view branches as a giant hassle that are a pain in the ass to create because you have to bother a grumpy dev-ops person and there is a ton of configuration and work involved. Git is not like that. Creating a new branch is as easy as typing a command, and you can use them as much as you want!
Now hold on tight because we’re about to get real computer sciencey!
Under the hood, branches are analogous to a programming data structure called a linked list (although, again, this is an oversimplification). To quickly go over a linked list for anyone who didn’t study CS - a linked list is a way to connect, in sequence, a collection of data - called nodes. Each node contains two things: some data and a reference to the next node in the sequence. So, in this visual representation, each commit is comprised of two things - the light blue half, represents the data in the commit.
So, the data for each node is essentially the first 4 items on our list of what is in a commit (although it doesn’t actually contain that data and is just a pointer to that data somewhere else, but that’s the general idea)
The other half of each commit node is a pointer back to it’s parent commit - aka, the last commit in history before this one. So that means that our first commit - all the way on the left there, doesn’t have a pointer since it is first in line. The second commit, the child, points back to it’s parent, the first commit.
And then the next commit points back to it’s parent, and so on down the line until we get to the end of the line - our latest commit.
Okay, so, now that we’ve learned that branches are a linked list of commits, that is actually a bit of a lie.
Branches are not the entire linked list of history, they are actually just a symbolic reference to a single commit. Our master “branch” is a reference that points at the most recent commit in our history. From that reference, we can walk back through each step in time by following each child commit’s pointer to it’s parent
So, only adding commits to a single branch is great, but branches are really powerful! so let’s run git branch and pass in the new branch name
Ta-da! Here’s our new branch. Which is just a pointer to the same commit as master! In practice, new-branch is a separate copy of the entire master branch, it has the exact same history and contains the same files. But, under the hood, it’s the same commits with two different branch references. A branch that always points to the same place as master isn’t very useful - we want to start adding new commits to our branch. Before that, we need to look at a new concept..
The HEAD, you’ve probably seen this in various contexts, but what is it, really?
The HEAD is another kind of symbolic reference that points to the top of your currently checked out branch. I know, there are a lot of these references! The reason that our new-branch was created as a pointer to that last commit, was because HEAD currently points to the master branch, which in turn points to that commit. The HEAD is great because it’s history is tracked through time, which we’ll get to in a bit. So, up to now, the HEAD has pointed at the tip of our master branch. Let’s change that.
There is only one HEAD per repository that moves to point to the commit you are currently working on. When you make a new commit, your HEAD moves to point to the new commit. If you checkout a different branch, the HEAD moves to point to the tip of that branch.
We run git checkout new-branch
Now, our HEAD is pointing to the new-branch symbolic ref, which in turn points to a commit in our history. From here on out, any commits we add are created with the parent reference set to whatever commit HEAD points to.
Now, if we add a new commit, our head reference was pointing to our new branch reference, which in turn pointed to the last commit (912bde5). Our new node is added with a pointer back to the HEAD as it’s parent commit. Git then moves our new-branch reference to point to our new commit, and the HEAD moves along with it. So, our new branch shares the history of all of the commits from master, but master is completely unchanged! From here, any new commits we add will be a child of the new-branch, etc. etc.
But say we are on a distributed team, and while we’re working in our new-branch, someone else has been adding commits to master. Now our trunk is starting to branch out and resemble a tree. This is where the power of git really comes into play - the shared history allows git to find the place where our code diverges and run a comparison between different states. And, we can bring our branches back together - something we’ll discuss in a bit. This simple yet powerful branch model is what makes git’s decentralization possible. And branches can start from any point! If I were to run git branch now, I’d add a new pointer to where HEAD is right now
Okay, so I’ve basically been lying to you this whole time. Branches in git are not a linked list or a tree, but actually a directed acyclic graph or DAG, which as you can see is bonkers to look at, but pretty awesome when it comes to features.
Before we dive into workflows, something that will help you to get out of your messes is a deeper understanding of the two main environments where we interact with git repos - remote and local
When working with git there are generally two places where you are interacting with code. A central git server where shared repositories are stored, called a remote, and your local git installation on your development machine. Git is a decentralized system in that every developer has a copy of the entire project and all of it’s history stored on their machine. But, generally teams have a “hub” where distributed team members push their code that acts as a single source of truth. Git “hubs” are generally installed bare, e.g. you cannot directly modify code stored there, all work is done on your local machine (this is why you typically switch from working in the command line to working in the github UI - the “hub” is bare)
The second place where you are interacting with code is on your local machine. Your machine is broken up into four sub-enviornments. (h/t to http://ndpsoftware.com/git-cheatsheet.html for this mental model!). First is your local copy of the remote git repository, which contains the entire history of a remote repo
Next is the staging area (or index) which is a snapshot of the changes that you want to commit. It’s important to understand that the staging/index is a snapshot of ALL of the files in the repo at a single point in time. So, when you add a file, you are moving that version of it into the staging copy - this took me a bit to understand, I thought you were adding the file to a list saying “watch these changes” but no, it’s saying, take a copy of this right now and put it into the staging area. If you then continue to make changes to the file, they are not reflected in staging until you run add again.
The workspace, is where you are actually making your changes - this is the directory on your machine where the files in the repo are stored on disk.
finally is the stash, which is a place to store or shelve your changes to files that you aren’t ready to commit yet. You can add files to the stash, go back and do other things in your workspace, and when you are ready go back and apply the stashed changes again. I won’t get into too much detail on the stash today, but it’s super useful and I highly recommend exploring it!
Okay. we’ve seen how history works in git, let’s go over some ways to view and change the history of a branch.
To see the history of our branch, we run git log
Git log shows me a list of all the commits in this branch. (this is from the jquery public repo on github) This shows me the tip/latest commit, and then follows the links backwards to the parent of this commit, and on and on until the beginning of time, the first commit.
But something that I think is actually even more useful is git reflog, which shows you the history of the HEAD
The reflog is a list of ALL the actions that you’ve taken in the repo. Reflog is the magic time machine that is going to save us from any catastrophic mistakes. Remember the HEAD reference? This is a list of the history of all the HEAD - a list of all commits that the HEAD has pointed to, which means it’s a list of all the actions you’ve taken, and you can reset back to any of these points in time - which we’ll come back to in a bit!
There are a few ways to move around in the history of your repo
Because every point in history on every branch has a unique identifier, we can go back to that point in time. However, checking out a previous point in time this way doesn’t change any of your history - it puts you into a READ ONLY state, aka a detached head.
So, the main point is that you are better off avoiding messes in the first place, there are two main ways to fix your messes - reset and revert.
Let’s say that I completely screwed up that merge, everything is borked… don’t panic, git reset HEAD to the rescue! Remember our old friend reflog? You can either use the @ syntax to pick a specific point in history from the reflog, or you can use the tilda syntax to go back x number of actions, or you can pass in a specific commit hash to go back to that point in history.
Beware - reset changes your history, so it only works if you haven’t pushed to a public remote yet!!!
There are three ways to reset your work - soft, mixed (the default) and hard.
Which one you want to use depends on the situation
I use this command quite a bit - if you do some work, but want to just throw it all away, git reset --hard with nothing after it will discard everything from your workspace and your staging area, and re-set you back to the tip of your current branch.
So, reset is great, but what do you do if you’ve already pushed a commit to a remote and you want to undo it? The answer is git revert! Revert also works if you want to undo a commit that has a child commit that you want to keep.
Revert, unlike reset, doesn’t take you back in time. Instead, with revert, you pass in the identifier of a specific commit, git creates a new commit that undoes the changes in that commit (by creating essentially a reverse patch)
Let’s go over a very simple branching workflow and how we sync our local and remote, let’s look at a trivial example using master. While trivial, this will help to explain some more difficult concepts.
So, i start by making some changes in my workspace to foo.php
I run git add, which takes a snapshot of the changed file and adds it to my staging snapshot
When we are ready, we run commit and save the new commit to our local copy of the master branch. Because my head is pointing at the master branch, the new commit is added with the tip of master as it’s new commit.
And then the pointer for master on our local gets moved to our new commit, and this is now the tip of the master branch.
Finally, I run push to copy my change out to my remote server.
Because our commit’s parent is also the tip of the master branch on the remote, this push goes through easily with no conflicts. This is called a “fast-forward”.
And then now the head of master is pointing to our new commit and both our local and remote repos are back in sync.
Buuuuutttt it’s usually not that simple when you are doing distributed development!
Typically when you work with other people, when you are ready to push out your changes, the remote looks more like this! While we’ve been busy on our local development, other people have pushed a bunch of commits to master. If we tried to do a plain old push at this point, github will reject our push because this isn’t a fast-forward! If we pushed this state to remote, we’d lose all of the commits that others have made.
Which brings me to the #1 rule of avoiding messes in Git -- always stay up to date! The more out of sync you get with your remote repo, the harder it will be to eventually merge in your local changes! So, we need to keep our branches in sync - We need to get all of everyone else’s commits AND our commit onto a branch together. There are a few different ways we could do this.
First, to really understand what is going on when we sync with our remote, we have to understand that There isn’t a *direct* connection between your local branch and the branch that lives on your remote server - remember remotes are bare so we can’t directly modify them! There is actually what is called a “tracking branch” that lives on your local, and is, like all git branches, just a symbolic reference to a commit in history (there are a lot of these refs, huh?).
So when we are looking at our local master branch, what is really happening is that we have a series of commits, and the reference for the tracking branch origin/master is pointing at the third commit, while our local “master” reference is pointing at the new commit to our local copy of master. Note that Tracking branches cannot be directly modified by us!
So as we saw, our view of the origin in our tracking branch is wrong, we haven’t taken into account the new commits that have been added to the origin. we run git fetch to update our tracking branch with the latest status of the remote origin/master. This actually updates ALL of your local tracking branches
Fetch creates this new tree structure in memory where the origin/master is on a different commit track than our local master! Both our master and the first commit in the purple box point to the same *parent* commit, but they are different paths
Now that we have an updated view of origin/master, let’s merge our changes into the master branch. Remember tracking branches cannot be directly modified, so if we want to merge them together, we have to merge the origin master INTO our master
Git creates a new merge commit in the history that ties both branches back together. Merge commits are different from regular commits in that they have more than one parent commit - in fact, you can merge multiple branches together at the same time, although that sounds like hell and I don’t know why you’d ever want to do that.
And then the pointer for your local master branch gets updated to point to the new merge commit.
That process we just followed of running git fetch and then git merge is what happens when you do a git pull! Git pull runs the fetch and the merge simultaneously, and we end up in the same state as if we’d run the two commands individually.
Git pull runs the fetch and the merge simultaneously, and we end up in the same state as if we’d run the two commands individually.
Now that we have incorporated both branches together, we can push our changes out to the remote.
Since this is a fast-forward (our new commit has the tip of the master branch as it’s parent), our push goes smoothly, the remote master is updated.
Then, the remote pointer for master moves up to the new merge commit, and we are the same once again. So, this got the job done, but it resulted in this gross branching history with multiple commits. Not to say that this is always bad, sometimes you might want to have this kind of branching history - it’s a team decision. But in a large distributed codebase, this quickly gets hard to manage - remember the fancy DAG graph? imagine dozens of branches overlapping and intersecting!
There is a better solution! Rebasing results in a nice linear history of commits without those nasty bubbles.
Let’s rewind to where we were after running fetch - we have our commit to master, and then we have a bunch of new commits in origin/master that we need to take into account.
We run git rebase against origin/master, Rebase tells our “master” branch to re-write it’s history, using the origin/master tracking branch as it’s base.
To re-write history, git goes back in time and finds the commit where our master diverges from the origin/master tracking branch, it detaches that commit from it’s parent commit (it saves a diff in temp files). Then it moves the tip of our local master to the same commit as the tip of origin/master
Finally, it will apply the diff for our new commit on top of master, effectively re-winding your changes at a new point in history at the tip of the origin/master branch. If there are multiple commits, it applies each one individually, in order at the new tip of the branch. This is referred to by git as “re-winding” history. Isn’t that beautiful? Our single commit stays a single commit, and our history is nice and linear.
So git pull --rebase is the all-in-one version of this process - similar to how git pull is fetch plus merge. Pull --rebase is fetch plus rebase.
Also known as git rpull - which, is a global alias we have here at Etsy for git pull --rebase.
Anyways, so here we are, we have a nice linear history in our local
We run git push again
And now it’s a simple fast-forward to merge our changes into the remote repository.
And then the head of our local master, origin master, and our tracking branch are all moved to the new commit.
But, commits are cheap and easy so there is no reason not to use them!
They help you to save your progress over time. How many times have you started working on something, gotten it partially, working, and then you change something else and everything breaks? All the time, right? If you had paused and made a commit at the point that things were partially working, you could roll back to that point if further work introduced a bug.
Smaller diffs are easier to reason about - scrolling through huge huge diffs each time can really make your eyes glaze over. Also, when you are reviewing a small changelist, it’s easier to notice something that doesn’t belong - a list of 4 filenames it’s way asier to spot that local config file that you didn’t want to commit, than in a list of 10 or 12 filenames.
But let’s say your human and you make a mistake - how do you fix the problem of committing the wrong thing?
Well, first, you avoid the problem!
Take the time to set up your command line to show you which branch you have currently checked out. I can’t tell you the number of times this has saved my ass!
Be super careful about what files you edit - personally, I am super anal retentive about never touching a file that isn’t part of the changes I’m making without first doing a commit of everything else.
Git status is your best command, I run it constantly to check where I am at.
Finally, understand what is happening when you stage a file. For a long time, I had completely the wrong idea about how staging works! Adding a file means that you take a snapshot of the file in that state and copy it into your staging environment. Any subsequent changes to the file are not reflected, and you can un-stage files!
So, let’s go over some common probems with committing the wrong file, and how to fix them. First is entering the wrong commit message. This is easily fixable with git commit -amend, which allows you to re-do the previous commmit.
Amend also lets you go back and add more files to a commit, say if you forget to commit a file. First, add your file to the staging area, then run git commit -amend
In a feature branch workflow, you create a new branch off of master, and do all of your work in that branch. When you’re feature is ready for prime time, you merge your changes back into master. So it’s a safety net that prevents you from accidentally committing and pushing buggy code to the live site. Additionally, In my work, I am frequently working on a longer-term project, but I get pinged throughout the day to help make smaller changes for other projects that my team is working on. If I followed a master-only workflow, I’d have a commit for one project that might not be complete, followed by another commit for a different task or project on my master. It is super hard to dis-entangle commits in the same branch that are for different, unrelated tasks. Finally, branches allow you to preserve the history of your work in a separate branch,
WIth a feature branch workflow, I need to make sure that I am keeping my branch up to date with master, and just like when I’m keeping my local branch in synch with it’s remote counterpart, git gives me a choice.
I can either rebase my feature branch against master, or I can merge the changes from master into my feature branch. So, this is a personal preference, but I prefer a rebase over a merge. Here’s why:
If we were to merge master into our feature branch, we are adding a new commit to our feature branch containing the updates from master. If another person edits a file that we also edited, we are potentially resolving conflicts created by other people’s changes to master which are different than in our old version of master - when we merge, our feature branch is the base, and master is applied to our branch.
However, if we rebase, then master is considered the base, and our changes are replayed on top of everyone else’s commits. If there are conflicts, they are created by *our* changes being re-applied to master, so it *should* be easier to resolve those conflicts.
Sooooo I’m sure you all were thinking about this, but the shitty thing about rebasing is that it applies each commit as an individual patch, so if you are rebasing a series of changes there is always a chance that you will need to resolve conflicts for each commit! Gross! I don’t know about you but everything I do in git is solely to avoid having to resolve conflicts. Also, a major caveat is that you typically shouldn’t do anything that changes the PUBLIC history of a branch. So, in this simple case, we are not changing the history of master, but you can use rebasing to change the history of a branch, and you’re gonna have a bad time if you do that.
So, if you are going to lean towards a feature branch workflow where you will be separate from master for a while and you plan to rebase against master to stay up-to-date, I highly recommend periodically combining multiple commits into one. This is also useful in many other situations like if you make two commits to master and would like to be able to roll the feature back with a single revert. The answer is our old friend rebase! We run rebase with the -i or “interactive” flag, and instead of passing in a separate branch, we pass in a shorthand for “the last two commits” - HEAD ~ x (we’ll get into this more shortly)
So when we run rebase like this, we are rebasing our feature branch *against itself* this is an important distinction! HEAD~X means, detach x number of commits and re-run them against the x+1th commit.
Okay, now, deep breath because we are diving into the command line...
The -i flag opens up the DREADED REBASE DIALOG. This was utterly terrifying to me the first time I saw it. TBH these dialogs are 99% of the things that suck about git. Now, in the interest of time, I’m not going to go through this entire flow, but needless to say.
The best way to survive this situation is to not panic. You can abort out of this process at any point!
After running the rebase to squash our commits, now we are back in a position where we have a single commit in our master that is not reflected in origin/master. We can do our regular rpull && push to push our change to origin and from there on to the live site.
This might be a bit controversial, but from here, I think the best way to get your changes back into master is to run a squash merge.
And now we are in this state, where our three commits in feature-branch have been combined into one.
Git merge squash will combine all of your commits into a single changeset in your staging area, and then it is up to you to actually commit your changes. Why is this controversial? Well, you lose all connection to your branch - in a typical merge commit, the commit has TWO parents. In this case, you have one parent. This is why doing this is controversial - we’ve now lost the explicit connection to our feature branch, so if we are trying to go back in history, simply following the commit back isn’t going to lead us to the branch where we did the work. BUT the other option to get a single commit to master is to rebase our feature branch against master and then squash the commits into one - but that also loses history! So, you are kind of damned either way, so personally I like the cleanliness of squashing.
Time for the elephant in the room: resolving conflicts :(
You don’t want to end up like this!!!
So, that’s it, hopefully this has helped you to understand what is going on under the hood a bit, some ideas about improved workflow, and how to avoid getting into messes in the first place!