Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Git Tip of the Week: Pushing and Pulling

2011, git, gtotw, tip

This week's Git Tip of the Week is about getting changes to and from other sources. You can subscribe to the feed if you want to receive new instalments automatically.

Understanding repositories

A Git repository is essentially a tree of commits, such that at any point in the commit history you have both a full representation of the repository's contents, as well as a back reference to one (or more) parents. Each node in this tree is uniquely identified by its SHA-1 hash (or a unique abbreviation), which is derived from its contents, including the back-pointer to the previous parent(s).

The primary advantage of this model is that two developers, committing exactly the same change, will always result in exactly the same node identity.

The secondary advantage of this model is that when moving content between repositories, you don't need to move the entire tree of commits; you can identify the common root between two trees and just send those contents. For example, if you have two developers whose trees look like:

Developer 1
A <- B <- C <- D
Developer 2
A <- B <- E <- F

then when Developer 1 wants to give his changes to Developer 2, the last known common node is B; so it is only necessary for Developer 1 to send commits C and D. Conversely, if Developer 2 wants to send her change sets to Developer 1, she only needs to send E and F.

Understanding remotes

Remotes allow a developer to track the state of repositories located on remote machines, and provide mechanism to copy commits from one to another. When cloning a repository in the first place, a remote named origin is automatically set up to track the remote repository's state. By default, the operations for copying commits between repositories (push, pull and fetch) all work on the origin remote unless you specify otherwise.

Unlike a centralised version control system, there is no concept of a single central repository. So when copying commits, you need to specify which repository to copy to. If you've set up a shared repository then this will be the one everyone works towards, but a distributed version control system opens up the possibility of many more organisational models which we won't cover in this tip.

A remote is configured with a URL, which takes the same form as that used to clone from. Typically, http and git URLs are normally read-only, whilst https and ssh are used for (authenticated) reads and writes.

Pushing and pulling

For the purposes of this tip, we'll set up a new clone of the repository on a different machine and use it to push to. This can take a number of forms:

  • file:///path/to/somewhere.git - a local file repository, initialised with git init --bare /path/to/somewhere.git
  • ssh://host/path/to/somewhere.git - remote file repository, initialised with ssh host git init --bare /path/to/somewhere.git
  • ssh:// - repository created on GitHub

To add this as a new remote, run:

git remote add github ssh:// 

Thereafter, you can use the name “github” to refer to this remote source. If you have used a different URL then feel free to use a different name; for example, in an ssh URL then you might want to use the (unqualified) host name. For example, if you wanted to send the code to github, you can do:

git push github

This will take all changes that weren't in the remote repository (on the current branch) and move them up to the remote server. Note that if there has been subsequent changes on the remote you may have a message saying “non fast-forward push rejected” – this just means someone else has pushed their commits before you, and if you were to push your changes you'd overwrite theirs.

The converse of the push operation is either pull or fetch. Both of these will bring down commits from the remote repository; however, the pull will merge those changes into your local branch, whilst fetch will make the commits available for inspection.

# Get the latest changes without merging
git fetch github
# Get the latest changes and merge them in
git pull github

In the developer example above, if Developer 1 was to push his change to GitHub, and Developer 2 were to pull changes into her repository, it would automatically create a merge node which ties the two trees together. A merge node is one which has two or more parent commits; in this case, we'd create a commit G whose parents were D and F.

If Developer 2 pushes her changes back to GitHub (i.e. the merge commit) then this will be available for Developer 1 to pull. In this case, the Developer 1 won't need to create a merge node (since they are already merged at that point) and instead will have a fast-forward merge. A fast forward merge is simply one which moves forwards through a commit history; in other words, going from A to D would be considered a fast-forward merge.

We'll look more at merges and the difference between fetch and pull in the near future; but for now, if you are creating a backup copy of your repository (or simply making it available for others at GitHub) then you now have the tools to achieve that.

Come back next week for another instalment in the Git Tip of the Week series.