A Git repository is essentially a tree of commits, such that at any point in the commit history you have both a full representation of the repository's contents, as well as a back reference to one (or more) parents. Each node in this tree is uniquely identified by its SHA-1 hash (or a unique abbreviation), which is derived from its contents, including the back-pointer to the previous parent(s).
The primary advantage of this model is that two developers, committing exactly the same change, will always result in exactly the same node identity.
The secondary advantage of this model is that when moving content between repositories, you don't need to move the entire tree of commits; you can identify the common root between two trees and just send those contents. For example, if you have two developers whose trees look like:
A <- B <- C <- D
|Developer 2|| |
A <- B <- E <- F
then when Developer 1 wants to give his changes to Developer 2, the last known common node is
B; so it is only necessary for Developer 1 to send commits
D. Conversely, if Developer 2 wants to send her change sets to Developer 1, she only needs to send
Remotes allow a developer to track the state of repositories located on remote machines, and provide mechanism to copy commits from one to another. When cloning a repository in the first place, a remote named
origin is automatically set up to track the remote repository's state. By default, the operations for copying commits between repositories (
fetch) all work on the
origin remote unless you specify otherwise.
Unlike a centralised version control system, there is no concept of a single central repository. So when copying commits, you need to specify which repository to copy to. If you've set up a shared repository then this will be the one everyone works towards, but a distributed version control system opens up the possibility of many more organisational models which we won't cover in this tip.
A remote is configured with a URL, which takes the same form as that used to clone from. Typically,
git URLs are normally read-only, whilst
ssh are used for (authenticated) reads and writes.
Pushing and pulling
For the purposes of this tip, we'll set up a new clone of the repository on a different machine and use it to push to. This can take a number of forms:
file:///path/to/somewhere.git- a local file repository, initialised with
git init --bare /path/to/somewhere.git
ssh://host/path/to/somewhere.git- remote file repository, initialised with
ssh host git init --bare /path/to/somewhere.git
ssh://email@example.com/username/repositoryname.git- repository created on GitHub
To add this as a new remote, run:
git remote add github ssh://firstname.lastname@example.org/username/repositoryname.git
Thereafter, you can use the name “github” to refer to this remote source. If you have used a different URL then feel free to use a different name; for example, in an
ssh URL then you might want to use the (unqualified) host name. For example, if you wanted to send the code to github, you can do:
git push github
This will take all changes that weren't in the remote repository (on the current branch) and move them up to the remote server. Note that if there has been subsequent changes on the remote you may have a message saying “non fast-forward push rejected” – this just means someone else has pushed their commits before you, and if you were to push your changes you'd overwrite theirs.
The converse of the
push operation is either
fetch. Both of these will bring down commits from the remote repository; however, the
pull will merge those changes into your local branch, whilst
fetch will make the commits available for inspection.
# Get the latest changes without merging git fetch github # Get the latest changes and merge them in git pull github
In the developer example above, if Developer 1 was to push his change to GitHub, and Developer 2 were to pull changes into her repository, it would automatically create a merge node which ties the two trees together. A merge node is one which has two or more parent commits; in this case, we'd create a commit
G whose parents were
If Developer 2 pushes her changes back to GitHub (i.e. the merge commit) then this will be available for Developer 1 to pull. In this case, the Developer 1 won't need to create a merge node (since they are already merged at that point) and instead will have a fast-forward merge. A fast forward merge is simply one which moves forwards through a commit history; in other words, going from
D would be considered a fast-forward merge.
We'll look more at merges and the difference between
pull in the near future; but for now, if you are creating a backup copy of your repository (or simply making it available for others at GitHub) then you now have the tools to achieve that.
Come back next week for another instalment in the Git Tip of the Week series.