Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Embedding JGit

2013, eclipse, eclipsecon, jgit

I gave a lightening talk at EclipseCon Europe 2013 on “Embedding JGit” talking about the different levels of JGit integration. Here’s the slides and a rough transcript of the talk; when the Eclipse Foundation YouTube video is up, I’ll link to it here as well.

Level Zero

Since JGit is an executable, you can simply fork out using System.exec or use ProcessBuilder to execute a JGit command, e.g.:

Level0.java
1
System.exec("java -jar jgit.sh --git-dir /tmp/repo init");

The jgit.sh is actually an executable shell script as well, so if you are running on a Unix system then you can invoke ./jgit.sh as well.

Of course, this is cheating somewhat since the JGit executable isn’t really embedded; but this can be useful for applications that are sensitive to memory pressure or where the execution can be done on a cloud host.

This approach has a number of advantages, specifically that the embedder already knows how to use it since JGit provides a (sub)set of the standard git commands.

Level One

If you need to embed JGit in an existing Java process, then it’s possible to use the program main class org.eclipse.jgit.pgm.Main and invoke the main method. The arguments can then be passed in as an array of Strings. This has the advantage that executing JGit doesn’t require spinning up a new JVM process, and as such, it can turn around multiple requests faster.

It’s still necessary to parse the output from the command using stream parsing to know anything other than ‘success’ or ‘not success’ (since the return code from the main method will indicate that already).

Level1.java
1
2
3
import org.eclipse.jgit.pgm.Main;

Main.main(new String[] { "--git-dir", "/tmp/repo/.git", "show", "HEAD" });

There are still some optimisations that get missed out if using this level; specifically, the JGit libraries have to parse the contents of the repository repeatedly as no information is shared between the runs.

Note that the repository passed here has to have the .git directory specified.

Level Two

The most popular way of interacting with JGit involves using the Git class to wrap a repository and to provide a set of porcelain commands. This is a set of commands that roughly mirror the high-level commands that are given at the command line; for example, .add() or .log().

Level2.java
1
2
3
4
5
6
import org.eclipse.jgit.api.Git

Git git = Git.open(new File("/tmp/repo/.git"));
git.clean() ...
git.lsRemote()  ...
git.log() ...

The advantages of using the Git class are that you get to re-use the same repository between invocations, so subsequent commands may be faster. You also have IDE completion and compile time correctness for the arguments, as opposed to the untested strings in the prior examples.

To invoke the command the builder pattern is used; the result from the .clean() is actually a CleanCommand. So to invoke it, you need to invoke the .call() method, after providing any necessary arguments:

Level2.java
1
2
git.clean().setCleanDirectories(true).setIgnore(true).call();
git.lsRemote().setRemote("origin").setTags(true).setHeads(true).call();

Although the builder allows an arbitrary number of arguments to be built up over repeated calls, care must be taken to ensure that any required arguments are set up appropriately.

Level Three

The Git API provides a high-level overview to commands in a portable fashion, allowing for the building of porcelain (high-level commands that operate on lower layers called the plumbing).

To go one level further down an instance of Repository is used.

This is typically constructed using a FileRepositoryBuilder, which again uses a builder pattern to instantiate a repository. This repository can then be re-used across multiple commands, can be served via JGit using something like the org.eclipse.jgit.http.server.glue.MetaServlet class.

The Repository doesn’t provide much information on its own; it provides a means to evaluate certain tree-ish expressions such as HEAD and master~2. However, if you just need to know what the current branch is or get a list of tags, the Repository is all you need.

Level3.java
1
2
3
4
5
6
Repository repository = FileRepositoryBuilder.create(new File("/tmp/repo/.git"))
Map tags = repository.getTags();
Map refs = repository.getAllRefs();
String currentBranch = repository.getBranch();
Ref HEAD = repository.getRef("HEAD");
repository.open(HEAD.getObjectId()).copyTo(System.out)

Level Four

Interacting with the Repository will only give read-only information, and only allow getting objects that are already known. To find out information from a path level or commit level, a couple of iterators must be used, known as RevWalk (commit iterator) and TreeWalk (path/directory iterator).

To implement a log like command, you can get a RevWalk on the repository and then iterate over commits. To express a start point, the walker needs to know what commits are included (and also, what commits are excluded).

Level4.java
1
2
3
4
5
6
7
8
9
10
RevWalk rw = new RevWalk(repository);
Ref HEAD = repository.resolve(HEAD);
rw.markStart(rw.parseCommit(HEAD));
Iterator<RevCommit> it = rw.iterator();
while(it.hasNext()) {
  RevCommit commit = it.next();
  System.out.println(commit.abbreivate(6).name()
    +   + commit.getShortMessage());
}
rw.dispose();

To get information about a specific path, the TreeWalk is used against a single commit:

Level4.java
1
2
3
4
5
6
7
8
9
10
TreeWalk tw = new TreeWalk(repository);
ObjectId tree = repository.resolve(HEAD^{tree});
tw.addTree(tree); // tree ‘0’
tw.setRecursive(true);
tw.setFilter(PathFilter.create(some/file));
while(tw.next()) {
  ObjectId id = tw.getObjectId(0);
  repository.open(id).copyTo(System.out);
}
tw.release();

Although this may look like a complex way of processing commits and directories, this maps to the underlying Git representation in an efficient manner. It also permits the ability to walk through multiple trees or ranges of commits in a single pass; additional filters such as an AuthorRevFilter or CommitTimeFilter can be used to restrict the ranges of commits, or similarly for the paths with subclasses of TreeFilter.

Note that the walkers should be released/disposed at the end of the use to ensure that they do not retain information (and thus memory) that may be no longer of interest. Also note that the walkers are not thread-safe, so should only be invoked within a single thread.

Level Five

Finally, to get objects in and out of a Git repository requires the use of a ObjectInserter and a ObjectReader.

Knowledge of these is outside the scope of this tutorial, but here’s how to do a “Hello World” with JGit:

Level5.java
1
2
3
ObjectId hello = repository.newObjectInserter().insert(Constants.OBJ_BLOB,
  "hello world".getBytes("UTF-8"));
repository.newObjectReader().open(hello).copyTo(System.out);

Note that objects inserted into a Git repository become eligible for garbage collection unless they are referred to via a commit and a tree that is reachable from a ref in the repository.