Alex headshot

AlBlue’s Blog

Macs, Modularity and More

Git Tip of the Week: Searching for Patterns

2011, git, gtotw, tip

This week’s Git Tip of the Week is about grepping to find content. You can subscribe to the feed if you want to receive new instalments automatically.


Sometimes, when investigating the contents of a repository, it’s not always obvious where to find the definition of a function (or method). Clearly, Unix tools such as grep allow you to find the content easily enough, but if there’s a large amount of generated data (such as compiled code) simply looking through all files can be time-consuming.

To find a file with a specific content element, you could use something like find . -exec grep pattern '{}' ';'. This will run the grep command on all files in the working directory. However, there’s a faster way of achieving this with git grep instead. Let’s say we wanted to find the contents of occurrences in the EGit repository of the variable newPushURI. We could use find to achieve this:


EGit (master)$ find . -exec grep newPushURI '{}' ';'
		URIish newPushURI = uri;
			newPushURI = newPushURI.setPort(GERRIT_DEFAULT_SSH_PORT);
			newPushURI = newPushURI.setScheme(Protocol.SSH.getDefaultScheme());
			newPushURI = newPushURI.setPort(GERRIT_DEFAULT_SSH_PORT);
			newPushURI = prependGerritHttpPathPrefix(newPushURI);
		uriText.setText(newPushURI.toString());
		scheme.select(scheme.indexOf(newPushURI.getScheme()));

OK, we’ve found some occurrences but it doesn’t print the names of the files, which isn’t too helpful. We could print the file out afterwards if we wanted to but this wouldn’t help for files which don’t have the match. Or, you could write some kind of script or alias to handle the scan-and-test. It’s also not particularly fast:


EGit (master)$ time find . -exec grep newPushURI '{}' ';' > /dev/null
real	0m1.605s
user	0m0.541s
sys	0m0.849s

An alternative is to use git grep to scan the contents of the current working tree:


EGit (master)$ git grep newPushURI
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                URIish newPushURI = uri;
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setPort(GERRIT_DEFA
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setScheme(Protocol.
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setPort(GERRIT_DEFA
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = prependGerritHttpPathPrefix(ne
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                uriText.setText(newPushURI.toString());
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                scheme.select(scheme.indexOf(newPushURI.getScheme()

Not only does it show which files they are located, it’s also an order of magnitude faster:


EGit (master)$ time git grep newPushURI > /dev/null
real	0m0.024s
user	0m0.014s
sys	0m0.033s

The arguments that git grep takes are much the same as grep itself; for example, -l lists files with matches (and -L is files without), -E allows an extended regexp, -i is ignore case and -w is word regexp.

There are also some options which are specific to git. The --no-index example scans the files in the directories, whilst --cached searches blobs in the index.

As well as the current working directory, git grep can also be used to specify a treeish (tag, branch, commit) and subfolders within a repository. If we wanted to look for the regex extension.point in the stable-1.0 branch for matches located in the org.eclipse.egit.core folder, we could do:


EGit] (master)$ git grep extension.point stable-1.0 -- org.eclipse.egit.core
stable-1.0:org.eclipse.egit.core/plugin.xml:   <extension point="org.eclipse.core.runtime.preferences">
stable-1.0:org.eclipse.egit.core/plugin.xml:  <extension point="org.eclipse.team.core.repository">

Finally, it’s possible to use -p to print out the name of a function in which a match occurs.


EGit (master)$ git grep  -p newPushURI
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java=        private void setDefaults(RepositorySelection selection) {
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                URIish newPushURI = uri;
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setPort(GERRIT_DEFA
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setScheme(Protocol.
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = newPushURI.setPort(GERRIT_DEFA
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                        newPushURI = prependGerritHttpPathPrefix(ne
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                uriText.setText(newPushURI.toString());
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java:                scheme.select(scheme.indexOf(newPushURI.getScheme()

Note that this shows the function annotated by = instead of a : at the end of the name. This can be used to quickly find which functions contain a reference to a given pattern:


EGit] (master)$ git grep  -p newPushURI | grep java=
org.eclipse.egit.ui/src/org/eclipse/egit/ui/internal/clone/GerritConfigurationPage.java=        private void setDefaults(RepositorySelection selection) {

Come back next week for another instalment in the Git Tip of the Week series.