Sam Doshi personal ramblings

An introduction to Git submodules

Over on the Monome online community lines, we’ve been discussing breaking up the monolithic git repository used for the Eurorack modules’ firmware. This is just a brief overview of Git submodules to see if they are a good fit for us.

A submodule allows you to keep another Git repository in a subdirectory of your repository. The other repository has its own history, which does not interfere with the history of the current repository. This can be used to have external dependencies such as third party libraries for example.

The Git submodule man page (emphasis mine)

Let’s work through a simple example to see how it works…

Setup

First, we’ll make our shared library repo lib and add some commits.

mkdir lib
cd lib
git init
touch a
git add a
git commit -m "added a"
git mv a b
git commit -m "mv a b"
cd ..

Now, let’s create a repo repo1 (that will use the library via a submodule).

mkdir repo1
cd repo1
git init
cd ..

This gives us the following files (via tree):

.
├── lib
│   └── b
└── repo1

Let’s add the submodule to repo1 as the directory common.

cd repo1
git submodule add ../lib common
git commit -m "added submodule"
cd ..

And again, the output of tree:

.
├── lib
│   └── b
└── repo1
    └── common
        └── b

Now we have a repo repo1 that contains a submodule in the directory common from repo lib.

Making changes to lib

Let’s make a change to lib to demonstrate what submodules allow us to do.

cd lib
git mv b c
git commit -m "mv b c"
cd ..

Again, consider the output of tree. Notice how repo1/common still contains b.

.
├── lib
│   └── c
└── repo1
    └── common
        └── b

Let’s clone repo1 to repo1_clone.

git clone repo1 repo1_clone

That gives us:

.
├── lib
│   └── c
├── repo1
│   └── common
│       └── b
└── repo1_clone
    └── common

Hmm, why hasn’t repo1_clone/common got any files in it?

cd repo1_clone
git submodule update --init
cd ..

That’s better1:

.
├── lib
│   └── c
├── repo1
│   └── common
│       └── b
└── repo1_clone
    └── common
        └── b

The important thing to note is that both repo1 and repo1_clone still contain b rather than c. This is because Git tracks a particular commit for the submodule rather than a branch or HEAD. You need to explicitly update the parent repo to track a new commit. This is great as it means changes to the upstream repo of the submodule are not forced upon us.

The output of git log --pretty="format:%H %s" in the lib repo is:

1e5717c64e1150ca1da08521a24d8469c2bdde00 mv b c
86a5b293fa8f860730cd96c11b29b5f03fc2a60e mv a b
3c163ca8fcf336907e1b2a121f25bd550a71e5e3 added a

The output of git submodule status in repo1 is:

86a5b293fa8f860730cd96c11b29b5f03fc2a60e common (heads/master)

Notice how the commit SHA for the common entry in git submodule status matches the second entry in the log for lib. This shouldn’t come as a surprise, as our submodule hasn’t been updated to the latest changes in lib yet.

Updating submodules

Let’s update repo1 to incorporate the changes made to lib.

cd repo1/common
git pull
cd ..
git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   common (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

The output of git status is telling us that we have modified common, we need to commit those changes to repo12.

git add common
git commit -m "update common"
cd ..

Now the output of tree is:

.
├── lib
│   └── c
├── repo1
│   └── common
│       └── c
└── repo1_clone
    └── common
        └── b

The only thing left to do is to update repo1_clone to reflect those changes:

cd repo1_clone
git pull
git submodule update

The git pull updates repo1_clone to match repo1, but won’t update the common directory. Assuming that common is clean, you can run git submodule update to update common to the correct commit of lib.

The final output of tree is:

.
├── lib
│   └── c
├── repo1
│   └── common
│       └── c
└── repo1_clone
    └── common
        └── c

Taking things further

The common directories in repo1 and repo1_clone are normal Git repos, that are cloned from lib. You can do all the normal things inside them that you would in any other Git repo: branch, checkout, commit, pull and even push. So if the work you’re doing on the lib repo is best done in the context of repo1, you can make your changes and commits in repo1/common—you just need to remember to commit the directory common to repo1 when you want repo1 to be updated to reference the new commits you’ve made to lib.


  1. As an aside, we could have used git clone repo1 repo1_clone --recursive for our initial clone to avoid having to use git submodule update --init
  2. If we change our mind, we can run git submodule update to revert the submodule (assuming that it is clean)