[Updated: December 27, 2020]
As a software engineer, it is not uncommon that we work on a project that has a dependency on the other project that we also work on at the same time. The scenario may look like the following:
We have two projects, and each of them has its Git repository:
- A common library, say
commonlib
, is used by many projects. The library is self-contained and has its test suite and document. - A project called
myproj
has a dependency oncommonlib
.
While we are working on myproj
, we may also need to update commonlib
at the same time. If both commonlib
and myproj
happen to be Python projects, we can use setuptools’ development mode (development mode) and Git submodule (submodule) to make the work easier. This article demonstrates how to use development mode and submodule to deal with this situation. Hopefully, people who need to manage this type of workflow can find this article helpful.
commonlib
and myproj
are used as an example in the rest of the article, and the example assumes the code runs in a virtual environment with the following condition:
- Ubuntu 20.04
- Python 3.9
- Git 2.25
The Challenges
First of all, for Python project development, we usually set up a virtual environment and install all the dependencies into the virtual environment. Then, we start working on our project, i.e., myproj
in this case. However, myproj
needs commonlib
, which we also work on at the same time. If we install commonlib
in the usual way, e.g., pip install, we cannot use Git to track our changes of commonlib
. This is the issue that development mode comes to solve.
Second, commonlib
is used by many projects, including myproj
. On the one hand, during the development, myproj
may need to stick with a specific version or branch of commonlib
, but other projects may need a different version of commonlib
. To ensure that we use the correct branch or version of commonlib
when we work on myproj
, we can set the dependency as a Git submodule.
What is Development Mode?
Development mode allows a project to be both installed and editable.
Typically, we install a Python package from PyPi.
$ pip install <package_name>
Or, we install it from a local package.
$ pip install <path_to_local_archive>
Either way, the package will be installed in our (virtual) environment. When we install a Python package into our virtual environment, the package will be copied to /virtual_environment/lib/python3.9/site-packages/, for example. If we want to install commonlib onto our virtual environment, we can do:
$ git clone https://github.com/burpeesDaily/commonlib.git
$ pip install commonlib/
After the installation, commonlib
will be shown as an installed package in the site-packages folder. We can use ls
command to check it. For example, the result may look like the following:
(demoenv) $ ls -l ~/demoenv/lib/python3.9/site-packages/
total 332
drwxr-xr-x 2 username username 4096 Nov 9 22:12 __pycache__
drwxr-xr-x 3 username username 4096 Dec 26 21:25 commonlib
drwxr-xr-x 2 username username 4096 Dec 26 21:25 commonlib-0.0.1-py3.9.egg-info
-rw-r--r-- 1 username username 53 Nov 9 22:34 commonlib.egg-link
-rw-r--r-- 1 username username 52 Nov 9 22:34 easy-install.pth
-rw-r--r-- 1 username username 126 Nov 9 21:42 easy_install.py
…
Development mode creates a link from the package to the virtual environment. With the development mode, a Python package can be installed to allow us to edit the code after the installation. Therefore, when we change the code, the change takes effect immediately in the virtual environment.
To install a Python package as development mode, use the command
$ pip install -e <path to the package>
Take the commonlib
as an example, and the result may look like the following:
(demoenv) $ python -m pip install -e commonlib/
Obtaining file:///home/username/commonlib
Installing collected packages: commonlib
Running setup.py develop for commonlib
Successfully installed commonlib
(demoenv) $ ls -l ~/demoenv/lib/python3.9/site-packages/
total 324
drwxr-xr-x 2 username username 4096 Nov 9 22:12 __pycache__
-rw-r--r-- 1 username username 31 Dec 26 22:28 commonlib.egg-link
-rw-r--r-- 1 username username 30 Dec 26 22:28 easy-install.pth
-rw-r--r-- 1 username username 126 Nov 9 21:42 easy_install.py
…
If we open the file, commonlib.egg-link
, we will see where it links to. For example,
(demoenv) $ cat ~/demoenv/lib/python3.9/site-packages/commonlib.egg-link
/home/username/commonlib
Note that development mode only available for a local project or a VCS URL. If we try to install a package from PyPi as development mode, the following error message will show. Use numpy
as an example,
$ pip install -e numpy
numpy should either be a path to a local project or a VCS url beginning with svn+, git+, hg+, or bzr+
What is Git Submodule?
A Git submodule is a Git repository inside another Git repository. It is like that one Git repository has reference to the other Git repository. For example, myproj
has a dependency on commonlib
. If commonlib
is a Git submodule of myproj
, the picture below illustrates their relationship.
Git submodule allows us to keep a Git repository as a subdirectory of another Git repository. When we do git clone myproj
, a specific commonlib
defined in myproj
submodule reference will be downloaded from the commonlib
repository. This way, we can clone another repository (i.e., commonlib
) into our project (i.e., myproj
) and keep the commits separate.
The following sections use commonlib
and myproj
to demonstrate the setup and workflow of development mode and submodule. The following sections also assume we do everything from scratch, including setup the Git repositories.
Setup the Projects
Assume commonlib
provides an effortless and only feature: greeting. The project layout and code look like the following:
commonlib/
├── LICENSE
├── README.rst
├── commonlib
│ ├── __init__.py
│ └── greeting.py
└── setup.py
greeting.py
def greeting(name: str):
"""Print a simple greeting with the name."""
print(f"Howdy, {name}")
pyproject.toml
[project]
name = "commonlib"
version = "0.0.1"
authors = [
{name="author name", email="author@email.com"}
]
description = "Simple Python package"
readme = {file = "README.rst", content-type = "text/x-rst"}
requires-python = ">=3"
dependencies = []
license = {text = "MIT License"}
classifiers = [
"License :: OSI Approved :: MIT License",
"Programming Language :: Python",
]
[project.urls]
repository = "https://github.com/burpeesDaily/commonlib"
(A complete example of commonlib
can be found at https://github.com/burpeesDaily/commonlib)
Now, we are ready to set up the Git repositories for both commonlib
and myproj
. Before we do that, we need to set up a Git server. This example uses localhost (i.e., 127.0.0.1) as the Git server.
$ sudo useradd git
$ sudo passwd git
$ su git
$ cd ~
$ git init –bare commonlib
$ git init –bare myproj
Setup Git Repository for commonlib
After we have a Git server, we can add the existing commonlib
to the Git server. Go back to the local user.
user:~$ cd commonlib/
user:~/commonlib$ git init
user:~/commonlib$ git add –all
user:~/commonlib$ git commit -a -m "Initialize commonlib repository"
user:~/commonlib$ git remote add origin git@127.0.0.1:commonlib
user:~/commonlib $ git push -u origin master
Setup Git Repository for myproj
For myproj
, we can do a similar thing as commonlib
. The project layout and code are like the following:
myproj/
├── LICENSE
├── README.rst
├── app.py
└── setup.py
app.py
from commonlib import greeting
def run():
greeting.greeting("Git Submodule")
if __name__ == "__main__":
run()
pyproject.toml
[project]
name = "myproj"
version = "0.0.1"
authors = [
{name="author name", email="author@email.com"}
]
description = "Simple Python project"
readme = {file = "README.rst", content-type = "text/x-rst"}
requires-python = ">=3"
dependencies = []
license = {text = "MIT License"}
classifiers = [
"License :: OSI Approved :: MIT License",
"Programming Language :: Python",
]
[project.urls]
repository = "https://github.com/burpeesDaily/python-template"
Then, add the existing code to the Git server.
user:~$ cd myproj/
user:~/myproj$ git init
user:~/myproj$ git add –all
user:~/myproj$ git commit -a -m "Initialize myprojrepository"
user:~/myproj$ git remote add origin git@127.0.0.1: myproj
user:~/myproj$ git push -u origin master
Setup Git Submodule
Although Git submodule provides many features for all kinds of situations, the two use cases used the most are 1. adding a repository as a submodule and 2. update a submodule.
Add a Repository as a Submodule
Adding an existing repository as a submodule of another repository can be done by the following commands:
user:~$ cd myproj/
user:~/myproj$ git submodule add git@127.0.0.1:commonlib
user:~/myproj$ git submodule init
user:~/myproj$ git commit -a -m "Add commonlib as submodule"
user:~/myproj$ git push
After adding a submodule, a submodule reference, i.e., a .gitmodules
file will be created. It may look like the following:
(demoenv) user:~/workspace/myproj$ ls -al
total 40
drwxrwxr-x 4 user user 4096 Dec 20 07:20 .
drwxrwxr-x 10 user user 4096 Dec 20 06:47 ..
drwxrwxr-x 9 user user 4096 Dec 20 07:22 .git
-rw-rw-r-- 1 user user 1233 Dec 20 06:44 .gitignore
-rw-rw-r-- 1 user user 73 Dec 20 07:20 .gitmodules
-rw-rw-r-- 1 user user 1067 Dec 20 06:44 LICENSE
-rw-rw-r-- 1 user user 278 Dec 20 06:58 README.rst
-rw-rw-r-- 1 user user 123 Dec 20 06:57 app.py
drwxrwxr-x 3 user user 4096 Dec 20 07:20 commonlib
-rw-rw-r-- 1 user user 724 Dec 20 06:57 setup.py
If we open the file, .gitmodules
, we can see that it records the information of submodules.
$ cat .gitmodules
[submodule "commonlib"]
path = commonlib
url = git@127.0.0.1:commonlib
Note: the url of the submodule in .gitmodules
can be a relative path. For example, both commonlib
and myproj
are located in the same folder of the Git server. The url can be simplified to ../commonlib
.
If we use Github to host our repositories, the submodule may look like below:
(The example, myproj
, can be found at https://github.com/burpeesDaily/myproj)
Update a Submodule
Usually, there are two cases that we may want to update a submodule: 1. Update a submodule because of some code changes. 2. Update a submodule to a newer or specific version.
Case 1: Update a submodule because of code changes
A submodule is just a Git repository inside another Git repository. When we make some changes on a submodule, we do the same thing as we usually do on a regular Git repository.
For example, we add a new function called greeting2 into commonlib
.
greeting.py
def greeting2(name: str):
"""Print a simple greeting with the name."""
print(f"How are you, {name}?")
We do the same thing for the submodule as we do for a regular repository: commit the change and push the change.
user:~$ cd myproj/commonlib
user:~/myproj/commonlib$ git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: greeting.py
no changes added to commit (use "git add" and/or "git commit -a")
user:~/myproj/commonlib$ git commit -a -m "Added a new greeting function."
user:~/myproj/commonlib$ git push
After we commit and push the submodule change, we can see the submodule reference of the main project, i.e., myproj
, also changed, and then we can do the same thing to update the reference. Then, myproj
will attach the newer commonlib
.
user:~/myproj/commonlib$ cd ../
user:~/myproj$ git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: commonlib (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
user:~/myproj$ git commit -a -m "Update submodule, commonlib"
user:~/myproj$ git push
Case 2: Update a submodule to a newer or specific version
When someone else modified commonlib
or add new features, we may want to update commonlib
submodule to the more recent version.
For example, someone adds a new function called greeting3 into commonlib
.
greeting.py
def greeting3():
"""Print a simple greeting with the name."""
print("How's going?")
And the commit hash is highlight below.
user2:~$ git clone git@127.0.0.1:commonlib
user2:~$ cd commonlib
user2:~/commonlib$ vim commonlib/greeting.py # add greeting3 function as the following
user2:~/commonlib$ git commit -a -m "Added greeting3 function."
user2:~/commonlib$ git push
user2:~/commonlib$ git log
commit 7735cf8460acd03f92e7c0529486c86ec83b2c0e (HEAD -> master, origin/master, origin/HEAD)
Author: user2 <user2@email.com>
Date: Sun Dec 22 00:27:09 2019 +0000
Added greeting3 function.
The way we update a submodule to a newer or specific version is to update the commit hash that the submodule points.
The Git submodule official document says, “Submodule repositories stay in a detached HEAD state pointing to a specific commit. Changing that commit simply involves checking out a different tag or commit then adding the change to the parent repository.”
The following is an example to update the submodule to commit hash 7735cf8460acd03f92e7c0529486c86ec83b2c0e.
user:~/myproj$ cd commonlib
user:~/myproj/commonlib$ git pull
user:~/myproj/commonlib$ git checkout 7735cf8460acd03f92e7c0529486c86ec83b2c0e
Note: checking out '7735cf8460acd03f92e7c0529486c86ec83b2c0e'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at 7735cf8 Added greeting3 function.
user:~/myproj/commonlib$ cd ..
user:~/myproj$ git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: commonlib (new commits)
no changes added to commit (use "git add" and/or "git commit -a")
user:~/myproj$ git commit -a -m "Update submodule, commonlib, to the newer one."
user:~/myproj$ git push
Setup Development Mode with Git Submodule
Development mode is the ability of setuptools, so it is no different from writing a setup.py
for packaging a Python project. However, when one Python project has another Python project as a submodule in it and wants to install the submodule as development mode, we need to add the submodule to the main project’s requirements.txt file. For example, the requirements.txt of myproj
can be the following.
# Install commonlib as development mode
-e ./commonlib # Path to the submodule
Therefore, when we install the dependencies of myproj
, commonlib
will be installed as develop mode automatically.
Note that if the submodule also has its dependencies, we can also add the installation instruction into the main project’s requirements.txt file
. For example,
# Install the dependencies for commonlib
-r ./commonlib/requirements.txt
# Install commonlib as development mode
-e ./commonlib # Path to the submodule
Workflow
The situation that we need to work on both the main project and its dependent project at the same time happens when we work on a big project which contains several smaller projects. In this case, we usually work with others as a team. The recommended workflow breaks down into two stages: the setup stage and the working stage.
Setup Stage
This stage prepares the code and working environment.
- Create a virtual environment
- Use
–-recurse-submodules
to download the source code.–-recurse-submodules
will download all the submodules.
$ git clone --recurse-submodules <URL_to_the_repository>
- Checkout the branch. Usually, when we work on a feature or fix a bug, we will create a branch. We should avoid working with the master (or develop) branch directly. More info about this can be found at https://guides.github.com/introduction/flow/
$ git checkout <branch_name>
- Install the dependencies onto the virtual environment.
$ pip install -r requirements.txt
Working Stage
This stage indicates the time that we are working on our real issue. Besides the code change, there are two cases we need to modify submodules.
Case 1: If we need to make some code change of a submodule:
- Create a branch of this change and create a Pull-Request (PR) for the submodule code change.
- After the PR is approved and the branch merged, update the submodule to the commit that the PR just merged.
Case 2: Someone updates a repository which is our submodule, and we want to update the submodule to the newer commit:
- Use git pull on the submodule folder to get the change.
- Update the commit hash of the submodule to the one we want.
cd
to the main project and commit the change of the submodule
Conclusion
It is easy to make mistakes when we are working on multiple related projects at the same time. When we have to work under this situation, development mode and submodule provide an easy way to manage our projects. Using development mode and submodule maybe not straightforward in the beginning. But once we get familiar with using it, the combination of development mode and submodule prevents us from making mistakes and improves our productivity.
Hello Shun,
can you please show me the code from the __init__.py from the commonlib?
Im not sure how to handle the import statements in this workflow for a submodule.
Thanks,
Jugin
Hi Jugin,
The content of __init__.py is empty (https://github.com/burpeesDaily/commonlib/blob/ad5fe6a4b8ce0aeb63ed978e3f9594fe149a6667/commonlib/__init__.py). The __init__.py file is required to make Python treat the directory containing the file as a package so that we can do this
You can check https://docs.python.org/3/tutorial/modules.html#packages for more detail. Hopefully, it helps.
Thanks for the great post.
What if there is a commonlib2 that is a submodule of myproj and a submodule of commonlib and both of them point to different versions of commonlib2, 1.0, and 2.0 respectively.
In that case, I was wondering what version will be be imported when I type in
after I installed all the dependencies of both myproj and commonlib.
Thanks,
Amit
Hi Amit,
Thank you for the question. So you are creating a situation like a diamond problem.
myproj —- commonlib —- commonlib2 (2.0)
myproj —- commonlib2 (1.0)
In this case, when you do
$ git clone –recurse-submodules
You will get two commonlib2, and they may look like thefollowing:
myproj
|— commonlib
| |—- commonlib2 (2.0)
|— commonlib2 (1.0)
Regarding your question, what version will be imported when you do
It will depend on how your Python path setup and how you install your packages. If you install everything to your virtual env, the last commonlib2 installed will be the version you import.
I would consider this situation is a bug and try to avoid it. If commonlib2 is a common library used by everybody, you may want to make it a standalone library and install it into your virtual environment separately.
Hopefully, it helps 🙂
Yes thank you so much for your reply!!
Yes Shun it’s really helpful! Thank you for your clarity and explanations 🙂
Thank you! I am glad it helps. 🙂
It’s interesting but IMO it doesn’t really works (or doesn’t show clearly enough for my level of understanding) the really interesting part of all this – which is managing the dependancies of the submodule without having to explicitely list them in the parent module.
For instance, if I try to add to commonlib:
import numpy as np
And then add to /commonlib/requirements.txt:
numpy
The import fails. Meaning that myproj doesn’t properly manage the dependancies of its submodule. Maybe I’m missing something obvious, but I cannot seem to get that to work. But that’s pretty important for this workflow, since any submodule (unless trivially simple) is likely to have some import of its own…. Solutions?
Hi Franck,
Thank you for the comment.
Based on my understanding of your question, you tried to do the following steps:
1. You have commonlib as a submodule inside myproj.
2. You installed commonlib as development mode.
3. Later, you added numpy in the requirements.txt of commonlib and also added “import numpy as np” in the code of commonlib.
4. You ran “pip install -r requirements.txt in the level of myproj.
Step 4 did not install numpy in your environment. Is it correct?
This is expected because the lines we added in the requirements.txt of myproj does not know it should also install the requirements.txt of commonlib.
To solve this problem, we could add this line “-r ./commonlib/requirements.txt” in the requirements.txt of myproj, so it will install the dependenies of commonlib.
The new requirements.txt of myprojc becomes the following
requirements.txt of myproj
This is a good question. I will modify the article to conver this case. Thank you for your comment and hopefully, this solve your problem.
That is indeed pretty much the use case I had in mind. Makes lots of sense. My workaround was to explicitely add all the requirements in myproj/requirements.txt, but that somewhat undermines the purpose of having a nice system like this.
You suggestion does solve that issue in a more elegant manner, with the benefit that any further requirement in commonlib will be manage by any project that uses it (provided they use the -r ./commonlib/requirements.txt in their own dependancy management).
Thanks a lot!
You are welcome! I am glad it solves your problem.