Following the previous article, Development Environment Setup and Virtual Environment, this article does not discuss language differences but focuses on project build and its usage. The purpose of writing code is to use it. In C++, when we start writing code (or even before), we need to decide how the code will be used (e.g., as an executable binary or a library) and then build the code as we desire. Is that the case in Python? How to do that? When I started learning Python, I thought Python was a scripting language and could only be running as a script. This is not true. Python code can be built and run as a library or an executable like C++ and many other languages. However, the way to do it in Python is quite different, as well as the terminology. I wish someone could tell me how to do that using the concept or language I was already familiar with when I started learning Python. That would make the learning process much more straightforward. And that’s the purpose of this article.
(Note that the Python code in the series assumes Python 3.7 or newer)
Build C++ Program
Before we start to wring a C++ program, we need to decide what we will build, and usually, we have two options – an executable or a library. A typical project layout for an executable may look like the following, which includes C++ source codes and a build file, CMakeLists.txt
if using CMake.
MyExecutableProject
├── CMakeLists.txt
├── include
│ └── MyLib.hpp
└── src
├── Main.cpp
└── MyLib.cpp
Assuming the contents of the files are the following.
MyLib.hpp
class MyLib
{
public:
static int add(int, int);
};
MyLib.cpp
#include "MyLib.hpp"
int MyLib::add(int a, int b)
{
return (a + b);
}
Main.cpp
#include "MyLib.hpp"
#include <iostream>
int main()
{
int a = 10;
int b = 20;
std::cout << a << " + " << b << " = " << MyLib::add(a, b) << std::endl;
return 0;
}
CMakeLists.txt
cmake_minimum_required (VERSION 3.10)
project(MyExecutable)
set(SOURCES
${PROJECT_SOURCE_DIR}/src/MyLib.cpp
${PROJECT_SOURCE_DIR}/src/Main.cpp
)
include_directories(${PROJECT_SOURCE_DIR}/include)
add_executable(myexe ${SOURCES})
For this C++ program, we need to compile and build its code. Once it is compiled and built, an executable binary, myexe
, will be generated. And then, we can run it by issuing ./myexe
if we run the code in Linux.
$ ./myexe
10 + 20 = 30
C++ Library
Building a C++ library is similar to a C++ executable project, which also needs C++ code and a build file.
MyLibraryProject
├── CMakeLists.txt
├── include
│ └── MyLib.hpp
└── src
└── MyLib.cpp
The main difference is that we need to build the code as a library via the build tool, like CMake, so we define that in CMakeLists.txt
to tell the compiler how to build the project as a library (the example below assumes we build a static library).
cmake_minimum_required (VERSION 3.10)
project(MyLib)
set(SOURCES ${PROJECT_SOURCE_DIR}/src/MyLib.cpp)
include_directories(${PROJECT_SOURCE_DIR}/include)
add_library(mylib STATIC ${SOURCES})
After we build the library project, libmylib.a
will be generated, and our code can use it by including its header file and linking against the library.
Build and Run Python Program
The previous section shows what we usually do in C++. Can we do the same thing in Python? Do we need to do that, and how to do it? The answer is yes, but in a different way.
Python is an interpreted language, so a program written in Python can be executed by the Python interpreter directly without compiling. Therefore, writing Python code as a script and running it as a script is the most obvious option, especially when the Python program is as simple as a single file.
Run as a Script
Assuming we have a Python script named script.py
with the following content.
def add(a: int, b: int) -> int:
return a + b
number_1 = 10
number_2 = 20
result = add(a=number_1, b=number_2)
print(f"{number_1} + {number_2} = {result}")
Running it as a script can be as easy as the following.
$ python script.py
10 + 20 = 30
The Python interpreter will execute the script from the top to the end.
Besides, by adding shebang, #!/usr/bin/env python
, at the top of our script, the Python script can be executed as a shell script in a Linux system.
#!/usr/bin/env python
def add(a: int, b: int) -> int:
return a + b
number_1 = 10
number_2 = 20
result = add(a=number_1, b=number_2)
print(f"{number_1} + {number_2} = {result}")
We can also remove its .py
extension to make it look like a shell script, which still works.
$ chmod +x script.py
$ mv script.py script
$ ./script
10 + 20 = 30
Note that #!/usr/bin/env python
specifies the path to the Python interpreter using the env command, which will execute the Python interpreter by looking at the PATH environment variable. In addition to using the env command, we can also use an absolute path in the shebang to specify the Python interpreter, e.g., #!/usr/local/bin/python.
Python Module
In C++, we have header and CPP files, but in Python, there is only one type, .py
files. Also, a file containing Python definitions and statements is called a module. Therefore, every .py
file is a module; a Python script is also a module.
However, the idea of module and script is not the same. The main difference between a module and a script is that a script is designed to execute directly, whereas a module is made to be imported. But this is a conceptual difference; we can still import a Python script. Because of this, we don’t need to designate that our Python code is a library or an executable – every Python module is importable and executable. Moreover, there is more than one way to run a Python module. The following example will show different ways to run a Python module.
Assuming we have a Python module called mylib.py
with the following content.
def add(a: int, b: int) -> int:
return a + b
if __name__ == "__main__":
number_1 = 5
number_2 = 7
result = add(a=number_1, b=number_2)
print("Running from the module...")
print(f"{number_1} + {number_2} = {result}")
The first way is to use the Python interpreter with the module’s file name, like running a script (i.e., python mylib.py
). The other way is to use the -m option. When using this option, the Python interpreter will search sys.path for the named module and execute it. Usually, the module name is the Python file without the .py
extension.
$ python -m mylib
Running from the module... 5 + 7 = 12
The same mylib.py
module can also be imported as a library. Assuming the code we want to import mylib.py
into is called main.py
and both modules are in the same folder, the main.py
can import the mylib.py
module as follows.
import mylib
if __name__ == "__main__":
number_1 = 10
number_2 = 20
result = mylib.add(a=number_1, b=number_2)
print("Running from main...")
print(f"{number_1} + {number_2} = {result}")
And then, we can run the main.py
module, as the previous example shows.
$ ls
main.py mylib.py
$ python main.py
Running from main...
10 + 20 = 30
Why use if __name__ == “__main__” statement?
One important thing to be aware of in the previous examples is that we add the if __name__ == “__main__”
statement in the mylib.py
module. Why?
Using this if-statement is to prevent executing code when importing the code as a module. To understand this, we need to know what is __name__
.
__name__
is an attribute every Python module has, and it is the string that a module uses to identify itself. In other words, it’s the module’s name when executing it. However, the value is not always the same but depends on how the module was executed. When a module is imported, its __name__
is the module name (i.e., the .py file’s filename without .py). However, when the module is executed as an executable, its __name__
becomes __main__
.
__main__ is the name of the environment where the top-level code runs. Top-level code is the first user-specified Python module that starts running. It’s like main()
, the entry point in a C++ program. Therefore, when we run a module as an executable, it becomes the top-level environment, so its __name__
becomes __main__
.
Besides, __name__
is a global variable inside a module and an attribute outside of the module. In other words, it can be accessed within or outside the module. Therefore, we can use it by printing the value to demonstrate how a module’s __main__
is assigned when imported and executed.
Assuming we have a module called mymodule.py
with the following content.
def greeting(name: str) -> None:
print(f"Hello, {name}")
print(__name__)
When we use the Python interpreter to import the mymodule.py module, the print function will show the value of __name__
is mymodule
.
>>> import mymodule
mymodule
Now, if we run the module as a script, its __name__
value becomes __main__
.
$ python mymodule.py
__main__
With the understanding of __name__
and __main__
, we can explain why we use the if __name__ == “__main__”
statement in the mylib.py module and how it prevents executing code when importing: the code under the if-statement block will be running only when the module is executed as the top-level code. We had seen how this worked at the beginning of this section – when the mylib.py
example was imported, the Running from the module...
was not executed. We can also check what will happen if we don’t have the if-statement. Assuming we have the mylib2.py
module, which is almost identical to mylib.py
, except mylib2.py
does not have the if __name__ == “__main__”
statement.
def add(a: int, b: int) -> int:
return a + b
number_1 = 11
number_2 = 17
result = add(a=number_1, b=number_2)
print("Running from the mylib2...")
print(f"{number_1} + {number_2} = {result}")
And a module main2.py
that imports mylib2.py
.
import mylib2
number_1 = 23
number_2 = 29
result = mylib2.add(a=number_1, b=number_2)
print("Running from main2...")
print(f"{number_1} + {number_2} = {result}")
When we run main2.py
, both “Running from…” statements are executed even if mylib2.py
was just imported.
$ ls
main2.py mylib2.py
$ python main2.py
Running from the mylib2...
11 + 17 = 28
Running from main2...
23 + 29 = 52
Therefore, if our module contains code that should be running only when the module is running in the top-level environment, we should always place the code under the if __name__ == “__main__”
statement.
Python Package
So far, we have only talked about simple cases, but it’s unlikely we can always put everything in one or two modules. When our project gets more complex, we must organize our code properly. The way Python structs modules is called Package. Python package is nothing but a folder containing Python modules. The package approach helps us organize Python code well and well structure Python modules’ namespaces by avoiding naming collisions between modules. A simple package example could look like the following.
mypackage
├── __init__.py
└── mylibrary
├── __init__.py
└── module.py
One package can contain another package; one package can also consist of multiple packages and modules. The example above shows that we have a package called mypackage
, and within mypackage
, we have another package, mylibrary
, which contains the module that provides the actual functionalities. Besides, each folder has an __init__.py
file. The __init__.py
files are required to make Python treat directories containing the file as packages (See Regular packages for the details). Most of the time, __init__.py
can be just an empty file.
Like Python modules, Python packages can be imported as a library or invoked as an executable.
Import a Package as a Library
Importing a package as a library is the same as importing a module. The only difference is that we need to include the package in the importing path.
Assuming the content of module.py
in mypackage
has the following content (Use the example above).
def add(a: int, b: int) -> int:
return a + b
And outside the mypackage
package, we have a main.py
that we want to import the module and use like a library. Since the module
module is part of the mylibrary
package, and mylibrary
package is part of the mypackage
package, the import path needs to include the packages. Because we assume our main.py
is in the same location as mypackage
, the import path becomes mypackage.mylibrary
as the code snippet shows below.
from mypackage.mylibrary import module
if __name__ == "__main__":
number_1 = 10
number_2 = 20
result = module.add(a=number_1, b=number_2)
print(f"{number_1} + {number_2} = {result}")
And then we can run main.py
as usual.
$ tree -L 3
.
├── main.py
└── mypackage
├── __init__.py
└── mylibrary
├── __init__.py
└── module.py
$ python main.py
10 + 20 = 30
(See python_package_library_example for the complete example)
Run a Package as an Executable
To make a Python package runnable, we need to add a __main__.py
file into the top level of the package, and then our mypackage
becomes the following.
mypackage
├── __init__.py
├── __main__.py
└── mylibrary
├── __init__.py
└── module.py
The __main__.py is used to provide a command-line interface for a package. When the package is invoked directly from the command line using the -m
option, the Python interpreter will look for the __main__.py
to run. Therefore, we can put our entry point code into the __main__.py
file.
from mypackage.mylibrary import module
if __name__ == "__main__":
number_1 = 10
number_2 = 20
result = module.add(a=number_1, b=number_2)
print("Running from mypackage...")
print(f"{number_1} + {number_2} = {result}")
As mentioned above, we use the -m
option with the package name to execute a package directly.
$ python -m mypackage
Running from mypackage...
10 + 20 = 30
Build and Install a Package
In C++, we use CMakeLists.txt
, Makefile
, or something similar to define the build and installation process. Python has a similar approach, and many tools serve this purpose, such as setuptools and wheel (see Distributing Python Modules for more details). The file used to define the process is called setup.py. The relationship between the setup.py
file and the build tools is similar to CMakeLists.txt
to CMake. Although the setup.py
file is mainly for defining what we want the project to be built, distributed, and installed, the setup file is just a regular Python file, so that we can use any Python functionality in the setup file. The following example demonstrates how it works with a package. Assuming we have a project with the layout shown below and a setup.py file which should be outside the package (mypackage
in this example).
project
├── mypackage
│ ├── __init__.py
│ └── mylibrary
│ ├── __init__.py
│ └── module.py
└── setup.py
The basic setup.py
definition for the package looks like the following (the following examples use setuptools as the build tool).
"""Setup file for mypackage."""
import pathlib
import setuptools
# The directory containing this file
HERE = pathlib.Path(__file__).parent
# This call to setup() does all the work
setuptools.setup(
name="my-package",
version="0.0.1",
description="A Python package example",
packages=setuptools.find_packages(),
python_requires=">=3.7",
)
The setuptools.setup()
function is the function where we input our definition for package build, and calling the function is actually all we need to do in the setup file.
(The detail of writing the setup.py
can be found at writing-the-setup-script , and the complete list of the keywords that setuptools.setup()
supports is available at setuptools – keywords)
Once our project has the setup.py
file, our package is ready to distribute and install. Since it’s Python, compiler the code is not necessary. The example below shows how to use pip to install the package into our Python environment.
$ tree -L 3
.
├── mypackage
│ ├── __init__.py
│ └── mylibrary
│ ├── __init__.py
│ └── module.py
└── setup.py
$ python -m pip install.
Processing /<path to the myproject>/project
Preparing metadata (setup.py) ... done
Using legacy 'setup.py install' for my-package, since package 'wheel' is not installed.
Installing collected packages: my-package
Attempting uninstall: my-package
Found existing installation: my-package 0.0.1
Uninstalling my-package-0.0.1:
Successfully uninstalled my-package-0.0.1
Running setup.py install for my-package ... done
Successfully installed my-package-0.0.1
The command will look for the local setup.py
and install the package (mypackage
in this example) into the site-packages
folder of our Python environment. The package name will be the name we specify in the setuptools.setup()
function (e.g., <python env>/lib/python3.7/site-packages/my-package
).
Build and Install a Package with an Executable Entry
The previous sections showed that a Python package could behave like a library or executable. For the latter, If the package contains a __main__.py
file, we can execute it with the -m
option after we install it. But there is more. The setup tool offers a convenient option to allow us to define executable entries (e.g., command line interface or GUI) that will be generated during the installation. The following example will describe how to add a CLI entry to our package.
The project is similar to the example in the previous section, but this time we add a bin folder with a cli.py
module as our command line interface.
project
├── mypackage
│ ├── __init__.py
│ ├── bin
│ │ ├── __init__.py
│ │ └── cli.py
│ └── mylibrary
│ ├── __init__.py
│ └── module.py
└── setup.py
And assuming the content of the cli.py
is the following.
from mypackage.mylibrary import module
def main():
number_1 = 10
number_2 = 20
result = module.add(a=number_1, b=number_2)
print("Running from entry...")
print(f"{number_1} + {number_2} = {result}")
We use the entry_points
option in the setup.py
file to define the entry.
"""Setup file for mypackage."""
import pathlib
import setuptools
# The directory containing this file
HERE = pathlib.Path(__file__).parent
# This call to setup() does all the work
setuptools.setup(
name="my-package",
version="0.0.1",
description="A Python package example",
packages=setuptools.find_packages(),
entry_points={"console_scripts": ["my-cli=mypackage.bin.cli:main"]},
python_requires=">=3.7",
)
In the code snippet above, the keyword console_scripts
indicates the entry is console type, and console_scripts
expects a list of console scripts. Since it’s a list, it can be multiple console entries. For each console entry, the format is <console_name>=<path_to_module:entry_function>
. In this example, the main()
function is the entry point, and its console script name is my-cli
. Thus, when the package is installed, a my-cli
script will be created and installed into our Python environment.
$ tree -L 3
.
├── mypackage
│ ├── __init__.py
│ ├── bin
│ │ ├── __init__.py
│ │ └── cli.py
│ └── mylibrary
│ ├── __init__.py
│ └── module.py
└── setup.py
$ python -m pip install .
$ my-cli
Running from entry...
10 + 20 = 30
We can use the which my-cli
command to verify that the my-cli
script has been installed into our Python environment.
$ which my-cli
/<python environment>/bin/my-cli
The generated my-cli
file may look like this.
#!/<python environment>/bin/python
# EASY-INSTALL-ENTRY-SCRIPT: 'my-package==0.0.1','console_scripts','my-cli'
__requires__ = 'my-package==0.0.1'
import re
import sys
from pkg_resources import load_entry_point
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
sys.exit(
load_entry_point('my-package==0.0.1', 'console_scripts', 'my-cli')()
)
This ability adds flexibility to Python package. It becomes very easy when we want to provide some CLI tools or test features for our packages.
Conclusion
In C++, we need to decide if we want to build an executable or a library, but the boundary between an executable and a library in Python is blurred. Python modules and packages can be imported as a library and executed as an executable. The flexibility allows a Python package that contains both modules to be imported and modules to be executed in one package.
(All the example code in this article is available at project_build_and_run)