[Updated: April 22, 2024]
This article provides Python coding guidance to help me ensure consistency in code readability and maintainability when working on my Python projects.
Writing code is not complicated, especially for a programming language like Python. However, writing clean code is not simple. Clean code is about building high-quality software that is maintainable, workable, and easy to understand. A project’s coding guideline is a minimum to keep the software developing under a certain quality. Besides, we often think programming languages are used to communicate with a machine, which may not be accurate. In contrast to talking to machines, the primary audiences of the code we write are other developers, including the future ourselves.
General Principles
The fundamental principle of Python programming is to write Pythonic code. Most of the Pythonic philosophy can be explained through PEP 20.
>>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense. Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Acronyms for Software Development
In addition to PEP 20, some excellent software practices by acronyms exist. These principles are suitable for Python programming and most programming languages.
Don’t Repeat Yourself (DRY) and Once and Only Once (OAOO)
Duplication code should be avoided at all costs.
You Ain’t Gonna Need It (YAGNI) and Keep It Simple (KIS)
Although we want to make our code future-proof, do not build more than necessary. Make sure that the decisions do not lock us in and avoid over-engineering.
Easier to Ask for Forgiveness than Permission (EAFP) and Look Before You Leap (LBYL)
EAFP means try running some code and expect it to work. If it does not, catch an exception. Then, handle the correct code in the except block.
In contrast to EAFP, LBYL indicates that we check the conditions before using them. This way is widespread in languages like C but is discouraged in Python.
Prefer EAFP to LBYL.
SOLID Principles
- Single responsibility principle
A software component must have only one responsibility.
- Open/Closed principle
A software entity should be open to extension but closed for modification.
- Liskov’s substitution principle
For a class, a client should be able to use any subtypes indistinguishably. In other words, a client is unaware of changes in the class hierarchy.
- Interface segregation principle
Interfaces should be small.
- Dependency inversion principle
A higher class should depend on the lower class’s abstraction rather than the lower class’s concrete implementations.
Naming
The naming convention follows PEP 8 – Style Guide for Python Code with additional types.
Type | Public | Private (or Protected) |
Packages | lower_with_underscore | |
Modules | lower_with_underscore | _lower_with_underscore |
Classes | CamelCase | _CamelCase |
Class Methods | lower_with_underscore() | _ lower_with_underscore() |
Exceptions | CamelCase | |
Functions | lower_with_underscore() | _lower_with_underscore() |
Global/Class Constants | CAPS_WITH_UNDERSCORE | _ CAPS_WITH_UNDERSCORE |
Global/Class Variables | lower_with_underscore | _ lower_with_underscore |
Instance Variables | lower_with_underscore | _ lower_with_underscore |
Function/Method Parameters | lower_with_underscore | |
Local Variables | lower_with_underscore | |
Enum Classes | CamelCase | _CamelCase |
Enum Members | CAPS_WITH_UNDERSCORE | |
Type Aliases and New Type | CamelCase | _CamelCase |
- Do not use __double_leading_and_trailing_underscore__ names. Python reserves this style.
- Using __double_leading_underscore names to indicate the variable or method is private is discouraged. This style reduces readability.
- Although classes and exceptions should use CamelCase, using CAPS as a common term for readability is encouraged. For example,
class MySQLHelper:
...
- Make names descriptive. Use complete words. Avoid abbreviation.
n = 0 # Bad! Meaningless name.
itm = 1 # Bad! Maybe it means "item" with "e" removed. Who knows?
eid = 2 # Bad! Maybe some kind of id? Extent id? Who knows?
- Boolean functions describing a characteristic start with is or has.
class Message:
...
def is_connect() -> bool:
...
def has_message() -> bool:
...
- Functions and variables denoting the number of elements should use count or length.
def message_count() -> int: # Number of messages
...
def queue_length() -> int: # Number of entries in queue
...
def queue_size() -> int: # Avoid: Confusing.
...
- Use plural if an object indicates a collection or a list.
weights = np.zeros(number_of_attributes + 1)
Style Guideline
The coding style is adapted from PEP 8 and the Black Code Style.
General Rule
- Do NOT terminate the lines with semi-colons.
- Use four-space indention
- Each module (Python file) should have an appropriate license boilerplate. A proper copyright notice and a license notice should be in each nontrivial file in the package. (Any file more than ten lines long is nontrivial for this purpose)
- Although we should keep a function as simple as possible, there is no limit to the size of a function.
- Do NOT explicitly inherit from the object if a class inherits from no other class.
- Use double quotes “ for the string representation.
Line Length
Line length should be 88 characters per line, according to the Black Line Length recommendation. Exceptions are
- URL
- Long import statements
- Long flags
- Long string without whitespace
- Linting disable comments
Import
- Use import for packages and modules only. Avoid import for individual classes or functions, except when the module provider recommends a different style or the style of importing modules loses readability.
from boto3.dynamodb import types # Too generic; lose readability
from pyspark import sql # Spark does not prefer this style.
from pyspark.sql import SparkSession # Spark prefers this style.
- Use import x for importing packages and modules.
- Use from x import y where x is the package and y is the module name with no prefix. For example,
from sklearn import preprocessing
from sound.effects import echo
...
label_encoder = preprocessing.LabelEncoder()
echo.EchoFilter(input, output, delay=0.7, atten=4)
- Use import y as z only when z is a standard abbreviation. For example,
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
- Do not use relative import. Even if the module is in the same package, use the full package name. This prevents unintentionally importing a package twice.
- Never import everything
# Bad and buggy! Do NOT do this
from os import *
- Import should be on separate lines and in alphabetical order
import os
import sys
- Within each group, imports should be sorted lexicographically, ignoring case, according to each module’s full package path. A code may optionally place a blank line between import sections.
import os
import sys
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
Order
When importing groups, the order of groups should be:
- Python standard libraries
- 3rd party packages
- Other modules in the same package
import collections
import Queue
import sys
from sklearn import preprocessing
from sklearn import model_selection
from mypackage import my_module_1
from mypackage import my_module_2
Typing
When importing the typing module, always import the class itself. Multiple typing classes should be imported on one line.
from typing import Any, Optional
Annotations
Use annotation whenever we can. PEP 3107 introduced the concept of annotations. The basic idea is to hint to the readers of the code about what to expect as values of arguments in functions.
- Use type annotation wherever we can and follow PEP 484 – Type Hints.
def function(a: int) -> list[int]:
...
def function_any_parameter(a: Any) -> str:
...
def function_parameter_with_default(a: int = 10) -> str:
...
class Point:
x: float
y: float
# A dictionary where the keys are strings, and the values are integers
name_counts: dict[str, int] = {
"Adam": 10,
"Guido": 12
}
# A list of integers
numbers: list[int] = [1, 2, 3, 4, 5, 6]
String Format
Prefer string interpolation style (3.6 & PEP 498).
name = "World"
greeting = f"Hello, {name}"
Shebang Line
Most .py files do not need to start with a #! line. If a file is supposed to execute as a shell script, use
#!/usr/bin/env python3
Whitespace
Whitespace should follow PEP-8 Whitespace in Expressions and Statements. For example,
class BinaryTree:
def __init__(self, data: int):
...
def insert(self, data: int) -> NoReturn:
...
def traverse(self, traversal_type: TraversalType) -> NoReturn:
...
if __name__ == "__main__":
tree = my_tree.BinaryTree(data=30)
tree.insert(10)
tree.insert(20)
tree.insert(40)
print("In-Order")
tree.traverse(traversal_type=my_tree.TraversalType.IN_ORDER)
Method Order in a Class
The methods order in classes should be consistent with the following order (from top to bottom).
- __init__
- __special__
- Properties
- Public methods
- Private methods
Coding Guideline
- Do NOT use mutable objects as the default arguments of functions or methods.
- Avoid global data. If global variables are necessary, making global variables constant is preferred.
- Lambda functions are fine to use for one-liners. If the lambda function’s code is longer than 80 characters, the lambda function should be defined as a regular function.
- Use pathlib to operate file paths.
- Use dataclass for a class containing mainly data Class (3.7 & PEP 557).
List, Dictionary, and Set Comprehensions
Using comprehension is encouraged for simple cases. It is discouraged for complex cases.
# Ok
squares = [x * x for x in range(10)]
# Bad; Hard to read
result = [(x, y) for x in range(10) for y in range(5) if x * y > 10]
Open and Close
Explicitly close files and sockets when done with them. The preferred way to manage files is using the with statement.
with open("hello.txt") as hello_file:
...
For file-like objects that do not support the with statement, use contextlib.closing().
import contextlib
with contextlib.closing(urllib.urlopen("http://www.python.org/")) as front_page:
for line in front_page:
print(line)
Inheritance
- When a derived class has __init__() function, explicitly call the base class’s __init__(). The base class’s __init__() will not automatically be called by the derived class.
class MyClass(BaseClass):
def __init__(self):
BaseClass.__init__(self)
- When calling a method from the base class, prefer
class Base:
...
def greeting(self):
...
class Derived(Base):
...
def echo(self):
...
Base.greeting(self)
Why not use super()?
super() offers many nice features, such as solving the diamond diagrams problem using MRO and shorter syntax. However, its syntax contradicts the Zen of Python guideline: Explicit is better than implicit. Also, it makes calling a specific base class method harder. Consider the following case:
class Base1:
def __init__(self) -> None:
print("Base1")
def greeting(self):
print("Hello from Base1")
class Base2:
def __init__(self) -> None:
print("Base2")
def greeting(self):
print("Hello from Base2")
class Child(Base1, Base2):
def __init__(self) -> None:
super().__init__()
def my_greeting(self):
super().greeting()
c = Child()
c.my_greeting()
The output will be the following:
Base1
Hello from Base1
Only the method in Base1 is called due to the MRO rule. To properly initialize Base1 and Base2, both need to call super().__init__() in their __init__() functions so that the methods can be searched according to the MRO rule. This approach adds some complexity and reduces readability. A reader may be curious why Base1 needs to call super().__init__().
Besides, in the my_greeting() function, the greeting() in Base1 is called based on the MRO rule. If we want to call the greeting() in Base2, we may try to do this.
class Child(Base1, Base2):
def __init__(self) -> None:
super().__init__()
def my_greeting(self):
super(Base2, self).greeting()
This will not work because Base2 will not be searched. (See The Python 2.3 Method Resolution Order for more detail)
On the other hand, the main disadvantage of the Base.method(self, xxx) style is the common base class called multiple times when the diamond diagram problem occurs. The other problem happens when the initialization order differs from the derived order, i.e., out-of-order. For example,
class Child(Base1, Base2): # Base1 first and Base2 second
def __init__(self) -> None:
Base2.__init__(self) # Initialize Base2 before Base1
Base1.__init__(self)
However, the diamond diagram problem is avoidable, and the out-of-order problem is considered a bug.
In summary, super() provides some nice features, but it reduces the readability, requires the users to fully understand how MRO works, and increases complexity in some cases. In contrast to super(), using the Base.method(self, xxx) style increases readability and makes it easier to manage which base class methods to call.
Copy
Assignment statements in Python do not copy objects; they create bindings between a target and an object. For collections that are mutable or contain mutable items, an actual copy is sometimes needed so that we can change the copy without altering the original object.
Use copy module to perform a deep copy.
Shallow Copy
# Initializing list 1
list1 = [1, 2, [3, 5], 4]
# Shallow copy
list2 = list1
# Original elements of the list
print("The original elements before shallow copying")
for item in list1:
print(item, end=" ")
# Adding an element to the new list
list2[2][0] = 7
# Checking if the change is reflected
print("\nThe original elements after shallow copying")
for item in list1:
print(item, end=" ")
Output:
The original elements before shallow copying
1 2 [3, 5] 4
The original elements after shallow copying
1 2 [7, 5] 4
Deep Copy
# Importing "copy" for copy operations
import copy
# Initializing list 1
list1 = [1, 2, [3, 5], 4]
# Deep copy
list2 = copy.deepcopy(list1)
# Original elements of the list
print("The original elements before deep copying")
for item in list1:
print(item, end=" ")
# Adding an element to the new list
list2[2][0] = 7
# Change is reflected in l2
print("\nThe new list of elements after deep copying ")
for item in list2:
print(item, end=" ")
# Change is NOT reflected in the original list
print("\nThe original elements after deep copying")
for item in list1:
print(item, end=" ")
Output:
The original elements before deep copying
1 2 [3, 5] 4
The new list of elements after deep copying
1 2 [7, 5] 4
The original elements after deep copying
1 2 [3, 5] 4
Exceptions
Exceptions should be handled at the right level.
- Do NOT use exceptions as a go-to mechanism. Raise exceptions when something wrong with the code that callers need to be aware of
- Do NOT expose tracebacks
- Empty except blocks should be avoided.
try:
# Do something
except: # Avoid this
pass
- Try to catch specific exceptions.
- When catching exceptions on the except block, actual error handling should be in the block.
- When changing the type of exception, always use the following pattern.
raise <exception> from <original>
- Do not catch the AssertionError exception.
Decorator and Properties
- Using a getter and setter does not make sense in Python. Use properties to access or set member variables where you usually use lightweight and straightforward getter and setter on languages like C++/Java.
import math
class Square:
"""A square with two properties: a writable area and a read-only perimeter."""
def __init__(self, side: float) -> None:
self.side: float = side
@property
def area(self) -> float:
"""Get or set the area of the square."""
return self._get_area()
@area.setter
def area(self, area) -> None:
self._set_area(area)
def _get_area(self) -> float:
"""Indirect accessor to calculate the 'area' property."""
return self.side ** 2
def _set_area(self, area) -> None:
"""Indirect setter to set the 'area' property."""
self.side = math.sqrt(area)
@property
def perimeter(self) -> float:
"""Get the perimeter."""
return self.side * 4
Comments and Docstrings
Good code is not only self-explanatory but also well-documented. Commenting code and writing documents are two different things: Commenting code explains how the code works. Documenting is to describe what the code is supposed to do and how to use it. In Python, we use # for commenting and docstrings for documenting.
General Rules
- Public methods must have documents, i.e., docstrings. Private methods may not need documents unless the implementation is tricky.
- A method that overrides a base class method may have a simple docstring sending the reader to its overridden method’s docstring. The rationale is that there is no need to repeat documentation already presented in the base method’s docstring in many places. However, if the overriding method’s behavior is substantially different from the overridden method, or details need to be provided (e.g., documenting additional side effects), a docstring with at least those differences is required on the overriding method.
- For the sake of readability, we can explicitly add #override as the comment for an overridden method. The idea is inspired by the C++ keyword override.
class Base:
def method(self):
"""A method in the base class."""
# ...
class Derived(Base):
# Override
def method(self):
"""A method derived from the Base class.
See Also
--------
:py:meth:`Base.method`.
"""
# ...
Comments
The purpose of the commenting code is to explain how the code works to developers who maintain the code. Generally, code commenting should follow PEP 8 Style.
Basic Example
def complex_function():
# A comment explains how the code work or
# something needs to be aware of.
print("do very complex stuff")
If an issue or improvement is required, use BUG, FIXME, or TODO.
For example,
# TODO: simplify the complex function
Documents
Code documenting is to describe what the code is supposed to do and how to use it. The audiences of the code document are the users of the code. In Python, we use docstrings to document code. A docstring is a string that describes the definition of a class, a module, or a method. The docstring is a special attribute of the object, object.__doc__. Docstrings can be shown by the help function.
class Example:
"""This is an example of docstrings."""
def method(self):
"""Docstrings for the method."""
pass
Help on class Example in module temp:
class Example(builtins.object)
| This is an example of docstrings.
|
| Methods defined here:
|
| method(self)
| Docstrings for the method.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
>>> print(Example.__doc__)
This is an example about docstrings.
>>> help(Example.method)
Help on function method in module temp:
method(self)
Docstrings for the method.
The basic rules of docstrings are:
- Use three double quotes “””.
- Docstrings should Follow Numpy/Scipy style, which is extended from PEP 8 & PEP 257.
- Use reStructured Text (reST) syntax for documenting, so the documents can be rendered by Sphinx.
The following sections are the summary of the numpydoc docstring guide.
Basic of Docstrings
The docstrings consist of some sections separated by headings. Each heading should be underlined in hyphens, and the section ordering should be consistent with the following pattern:
- A short (one-line) summary
- Extended summary. Detail description. This section should clarify functionality, not the implementation details or background theory.
- Attributes (if class) / Parameters (if function)
- Methods (if class)
- Returns
- Yields
- Raises
- Warnings
- See Also
- Notes. The implementation detail or background theory should be described here.
- Reference
- Examples
For example,
class Stack:
"""A simple stack data structure with basic operations.
Attributes
----------
size: int
The size of the stack.
Methods
-------
push(data: Any)
Add an item onto the stack.
pop() -> Any
Remove the top item from the stack, and return the value of
the removed item.
dump()
Print all the data from the stack.
Examples
--------
>>> from my_package.my_data_structure import my_stack
>>> stack = my_stack.Stack()
>>> stack.push(10)
>>> stack.push(3)
>>> stack.push(14)
>>> stack.dump()
14 3 10
>>> stack.size
>>> 3
>>> stack.pop()
>>> 14
>>> stack.dump()
3 10
>>> stack.size
2
"""
def __init__(self):
self.top = None
self._size = 0
def push(self, data: Any) -> None:
"""Add an item onto the stack.
Parameters
----------
data: Any
The data to be pushed onto the stack.
"""
# ...
def pop(self) -> Optional[Any]:
"""Remove the top item from the stack, and return its value.
Returns
-------
Optional[Any]
Return the value of the removed item. If the stack is
empty, None will be returned.
"""
# ...
def dump(self) -> None:
"""Print all the data from the stack without removing any item."""
# ...
@property
def size(self) -> int:
"""Return the stack size."""
return self._size
Package and module Documenting
Each package and module should have a docstring with at least a summary line. The documentation for a package should be in __init__.py. For module documenting, the docstrings should be at the top of the module.
Both package and module docstrings should follow the pattern when they are appropriate:
- Summary
- Extended summary
- Routine listings. Routine listings are encouraged, especially for large packages and modules. This provides a good overview of the package or the module’s functionality.
- See Also
- Notes
- References
- Examples
Linting and Type Checking
Looking at the code in a structured form makes it easier to understand the code at a glance and increases the chance of spotting bugs and mistakes. Unlike C++ and Java, which have a compiler to help catch errors and give warnings to prevent potential bugs, Python needs additional tools to detect bugs. Some recommended tools:
- Flake8 is a tool for finding bugs and style problems in Python source code.
- Mypy is a static type checker for Python programs.
- Pydocstyle for document style checking
- Black for coding style checking
Linting Configurations
The configurations used in this article’s style are the following. Note that the templates below only show the configurations that are not enabled or disabled by default.
Flake8
Filename: .flake8
[flake8]
max-line-length = 88
extend-ignore = E203, W503
Flake8 error code: https://flake8.pycqa.org/en/latest/user/error-codes.html
Mypy
Filename: .mypy.ini
[mypy]
ignore_missing_imports = True
show_column_numbers = True
warn_redundant_casts = True
strict_equality = True
warn_return_any = True
disallow_any_unimported = True
warn_unreachable = True
disallow_untyped_defs = True
Mypy configure list: https://github.com/python/mypy/blob/master/docs/source/config_file.rst
Pydocstyle
Filename: .pydocstyle
[pydocstyle]
convention=numpy
add-ignore=D104,D107
Pydocstyle error code: http://www.pydocstyle.org/en/stable/error_codes.html
Black
The style in this article aligns with the Black style. No configuration file is needed.
Project Layout
A typical project could be organized as the following:
project
├── .flake8
├── .mypy.ini
├── .pydocstyle
├── LICENSE
├── pytest.ini
├── README.rst
├── docs
├── myproject
│ ├── __init__.py
│ ├── bin
│ │ ├── __init__.py
│ │ ├── app.py
│ ├── mylibrary
│ │ ├── __init__.py
│ │ ├── module1.py
│ │ ├── module2.py
│ └── mymodule.py
├── dev-requirements.txt
├── pyproject.toml
├── requirements.txt
└── tests
- Use requirements.txt for users who use the project
- Use dev-requirements.txt for developers who work on the project
- Since the tests folder is at the top of the project directory, its config file should have the following configurations.
Filename: pytest.ini
[pytest]
testpaths = tests
(A GitHub Python Template with the configurations and layout mentioned above is available at https://github.com/burpeesDaily/python-template)
Reference
- PEP 20 – The Zen of Python
- PEP 8 – Style Guide for Python Code
- PEP 3107 – Function Annotations
- PEP 484 – Type Hints
- PEP 585 – Type Hinting Generics In Standard Collections
- PEP 257 – Docstring Conventions
- NumPy Documentation Style
- Black Code Style
- Google Python Style Guide
Notable Features Since Python 3.5
- Type Hints (3.5 & PEP 484)
- Variable Annotations (3.6 & PEP 526)
- Generics in Standard Collections (3.9 & PEP 585)
- Union Type (3.10 & PEP 604)
- ParamSpec (3.10 & PEP 612)
- TypeAlias (3.10 & PEP 613)
- TypeGuard (3.10 & PEP 647)
- TypeVarTuple (3.11 & PEP 646)
- Making individual TypedDict items as required or not-required (3.11 & PEP 655)
- Self Type (3.11 & PEP 673)
- Arbitrary Literal String Type (3.11 & PEP 675)
- Underscores in Numeric Literals (3.6 & PEP 515)
- Formatted String Literals (3.6 & PEP 498)
- Support = for self-documenting expressions and debugging (3.8)
- Context Variables (3.7 & PEP 567)
- Data Class (3.7 & PEP 557)
- Support __slots__ and kw_only (3.10)
- Operators
- Structural Pattern Matching (3.10 & PEP 634)
- Exceptions
2 thoughts on “My Python Coding Style and Principles”