Python directed dependency graphs

pyproject.toml specifies Python package prerequisites and typically all Python metadata. This helps security by allowing extremely fast recursive machine-parsing of prerequisites without installing packages first. Generally, specify Python package prerequisites in pyproject.toml as much as possible.

Python packages should minimize the size of their directed dependency graph for best package longevity with minimum maintenance effort. However, the most effective use of programmer/scientist/engineer time generally comes from reusing code wherever appropriate. How do we evaluate quality of prereqs? Modern Python code includes these factors:

Long term archiving of Python software requires direct and indirect dependencies. This is commonly done by pip freeze, but provides no direct sense of module hierarchy. The techniques described below provide a detailed, zoomable hierarchical view of Python module dependencies.

Python dependency analysis where packages use setup.py to specify package prerequisites generally require modules to be installed to determine their dependencies. That is, setup.py is recursively executed for each module to determine what modules are needed overall. This is bad for automated security analysis, which is slowed greatly by needing to install packages to determine prereqs. Modern Python packages solve this problem by specifying most package configuration in pyproject.toml.

Currently, pipdeptree is the most practical solution to generate plots of Python directed dependency graphs. This method assumes:

  • self-test has adequate coverage to be meaningful for most users
  • packages only used as convenience methods for some users are under [project.optional-dependencies] in pyproject.toml
  • strictly necessary modules are specified
  • minimum Python version is specified
  • CI-only requirements are specified

The process below is targeted for packages used in “development mode” that is, not installed into site-packages, except for a link back to the code directory.

Install prereqs:

pip install virtualenv

In the Python package directory, create a new Python virtual environment, since pipdeptree depends on having only the analyzed package and its dependencies installed.

virtualenv testdep
. testdep/bin/activate

pip install pipdeptree[graphviz]

Install the package to examine (and whatever dependencies it automatically installs)

pip install -e .

Make a hierarchical dependency graph

pipdeptree

This should be a very short tree (unless testing with a big package). Try it with a simple package, seeing if the dependency list is expected.

Now create the directed dependency graph for the package. Install GraphViz by

  • Linux: apt install graphviz
  • macOS: brew install graphviz
  • Windows

and then:

pipdeptree --graph-output svg > dep.svg

View the SVG in web browser or image viewer software such as IrfanView.

Wrap up the previous discussion and scripts in this Bash script pydeptree.sh for a one-click Python dependency graph.

#!/usr/bin/env bash

set -o errexit

[[ ! -z $1 ]] && cd $1

virtualenv testdep     # it's OK if it already exists

. testdep/bin/activate

pip install pipdeptree[graphviz]

pip install -e .[tests]

pipdeptree --graph-output svg > dep.svg

. deactivate

eog dep.svg &  # image viewing program

Notes

To make Modulegraph useful, the output must be post-processed, as almost all of the output is system stdlib modules. Modulegraph is an established, maintained tool for creating a .dot dependency graph. It lists extremely verbose output. It’s necessary to post-process .dot output with pydot to make use of modulegraph output. What if we instead preemptively excluded from a list of known stdlib modules, removing say 98% of modulegraph output from the start?

pip install modulegraph

Examine a file’s requirements, creating a .dot graph.

python -mmodulegraph file.py -q -g > graph.dot
dot -Tsvg graph.dot > graph.svg

Modulegraph command line options


Snakefood is another dependency graph checker.