The Hitchhiker's Guide to Python Packaging

 · 5 min read
 · Konstantinos Tsoumas
Table of contents

Imagine reaching for your sunglasses only to realize they’ve been on your head the whole time. Similarly, tracking down a missing dependency hidden in your own project folder can feel like a wild goose chase. Properly packaging your Python code is like labeling leftovers in the fridge: no more mystery meals or surprise errors when you least expect them.

Why Packaging Matters

Python packages is just an abstraction of code, to a way higher level, just like a function is an abstraction of a couple lines of code. You're basically trying to zip up your code, its dependencies, hopes, dreams and that one weird bug you can't reproduce into a single file with a good-looking label on it! Let’s start by a high level overview of the packaging process and explain every component right after.

  • Avoid dependency hell. One missing or mismatched requirement can stop users in their tracks.
  • Ensure reproducibility. Pin versions and lock down builds for consistent environments.
  • Improve collaboration. A well‑packaged library or CLI is easier to share and extend.
  • Stop dealing with broken relatives paths in exploration notebooks.

Scoping Your Package

Before you package, ask:

  • Who are the users? Data scientists, DevOps engineers, open‑source maintainers?
  • Where is your package intended to run on? Different environments (like servers, desktops, etc.) have unique requirements for dependencies, performance, security and packaging formats.
  • What’s its nature? Is your software meant to be reusable (a library/module), or is it an end-user application or service? A library needs clean APIs, versioning and proper dependency management where an application may need CLI support.
  • Security considerations? Open source vs. private code, licensing, credentials, firewall rules. Think all the way from permitting people to see the source code to licenses, credentials and firewall rules.
  • How are you going to distribute your package? PyPi, Conda‑Forge, privately (like DevOps or all of the above?

The packaging process overview

Figure_1.jpg

The above outlines the process of creating, publishing and distributing a distribution package by leveraging PyPi (Python Package Index) and it doesn't include every step.

Project Configuration (pyproject.toml)

This is your single configuration file written in TOML (Tom's Obvious, Minimal Language) format used by packaging tools (among others) to communicate what is needed to build your project. Its elements are: [build-system], [project], [tool] which will guide your project. To be precise, those elements are specifically TOML tables.

It generally looks like:

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

[project]
name = "myproject"
version = "0.1.0"
description = "A wonderful Python package"
authors = [ { name = "Alice Developer", email = "[email protected]" } ]
dependencies = ["requests>=2.25,<3.0"]

[tool.black]
line-length = 88

Package Formats:

There are two main package formats that are used to distribute python packages namely, source distributions (s + dists = sdist) or binary distributions (with the most common being wheels at the time of writing).

To save you some thinking space, most of libraries are distributed in both formats so to deal with installation failures (having the second format distribution as a fallback strategy) or downstream packagers (like Homebrew, Conda, MacPorts) and more.

Source distribution

  • What is it

Like with many things, we sometimes know the extensions of a file (think of .exe, .jpeg) but we don’t really know what’s going on inside and that’s fine! Respectively, you might have seen .tar.gz (source distribution called tarball) or .whl (binary distribution called wheel) as a format extension.

As a concept, a source distribution is a compressed archive of the source code in its raw form; it has not been compiled or built. The file with extension ".tar.gz" contains the source code plus an additional special file called 'PKG-INFO' that holds the project's metadata which takes away the need of computing metadata everytime saving us some time. By having a zipped up version of the source code, the developer has the ability to inspect the contents of a source distributed package (by unpacking it first). A common example of this is scikit-learn where you can inspect its code openly.

  • Sdist naming convention

Sdists follow a simple naming convention like {distribution}-{version}.tar.gz.

  • When to use it
  • You need users (or downstream packagers) to recompile for their own platform
  • You want full transparency into every line of code
  • You want a universal fallback if a wheel install fails

Binary distribution

  • What is it

Unlike source distributed packages, binary distributed packages are constructed and behave very differently though the impact is not that big for pure python packages. Let's stick to 'wheel' as it's the main type of binary distributions we found nowadays.

Wheels do not contain source code but compiled and executable code instead (how cool is that?!). The compiled code is directly linked to an operating system and processor architecture (for computation reasons mostly) and many times also to the Python interpreter's version. This indicates that, for a specific version of a project, there can be many wheels (if anything of previously mentioned components is different) but only one sdist.

If you open up the zipped-formatted wheel package (normally just like unpacking another ZIP file), you can find metadata about your project and a certain filled named ‘RECORD’ which serves as a crucial security measure. It's a comprehensive list of every file contained within the wheel package along with a hash of their content, to verify the integrity of the downloaded package (making sure no files have been altered or corrupted since the day the package was created).

  • Naming conventions

While sdists have a simple naming convention, wheel packages are more specific. A common naming convention for a wheel package would look like {distribution}-{version}(-{build tag})-{python tag}-{abi tag}-{platform tag}.whl. This is not random as the difference in naming reflects the different purposes of the two formats: wheels are built for specific environments and need to indicate compatibility, while sdists are source code archives that can be built into wheels.

  • When to use it

  • Your code is not pure Python (e.g. an extension module).

  • You want fast install and no build steps

  • You need integrity checks per file

Legacy notes: a couple of words on .egg files.

Eggs are amazing, in general, especially for breakfast.

Here though, they serve a different purpose. Egg is an old binary distributed package format that has been well replaced by the wheel format since August 2023 (https://blog.pypi.org/posts/2023-06-26-deprecate-egg-uploads/).

To name some important differences, egg was introduced back in 2004 by setuptools where wheel were only started in 2012. Egg was both a distribution format AND a runtime installation format! Additionally, wheel is versioned compared to egg and it doesn't include and .pyc files.

Uploading your package

Now that you have build your package we need to find a way to upload it so people can directly download and use it. PyPi, anaconda and conda-forge are very popular software repos for packages. The main difference between the three is that PyPi, only hosts Python-software where the other two may go beyond that. That leads into having packages that depend on non-Python code to be released into Anaconda or conda-forget (sometimes developers also release Python-written-only packages there though).

In a company setting, you may want to release your package in a private repo and there are many paid options provided by companies like Anaconda. It's also very common to upload your package in Azure DevOps, if the company you're working for is following an Azure tech stack.

However, we need a way to upload the package there. Twine is a command line tool that transfers program files and metadata to a web API and is the primary tool people use to upload packages to PyPi. Your build tool may also come with its own interface for uploading to PyPi so have a look before blindly using something!

Download & Install

Given that everything goes alright, users can now download and install the package. This is mostly done with pip, the most popular tool for installing Python packages (remember pip install pandas?).

pip install myproject==1.2.3