Skip to article content

Reproducible Machine Learning Workflows for Scientists with Pixi

Pixi

Pixi

1Conceptual overview

Pixi is a cross-platform package and environment manager that can handle complex development workflows Arts et al., n.d.Arts et al., n.d.. Importantly, Pixi automatically and non-optionally will produce or update a lock file — a structured file that contains a full list of all environments defined with a complete list of all packages, as well as definition of each packages with digest information on the binary — for the software environments defined by the user whenever any actions mutate the environment. Pixi is written in Rust, and leverages the language’s speed and technologies to solve environments fast.

Pixi addresses the concept of computational reproducibility by focusing on a set of main features

  1. Virtual environment management: Pixi can create environments that contain conda packages and Python packages and use or switch between environments easily.
  2. Package management: Pixi enables the user to install, update, and remove packages from these environments through the pixi command line.
  3. Task management: Pixi has a task runner system built-in, which allows for tasks with custom logic and dependencies on other tasks to be created.

combined with robust behaviors

  1. Automatic lock files: Any changes to a Pixi workspace that can mutate the environments defined in it will automatically and non-optionally result in the Pixi lock file for the workspace being updated. This ensures that any and every state of a Pixi project is trivially computationally reproducible.
  2. Solving environments for other platforms: Pixi allows the user to solve environment for platforms other than the current user machine’s. This allows for users to solve and share environment to any collaborator with confidence that all environments will work with no additional setup.
  3. Pairity of conda and Python packages: Pixi allows for conda packages and Python packages to be used together seamlessly, and is unique in its ability to handle overlap in dependencies between them. Pixi will first solve all conda package requirements for the target environment, lock the environment, and then solve all the dependencies of the Python packages for the environment, determine if there are any overlaps with the existing conda environment, and the only install the missing Python dependencies. This ensures allows for fully reproducible solves and for the two package ecosystems to compliment each other rather than potentially cause conflicts.
  4. Efficient caching: Pixi uses an extremely efficient global caching scheme. This means that the first time a package is installed on a machine with Pixi is the slowest is will ever be to install it for any future project on the machine while the cache is still active.

Pixi users declaratively specify their project dependencies which are recorded in a Pixi manifest pixi.toml file (which for Python projects can optionally be embedded in a pyproject.toml [pixi] table) and automatically resolved in the pixi.lock lock file. This declarative nature allows for users to efficiently specify their project requirements while being guaranteed a static and reproducible environment from the lock file.

2CUDA hardware accelerated environment creation

Combining the features of modern CUDA 12 conda packages with Pixi’s environment management, it is now possible to efficiently manage multiple software environments that can include both hardware accelerated and CPU environments. An example Pixi workspace is presented in Program 1

pixi.toml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
[workspace]
channels = ["conda-forge"]
name = "ml-example"
platforms = ["linux-64", "osx-arm64", "win-64"]
version = "0.1.0"

[tasks]

[dependencies]
python = ">=3.13.5,<3.14"

[feature.cpu.dependencies]
pytorch-cpu = ">=2.7.1,<3"
torchvision = ">=0.22.0,<0.23"

[feature.cpu.tasks.train-cpu]
description = "Train MNIST on CPU"
cmd = "python src/torch_MNIST.py --epochs 2 --save-model --data-dir data"

[feature.gpu.system-requirements]
cuda = "12"

[feature.gpu.target.linux-64.dependencies]
pytorch-gpu = ">=2.7.1,<3"
torchvision = ">=0.22.0,<0.23"

[feature.gpu.tasks.train-gpu]
description = "Train MNIST on GPU"
cmd = "python src/torch_MNIST.py --epochs 14 --save-model --data-dir data"

[feature.inference.dependencies]
matplotlib = ">=3.10.3,<4"

[environments]
cpu = ["cpu"]
gpu = ["gpu"]
inference = ["gpu", "inference"]

Program 1:Example of a multi-platform and multi-environment Pixi manifest with all required information and constraints to resolve and install CUDA accelerated conda packages.

where the definition of multiple platforms allows for solving the declared environments for all platforms while on other platforms

pixi.toml
1
2
3
4
5
[workspace]
channels = ["conda-forge"]
name = "ml-example"
platforms = ["linux-64", "osx-arm64", "win-64"]
version = "0.1.0"

the cpu feature defines dependencies and tasks that are accessible from the cpu environment

pixi.toml
...

[feature.cpu.dependencies]
pytorch-cpu = ">=2.7.1,<3"
torchvision = ">=0.22.0,<0.23"

[feature.cpu.tasks.train-cpu]
description = "Train MNIST on CPU"
cmd = "python src/torch_MNIST.py --epochs 2 --save-model --data-dir data"

...

[environments]
cpu = ["cpu"]

The gpu feature does the same for the gpu environment, but it also importantly defines a system-requirements table that define the system specifications needed to install and run a Pixi workspace’s environments.

pixi.toml
...

[feature.gpu.system-requirements]
cuda = "12"

[feature.gpu.target.linux-64.dependencies]
pytorch-gpu = ">=2.7.1,<3"
torchvision = ">=0.22.0,<0.23"

[feature.gpu.tasks.train-gpu]
description = "Train MNIST on GPU"
cmd = "python src/torch_MNIST.py --epochs 14 --save-model --data-dir data"

...

[environments]
...
gpu = ["gpu"]

system-requirements build upon the concept of conda “virtual packages”, allowing for the dependency resolver to enforce constraints declared by defining compatibility of the system with virtual packages, like __cuda. In the particular case of CUDA, the system-requirements table specifies the CUDA version the workspace expects the host system to support, as detected through the host system’s NVIDIA driver API. While the system-requirements field values do not correspond to lower or upper bounds, specifying that the workspace is expected to work on systems that support CUDA 12

pixi.toml
...

[feature.gpu.system-requirements]
cuda = "12"

...

ensures that packages depending on __cuda >= 12 are resolved correctly. This effectively means that declaring the system requirement will cause the Pixi dependency resolver to find CUDA enabled packages that are compatible with CUDA 12, disallowing for incompatible package builds to be resolved. Once these package dependencies have been resolved and locked, this ensures that any system capable of meeting the system requirement will get working CUDA accelerated conda packages installed.

Not all machines will have an NVIDIA GPU on them to allow for the system requirements to be resolved correctly. To allow for non-CUDA-supported-machines to still resolve Pixi workspace requirements, shell environment overrides exist through the CONDA_OVERRIDE_CUDA environmental variable. Setting CONDA_OVERRIDE_CUDA=12 on a machine that doesn’t meet the CUDA version requirements, will override the supported virtual packages and set a value of __cuda=12 for the system. This can be clearly understood from setting the override and then querying the workspace summary with pixi info, as seen in Program 2. This is a powerful functionality as it allows for environment specification, resolution, and locking for target platforms that users might not have access to, but can be assured are valid.

% pixi info
System
------------
       Pixi version: 0.50.2
           Platform: osx-arm64
   Virtual packages: __unix=0=0
                   : __osx=15.3.2=0
                   : __archspec=1=m2
...

% CONDA_OVERRIDE_CUDA=12 pixi info
System
------------
       Pixi version: 0.50.2
           Platform: osx-arm64
   Virtual packages: __unix=0=0
                   : __osx=15.3.2=0
                   : __cuda=12=0
                   : __archspec=1=m2
...

Program 2:Demonstration of using the CONDA_OVERRIDE_CUDA environmental variable on a system with no CUDA support (an Apple silicon machine) to allow dependency resolution as if it supported CUDA 12.

Pixi also allows for feature composition to efficiently create new environments. Program 1’s gpu and inference features are combined and resolved collectively to provide a new CUDA accelerated inference environment that does not affect the gpu environment.

pixi.toml
...

[feature.inference.dependencies]
matplotlib = ">=3.10.3,<4"

[environments]
...
gpu = ["gpu"]
inference = ["gpu", "inference"]

3Locked environments

Once the workspace has been defined, any Pixi operation on the workspace will result in all environments in the workspace having their dependencies resolved and then fully specified (“locked”) at the digest (“hash”) level in a single pixi.lock Pixi lock file, as seen in Program 3. The lock file is a YAML file that contains two definition groups: environments and packages. The environments group lists every environment in the workspace for every platform with a complete listing of all packages in the environment. The packages group lists a full definition of every package that appears in the environments lists, including the package’s URL and digests (e.g. sha256, md5). These groups provide a full description of every package described in the Pixi workspace and its dependencies and constraints on other packages. Versioning the lock file along with the manifest file in a version control system allows for workspaces to be fully reproducible to the byte level indefinitely into the future, conditioned on the continued existence of the package indexes the workspace pulls from (e.g. conda-forge, PyPI, the nvidia conda channel). In the event that long term preservation and reproducibility are of importance, there are community projects Zwerschke et al., n.d. that allow for downloading all dependencies of a Pixi environment and generating a tar archive containing all of the packages, which can later be unpacked and installed.

pixi.lock
version: 6
environments:
  cpu:
    channels:
    - url: https://conda.anaconda.org/conda-forge/
    packages:
      linux-64:

...

      - conda: https://conda.anaconda.org/conda-forge/linux-64/python-3.13.5-hec9711d_102_cp313.conda
      - conda: https://conda.anaconda.org/conda-forge/noarch/python_abi-3.13-8_cp313.conda
      - conda: https://conda.anaconda.org/conda-forge/linux-64/pytorch-2.7.1-cpu_mkl_py313_h58dab0e_103.conda
      - conda: https://conda.anaconda.org/conda-forge/linux-64/pytorch-cpu-2.7.1-cpu_mkl_hc60beec_103.conda

...

  gpu:
    channels:
    - url: https://conda.anaconda.org/conda-forge/
    packages:
      linux-64:

...

      - conda: https://conda.anaconda.org/conda-forge/linux-64/cuda-nvcc-tools-12.9.86-he02047a_2.conda
      - conda: https://conda.anaconda.org/conda-forge/linux-64/cuda-nvdisasm-12.9.88-hbd13f7d_0.conda
      - conda: https://conda.anaconda.org/conda-forge/linux-64/cuda-nvrtc-12.9.86-h5888daf_0.conda
      - conda: https://conda.anaconda.org/conda-forge/linux-64/cuda-nvtx-12.9.79-h5888daf_0.conda
      - conda: https://conda.anaconda.org/conda-forge/linux-64/cuda-nvvm-tools-12.9.86-h4bc722e_2.conda
      - conda: https://conda.anaconda.org/conda-forge/noarch/cuda-version-12.9-h4f385c5_3.conda
      - conda: https://conda.anaconda.org/conda-forge/linux-64/cudnn-9.10.1.4-hbcb9cd8_1.conda

...

      - conda: https://conda.anaconda.org/conda-forge/linux-64/python-3.13.5-hec9711d_102_cp313.conda
      - conda: https://conda.anaconda.org/conda-forge/noarch/python_abi-3.13-8_cp313.conda
      - conda: https://conda.anaconda.org/conda-forge/linux-64/pytorch-2.7.1-cuda129_mkl_py313_h1e53aa0_304.conda
      - conda: https://conda.anaconda.org/conda-forge/linux-64/pytorch-gpu-2.7.1-cuda129_mkl_h43a4b0b_304.conda

...

packages:

...

- conda: https://conda.anaconda.org/conda-forge/linux-64/pytorch-gpu-2.7.1-cuda129_mkl_h43a4b0b_304.conda
  sha256: af54e6535619f4e484d278d015df6ea67622e2194f78da2c0541958fc3d83d18
  md5: e374ee50f7d5171d82320bced8165e85
  depends:
  - pytorch 2.7.1 cuda*_mkl*304
  license: BSD-3-Clause
  license_family: BSD
  size: 48008
  timestamp: 1753886159800

...

Program 3:Example structure of a pixi.lock Pixi lock file showing the definition of the environments as well as a full description of each package used in each environment.

References
  1. Arts, R., Zalmstra, B., Vollprecht, W., de Jager, T., Morcotilo, N., & Hofer, J. (n.d.). pixi. https://github.com/prefix-dev/pixi/releases/tag/v0.50.2
  2. Arts, R., Zalmstra, B., Vollprecht, W., de Jager, T., Morcotilo, N., & Hofer, J. (n.d.). Pixi Documentation. https://pixi.sh/v0.50.2/
  3. Zwerschke, P., Elsner, D., & Stoyan, B. (n.d.). pixi-pack. https://github.com/Quantco/pixi-pack/releases/tag/v0.7.2
Reproducible Machine Learning Workflows for Scientists with Pixi
Deploying environments to remote compute
Reproducible Machine Learning Workflows for Scientists with Pixi
Summary