How to cheat setuptools-scm (Debian diary)

[2016-12-19: some additions]

This is another little issue from Python packaging for Debian which I came across lately packaging the compressed NumPy based data container Bcolz. Upstream uses setuptools-scm to determine the software’s version during build time from the source code management environment the code is in. This method is convenient for the upstream development because with that the version number doesn’t need to be hard-coded, and often people just forget to update that (and other version carrying files like docs/ when a new version of a project is released.

python-setuptools just needs to be added to the to do its job, and in Bcolz the code goes like this:

        'version_scheme': 'guess-next-dev',
        'local_scheme': 'dirty-tag',
        'write_to': 'bcolz/'

The file the version number is written to is bcolz/ This file isn’t in the upstream code revision nor in the tarball which was released by the upstream developers, it’s always generated during build time.

In Debian there is an error if you try to build a package from a source tree which contains files which aren’t to be found in the corresponding tarball, like cruft from a previous build, or if any files have changed – therefore every new package should be test build also twice in a row in a non-chroot environment. Generally there a two ways to solve this, either you add cruft to debian/clean, or you add the file resp. a matching file pattern to extend-diff-ignore in debian/source/options. Which method is the better one could be discussed, I’m generally using the clean option if something isn’t in the upstream tarball, and the source/options solution if something is already in the upstream tarball, but gets changed during a build. This is related to your preferred Git procedures, if you remove a file which is in the upstream tarball these removals have to be checked in separately, and that means everytime a new upstream tarball is released – that is not very convenient. Another option which is available is to strip certain files from the upstream tarball by putting them on the Files-Excluded in deb/copyright. By the way, the same complex applies to egg-info/: that folder is shipped or is not shipped in the upstream tarball, and files in that folder get changed during build.

When the source code is put into a Git environment for Debian packaging, there could be problems with the version number setuptools-scm comes up with. This setuptools extension gets the recent version from the latest Git tag when there is a version number to be found, and that’s all right. In Git environments for Debian packaging (like e.g. of the Debian Science group, the Python groups and the others) that is available, like the commonly used upstream tags have that1. The problem is, sometimes the upstream version which Debian has2 doesn’t match the original upstream version number which is wanted for version in bcolz/ For example, the suffix +ds is used if the upstream tarball has been stripped from prebuild files or embedded convenience shipments (like it’s the case with the Bcolz package where c-blosc/ has been stripped because that’s build for another package), of the suffix “+dfsg” shows that non-DFSG free software has been removed (which can’t be distributed through the main archive section). For example, the version string for Bcolz which is found after the build currently (1.1.0+ds1-1) is 1.1.0+ds1:

# coding: utf-8
# file generated by setuptools_scm
# don't change, don't track in version control
version = '1.1.0+ds1'

But that’s not wanted because this version never has been released, but appears everywhere:

$ pip list | grep bcolz
bcolz (1.1.0+ds1)
$ python3 -c 'import bcolz; bcolz.print_versions()'
bcolz version:     1.1.0+ds1

There are several different ways how to fix this. The one “with the crowbar” (like said in German) is to patch use_scm_version out from, but if you don’t provide any version in exchange the version number which is used by Setuptools then is 0.0.0. The upstream version could be hard-coded into the patch, but then again it has not to be forgotten to update it manually by the maintainer, which is not very convenient. Plus, could change and the patch then might need to be unfuzzed, thus more work. Bad.

A patch could be spared by manipulating and exporting the SETUPTOOLS_SCM_PRETEND_VERSION environment variable for setuptools-scm in debian/rules, which is sometimes used when I see the returns for that string on Debian Code Search. But how to prevent to hard code the version number, here? The dpkg-dev package (pulled by build-essential) ships a Makefile snippet /usr/share/dpkg/ which could be included into debian/rules. It defines several variables which are useful for packaging, like DEB_SOURCE contains the source package name, they are extracted from debian/changelog. But, DEB_VERSION_UPSTREAM which is available through that puts out the upstream version without epoch and Debian revision, but it’s not getting any finer grained out of the box.

For a custom fix, a regular expression which removes the +... extensions (if present) from the bare upstream version string would be s/\+[^+]*//:

$ echo "1.1.0+ds1" | sed -e 's/\+[^+]*//'
$ echo "1.1.0" | sed -e 's/\+[^+]*//'
$ echo "1.1.0+dfsg12" | sed -e 's/\+[^+]*//'

With that, a custom variable VERSION_UPSTREAM could be set on the top of DEB_VERSION_UPSTREAM (from in debian/rules:

include /usr/share/dpkg/
VERSION_UPSTREAM = $(shell echo '$(DEB_VERSION_UPSTREAM)' | sed -e 's/\+[^+]*//')

Bam, that works (see the commit here):

# coding: utf-8
# file generated by setuptools_scm
# don't change, don't track in version control
version = '1.1.0'

An addition, I’ve seen that dh-python also takes care of SETUPTOOLS_SCM_PRETEND_VERSION since 2.20160609. The environment variable is set by the Debhelper build system if python{3,}-setuptools-scm is among the build-dependencies in debian/control. The Perl code for that is in dh/ I think the version number string above comes from dh-python’s pretended version, and not from any of the Git tags (which are currently debian/1.1.0+ds1-1 and upstream/1.1.0+ds1).

  1. For Git in Debian packaging, e.g. see the DEP-14 proposal (Recommended layout for Git packaging repositories): [return]
  2. Following the scheme for package versions “[epoch:]upstream_version[-debian-revision]” [return]