Collected notes from Python packaging (Debian diary)

These are some collected notes on some particular problems from packaging Python stuff for Debian, and more are coming up like this in the future. Some of the issues discussed here might be rather simple and even benign for the experienced Python packager, but maybe this might be helpful for people coming across the same issues for the first time, wondering what’s going wrong. But some of the things discussed aren’t easy. Here are the notes for this posting, there is no particular order.

UnicodeDecoreError on open() in Python 3 running in non-UTF-8 environments

I’ve came across this problem again recently while packaging httpbin 0.5.0. The build breaks the following way like when trying to build with sbuild in a chroot, that’s the first run of setup.py with the default Python 3 interpreter:

I: pybuild base:184: python3.5 setup.py clean 
Traceback (most recent call last):
  File "setup.py", line 5, in <module>
    os.path.join(os.path.dirname(__file__), 'README.rst')).read()
  File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2386: ordinal not in range(128)
E: pybuild pybuild:274: clean: plugin distutils failed with: exit code=1: python3.5 setup.py clean 

One comes across UnicodeDecodeError-s quite oftenly on different occasions, mostly related to Python 2 packaging (like here). Here it’s the case that in setup.py the long_description is tried to be read from the opened UTF-8 encoded file README.rst:

long_description = open(
    os.path.join(os.path.dirname(__file__), 'README.rst')).read()

This is a problem for Python 3.5 (and other Python 3 versions) when setup.py is executed by an interpreter which is run in a non-UTF-8 environment1:

$ LANG=C.UTF-8 python3.5 setup.py clean
running clean
$ LANG=C python3.5 setup.py clean
Traceback (most recent call last):
  File "setup.py", line 5, in <module>
    os.path.join(os.path.dirname(__file__), 'README.rst')).read()
  File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2386: ordinal not in range(128)
$ LANG=C python2.7 setup.py clean
running clean

The reason for this error is that the default encoding for file object returned by open() e.g. in Python 3.5 is platform dependent, so that read() fails on that when there’s a mismatch:

>>> readme = open('README.rst')
>>> readme
<_io.TextIOWrapper name='README.rst' mode='r' encoding='ANSI_X3.4-1968'>

Non-UTF-8 build environments because $LANG isn’t particularly set at all or set to C are common or even default in Debian packaging, like in the continuous integration resp. test building for reproducible builds the primary environment features that (see here). That’s also true for the base system which sbuild creates:

$ schroot -d / -c unstable-amd64-sbuild -u root
(unstable-amd64-sbuild)root@varuna:/# locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

A problem like this is solved mostly elegant by installing some workaround in debian/rules. A quick and easy fix is to add export LC_ALL=C.UTF-8 here, which supersedes the locale settings of the build environment. $LC_ALL is commonly used to change the existing locale settings, it overrides all other locale variables with the same value (see here). C.UTF-8 is an UTF-8 locale which is available by default in a base system, it could be used without using the locales package (or worse, the huge locales-all package):

(unstable-amd64-sbuild)root@varuna:/# locale -a
C
C.UTF-8
POSIX

But this problem ideally should be taken care of upstream. In Python 3, the default open() is io.open(), for which the specific encoding could be given, so that the UnicodeDecodeError disappears. Python 2 uses os.open() for open(), which doesn’t support that, but has io.open(), too. A fix which works for both Python branches goes like this:

import io
long_description = io.open(
    os.path.join(os.path.dirname(__file__), 'README.rst'), encoding='utf-8').read()

non-deterministic order of requirements in egg-info/requires.txt

This problem appeared with prospector/0.11.7-5 in the reproducible builds test builds, that was the first package of Prospector running on Python 32. It came out that there were differences in the sorting order of the [with_everything] dependencies resp. requirements in prospector-0.11.7.egg-info/requires.txt if the package was build in varying systems:

├── prospector_0.11.7-5_all.deb
│   ├── file list
│   │ @@ -1,3 +1,3 @@
│   │  -rw-r--r--   0        0        0        4 2016-04-01 20:01:56.000000 debian-binary
│   │ --rw-r--r--   0        0        0     4343 2016-04-01 20:01:56.000000 control.tar.gz
│   │ +-rw-r--r--   0        0        0     4344 2016-04-01 20:01:56.000000 control.tar.gz
│   │  -rw-r--r--   0        0        0    74512 2016-04-01 20:01:56.000000 data.tar.xz
│   ├── control.tar.gz
│   │   ├── control.tar
│   │   │   ├── ./md5sums
│   │   │   │   ├── md5sums
│   │   │   │   │┄ Files in package differ
│   ├── data.tar.xz
│   │   ├── data.tar
│   │   │   ├── ./usr/share/prospector/prospector-0.11.7.egg-info/requires.txt
│   │   │   │ @@ -1,12 +1,12 @@
│   │   │   │  
│   │   │   │  [with_everything]
│   │   │   │ +pyroma>=1.6,<2.0
│   │   │   │  frosted>=1.4.1
│   │   │   │  vulture>=0.6
│   │   │   │ -pyroma>=1.6,<2.0

Reproducible builds folks recognized this to be a toolchain problem and set up the issue randonmness_in_python_setuptools_requires.txt to cover this. Plus, a wishlist bug against python-setuptools was filed to fix this. The patch which was provided by Chris Lamb adds sorting of dependencies in requires.txt for Setuptools by adding sorted() (stdlib) to _write_requirements() in command/egg_info.py:

--- a/setuptools/command/egg_info.py
+++ b/setuptools/command/egg_info.py
@@ -406,7 +406,7 @@ def warn_depends_obsolete(cmd, basename, filename):
 def _write_requirements(stream, reqs):
     lines = yield_lines(reqs or ())
     append_cr = lambda line: line + '\n'
-    lines = map(append_cr, lines)
+    lines = map(append_cr, sorted(lines))
     stream.writelines(lines)

In the toolchain, nothing sorts these requirements alphanumerically to make differences disappear, but what is the reason for these differences in the Prospector packages, though?

The problem is somewhat subtle. In setup.py, [with_everything] is a dictionary entry of _OPTIONAL (which is used for extras_require) that is created by a list comprehension out of the other values in that dictionary. The code goes like this:

_OPTIONAL = {
    'with_frosted': ('frosted>=1.4.1',),
    'with_vulture': ('vulture>=0.6',),
    'with_pyroma': ('pyroma>=1.6,<2.0',),
    'with_pep257': (),  # note: this is no longer optional, so this option will be removed in a future release
}
_OPTIONAL['with_everything'] = [req for req_list in _OPTIONAL.values() for req in req_list]

The result, the new _OPTIONAL dictionary including the key with_everything (which w/o further sorting is the source of the list of requirements requires.txt) e.g. looks like this (code snipped run through in IPython):

In [3]: _OPTIONAL
Out[3]: 
{'with_everything': ['vulture>=0.6', 'pyroma>=1.6,<2.0', 'frosted>=1.4.1'],
 'with_vulture': ('vulture>=0.6',),
 'with_pyroma': ('pyroma>=1.6,<2.0',),
 'with_frosted': ('frosted>=1.4.1',),
 'with_pep257': ()}

That list comprehension iterates over the other dictionary entries to gather the value of with_everything, and – Python programmers know that of course – dictionaries are not indexed and therefore the order of the key-value pairs isn’t fixed, but is determined by certain conditions. That’s the source for the non-determinism of this Debian package revision of Prospector.

This problem has been fixed by a patch, which just presorts the list of requirements before it gets added to _OPTIONAL:

@@ -76,8 +76,8 @@
     'with_pyroma': ('pyroma>=1.6,<2.0',),
     'with_pep257': (),  # note: this is no longer optional, so this option will be removed in a future release
 }
-_OPTIONAL['with_everything'] = [req for req_list in _OPTIONAL.values() for req in req_list]
-
+with_everything = [req for req_list in _OPTIONAL.values() for req in req_list]
+_OPTIONAL['with_everything'] = sorted(with_everything)

In comparison to the list method sort(), the function sorted() does not change iterables in-place, but returns a new object. Both could be used. As a side note, egg-info/requires.txt isn’t even needed, but that’s another issue.


  1. As an alternative to prefixing LC_ALL, env -i could be used to get an empty environment. [return]
  2. 0.11.7-4 already but this package revision was in experimental due to the switch to Python 3 and therefore not tested by reproducible builds continuous integration. [return]