Remotely deploy a WSGI application (as a Debian package) with Ansible

This is a mini workshop as an introduction into using Ansible for the administration of Debian systems. As an example it’s shown how this configuration management tool can be used to remotely set up a simple WSGI application running on an Apache web server on a Debian installation to make it available on the net. The application which is used as an example is httpbin by Runscope. This is an useful HTTP request service for the development of web software or any other purposes which features a number of specific endpoints that can be used for different testing matters. For example, the address http://<address>/user-agent of httpbin returns the user agent identification of the client program which has been used to query it (that’s taken from the header of the request). There are official instances of this request server running on the net, like the one at http://httpbin.org/. WSGI is a widespread standard for programming web application in Python, and httpbin is implemented in Python using the Flask web framework.

The basis of the workshop is a simple base installation of an up-to-date Debian 8 “Jessie” on a demonstration host, and the latest official release of that is 8.7. As a first step, the installation has to be switched over to the “testing” branch of Debian, because the Debian packages of httpbin are comparatively new and are going to be introduced into the “stable” branch of the archive the first time with the upcoming major release number 9 “Stretch”. After that, the Apache packages which are needed to make it available (apache2 and libapache2-mod-wsgi – other web servers of course could be used instead), and which are not part of a base installation, are installed from the archive. The web server then gets launched remotely, and the httpbin package will be also pulled and the service is going to be integrated into Apache to run on that. To achieve that, two configuration files must be deployed on the target system, and a few additional operations are needed to get everything working together. Every step is preconfigured within Ansible so that the whole process could be launched by a single command on the control node, and could be run on a single or a number of comparable target machines automatically and reproducibly.

If a server is needed for trying this workshop out, straightforward cloud server instances are available on the net for example at DigitalOcean, but – let me underline this – there are other cloud providers which offer the same things, too! If it’s needed for experiments or other purposes only for a limited time, low priced “droplets” are available here which are billed by the hour. After being registered, the machine(s) which is/are wanted could be set up easily over the web interface (choose “Debian 8.7” as OS), but there are also command line clients available like doctl (which is not yet available as a Debian package). For the convenient use of a droplet the user should generate a SSH key pair on the local machine, first:

$ ssh-keygen -t rsa -b 4096 -C "john@doe.com" -f ~/.ssh/mykey

The public part of the key ~/.ssh/mykey.pub then can be uploaded into the user account before the droplet is going to be created, it then could be integrated automatically. There is a good introduction on the whole process available in the excellent tutorial series serversforhackers.com, here. Ansible then can use the SSH keypair to login into a droplet without the need to type in the password every time. On a cloud server like this carrying a Debian base system, the examples in this workshop can be tried well. Ansible works client-less and doesn’t need to be installed on the remote system but only on the control node, however a Python 2.7 interpreter is needed there (the base system of DigitalOcean includes that).

For that Ansible can do anything on them, remote servers which are going to be controlled must be added to /etc/ansible/hosts. This is a configuration file in the INI format for DNS names and IP addresses. For a flexible organisation of the server inventory it’s possible to group hosts here, IP ranges could be given, and optional variables can be used among other useful things (the default file contains a couple of examples). One or a couple of servers (in Ansible they are called “hosts”) on which something particular is going to be happening (like httpbin is going to be installed) could be added like this (the group name is arbitrary):

[httpbin]
192.0.2.0

Whether Ansible could communicate with the hosts in the group and actually can operate on them can be verified by just pinging them like this:

$ ansible httpbin -m ping -u root --private-key=~/.ssh/mykey
192.0.2.0 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}

The command succeeded well, so it appears there isn’t no significant problem regarding this machine. The return value changed:false indicates that there haven’t been any changes on that host as a result of the execution of this command. Next to ping there are several other modules which could be used with the command line tool ansible the same way, and these modules are actually something like the core components of Ansible. The module shell for example can be used to execute shell commands on the remote machine like uname to get some system information returned from the server:

$ ansible httpbin -m shell -a "uname -a" -u root --private-key=~/.ssh/mykey
192.0.2.0 | SUCCESS | rc=0 >>
Linux debian-512mb-fra1-01 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux

In the same way, the module apt could be used to remotely install packages. But with that there’s no major advantage over other software products that offer a similar functionality, and using those modules on the command line is just the basics of Ansible usage.

Playbooks in Ansible are YAML scripts for the manipulation of the registered hosts in /etc/ansible/hosts. Different tasks can be defined here for successive processing, like a simple playbook for changing the package source from “stable” to “testing” for example goes like this:

---
 - hosts: httpbin
   tasks:
   - name: remove "jessie" package source
     apt_repository: repo='deb http://mirrors.digitalocean.com/debian jessie main' state=absent

   - name: add "testing" package source
     apt_repository: repo='deb http://httpredir.debian.org/debian testing main contrib non-free' state=present

   - name: upgrade packages
     apt: update_cache=yes upgrade=dist

First, like used with the CLI tool ansible above, the targeted host group httpbin is chosen. The default user “root” and the SSH key could be fixed here, too, to spare the need to give them on the command line. Then there are three tasks defined to get worked down consecutively: With the module apt_repository the preset package source “jessie” is removed from /etc/apt/sources.list. Then, a new package source for the “testing” archive gets added to /etc/apt/sources.list.d/ by using the same module (by the way, mirrors.digitalocean.org also provides testing, though, and that might be faster). After that, the apt module is used to upgrade the package inventory (it performs apt-get dist-upgrade), after an update of the package cache has taken place (by running apt-get update)

A playbook like this (the filename is arbitrary, but commonly carries the suffix .yml) can be run by the CLI tool ansible-playbook, like this:

$ ansible-playbook httpbin.yml -u root --private-key=~/.ssh/mykey

Ansible then works down the individual “plays” of the tasks on the remote server(s) top-down, and due to a high speed net connection and SSD block device hardware the change of the system to being a Debian Testing base installation only takes around a minute to complete in the cloud. While working, Ansible puts out status reports for the individual operations. If certain changes on the base system have been taken place already like when a playbook is run through one more time, the modules of course sense that and return just the information that the system haven’t been changed because it’s already there what have been wanted to change to. Beyond the basic playbook which is shown here there are more advanced features like register and when available to bind the execution of a play to the error-free result of a previous one.

The apt module then can be used in the playbook to install the three needed binary packages one after another:

   - name: install apache2
     apt: pkg=apache2 state=present

   - name: install mod_wsgi
     apt: pkg=libapache2-mod-wsgi state=present

   - name: install httpbin
     apt: pkg=python-httpbin state=present

The Debian packages are configured in a way that the Apache web server is running immediately after installation, and the Apache module mod_wsgi is automatically integrated. If that would be otherwise desired, there are Ansible modules available for operating on Apache which can reverse this if that is wanted. By the way, after the package have been installed the httpbin server can be launched with python -m httpbin.core, but this runs only a mini web server which is not suitable for productive use.

To get httpbin running on the Apache web server two configuration files are needed. They could be set up in the project directory on the control node and then uploaded onto the remote machine with another Ansible module. The file httpbin.wsgi (the name is again arbitrary) contains only a single line which is the starter for the WSGI application to run:

from httpbin import app as application

The module copy can be used to deploy that script on the host, while the target folder /var/www/httpbin must be set up before that by the module file. In addition to that, a separate user account like “httpbin” (the name is also arbitrary but picked up in the other config file) is needed to run it, and the module user can set this up. The demonstrational playbook continues, and the plays which are performing these three operations are going like this:

   - name: mkdir /var/www/httpbin
     file: path=/var/www/httpbin state=directory

   - name: set up user "httpbin"
     user: name=httpbin

   - name: copy WSGI starter
     copy: src=httpbin.wsgi dest=/var/www/httpbin/httpbin.wsgi owner=httpbin group=httpbin mode=0644 

Another configuration script httpbin.conf is needed for Apache on the remote server to include the WSGI application httpbin running as a virtual host. It goes like this:

<VirtualHost *>
 WSGIDaemonProcess httpbin user=httpbin group=httpbin threads=5
 WSGIScriptAlias / /var/www/httpbin/httpbin.wsgi

 <Directory /var/www/httpbin>
  WSGIProcessGroup httpbin
  WSGIApplicationGroup %{GLOBAL}
  Order allow,deny
  Allow from all
 </Directory>
</VirtualHost>

This file needs to be copied into the folder /etc/apache2/sites-available on the host, which already exists when the apache2 package is installed. The remaining operations which are missing to get anything running together are: The default welcome screen of Apache blocks anything else and should be disabled by Apache’s CLI tool a2dissite. And after that, the new virtual host needs to be activated with the complementary tool a2ensite – both could be run remotely by the module command. Then the Apache server on the remote machine must be restarted to read in the new configuration. You’ve guessed it already, that’s all easy to perform with Ansible:

   - name: deploy configuration script
     copy: src=httpbin.conf dest=/etc/apache2/sites-available owner=root group=root mode=0644

   - name: deactivate default welcome screen
     command: a2dissite 000-default.conf
     
   - name: activate httpbin virtual host
     command: a2ensite httpbin.conf

   - name: restart Apache
     service: name=apache2 state=restarted 

That’s it. After this playbook has been performed by Ansible on a (or several) freshly set up remote Debian base installation completely, then the httpbin request server is available running on the Apache web server and could be queried from anywhere by a web browser, or for example by curl:

$ curl http://192.0.2.0/user-agent
{
  "user-agent": "curl/7.50.1"
}

With the broad set of Ansible modules and the playbooks a lot of tasks can be accomplished like the example problem which has been explained here. But the range of functions of Ansible however is still even more comprehensive, but to discuss that would have blown the frame of this blog post. For example the playbooks offer more advanced features like event handler which can be used for recurring operations like the restart of Apache in more extensive projects. And beyond playbooks, templates could be set up in the roles which can behave differently on selected machine groups – Ansible uses Jinja2 as template engine for that. And the scope of functions of the basic modules could be expanded by employing external tools.

To drop a word on why it could be useful in certain situations to run own instances of the httpbin request server instead of using the official ones which are provided on the net by Runscope: Like some people would prefer to run a private instance for example in the local network instead of querying the one on the internet. Or for some development reasons a couple or even a large number of identical instances might be needed – Ansible is ideal for setting them up automatically. Anyway, the Javascript bindings to the tracking services like Google Analytics in httpbin/templates/trackingscripts.html are patched out in the Debian package. That could be another reason to prefer to set up an own instance on a Debian server.