About

This is a concise tutorial on Ansible. It starts by giving a description on what Ansible is and what it is used for, provides instructions how to install it, and gives an overview on its playbooks. Then goes into more detail on the files used by Ansible: inventory, configuration, and playbooks. It finalizes by showcasing multiple use cases. It doesn’t cover everything, since to do so would make it look like the manual, but it is broad and deep enough to give a jump start to using it.
Vagrant will be used in this tutorial since it provides us a convenient way to make development environments easily available for testing Ansible.

The recipes used in this tutorial can be found at https://github.com/alexconst/ansible_recipes.

What is Ansible

“Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates.” 1
In order to to this it uses text files where configuration management, deployment and orchestration tasks are defined.
The advantage of using a provisioning tool like Ansible is that by using its configuration files it makes the whole process reproducible and scalable to hundreds or thousands of servers. With the benefit that these configuration files can be put under version control. Another advantage is that Ansible’s modules are implemented to be idempotent; a property not present when using shell scripts.

An Ansible recipe is composed by:

  • one or more YAML playbook files, which define the tasks to be executed,
  • an inventory file, where target host machines are listed and grouped, and
  • an optional Ansible configuration file.

Other well known provisioning tools include: Puppet (2005), Chef (2008) and Salt (2011). So why choose Ansible (2012)?
There are multiple discussions2 3 4 on this topic but with no clear winner standing out. The main reasons for this being related with the maturity of each tool, its prevalence inside a company, and the pros and cons that each tool brings. Nonetheless, Chef does tend to be well regarded (and a better alternative than its Ruby counterpart Puppet), and Ansible does tend to be recommended for new users or installations.

The main reasons for Ansible:

  • excellent documentation,
  • easy learning curve: due to the use of YAML, Python and its documentation,
  • declarative paradigm: configuration is done as data via YAML files not code,
  • agent-less architecture: only SSH is used, so no potential vulnerable agents get installed,
  • batteries included: almost 500 modules,
  • use of Jinja2 templating language: for variables and loop constructs, and
  • Ansible Galaxy: a repository with thousands of Ansible recipes which you can customize to your needs.

Installation

To install Ansible on your host machine:

http://docs.ansible.com/ansible/intro_installation.html#installing-the-control-machine

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# install Ansible from source
cd /usr/local/src
git clone git://github.com/ansible/ansible.git --recursive

# install dependencies:
sudo pip install paramiko PyYAML Jinja2 httplib2 six
# needed for the AWS EC2 inventory:
sudo pip install boto
# needed for connecting to a guest when using passwords
sudo apt-get install -y sshpass

# set up a default inventory file
echo "" >> ~/ansible_hosts


# add these lines to your shell rc file
# (unfortunately here documents break syntax highlight)
# cat <<'EOF' >> "$your_shellrc_file"
export ANSIBLE_INVENTORY="$HOME/ansible_hosts"
export ANSIBLE_HOME="/usr/local/src/ansible"
alias env-ansible="source $ANSIBLE_HOME/hacking/env-setup"
# needed for the AWS EC2 inventory:
export ANSIBLE_EC2="$ANSIBLE_HOME/contrib/inventory/ec2.py"
alias ansible-inv-ec2="$ANSIBLE_EC2"
export EC2_INI_PATH="ec2.ini"
# EOF


# to use ansible, set its environment
env-ansible


# to update Ansible
cd /usr/local/src/ansible
git pull --rebase
git submodule update --init --recursive

Ansible overview

1
2
ansible --version
# ansible 2.1.0 (devel 0f15e59cb2) last updated 2016/02/09 15:31:35 (GMT +100)

If you wish to test the commands described in this section then start by preparing a test environment.

Use the following Vagrantfile and vagrant up to deploy a new environment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure(2) do |config|

  # Choose a box with VBox guest tools already installed and a Ruby version
  # compatible with GitHub Pages and Jekyll.
  config.vm.box = "ubuntu/wily64"

  # Set up hostname
  config.vm.hostname = "ansible-test"

  # Assign a static IP to the guest
  config.vm.network "private_network", ip: "192.168.22.50"

end

Create a nodes.ini inventory file:

1
192.168.22.50

Now you’re ready to test Ansible.

Ansible supports two operation modes: ad-hoc mode and playbook mode.

In ad-hoc mode commands are executed from the command line.
Examples:

1
2
3
4
5
6
# "ping" `all` nodes in the `nodes.ini` inventory file using the `vagrant` remote user
# the `ping` module tries to login to a host, verify a usable python and return pong on success
ansible -m ping -u vagrant -i nodes.ini all --ask-pass

# collect system information (aka gathering facts)
ansible -m setup -u vagrant -i nodes.ini all --ask-pass > facts.txt

With playbook mode commands are executed sequentially as defined in the playbook file. In the example listed here we update the package listing and install htop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
---
# This playbook only has one play
# And it applies to all hosts in the inventory file
- hosts: all
  # we need priviledge escalation to install software, so we become root
  become: yes
  # and we become root using sudo
  become_method: sudo
  # to perform the following tasks:
  # (and tasks should always have a name)
  tasks:
    - name: update package listing cache
      # use the Ansible apt module to:
      # update package list, but don't upgrade the system
      apt: update_cache=yes upgrade=no cache_valid_time=1800

    - name: install packages
      # use the Ansible apt module to:
      # install the listed packages to the latest available version
      apt: pkg={{ item }} state=latest
      with_items:
        - htop

To run the playbook:

1
2
3
4
# check that the playbook syntax is correct
ansible-playbook --syntax-check htop.yml
# run the playbook
ansible-playbook -i nodes.ini -u vagrant htop.yml --ask-pass

In the examples shown above we used the nodes.ini inventory which only contained the IP address of the target machine. But alternatively we could have used this vagrant.ini inventory file instead:

1
mymachine ansible_ssh_host=192.168.22.50 ansible_ssh_port=22 ansible_ssh_user='vagrant' ansible_ssh_private_key_file='./.vagrant/machines/default/virtualbox/private_key'

Which would then simplify the command for running playbooks to this:

1
ansible-playbook -i vagrant.ini htop.yml

The first word in the vagrant.ini file, “mymachine”, works as an hostname alias for when executing Ansible commands/playbooks; it doesn’t really need to be an hostname.
Also note that, in this particular case, because of the relative path used to specify the private SSH key, for the inventory to work it needs to be on the same directory as the Vagrantfile.

Inventory files

The inventory file lists and groups target host machines where the playbooks can be executed. A inventory file can look like this:

1
2
3
4
5
6
7
8
9
10
# group 'web' includes 30 webservers
[web]
webserver-[01:30].example.com

# and group 'db' includes 6 db servers
[db]
dbserver-[a-f].example.com

# and this is how the Vagrant inventory file looks like for a 'default' Vagrant machine
default ansible_ssh_host=127.0.0.1 ansible_ssh_port=2200 ansible_ssh_user='vagrant' ansible_ssh_private_key_file='/path/to/.vagrant/machines/default/virtualbox/private_key'

Or more complex:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# define group with 2 hosts
[europe]
host1
host2

# define group 'asia', where one of the hosts is also in the 'europe' group
# this may imply Ansible commands being executed twice on this host (but no
# worries since they are idempotent)
[asia]
host2
host3

# define a group of groups named 'euroasia' using the 'children' keyword
[euroasia:children]
europe
asia

# and set variables, to be used in the playbooks, for the 'euroasia' group
# NOTE: best practices actually recommend having the variables defined in a separate YAML file
[euroasia:vars]
ntp_server=ntp.london.example.com
proxy=proxy.london.example.com
some_custom_var=foobar

[global:children]
euroasia
america
oceania

Dynamic inventory files (with AWS)

Ansible also provides a way to get an inventory of hosts from third party sources, which is particularly useful when dealing with cloud providers. Ansible includes support for: AWS EC2, Digital Ocean, Google CE, Linode, OpenStack, among others. And it even allows adding support to other sources of dynamic inventory systems5.

Here follows an example of AWS EC2 dynamic inventory system (note that if you haven’t already, you will need to perform the steps described in the Installation section; namely installing the needed packages and setting alias and environment variables in your shell rc file).

Get your EC2 external inventory script settings file ready:

1
2
3
4
5
6
7
8
# option 1: either use the default location previously set in your shell rc file
# this next line should simply echo "ec2.ini" (thus pointing to the current dir)
echo $EC2_INI_PATH
# copy the provided ec2.ini to the local dir
cp $ANSIBLE_HOME/contrib/inventory/ec2.ini .

# option 2: or set the path to your ec2 ini file
export EC2_INI_PATH="/path/to/ec2.ini"

In a real use case you may want to edit the ec2.ini file to better suite your needs. For example, you can speed up the query process by including or excluding regions of interest with the regions and regions_exclude variables. Or change the cache_max_age variable that specifies how long cache results are valid and API calls are skipped.

To get a listing of running instances:

1
2
3
4
5
6
7
ansible-inv-ec2 --list

# or if you need to refresh the cache
ansible-inv-ec2 --list --refresh-cache

# or if you want to choose a particular AWS profile, in this case 'dev'
AWS_PROFILE="dev" ansible-inv-ec2 --list --refresh-cache

To execute an ad-hoc command:

1
2
3
4
5
6
7
8
9
# typically official Debian machines have the 'admin' user while Ubuntu have the 'ubuntu' user
instance_user="admin"
# path to your SSH keypair
instance_key="$HOME/.ssh/aws_developer.pem"
# despite passing a profile, you still need to specify a region
region="eu-west-1"

# execute the ping module
AWS_PROFILE="dev"  ansible -i "$ANSIBLE_EC2" -u "$instance_user" --private-key="$instance_key" "$region"  -m ping

To run our playbook that installs htop:

1
2
3
4
5
6
7
# by default Debian machines have the 'admin' user while Ubuntu have the 'ubuntu' user
instance_user="admin"
# path to your SSH keypair
instance_key="$HOME/.ssh/aws_developer.pem"

# run the playbook
AWS_PROFILE="dev"  ansible-playbook -i "$ANSIBLE_EC2" -u "$instance_user" --private-key="$instance_key"   htop.yml

A few other points worth mentioning:

  • the default ec2.ini is configured to run Ansible from outside AWS EC2, however this is not the most efficient way to manage those instances. The ideal would be to have an Ansible management instance running in EC2 as well.
  • when running Ansible from within AWS EC2 then using internal DNS names and IP addresses makes more sense. This can be configured via the destination_variable setting. Which is actually required to access the instances when dealing with a private subnet inside a VPC.
  • when running a private subnet inside a VPC then those instances will only be listed in the inventory if the vpc_destination_variable is set to private_ip_address.
  • when working with dynamic inventories many dynamic groups are automatically created. So an instance with an AWS tag such as class:webserver would load variables from a group_vars/ec2_tag_class_webserver variables file.

Configuration files

Ansible will use the configuration options found on the first file that it finds from the following list:

  • ANSIBLE_CONFIG: an environment variable pointing to a config file
  • ansible.cfg: in the current directory
  • .ansible.cfg: in the home directory
  • /etc/ansible/ansible.cfg

NOTE: it will only use one file. Settings are not merged.

The configuration file can be used to set a multitude of options regarding connectivity, parallelism, privilege escalation, among other settings. Nearly all of these options can be overridden in the playbooks or via command line flags. Check the documentation for a list of all options. Some of the most useful ones are:

  • forks: the default number of processes to spawn when communicating with remote hosts. By default it is automatically limited to the number of possible hosts.
  • gathering: by default is set to implicit which ignores the fact cache and gathers facts per play unless the gather_facts: False is set in the playbook. The explicit option does the opposite. The smart setting will only gather facts once per playbook. Both explicit and smart use the facts cache.
  • log_path: if configured it will be used to log information.
  • nocows: set to 1 if you don’t like them.
  • private_key_file: points to a private key file. You can use this config option instead of the ansible --private-key.
  • vault_password_file: sets the path to Ansible Vault password file.

If you’re looking to optimize your operations look into the pipelining and accelerate_* options.

To configure Ansible to your needs make a copy of the template at $ANSIBLE_HOME/examples/ansible.cfg to your local dir.

Playbook files

Overview

Playbooks are YAML files that describe configuration, deployment and orchestration operations to be performed on a group of nodes.
Each playbook contains a list of plays, and each play includes a list of tasks targeted at a group of hosts, and each task calls an Ansible module.
Tasks are executed sequentially according to the order defined in the playbook, but each task is executed in parallel across hosts.
Apart from those there is also the concept of handler which is a task that executes at the end of a play (but only once) when it is triggered by any of the tasks (that were set to notify that handler). They are typically used to restart services and reboot the machine.

Commonly used entries in a playbook:

  • hosts: a list of one or more groups of nodes where the playbook will be executed.
  • gather_facts: you can turn off fact gathering by setting it to no (but the best would be instead to use gathering = smart in your ansible.cfg)
  • vars: a list of variables that can be used both in the Jinja2 template files and also in the tasks in the playbook.
  • vars_files: a list of playbook files that have variable definitions.
  • remote_user: the remote user used to login to the node.
  • become: if set to yes the remote user will switch to root before executing the tasks.
  • become_method: defines the switch user method, typically it’s sudo.
  • tasks: a list of tasks.
  • task: each tasks makes use of a module to perform an operation and eventually notify a handler.
  • handlers: a list of handlers. With each handler being executed at most once, at the end of a play.

Some commonly used modules:

  • apt and yum: package management.
  • template: evaluate the input Jinja2 template file and copy its result to the remote node.
  • copy: copy a file to the remote node.
  • shell: execute a command via the shell and thus making use of environment variables.
  • command: execute a command without invoking a shell or using environment variables.

Variables

Parametrization of a playbook can be done via variables. These can be defined in playbooks, inventories, and via the command line.
A special class of variables goes by the name of facts, which consist on information gathered from the target node, and are particularly useful when dealing with config files that need external IP addresses or number of CPU cores. Facts are named using the following prefixes: ansible_, facter_ and ohai_; where the first group refers to Ansible’s own facts scheme, while the other two are present for convenience/migration purposes and refer respectively to Puppet and Chef fact gathering systems.

Ansible also makes possible for a host to use facts from another host via hostvars. For example suppose your load balancer needs information about the external IP address of the machines in the webservers group (groups can be accessed using the groups variable). That can be done as follows:

1
2
3
{% for host in groups['webservers'] %}
   {{ hostvars[host]['ansible_eth0']['ipv4']['address'] }}
{% endfor %}

Note: when using hostvars with Vagrant things can get a bit tricky. For it to work properly you need to have persistent fact caching enabled. To do this install redis and the python bindings:

1
apt-get install -y redis-server python-redis

And configure the use of redis in ansible.cfg:

1
2
3
gathering = smart
fact_caching = redis
fact_caching_timeout = 86400

Secrets

To avoid keeping sensitive information like passwords in plaintext it’s possible to use Ansible Vault to encrypt and decrypt secrets using AES-256. It basically works like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# set your editor
export EDITOR="vim"

# create a variables file in the vault
vaultfile="vars/main.yml"
ansible-vault create $vaultfile

# to edit a file in the vault
ansible-vault edit $vaultfile

# to run a playbook with encrypted variables (either in itself or in a dependency):
ansible-playbook $playbook.yml --ask-vault-pass
# or if it's stored in a file (as a single line in the file):
ansible-playbook $playbook.yml --vault-password-file $secret_file.txt
# the path to this file can also be configured in the environment variable ANSIBLE_VAULT_PASSWORD_FILE

# if the decryption process is slow, install the cryptography package
pip install cryptography

Roles

There is one final concept that one should be aware and it regards reusability. Ansible makes possible for a playbook to include other playbooks or roles.
Roles are a collection of playbooks that act as reusable building blocks. The file structure of a role can look something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
lamp_haproxy/roles/nagios
├── files
│   ├── ansible-managed-services.cfg
│   ├── localhost.cfg
│   └── nagios.cfg
├── handlers
│   └── main.yml
├── tasks
│   └── main.yml
└── templates
    ├── dbservers.cfg.j2
    ├── lbservers.cfg.j2
    └── webservers.cfg.j2

The Ansible Galaxy is a community driven website that has thousands of roles that can be reused and customized to specific needs.

Because the best way to understand playbooks is via examples the next sections will do just that. But be sure to read the Best Practices guide first, as it will help making the most out of the playbooks and Ansible.

Example 1: SSH known_hosts management

Using Vagrant is very convenient when testing environments and provisioning. But with each vagrant up the SSH fingerprints at $HOME/.ssh/know_hosts also get updated which can lead to errors when provisioning. While this can be a deterrent for MitM attacks it becomes a nuisance when testing things out in a local environment since it requires intervention with each new deploy, namely by editing known_hosts or running ssh-keygen -R $hosts.
A more interesting way to fix this is by running Ansible on the localhost. While not particular useful in this case the approach used can be adapted for managing keys in other situations. And it’s also a good excuse to flex Ansible’s muscle.
The files for this example are in the refresh_ssh_public_keys directory.

Note: this playbook will be executed locally and modify the known_hosts file in the host computer; which makes it an exception to Ansible’s typical use case.

In this case the use of an inventory file isn’t strictly required but using one prevents a warning. The inventory file:

1
localhost              ansible_connection=local

The playbook:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
# This playbook refreshes local SSH key fingerprints. Useful when using Vagrant.
#
# To run the playbook on your localhost:
#   ansible-playbook main.yml
# Or to avoid any warnings:
#   ansible-playbook -i localhost.ini main.yml
#
# NOTE: this should be used only in a trusted local environment. Otherwise you
# may be making yourself vulnerable to MitM attacks.
#
- hosts: localhost
  gather_facts: no
  vars:
    known_hosts_file: "~/.ssh/known_hosts"
    # Only hosts in this range will be updated:
    target_subnet: "192.168.22."
    host_start: 50
    host_end: 59
  tasks:
    - name: Check if the known hosts file exists
      file: "path={{ known_hosts_file }} state=file"
      # Save the task output/report/log to a register
      register: file_check
      # We ignore errors here because we'll handle them in the next task
      ignore_errors: true

    - name: Create the known hosts file when not found
      file: "path={{ known_hosts_file }} state=touch"
      # Use Jinja2 template filters to check if the field 'failed' exists
      when: file_check | failed

      # Don't Repeat Yourself. Save the target hosts list to a register
    - name: Dummy task to build list of nodes for ssh fingerprint
      assert: { that: "'a' == 'a'" }
      # create a custom sequence and save it to register target_hosts
      with_sequence:
        start={{host_start}}
        end={{host_end}}
        format={{target_subnet}}%i
      register: target_hosts

    - name: Remove SSH fingerprints if they exist
      known_hosts:
        state=absent
        path="{{known_hosts_file}}"
        host="{{item}}"
        # Preprocess data in register, using Jinja2 templates, in order to allow
        # easy access via {{item}} instead of {{item.item}}
      with_items: "{{ target_hosts.results | map(attribute='item') | list }}"

    - name: Add SSH fingerprints if the node is online
      # This task makes use of the lookup module which allows accessing data from
      # outside sources. In particular it uses the pipe lookup which returns the
      # raw output of the specified ssh-keyscan command.
      known_hosts:
        state=present
        path="{{known_hosts_file}}"
        host="{{item}}"
        key="{{ lookup('pipe', 'ssh-keyscan -H -T 1 {{item}}') }}"
      with_items: "{{ target_hosts.results | map(attribute='item') | list }}"
      ignore_errors: yes

To execute the playbook:

1
ansible-playbook -i localhost.ini main.yml

Example 2: nginx webserver

This section demonstrates the creation of a role for nginx.
The purpose of this example is to give an idea of what role creation involves as well as to further exemplify playbooks. It doesn’t aim to be a fully-fledged role, especially given that there are already very complete and versatile recipes available at Ansible Galaxy.

The nginx role file structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
website
├── ansible.cfg                     # Ansible configuration file
├── example_nginx.ini               # Inventory with Vagrant machine details
├── example_nginx.Vagrantfile       # Vagrantfile for this example
├── example_nginx.yml               # Playbook for this example
├── group_vars
│   ├── all                         # Variables used by all hosts
│   └── webservers                  # Variables used by all webservers
├── roles
│   ├── common
│   │   └── tasks                   # Tasks executed by all roles
│   │       └── main.yml
│   └── nginx                       # nginx role
│       ├── files                   # Files to be copied to the guest machines
│       │   └── humans.txt
│       ├── handlers                # Handlers notified by tasks
│       │   └── main.yml
│       ├── tasks                   # nginx tasks
│       │   └── main.yml
│       └── templates               # Templates expanded and copied to the guest machines
│           ├── index.html.j2
│           ├── nginx.conf.j2
│           └── sites-available_default.j2
└── Vagrantfile                     # Symbolic link to example_nginx.Vagrantfile

The ansible.cfg file was edited from the original file to include these settings:

1
2
gathering = smart
log_path = /tmp/ansible.log

The example_nginx.ini inventory file includes all webservers:

1
2
[webservers]
default ansible_ssh_host=192.168.22.51 ansible_ssh_port=22 ansible_ssh_user='vagrant' ansible_ssh_private_key_file='./.vagrant/machines/default/virtualbox/private_key'

The “main” playbook example_nginx.yml installs nginx and deploys the website on all webservers, after executing the common role. And in order to do so it switches to the root user via the become and become_method directives.

1
2
3
4
5
6
7
8
9
---
- name: deploy website
  hosts: webservers
  become: yes
  become_method: sudo

  roles:
    - common
    - nginx

The variables for the webservers:

1
2
---
website_root: /var/www/mysite

The variables described in all:

1
2
---
website_port: 80

In this example the website_port could be in the webservers variables file, but in the next example this information will be required, so that is why it is being placed in a common file.

The common role includes any operations that may be shared by more than one role:

1
2
3
4
---
# Use the apt module to: update package list, but don't upgrade the system
- name: update package listing cache
  apt: update_cache=yes upgrade=no cache_valid_time=1800

The main playbook for this role is responsible for installing nginx, configuring it via Jinja2 templates, deploying the website, and notifying the task handler for restarting the nginx daemon. The playbook is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
# Install latest version of nginx package.
# Cache is not updated here since that is done in the common role.
- name: install latest nginx
  apt: name=nginx state=latest update_cache=no
  notify: restart nginx

# Enable nginx to start at boot.
- name: enable nginx
  service: name=nginx enabled=yes

# Configure nginx settings.
- name: configure nginx settings
  template: src=nginx.conf.j2 dest=/etc/nginx/nginx.conf
  notify: restart nginx

# Configure nginx websites.
- name: configure nginx websites
  template: src=sites-available_default.j2 dest=/etc/nginx/sites-available/default
  notify: restart nginx

########################
# Copy the website.
# This could also include downloading from a git repo.
########################
- name: create website root dir
  file: path={{ website_root }} state=directory mode=755
- name: copy a file
  copy: src=humans.txt dest={{ website_root }}/humans.txt
  notify: restart nginx
- name: copy website index
  template: src=index.html.j2 dest={{ website_root }}/index.html
  notify: restart nginx

Several of the tasks notify a handler task that is responsible for restarting the nginx service. It’s defined in the handlers/main.yml as follows:

1
2
3
---
- name: restart nginx
  service: name=nginx state=restarted

Let us now look at the template files. First the nginx.conf.j2 template.
The first line includes a comment that will be expanded via the {{ ansible_managed }} variable whose purpose is to timestamp the file and also alert anyone that the file is auto-generated and should not be edited.
The only other line of interest, in Ansible context anyway, is {{ ansible_processor_vcpus * 2 }} which allows optimizing the number of workers dynamically by using Ansible facts gathering and making use of the Jinja2 template arithmetic capabilities.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# {{ ansible_managed }}

user www-data;
worker_processes {{ ansible_processor_vcpus * 2 }};
pid /var/run/nginx.pid;

events {
        worker_connections  768;
}

http {
        ##
        # Basic Settings
        ##
        sendfile off;
        # sendfile disabled because of virtualbox bug
        # https://www.vagrantup.com/docs/synced-folders/virtualbox.html
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 65;
        types_hash_max_size 2048;
        include /etc/nginx/mime.types;
        default_type application/octet-stream;

        ##
        # Logging Settings
        ##
        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;

        ##
        # Virtual Host Configs
        ##
        include /etc/nginx/conf.d/*.conf;
        include /etc/nginx/sites-enabled/*;
}

The next template is responsible for configuring the website. It expands the variables previously configured for the webserver hostname, port and website root directory location.

1
2
3
4
5
6
7
8
9
10
11
12
13
# {{ ansible_managed }}

server {
        server_name {{ ansible_hostname }};

        listen {{ website_port }};
        root {{ website_root }};
        index index.html index.htm index.nginx-debian.html;

        location / {
                try_files $uri $uri/ =404;
        }
}

The last template used in this role is for the index.html page. It shows (static) system information using Ansible facts:

1
2
3
4
5
6
7
8
9
<html>
<p>webserver <b>{{ ansible_hostname }}</b>:</p>
<p style="text-indent: 5em;">system: {{ ansible_lsb.description }} running kernel {{ ansible_kernel }}</p>
<p style="text-indent: 5em;">CPU: {{ ansible_processor_vcpus }} vCPUs at {{ ansible_processor[1] }}</p>
<p style="text-indent: 5em;">RAM: {{ ansible_memtotal_mb }} MiB</p>
<p style="text-indent: 5em;">disk: {{ (ansible_mounts[0].size_total/1024/1024/1024)|int }} GiB</p>
<p style="text-indent: 5em;">eth0: {{ ansible_eth0.ipv4.address }}</p>
<p style="text-indent: 5em;">eth1: {{ ansible_eth1.ipv4.address }}</p>
</html>

To deploy the website:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Pick the Vagrantfile for this example
ln -f -s example_nginx.Vagrantfile Vagrantfile
# Start the VM instance
vagrant up

# Refresh SSH fingerprints for the 192.168.22.5x range on the host, otherwise
# Ansible would fail during provisioning with the message:
# "SSH encountered an unknown error during the connection. ..."
ansible-playbook -i ../refresh_ssh_public_keys/localhost.ini ../refresh_ssh_public_keys/main.yml

# Perform the provisioning
ansible-playbook -i example_nginx.ini  example_nginx.yml

# Access the website:
http://localhost:8080/

Example 3: HAProxy load balancer

Before we start I just want to point that the example_haproxy.(ini|yml|Vagrantfile) set was created as a draft to this example, so it will not be approached here. However the following Vagrantfile snippet, which is part of it, is still worth mentioning since it automates the whole provisioning process (ie, no need to run ansible-playbook after vagrant up) while also showing the use of advance settings for Ansible in Vagrant.

1
2
3
4
5
6
7
8
9
10
  # provisioning using Ansible
  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "example_haproxy.yml"
    ansible.inventory_path = "example_haproxy.ini"
    # when using an inventory file, the path to the private key must also be specified
    # either as an argument or in the inventory file itself (which it is)
    #ansible.raw_arguments  = [
    #  "--private-key=./.vagrant/machines/default/virtualbox/private_key"
    #]
  end

Now for the load balancer example.
In this example we deploy 2 nginx webservers and 1 HAProxy reverse proxy for load balancing.

Let us start by running it first:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Pick the Vagrantfile for this example
ln -f -s example_load_balanced_website.Vagrantfile Vagrantfile
# Start the VM instances for the webservers and load balancer
vagrant up

# Refresh SSH fingerprints for the 192.168.22.5x range on the host, otherwise
# Ansible would fail during provisioning with the message:
# "SSH encountered an unknown error during the connection. ..."
ansible-playbook -i ../refresh_ssh_public_keys/localhost.ini ../refresh_ssh_public_keys/main.yml

# Perform the provisioning
ansible-playbook -i example_load_balanced_website.ini example_load_balanced_website.yml

# Check HAProxy stats
http://localhost:8080/haproxy?stats

# Access the website:
http://localhost:8080/

And if you refresh the page you’ll see that it gets served by a different webserver.

This Vagrantfile is responsible for providing the 3 machines:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure(2) do |config|

  # Choose a box with VBox guest tools already installed
  #config.vm.box = "debian/jessie64"
  config.vm.box = "ubuntu/wily64"

  # Set up hostname
  config.vm.hostname = "ansible-nginx"

  # Message shown on vagrant up
  config.vm.post_up_message = "After provisioning check the website at http://localhost:8080/"

  # Share an additional folder with the guest VM.
  host_folder = ENV['HOME'] + "/home/downloads/share_vagrant"
  guest_folder = "/shared/"
  config.vm.synced_folder host_folder, guest_folder

  # Fine tune the virtualbox VM
  config.vm.provider "virtualbox" do |vb|
    vb.customize [
      "modifyvm", :id,
      "--cpus", "2",
      "--cpuexecutioncap", "50",
      "--memory", "512",
    ]
  end

  # fix annoyance, http://foo-o-rama.com/vagrant--stdin-is-not-a-tty--fix.html
  config.vm.provision "fix-no-tty", type: "shell" do |s|
    s.privileged = false
    s.inline = "sudo sed -i '/tty/!s/mesg n/tty -s \\&\\& mesg n/' /root/.profile"
  end
  # fix annoyance, http://serverfault.com/questions/500764/dpkg-reconfigure-unable-to-re-open-stdin-no-file-or-directory
  config.vm.provision "shell", inline: "echo 'export DEBIAN_FRONTEND=noninteractive' >> /root/.profile"
  config.vm.provision "shell", inline: "for user in /home/*; do echo 'export DEBIAN_FRONTEND=noninteractive' >> $user/.profile; done"


  #####################################
  # multi-machine environment specific
  #####################################

  # web servers
  (1..2).each do |i|
    config.vm.define "web#{i}" do |web|
      web.vm.hostname = "web#{i}"
      # Assign a static IP to the guest
      web.vm.network :private_network, ip: "192.168.22.5#{i}"
      # Create a forwarded port mapping
      web.vm.network "forwarded_port", guest: 80, host: "808#{i}"
      # web server specific provisioning
      web.vm.provision :shell, inline: "echo 'Web Server #{web.vm.hostname} reporting for duty.'"
    end
  end

  # lb server
  config.vm.define "lb" do |lb|
    lb.vm.hostname = "lb"
    # Assign a static IP to the guest
    lb.vm.network :private_network, ip: "192.168.22.50"
    # Create a forwarded port mapping
    lb.vm.network "forwarded_port", guest: 80, host: "8080"
    # override default settings
    lb.vm.provider "virtualbox" do |vb|
      vb.memory = "256"
    end
    # lb server specific provisioning
    lb.vm.provision :shell, inline: "echo 'Load Balancer #{lb.vm.hostname} ready to distribute workload.'"
  end

end

After the provision finishes it’s then possible to ssh to the different machines using vagrant ssh $machine_name, as well as running the playbook.
The main playbook example_load_balanced_website.yml has two plays. The first deploys the webservers and the second the load balancer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
---
- name: deploy webservers
  hosts: webservers
  become: yes
  become_method: sudo
  roles:
    - common
    - nginx

- name: deploy loadbalancer
  hosts: lbservers
  become: yes
  become_method: sudo
  roles:
    - common
    - haproxy

The inventory file example_load_balanced_website.ini defines the lbservers and webservers host groups:

1
2
3
4
5
6
[lbservers]
lb ansible_ssh_host=192.168.22.50 ansible_ssh_port=22 ansible_ssh_user='vagrant' ansible_ssh_private_key_file='./.vagrant/machines/lb/virtualbox/private_key'

[webservers]
web1 ansible_ssh_host=192.168.22.51 ansible_ssh_port=22 ansible_ssh_user='vagrant' ansible_ssh_private_key_file='./.vagrant/machines/web1/virtualbox/private_key'
web2 ansible_ssh_host=192.168.22.52 ansible_ssh_port=22 ansible_ssh_user='vagrant' ansible_ssh_private_key_file='./.vagrant/machines/web2/virtualbox/private_key'

And the config file ansible.cfg enables persistent caching using redis so that we can use the hostvars magic variable when configuring load balancing:

1
2
3
gathering = smart
fact_caching = redis
fact_caching_timeout = 86400

The other variables used are defined in group_vars/all, group_vars/webservers, and group_vars/lbservers:

1
2
---
website_port: 80
1
2
3
---
website_root: /var/www/mysite
# website_port is declared in the all group_vars since it's also used by the LB
1
2
3
4
5
---
backend_name: backend_lbservers
daemon_name: proxy_daemon
balance: roundrobin
lb_listen_port: 80

The haproxy main playbook installs the package and notifies the handlers to restart the needed services:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
---
# Install latest version of haproxy package.
# Cache is not updated here since that is done in the common role.
- name: install latest haproxy
  apt: name=haproxy state=latest update_cache=no
  notify: restart haproxy

# Enable haproxy to start at boot.
- name: enable haproxy
  service: name=haproxy enabled=yes

# Configure haproxy settings.
- name: configure haproxy settings
  template: src=haproxy.cfg.j2 dest=/etc/haproxy/haproxy.cfg
  # we need to restart rsyslog to enable haproxy logging to /var/log/haproxy.log
  # https://serverfault.com/questions/645924/haproxy-logging-to-syslog/751631#751631
  notify:
    - restart rsyslog
    - restart haproxy

The playbook for the haproxy handlers has no surprises (that is if you read the comments in the previous playbook):

1
2
3
4
5
6
---
- name: restart haproxy
  service: name=haproxy state=restarted

- name: restart rsyslog
  service: name=rsyslog state=restarted

Finally in the template for haproxy.cfg we use Ansible facts to configure the load balancing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# {{ ansible_managed }}

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

        # enable stats
        stats enable
        stats uri /haproxy?stats


backend {{ backend_name }}
    # set mode to HTTP
    mode http
    # set balancing algorithm for distributing requests
    balance     {{ balance }}
    # get list of machines in the lbservers group
    {% for host in groups['lbservers'] %}
        # each LB will listen for connections on lb_listen_port on all interfaces
        listen {{ daemon_name }} *:{{ lb_listen_port }}
    {% endfor %}
    # get list of machines in the webservers group
    {% for host in groups['webservers'] %}
        # requests will be forwarded to the webservers' WAN network interface (eth1) on port website_port
        # the check option enables health checks by HAProxy
        server {{ host }} {{ hostvars[host]['ansible_eth1'].ipv4.address }}:{{ website_port }} check
    {% endfor %}

Debugging and tips

  • To check if the playbook is valid execute ansible-playbook --syntax-check playbook.yml.

  • To have the playbook execute as a dry run (ie, without really executing anything) ansible-playbook --check playbook.yml.

  • To get the stdout and stderr of each task executed in the playbook use the -v flag.

  • To print statements and check variable values during playbook execution use the debug module. Examples:

1
2
3
4
5
6
7
8
9
10
11
12
13
---
- hosts: all 
  # Debug examples
  tasks:
    # print list of ipv4 addresses when the machine has a gateway defined
    - debug: msg="System {{ inventory_hostname }} has the following IPv4 addresses {{ ansible_all_ipv4_addresses }}"
      when: ansible_default_ipv4.gateway is defined

    # execute command and save result (including stdout and stderr) to a variable
    - shell: /usr/bin/uptime
      register: result
    # print variable
    - debug: var=result
  • To enable logging set the log_path in your ansible.cfg file.

  • To list the tasks that would be executed by an ansible-playbook command add the --list-tasks option.

  • To list the hosts that would be affected by an ansible-playbook command add the --list-hosts option. Especially useful when using the --limit option to limit to execution on a group of hosts.

  • To check if your nodes are reachable execute ansible all -m ping (install the sshpass package on your host system and add the --ask-pass option if you didn’t propagate an SSH keypair).

  • When dealing with large playbooks it may be useful to change the execution entry point or choose which tasks to execute. By using the --tags/--skip-tags options when executing a playbook it’s possible to filter which tasks get/don’t get executed. And with the --start-at-task it’s possible to choose a starting point to the playbook. A --step option is also provided to allow executing a playbook in interactive mode.

Other references:

Troubleshooting

  • failure executing playbook
    ERROR: ansible "Failed to lock apt for exclusive operation"
    PROBLEM: the playbook or task needs to be executed as root. Note that sudo: yes has been deprecated and replaced by the become and become_method directives.
    SOLUTION:

    1
    2
    become: yes
    become_method: sudo
    
  • ruby gems not installed for all users
    PROBLEM: the ruby gems are installed for the user running the playbook
    SOLUTION: to make the gems install to /var/lib instead make sure you use the user_install=no option and vagrant destroy to start over.
    REFERENCES:
    http://docs.ansible.com/ansible/gem_module.html
    http://stackoverflow.com/questions/22115936/install-bundler-gem-using-ansible

  • unable to connect to machine
    ERROR: SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue
    TROUBLESHOOTING: attempt to manually connect to the instance, if the problem is related to WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED then try the solution listed here
    SOLUTION: assuming you understand the cause of the problem then try again after executing the following:

    1
    2
    3
    4
    5
    6
    # select the instance with problems
    node="192.168.22.50"
    # remove the old key
    ssh-keygen -R $node
    # add the new key
    ssh-keyscan -H $node >> $HOME/.ssh/known_hosts
    

    TROUBLESHOOTING: if that doesn’t solve it then try using the -vvvv option when manually connecting with ssh to see if you can determine the root cause

  • unable to run ansible-inv-ec2 --list (or ec2.py --list)
    ERROR: ERROR: "Forbidden", while: getting RDS instances% or ERROR: "Forbidden", while: getting ElastiCache clusters%
    PROBLEM: the AWS credentials you’re using do not have access to AWS RDS and/or ElastiCache
    SOLUTION: edit your Ansible ec2.ini to have rds = False and/or elasticache = False

  • Ansible fails to read facts from hostvars
    ERROR: "AnsibleUndefinedVariable: 'dict object' has no attribute 'ansible_eth1'"
    PROBLEM: Vagrant runs provisioning for each machine independently which means that each machine is unaware of the other ones, much less of their facts.
    SOLUTION: enable fact_caching using redis. Check for the instructions in this tutorial.
    REFERENCES:
    http://blog.wjlr.org.uk/2014/12/30/multi-machine-vagrant-ansible-gotcha.html
    https://stackoverflow.com/questions/32544830/ansible-not-seeing-ansible-eth1-device

  • nginx fails to start
    ERROR: nginx.serviceJob for nginx.service failed because the control process exited with error code. See "systemctl status nginx.service" and "journalctl -xe" for details.
    PROBLEM: it could be there is a duplicate configuration in your nxing.conf file.
    SOLUTION: remove any duplicate options in the nginx.conf file.

  • HAProxy doesn’t log
    ERROR: actually it does log. But it logs to /var/log/syslog
    SOLUTION: if not already present then make sure /etc/rsyslog.d/*haproxy.conf has this line: if $programname startswith 'haproxy' then /var/log/haproxy.log. After that run service rsyslog restart.
    REFERENCES:
    https://serverfault.com/questions/645924/haproxy-logging-to-syslog/751631#751631

Footnotes