Friday, July 09, 2010

Working with Chef cookbooks and roles

Welcome to the second installment of my Chef saga (you can read the first one here). This time I will walk you through creating your own cookbook, modifying an existing cookbook, creating a role and adding a client machine to that role. As usual, I got much help from the good people on the #chef IRC channel, especially the omnipresent @kallistec. All these tasks are also documented in one form or another on the Chef wiki, but I found it hard to put them all together, hence this blog post.

Downloading the Opscode Chef cookbooks

Step 0 for this task is to actually clone the Opscode repository. I created a directory called /srv/chef/repos on my Chef server box and ran this command inside it:

# git clone git://github.com/opscode/chef-repo.git

This will create /srv/chef/repos/chef-repo with a bunch of sub-directories, one of them being cookbooks.
I deleted the cookbooks directory and cloned the Opscode cookbooks in its place:

# cd /srv/chef/repos/chef-repo
# git clone git://github.com/opscode/cookbooks



Uploading the cookbooks to the Chef server

Just downloading the cookbooks somewhere on the file system is not enough. The Chef server needs to be made aware of their existence. You do that with the following knife command (which I ran on the Chef server box):

# knife cookbook upload -a -o /srv/chef/repos/chef-repo/cookbooks

BTW, if you specified a non-default configuration file location for knife when you configured it (I specified /etc/chef/knife.rb for example) then you need to make a symlink from that file to ~/.chef/knife.rb, otherwise knife will complain about not finding a config file. At least it complained to me in its strange Swedish accent.

To go back to the knife command above: it says to upload all (-a) cookbooks it finds under the directory specified with -o.

Modifying an existing cookbook

If you look closely under the Opscode cookbooks, there's one called python. A cookbook contains one or more recipes, which reside under COOKBOOK_NAME/recipes. Most cookbooks have only one recipe which is a file called default.rb. In the case of the python cookbook, this recipe ensures that certain python packages such as python-dev, python-imaging, etc. get installed on the node running Chef client. To add more packages, simply edit default.rb (there's a certain weirdness in modifying a Ruby file to make a node install more Python packages....) and add your packages of choice.

Again, modifying a cookbook recipe on the file system is not enough; you need to let the Chef server know about the modification, and you do it by uploading the modified cookbook to the Chef server via knife:

# knife cookbook upload python -o /srv/chef/repos/chef-repo/cookbooks

Note the modified version of the 'knife cookbook upload' command. In this case, we don't specify '-a' for all cookbooks, but instead we specify a cookbook name (python). However, the directory remains the same. Do not make the mistake of specifying /srv/chef/repos/chef-repo/cookbooks/python as the target of your -o parameter, because it will not work. Trust me, I tried it until I was enlightened by @kallistec on the #chef IRC channel.

Creating your own cookbook

It's time to bite the bullet and create your own cookbook. Chef makes it easy to create all the files needed inside a cookbook. Run this command when in the top-level chef repository directory (/srv/chef/repos/chef-repo in my case):

# rake new_cookbook COOKBOOK=octopus

(like many other people, I felt the urge to cook Paul the Octopus when he accurately predicted Germany's loss to Spain)

This will create a directory cscp neo-app01:/opt/evite/etc/ad_urls.json .
alled octopus under chef-repo/cookbooks and it will populate it with many other directories and files. At a minimum, you need to modify only octopus/recipes/default.rb. Let's assume you want to install some packages. You can take some inspiration from other recipes (build-essential for example), but it boils down to something like this:

include_recipe "build-essential"
include_recipe "ntp"
include_recipe "python"
include_recipe "screen"
include_recipe "git"

%w{chkconfig libssl-dev syslog-ng munin-node}.each do |pkg|
  package pkg do
    action :install
  end
end

Note that I'm also including other recipes in my default.rb file. They will be executed on the Chef client node, BUT they also need to be specified in the metadata file cookbooks/octopus/metadata.rb as dependencies, so that the Chef client node knows that it needs to download them before running them. Trust me on this one, I speak again from bitter experience. This is how my metadata.rb file looks like:

maintainer        "My Organization"
maintainer_email  "admin@example.com"
license           "Apache 2.0"
description       "Installs required packages for My Organization applications"
version           "0.1"
recipe            "octopus", "Installs required packages for My Organization applications"
depends           "build-essential"
depends           "ntp"
depends           "python"
depends           "screen"
depends           "git"

%w{ fedora redhat centos ubuntu debian }.each do |os|
  supports os
end

Now it's time to upload our brand new cookbook to the Chef server. As before, we'll use the knife utility:

# knife cookbook upload octopus -o /srv/chef/repos/chef-repo/cookbooks

If you have any syntax errors in the recipe file or the metadata file, knife will let you know about them. To verify that the cookbook was uploaded successfully to the server, run this command, which should list your cookbook along the others that you uploaded previously:

# knife cookbook list

Creating a role and associating a chef client machine with it

Chef supports the notion of roles, which is very important for automated configuration management because you can assign a client node to one or more roles (such as 'web' or 'db' for example) and have it execute recipes associated with those roles.

To add a role, simply create a file under chef-repo/roles. I called mine base.rb, with the contents:

name "base"
description "Base role (installs common packages)"
run_list("recipe[octopus]")

It's pretty self-explanatory. Clients associated with the 'base' role will run the 'octopus' recipe.

As with cookbooks, we need to upload the newly created role to the Chef server. The following knife command will do it (assuming you're in the chef-repo directory):

# knife role from file roles/base.rb

To verify that the role has been uploaded, you can run:

# knife role show base

and it should show a JSON output similar to:

{
    "name": "base",
    "default_attributes": {
    },
    "json_class": "Chef::Role",
    "run_list": [
      "recipe[octopus]"
    ],
    "description": "Base role (installs common packages)",
    "chef_type": "role",
    "override_attributes": {
    }
}

Now it's time to associate a Chef client with the new role. You can see which clients are already registered with your Chef server by running:

# knife client list

Now pick a client and see which roles it is associated with:

# knife node show client01.example.com -r
{
  "run_list": [
  ]
}

The run_list is empty for this client. Let's associate it with the role we created, called 'base':

# knife node run_list add client01.example.com "role[base]"

You should now see the 'base' role in the run_list:


# knife node show client01.example.com -r
{
  "run_list": [
    "role[base]"
  ]
}

The next time client01.example.com will run chef-client, it will figure out it is part of the 'base' role, and it will follow the role' run_list by downloading and running the 'octopus' recipe from the Chef server.

In the next installment, I will talk about how to automatically bootstrap a Chef client in an EC2 environment. The goal there is to launch a new EC2 instance, have it install Chef client, have it assign a role to itself, then have it talk to the Chef server, download and run the recipes associated with the role.

Tuesday, July 06, 2010

Chef installation and minimal configuration

I started to play with Chef the other day. The instructions on the wiki are a bit confusing, but help on twitter (thanks @jtimberman) and on the #chef IRC channel (thanks @kallistec) has been great. I am at the very minimal stage of having a chef client talking to a chef server. I hasten to write down what I've done so far, both for my own sake and for others who might want to do the same. My OS is Ubuntu 10.04 32-bit on both machines.

First of all: as the chef wiki says, make sure you have FQDNs correctly set up on both client and server, and that they can ping each other at a minimum using the FQDN. I added the FQDN to the local IP address line in /etc/hosts, so that 'hostname -f' returned the FQDN correctly. In what follows, my Chef server machine is called chef.example.com and my Chef client machine is called client.example.com.

Installing the Chef server


Here I went the Ruby Gems route, because the very latest Chef (0.9.4) had not been captured in the Ubuntu packages yet when I tried to install it.

a) install pre-requisites

# apt-get install ruby ruby1.8-dev libopenssl-ruby1.8 rdoc ri irb build-essential wget ssl-cert

b) install Ruby Gems

# wget http://production.cf.rubygems.org/rubygems/rubygems-1.3.7.tgz
# tar xvfz rubygems-1.3.7.tgz
# cd rubygems-1.3.7
# ruby setup.rb
# ln -sfv /usr/bin/gem1.8 /usr/bin/gem

c) install the Chef gem

# gem install chef

d) install the Chef server by bootstrapping with the chef-solo utility

d1) create /etc/chef/solo.rb with contents:


file_cache_path "/tmp/chef-solo"
cookbook_path "/tmp/chef-solo/cookbooks"
recipe_url "http://s3.amazonaws.com/chef-solo/bootstrap-latest.tar.gz"

d2) create /etc/chef/chef.json with contents:

{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "runit",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.example.com",
"webui_enabled": true
}
},
"run_list": [ "recipe[bootstrap::server]" ]
}

d3) run chef-solo to bootstrap the Chef server install:

# chef-solo -c /etc/chef/solo.rb -j /etc/chef/chef.json

e) create an initial admin client with the Knife utility, to interact with the API

#knife configure -i
Where should I put the config file? [~/.chef/knife.rb]
Please enter the chef server URL: [http://localhost:4000] http://chef.example.com
Please enter a clientname for the new client: [root]
Please enter the existing admin clientname: [chef-webui]
Please enter the location of the existing admin client's private key: [/etc/chef/webui.pem]
Please enter the validation clientname: [chef-validator]
Please enter the location of the validation key: [/etc/chef/validation.pem]
Please enter the path to a chef repository (or leave blank):

f) create an intial Chef repository

I created a directory called /srv/chef/repos , cd-ed to it and ran this command:

# git clone git://github.com/opscode/chef-repo.git

At this point, you should have a functional Chef server, although it won't help you much unless you configure some clients.

Installing a Chef client

Here's the bare minimum I did to get a Chef client to just talk to the Chef server configured above, without actually performing any cookbook recipe yet (I leave that for another post).

The first steps are very similar to the ones I followed when I installed the Chef server.


a) install pre-requisites

# apt-get install ruby ruby1.8-dev libopenssl-ruby1.8 rdoc ri irb build-essential wget ssl-cert

b) install Ruby Gems

# wget http://production.cf.rubygems.org/rubygems/rubygems-1.3.7.tgz
# tar xvfz rubygems-1.3.7.tgz
# cd rubygems-1.3.7
# ruby setup.rb
# ln -sfv /usr/bin/gem1.8 /usr/bin/gem

c) install the Chef gem

# gem install chef

d) install the Chef client by bootstrapping with the chef-solo utility

d1) create /etc/chef/solo.rb with contents:

file_cache_path "/tmp/chef-solo"
cookbook_path "/tmp/chef-solo/cookbooks"
recipe_url "http://s3.amazonaws.com/chef-solo/bootstrap-latest.tar.gz"
Caveat for this stepCaveat for this step
d2) create /etc/chef/chef.json with contents:

{
"bootstrap": {
"chef": {
"url_type": "http",
"init_style": "runit",
"path": "/srv/chef",
"serve_path": "/srv/chef",
"server_fqdn": "chef.example.com",
"webui_enabled": true
}
},
"run_list": [ "recipe[bootstrap::client]" ]
}

Note that the only difference so far between the Chef server and the Chef client bootstrap files is the one directive at the end of chef.json, which is bootstrap::server for the server and bootstrap::client for the client.

Caveat for this step: if you mess up and bootstrap the client using the wrong chef.json file containing the bootstrap::server directive, you will end up with a server and not a client. I speak from experience -- I did exactly this, then when I tried to run chef-client on this box, I got:

WARN: HTTP Request Returned 401 Unauthorized: Failed to authenticate!

/usr/lib/ruby/1.8/net/http.rb:2097:in `error!': 401 "Unauthorized" (Net::HTTPServerException)

d3) run chef-solo to bootstrap the Chef client install:

# chef-solo -c /etc/chef/solo.rb -j /etc/chef/chef.json

At this point, you should have a file called client.rb in /etc/chef on your client machine, with contents similar to:

#
# Chef Client Config File
#
# Dynamically generated by Chef - local modifications will be replaced
#

log_level :info
log_location STDOUT
ssl_verify_mode :verify_none
chef_server_url "http://chef.example.com:4000"

validation_client_name "chef-validator"
validation_key "/etc/chef/validation.pem"
client_key "/etc/chef/client.pem"

file_cache_path "/srv/chef/cache"
pid_file "/var/run/chef/chef-client.pid"

Mixlib::Log::Formatter.show_time = false
Caveat for this stepCaveat for this stepCaveat for this step
e) validate the client against the server

e1) copy /etc/chef/validation.pem from the server to /etc/chef on the client
e2) run chef-client on the client; for debug purposes you can use:

# chef-client -l debug

If everything goes well, you should see a message of the type:

# chef-client
INFO: Starting Chef Run
INFO: Client key /etc/chef/client.pem is not present - registering
WARN: Node client.example.com has an empty run list.
INFO: Chef Run complete in 1.209376 sec WARN: HTTP Request Returned 401 Unauthorized: Failed to authenticate!
31 < ggheo > 30 /usr/lib/ruby/1.8/nCaveat for this stepet/http.rb:2097:in `error!': 401 "Unauthorized" (Net::HTTPServerException)onds
INFO: Running report handlers
INFO: Report handlers complete

You should also have a file called client.pem containing a private key that the client will be using when talking to the server. At this point, you should remove validation.pem from /etc/chef on the client, as it is not needed any more.

You can also run this command on the server to see if the client got registered with it:

# knife client list -c /etc/chef/knife.rb

The output should be something like:

[
"chef-validator",
"chef-webui",
"chef.example.com",
"root",
"client.example.com"
]

That's it for now. As I warned you, nothing exciting happened here except for having a Chef client that talks to a server but doesn't actually DO anything. Stay tuned for more installments in my continuing chef saga though...


Tuesday, June 15, 2010

Common nginx configuration options

Google reveals a wealth of tutorials and sample nginx config files, but in any case here are some configuration tips that have been helpful to me.

Include files

Don't be shy in splitting up your main nginx.conf file into several smaller files. Your co-workers will be grateful. A structure that has been working for me is to have one file where I define my upstream pools, one file where I define locations that point to upstream pools, and one file where I define servers that handle those locations.

Examples:

upstreams.conf

upstream cluster1 {
fair;
server app01:7060;
server app01:7061;
server app02:7060;
server app02:7061;
}

upstream cluster2 {
fair;
server app01:7071;
server app01:7072;
server app02:7071;
server app02:7072;
}

locations.conf


location / {
root /var/www;
include cache-control.conf;
index index.html index.htm;
}

location /services/service1 {
proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

add_header Pragma "no-cache";


proxy_pass http://cluster1/;
}

location /services/service2 {
proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

add_header Pragma "no-cache";

proxy_pass http://cluster2/service2;
}

servers.conf

server {
listen 80;
include locations.conf;
}

At this point, your nginx.conf looks very clean and simple (you can still split it into more include files, by separating for example the gzip configuration options into their own file etc.)

nginx.conf

worker_processes 4;
worker_rlimit_nofile 10240;

events {
worker_connections 10240;
use epoll;
}

http {
include upstreams.conf;

include mime.types;
default_type application/octet-stream;

log_format custom '$remote_addr - $remote_user [$time_local] '
'"$request" $status $bytes_sent '
'"$http_referer" "$http_user_agent" "$http_x_forwarded_for" $request_time';

access_log /usr/local/nginx/logs/access.log custom;

proxy_buffering off;
sendfile on;
tcp_nopush on;
tcp_nodelay on;

gzip on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml application/xml+rss image/svg+xml application/x-font-ttf application/vnd.ms-fontobject;
gzip_disable "MSIE [1-6]\.";

# proxy cache config
proxy_cache_path /mnt/nginx_cache levels=1:2
keys_zone=one:10m
inactive=7d max_size=10g;
proxy_temp_path /var/tmp/nginx_temp;

proxy_next_upstream error;

include servers.conf;
}

This nginx.conf file is fairly vanilla in terms of the configuration options I used, but it's worth pointing some of them out.

Multiple worker processes

This is useful when you're running nginx on a multi-core box. Example:

worker_processes 4;

Increased number of file descriptors

This is useful for nginx instances that get hit by very high traffic. You want to increase the maximum number of file descriptors that nginx can use (the default on most Unix systems is 1024; run 'ulimit -n' to see the value on your system). Example:

worker_rlimit_nofile 10240;


Custom logging

See the log_format and access_log directives above. In particular, the "$http_x_forwarded_for" value is useful if nginx is behind another load balancer, and "$request_time" is useful to see the time taken by nginx when processing a request.

Compression

This is useful when you want to compress certain types of content sent back to the client. Examples:


gzip on;
gzip_min_length 10240;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain text/css text/xml text/javascript application/x-javascript application/xml application/xml+rss image/svg+xml application/x-font-ttf application/vnd.ms-fontobject;
gzip_disable "MSIE [1-6]\.";

Proxy options

These are options you can set per location. Examples:


proxy_pass_header Server;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
add_header Pragma "no-cache";


Most of these options have to do with setting custome HTTP headers (in particular 'no-cache' in case you don't want to cache anything related to that particular location)

Proxy cache

Nginx can be used as a caching server. You need to define a proxy_cache_path and a proxy_temp_path under your http directive, then use them in the locations you want to cache.


proxy_cache_path /mnt/nginx_cache levels=1:2
keys_zone=one:10m
inactive=7d max_size=10g;
proxy_temp_path /var/tmp/nginx_temp;

In the location you want to cache, you would add something like this:

proxy_cache one;
proxy_cache_key mylocation.$request_uri;
proxy_cache_valid 200 302 304 10m;
proxy_cache_valid 301 1h;
proxy_cache_valid any 1m;
proxy_cache_use_stale error timeout invalid_header http_500 http_502 http_503 http_504 http_404;

HTTP caching options

Many times you want to cache certain types of content and not others. You can specify your caching rules in a file that you include in your root location:

location / {
root /var/www;
include cache-control.conf;

index index.html index.htm;
}

You can specify different expire headers and cache options based on the request URI. Examples (inside cache-control.conf in my case)

# default cache 1 day
expires +1d;

if ($request_uri ~* "^/services/.*$") {
expires +0d;
add_header Pragma "no-cache";
}

if ($request_uri ~* "^/(index.html)?$") {
expires +1h;
}

SSL

All you need to do here is to define another server in servers.conf, and have it include the locations you need (which can be the same ones handled by the server on port 80 for example):

server {
server_name www.example.com;
listen 443;
ssl on;
ssl_certificate /usr/local/nginx/ssl/cert.pem;
ssl_certificate_key /usr/local/nginx/ssl/cert.key;

include locations.conf;
}



syslog-ng tips and tricks

Although I've been contemplating using scribe for our logging needs, for now I'm using syslog-ng. It's been doing the job well so far. Here are a couple of configuration tips:

1) Sending log messages for a given log facility to a given log file

Let's say you want to send all haproxy log messages to a file called /var/log/haproxy.log. In haproxy.cfg you can say:

global
 log 127.0.0.1 local7 info

...which means -- log all messages to localhost, to log facility local7 and with a log level of info.

To direct these messages to a file called /var/log/haproxy.log, you need to define the following in /etc/syslog-ng/syslog-ng.conf:

i) a destination:

destination df_haproxy { file("/var/log/haproxy.log"); };

ii) a filter:

filter f_haproxy { facility(local7); };

iii) a log (which ties the destination to the filter):

log {
source(s_all);
filter(f_haproxy);
destination(df_haproxy);
};

You also need to configure syslog-ng to allow log messages sent via UPD from localhost. Add this line to the source s_all element:

udp(ip(127.0.0.1) port(514));

Important note: since you're sending haproxy log messages to the local7 facility, this means that they'll also be captured by /var/log/syslog and /var/log/messages, since they are configured in syslog-ng.conf as destinations for the filters f_syslog and f_messages, which by default catch the local7 facility. As a result, you'll have triple logging of your haproxy messages. The solution? Add local7 to the list of facilities excluded from the f_syslog and f_messages filters.

2) Sending log messages to a remote log host

Assume you want to centralize log messages for a given service by sending them to a remote log host. Let's assume that the service logs via the local0 facility. The same procedure applies, with the creation of the following elements in syslog-ng.conf:

i) a destination


destination df_remote_log {
  udp("remote_loghost" port (5000));
};


ii) a filter:


filter f_myservice { facility(local0); };

iii) a log:

log {
        source(s_all);
        filter(f_myservice);
        destination(df_remote_log);
};

Note that you can also send messages for this particular filter (corresponding to local0) to a local file, by creating a destination poining to that file and a log element tying the filter with that destination, like this:

destination df_local_log { file("/var/log/myservice.log"); };
log {
        source(s_all);
        filter(f_myservice);
        destination(df_local_log);
};

Finally, to finish the remote logging bit, you need to configure syslog-ng on the remote host to allow messages on UDP port 5000, and to log them to a local file. Here's my configuration on host "remote_loghost":

i) a new source allowing messages on port 5000:

source s_remote_logging {
    udp(ip(0.0.0.0) port(5000));
};

ii) a destination pointing to a local file:

destination df_common_log { file ("/var/log/myservice_common.log"); };

iii) a log combining the source and the destination above; I am using the predefined f_syslog filter here, because I don't need to select messages based on a given facility anymore:

log {
        source(s_remote_logging);
        filter(f_syslog);
        destination(df_common_log);
};




Thursday, June 03, 2010

Setting up PHP5/FastCGI with nginx

PHP is traditionally used with Apache, but can also be fronted by nginx. Here are some notes that I took while setting up PHP5/FastCGI behind nginx. My main source of inspiration was this howtoforge article. My OS flavor is Ubuntu 9.04 32-bit, but the same procedure applies to other flavors.

1) Install PHP5 and other required packages

# apt-get install php5 php5-cli php5-cgi php5-xcache php5-curl php5-sqlite libpcre3 libpcre3-dev libssl-dev

2) Configure xcache

I added the lines in the following gist to the end of /etc/php5/cgi/php.ini: http://gist.github.com/424172

3) Install spawn-fcgi, which used to be included in the lighttpd package, but now can be downloaded on its own from here.

# tar xvfz spawn-fcgi-1.6.3.tar.gz; cd spawn-fcgi-1.6.3; ./configure; make; make install

At this point you should have /usr/local/bin/spawn-fcgi installed. This wrapper needs to be launched at some point via a command line similar to this:

/usr/local/bin/spawn-fcgi -a 127.0.0.1 -p 9000 -u www-data -f /usr/bin/php5-cgi

What this does is it launches the php5-cgi process which will listen on port 9000 for php requests.

The original howtoforge article recommended writing an init.d wrapper for this command. I used to do this, but I noticed that the php5-cgi process dies quite frequently, so I needed something more robust. Hence...

4) Configure spawn-fcgi to be monitored by supervisor

I installed supervisor via:

# apt-get install python-setuptools
# easy_install supervisor
# echo_supervisord_conf > /etc/supervisord.conf

Then I added this section to /etc/supervisord.conf:

[program:php5-cgi]
command=/usr/local/bin/spawn-fcgi -n -a 127.0.0.1 -p 9000 -u www-data -f /usr/bin/php5-cgi
redirect_stderr=true          ; redirect proc stderr to stdout (default false)
stdout_logfile=/var/log/php5-cgi/php5-cgi.log
stdout_logfile_maxbytes=10MB   ; max # logfile bytes b4 rotation (default 50MB)

Note that I added the -n flag to the spawn-fcgi command line. This keeps the process it spawns (/usr/bin/php5-cgi) in the foreground, so that it can be daemonized and monitored properly by supervisor.

I use this init.d script to start/stop supervisord. Don't forget to 'chkconfig supervisord on' so that it starts at boot time.

Per the configuration above, I also created /var/log/php5-cgi and chown-ed it www-data:www-data. That directory will contain a file called php5-cgi.log which will capture both the stdout and stderr of the php5-cgi process.

When I started supervisord via 'service supervisord start', I was able to see /usr/bin/php5-cgi running. I killed that process, and yes, supervisor restarted it. May I just say that SUPERVISOR ROCKS!!!

5) Install and configure nginx

I prefer to install nginx from source.

# tar xvfz nginx-0.8.20.tar.gz; cd nginx-0.8.20; ./configure --with-http_stub_status_module; make; make install

To configure nginx to serve php files, add these 'location' sections to the server listening on port 80 in /usr/local/nginx/conf/nginx.conf:

      location / {
            root   /var/www/mydocroot;
            index  index.php index.html;
        }

    location ~ \.php$ {
        include /usr/local/nginx/conf/fastcgi_params;
        fastcgi_pass  127.0.0.1:9000;
        fastcgi_index index.php;
        fastcgi_param  SCRIPT_FILENAME  /var/www/mydocroot$fastcgi_script_name;
        }

(assuming of course your root document directory is /var/www/mydocroot)

Now nginx will foward all request to *.php to a process listening on port 9000 on localhost. That process is php5-cgi, as we are aware by now.

That's about it. Start up nginx and you should be in business.

Friday, May 21, 2010

Setting up a mail server in EC2 with postfix and Postini

Setting up a mail server in 'the cloud' is notoriously difficult, mainly because many cloud-based IP addresses are already blacklisted, and also because you don't have control over reverse DNS in many cases. One way to mitigate these factors is to use Postini for outbound email from your cloud-based mail server. I've experimented with this in a test environment, with fairly good results. Here are some notes I jotted down. I am indebted to Jeff Roberts for his help with the postfix setup.

Let's assume you have a domain called mydomain.com and you want to set up a mail server in EC2 (could be any cloud provider) so that you can both send and receive email. I will also assume that you'll use Postini -- which currently costs $11/year/user. In my setup, I am mostly interested in sending mail out and less in receiving mail, so I only have one user: info@mydomain.com.

EC2 setup

I deployed a c1.medium EC2 instance running Ubuntu 9.04 and I also gave it an Elastic IP. The host name of my mail server is mail01.

I then installed postfix via 'apt-get install postfix'.

Postfix setup

For starters, I configured postfix to receive mail for mydomain.com, and to restrict the set of IP addresses that are allowed to use it as a relay. The relevant lines in /etc/postfix/main.cf are:

myhostname = mail01
myorigin = /etc/mailname
mydestination = mydomain.com, mail01, localhost.localdomain, localhost
mynetworks = cidr:/etc/postfix/allowed_relays, 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128

The contents of /etc/postfix/allowed_relays are similar to:

10.1.2.3   tstapp01
10.5.6.7  stgapp01

These IPs/names are application servers that I configured to use mail01 as their outgoing mail server, so they need to be allowed to send mail out.

Postini setup

When you sign up for Postini, you can specify that the user you created is of admin type. When you then log in to https://login.postini.com/ as that user and specify that you want to go to the System Administration area, you'll be directed to a URL such as http://ac-s9.postini.com. Make a note of the number after the s, in this case 9. It will dictate which outbound Postini mail servers you will need to use, per these instructions. For s9 (which was my case), the CIDR range of the outbound Postini mail servers is 74.125.148.0/22.

Within the Postini System Administration interface, I had to edit the settings for Inbound Servers and Outbound Servers.

For Inbound Servers, I edited the Delivery Mgr tab and specified the Elastic IP of mail01 as the Email Server IP address.

For Outbound Servers, I cliked on 'Add Outbound Email Server' and I specified a range whose beginning and end are the Elastic IP of mail01. For Reinjection Hosts, I also specified the Elastic IP of mail01.

The Reinjection Host setup is necessary for using Postini as an outbound mail service. See the Postini Outbound Services Configuration Guide (PDF) for details on how to do this depending on the mail server software you use.

More Postfix setup

Now we are ready to configure postfix to use Postini as its own outbound relay. I just added one line to /etc/postfix/main.cf:

relayhost = outbounds9.obsmtp.com

(this will be different, depending on the number you get when logging into Postini; in my case the number is 9)

I also had to allow the CIDR range of the Postini mail server to use mail01 as a relay (they need to reinject the message into mail01, hence the need to specify mail01's external Elastic IP as the Reinjection Host IP above). I edited /etc/postfix/allowed_relays and added the line:

74.125.148.0/22 postini ac-s9

DNS setup

To use Postini as an incoming mail server service, you need to configure mydomain.com to use Google Apps. Then you'll be able to specify the Postini mail servers that you'll use as MX records for mydomain.com.


DKIM setup

To further improve the chances that email you send out from the cloud will actually reach its intended recipients, it's a good idea to set up DKIM and SPF. For DKIM, see my blog post on "DKIM setup with postfix and OpenDKIM".

SPF setup

This is actually fairly easy to do. You basically need to add a DNS record which details the IP addresses of the mail servers which are allowed to send mail for mydomain.com. You can use an SPF setup wizard for this, and an SPF record testing tool to see if your setup is correct. In my case, I added a TXT record with the contents:

v=spf1 ip4:207.126.144.0/20 ip4:64.18.0.0/20 ip4:74.125.148.0/22 ip4:A.B.C.D -all

where the first 3 subnets are the 3 possible CIDR ranges for Postini servers, and the last IP which I denoted with A.B.C.D is mail01's Elastic IP address.

At this point, you should be able to both send email out through mail01 (from the servers you allowed to do so) and then through Postini, and also received email for the users you created within mydomain.com. If you also have DKIM and SPF setup, you're in a pretty good shape in terms of email deliverability situation. Still, sending email reliably is one of the hardest things to do in today's Internet, so good luck!

Deploying MongoDB with master-slave replication

This is a quick post so I don't forget how I set up replication between a MongoDB master server running at a data center and a MongoDB slave running in EC2. I won't go into how to use MongoDB (I'll have another one or more posts about that), I'll just talk about installing and setting up MongoDB with replication.

Setting up the master server

My master server runs Ubuntu 9.10 64-bit, but similar instructions apply to other flavors of Linux.

1) Download MongoDB binary tarball from the mongodb.org downloads page (the current production version is 1.4.2). The 64-bit version is recommended. Since MongoDB uses memory-mapped files, if you use the 32-bit version your database files will have a 2 GB size limit. So get the 64-bit version.

2) Unpack the tar.gz somewhere on your file system. I unpacked it in /usr/local on the master server and created a mongodb symlink:

# cd /usr/local
# tar xvfz mongodb-linux-x86_64-1.4.1.tar.gz
# ln -s mongodb-linux-x86_64-1.4.1 mongodb

3) Create a mongodb group and a mongodb user. Create a data directory on a partition with generous disk space amounts. Create a log directory. Set permissions on these 2 directories for user and group mongodb:

# groupadd mongodb
# useradd -g mongodb mongodb
# mkdir /data/mongodb
# mkdir /var/log/mongodb
# chown -R mongodb:mongodb /data/mongodb /var/log/mongodb

4) Create a startup script in /etc/init.d. I modified this script so it also handles logging the way I wanted. I posted my version of the script as this gist. In my case the script is called mongodb.master. I chkconfig-ed it on and started it:

# chkconfig mongodb.master on
# service mongodb.master start

5) Add location of MongoDB utilities to your PATH. If my case, added this line to .bashrc:

export PATH=$PATH::/usr/local/mongodb/bin

That's it, you should be in business now. If you type

# mongo

you should be connected to your MongoDB instance. You should also see a log file created in /var/log/mongodb/mongodb.log.

Also check out the very good administration-related documentation on mongodb.org.

Setting up the slave server

The installation steps 1) through 3) are the same on the slave server as on the master server.

Now let's assume you already created a database on your master server. If your database name is mydatabase, you should see files named mydatabase.0 through mydatabase.N and mydatabase.ns in your data directory. You can shut down the master server (via "service mongodb.master stop"), then simply copy these files over to the data directory of the slave server. Then start up the master server again.

At this point, in step 4 you're going to use a slightly different startup script. The main difference is in the way you start the mongod process:

DAEMON_OPTS="--slave --source $MASTER_SERVER --dbpath $DBPATH -logpath $LOGFILE --logappend run"

where MASTER_SERVER is the name or the IP address of your MongoDB master server. Note that you need port 27017 open on the master, in case you're running it behind a firewall.

Here is a gist with my slave startup script (which I call mongodb.slave).

Now when you run 'service mongodb.slave start' you should have your MongoDB slave database up and running, and sync-ing from the master. Check out the log file if you run into any issues.

Log rotation

The mongod daemon has a nifty feature: if you send it a SIGUSR1 signal via 'kill -USR1 PID', it will rotate its log file. The current mongodb.log will be timestamped, and a brand new mongodb.log will be started.

Monitoring and graphing

I use Nagios for monitoring and Munin for resource graphing/visualization. I found a good Nagios plugin for MongoDB at the Tag1 Consulting git repository: check_mongo. The 10gen CTO Eliot Horowitz maintains the MongoDB munin plugin on GitHub: mongo-munin.

Here are the Nagios commands I use on the master to monitor connectivity, connection counts and long operations:

/usr/local/nagios/libexec/check_mongo -H localhost -A connect
/usr/local/nagios/libexec/check_mongo -H localhost -A count
/usr/local/nagios/libexec/check_mongo -H localhost -A long

On the slave I also monitor slave lag:

/usr/local/nagios/libexec/check_mongo -H localhost -A slavelag

The munin plugin is easy to install. Just copy the files from github into /usr/share/munin/plugins and create symlinks to them in /etc/munin/plugins, then restart munin-node.

mongo_btree -> /usr/share/munin/plugins/mongo_btree
mongo_conn -> /usr/share/munin/plugins/mongo_conn
mongo_lock -> /usr/share/munin/plugins/mongo_lock
mongo_mem -> /usr/share/munin/plugins/mongo_mem
mongo_ops -> /usr/share/munin/plugins/mongo_ops

You should then see nice graphs for btree stats, current connections, write lock percentage, memory usage and mongod operations.

That's it for now. I'll talk more about how to actually use MongoDB with pymongo in a future post. But just to express an opinion, I think MongoDB rocks! It hits a very sweet spot between SQL and noSQL technologies. At Evite we currently use it for analytics and reporting, but I'm keeping a close eye on sharding becoming production-ready so that we can potentially use it for web-facing traffic too.

Sunday, May 09, 2010

Python Testing Tools Taxonomy on bitbucket

Michael Foord (aka Fuzzyman aka @voidspace) has kindly jumpstarted the task of porting the Python Testing Tools Taxonomy (aka PTTT) to reStructuredText, and to put it up on bitbucket for easier collaboration. The new bitbucket project is called taxonomy. Michael ported the unit testing section so far.

Porting the original trac format to reST is not the most fun way to spend one's time, so we need collaborators. If you're interested in contributing to this project, please fork and let us know so we can commit your patches. Thanks!

Thursday, May 06, 2010

RabbitMQ clustering in Ubuntu

I've been banging my head against various RabbitMQ-related walls in the last couple of days, so I wanted to quickly jot down how I got clustering to work on Ubuntu boxes, before I forget.

Scenario A
  • 2 Ubuntu 9.04 64-bit servers (app01 and app02)
  • 1 Ubuntu 9.10 64-bit server (app03)
  • all servers are part of an internal DNS zone
My initial mistake here was to install the rabbitmq-server package via "apt-get install". On 9.04, I got version 1.5.4 or rabbitmq-server, while on 9.10 I got 1.6. As it turns out, the database versions (rabbitmq uses the Mnesia database) are incompatible in terms of clustering between these 2 versions.

When I tried to join the server running Ubuntu 9.10 to the cluster, it complained with:

error: {schema_integrity_check_failed,
           {aborted,{no_exists,rabbit_user,version}}}

So....I removed rabbitmq-server via "apt-get remove" on all 3 servers. I also removed /var/lib/rabbitmq (which contains the database directory mnesia) and /var/log/rabbitmq (which contains the logs).

I then downloaded the rabbitmq-server_1.7.2-1_all.deb package and installed it, but only after also installing erlang. So:

# apt-get install erlang
# dpkg -i rabbitmq-server_1.7.2-1_all.deb

The install process automatically starts up the rabbitmq server. You can see its status by running:

# rabbitmqctl status
Status of node 'rabbit@app02' ...
[{running_applications,[{rabbit,"RabbitMQ","1.7.2"},
                        {mnesia,"MNESIA  CXC 138 12","4.4.7"},
                        {os_mon,"CPO  CXC 138 46","2.1.8"},
                        {sasl,"SASL  CXC 138 11","2.1.5.4"},
                        {stdlib,"ERTS  CXC 138 10","1.15.5"},
                        {kernel,"ERTS  CXC 138 10","2.12.5"}]},
 {nodes,['rabbit@app02']},
 {running_nodes,['rabbit@app02']}]
...done.

Very important step: make sure the 3 servers have the same .erlang.cookie file. By default, every new installation creates a file containing a unique cookie in /var/lib/rabbitmq/.erlang.cookie. For clustering, the cookie needs to be the same, otherwise the servers are not allowed to access the shared cluster state on each other.

Here's what I did:
  • I stopped rabbitmq-server on app02 and app03 via '/etc/init.d/rabbitmq-server stop'
  • I copied /var/lib/rabbitmq/.erlang.cookie from app01 to app02 and app03
  • I removed /var/lib/rabbitmq/mnesia on app02 and app03 (you need to do this when you change the cookie)
  • finally I restarted rabbitmq-server on app02 and app03 via '/etc/init.d/rabbitmq-server start'

Now comes the actual clustering setup part. In my case, I wanted all nodes to be disk nodes as opposed to RAM nodes in terms of clustering, for durability purposes.

Here's what I did on app02, following the RabbitMQ clustering doc:

# rabbitmqctl stop_app
Stopping node 'rabbit@app02' ...
...done.

# rabbitmqctl reset
Resetting node 'rabbit@app02' ...
...done.

# rabbitmqctl cluster rabbit@app01 rabbit@app02
Clustering node 'rabbit@app02' with ['rabbit@app01',
                                         'rabbit@app02'] ...
...done.

# rabbitmqctl start_app
Starting node 'rabbit@app02' ...
...done.

# rabbitmqctl status
Status of node 'rabbit@app02' ...
[{running_applications,[{rabbit,"RabbitMQ","1.7.2"},
                        {mnesia,"MNESIA  CXC 138 12","4.4.7"},
                        {os_mon,"CPO  CXC 138 46","2.1.8"},
                        {sasl,"SASL  CXC 138 11","2.1.5.4"},
                        {stdlib,"ERTS  CXC 138 10","1.15.5"},
                        {kernel,"ERTS  CXC 138 10","2.12.5"}]},
 {nodes,['rabbit@app02','rabbit@app01']},
 {running_nodes,['rabbit@app01','rabbit@app02']}]

You need to make sure at this point that you can see both nodes app01 and app02 listed in the nodes list and also in the running_nodes list. If you forgot the cookie step, the steps above will still work, but you'll only see app02 listed under nodes and running_nodes. Only if you inspect the log file /var/log/rabbitmq/rabbit.log on app01 will you see errors such as

=ERROR REPORT==== 6-May-2010::11:12:03 ===
** Connection attempt from disallowed node rabbit@app02 **

Assuming you see both nodes app01 and app02 in the nodes list, you can proceed with the same steps on app03. At this point, you should see something similar to this if you run 'rabbitmqctl status' on each node:

# rabbitmqctl status
Status of node 'rabbit@app03' ...
[{running_applications,[{rabbit,"RabbitMQ","1.7.2"},
{mnesia,"MNESIA CXC 138 12","4.4.10"},
{os_mon,"CPO CXC 138 46","2.2.2"},
{sasl,"SASL CXC 138 11","2.1.6"},
{stdlib,"ERTS CXC 138 10","1.16.2"},
{kernel,"ERTS CXC 138 10","2.13.2"}]},
{nodes,['rabbit@app03','rabbit@app02','rabbit@app01']},
{running_nodes,['rabbit@app01','rabbit@app02','rabbit@app03']}]
...done.

Now you can shut down and restart 2 of these 3 nodes, and the cluster state will be preserved. I am still experimenting with what happens if all 3 nodes go down. I tried it and I got a timeout when trying to start up rabbitmq-server again on one of the nodes. I could only get it to start by removing the database directory /var/lib/rabbitmq/mnesia -- but this kinda defeats the purpose of persisting the state of the cluster on disk.

Now for the 2nd scenario, when the servers you want to cluster are not running DNS (for example if they're EC2 instances).

Scenario B
  • 2 Ubuntu 9.04 servers running as EC2 instances (ec2app01 and ec2app02)
  • no internal DNS
Let me cut to the chase here and say that the proper solution to this scenario is to have an internal DNS zone setup. The fact is that rabbitmq-server nodes are communicating amongst themselves using DNS, so if you choose not to use DNS, and instead use entries in /etc/hosts or even IP addresses, you need to resort to ugly hacks involving editing of the rabbitmq-server, rabbitmq-multi and rabbitmqctl scripts in /usr/lib/rabbitmq/bin and specifying -name instead of -sname everywhere you find that option in those scripts. And even then things might not work, and things will break when you will upgrade rabbitmq-server. See this discussion and this blog post for more details on this issue. In a nutshell, the RabbitMQ clustering mechanism expects that the FQDN of each node can be resolved via DNS on all other nodes in the cluster.

So...my solution was to deploy an internal DNS server running bind9 in EC2 -- which is something that sooner or later you need to do if you run more than a handful of EC2 instances.


Future work

This post is strictly about RabbitMQ clustering setup. There are many other facets of deploying a highly-available RabbitMQ system.

For example, I am also experimenting with putting an HAProxy load balancer in front of the my RabbitMQ servers. I want all clients to communicate with a single entry point in my RabbitMQ cluster, and I want that entry point to load balance across all servers in the cluster. I'll report on that setup later.

Another point that I need to figure out is how to deal with failures at the individual node level, given that RabbitMQ queues are not distributed across all nodes in a cluster, but instead they stay on the node where they were initially created. See this discussion for more details on this issue.

If you have deployed a RabbitMQ cluster and worked through some of these issues already, I'd love to hear about your experiences, so please leave a comment.

Sunday, May 02, 2010

Getting wireless NIC to work on Thinkpad T410s with Ubuntu Lucid

I tweeted this before, but a blog post is less ephemeral and serves better as a longer-term memory for myself and maybe for others.

The Thinkpad T410s uses the Realtek rtl8192se wireless chip which doesn't play well with Ubuntu Lucid, even with the general release from last week. To get it to work, I had to:


1) Download Linux driver for rtl8192se from this Realtek page.
2) Untar the package, run make and 'make install' as root.
3) Reboot.


Unfortunately, you'll need to do this every time you upgrade the kernel, unless support for this chipset gets better.

Friday, April 23, 2010

Deploying and monitoring mysql-proxy with supervisor

I've known about supervisor ever since Chris McDonough's PyCon 2008 talk. I've wanted to use it before, especially in conjunction with mysql-proxy, but my efforts in running it haven't been met with success. That is, until I realized the one but important mistake I was making: I was trying to run mysql-proxy in its own --daemon mode, and I was expecting supervisor to control the mysql-proxy at the same time. Well, this is not how supervisor works.

Supervisor handles processes that run in the foreground, and it daemonizes them for you. It knows how to capture stdout and stderr for the process it controls and how to save those streams either combined or separately to their own log files.

Of course, what supervisor excels at is actually monitoring that the process is running -- it can detect that the process died or was killed, and it will restart that process within seconds. This sounds like the perfect scenario for the deployment of mysql-proxy. Here's what I did to get that to work:

1) Installed supervisor via easy_install.
# easy_install supervisor
I also created a configuration file for supervisord by running:
# echo_supervisord_conf > /etc/supervisord.conf
2) Downloaded the binary package for mysql-proxy (version 0.8 at the time of this post). I un-tar-ed the package in /opt/mysql-proxy-0.8.0-linux-glibc2.3-x86-64bit and I made a symlink to it called /opt/mysql-proxy.

3) I wanted mysql-proxy to send traffic to 2 MySQL servers that I had configured in master-master mode. I also wanted one MySQL server to get all traffic, and the 2nd server to be hit only if the first one went down. In short, I wanted active-passive failover mode for my 2 MySQL servers. I found this lua script written by Sheeri Cabral for doing the failover, but I had to modify it slightly -- at least in mysql-proxy 0.8, the proxy.backends variable is now called proxy.global.backends. The rest of the script was fine. I created a directory called /opt/mysql-proxy/scripts and saved my version of the script in a file called failover.lua in that directory.

4) I verified that I could correctly run mysql-proxy in foreground mode by issuing this command (where 10.1.1.1 and 10.1.1.2 are the IP addresses of my active and passive MySQL servers respectively):
/opt/mysql-proxy/bin/mysql-proxy \
--admin-address 127.0.0.1:43306 \
--proxy-address 127.0.0.1:33306 \
--proxy-backend-addresses=10.1.1.1:3306 \
--proxy-backend-addresses=10.1.1.2:3306 \
--proxy-lua-script=/opt/mysql-proxy/scripts/failover.lua

Note that I am specifying the local ports where mysql-proxy should listen for admin traffic (port 43306) and for regular MySQL proxying traffic (port 33306).

5) I created a [program] section in the /etc/supervisord.conf configuration file that looks like this:

[program:mysql-proxy-3306]
command=/opt/mysql-proxy/bin/mysql-proxy \
--admin-address 127.0.0.1:43306 \
--proxy-address 127.0.0.1:33306 \
--proxy-backend-addresses=stgdb101:3306 \
--proxy-backend-addresses=stgdb03:3306 \
--proxy-lua-script=/opt/mysql-proxy/scripts/failover.lua
redirect_stderr=true
stdout_logfile=/var/log/%(program_name)s.log
user=mysql

Here's a short explanation of each line in this section:
i) The program name can be anything you want. It is what gets displayed in the program list when you run supervisorctl.
ii) The command is the exact commmand you would use to run the process in the foreground.
iii) I do want to redirect stderr to stdout
iv) I do want to capture stdout to a file in /var/log named "program_name".log -- so in my case it's named mysql-proxy-3306.log
v) I want to run the mysql-proxy process as user mysql (if no user is specified, it will run as root)

I made two other modifications to /etc/supervisord.conf in the [supervisord] section:
logfile=/var/log/supervisord.log
pidfile=/var/run/supervisord.pid

I also found an init.d startup script example for supervisord, which I modified a bit. My version is here. To start up supervisord, I use '/etc/init.d/supervisord start' or 'service supervisord start'. To make supervisord re-read its configuration file, I use 'service supervisord reload'. I also ran 'chkconfig supervisord on' to have the init.d script start upon reboot.

6) After starting up supervisord, I could see the mysql-proxy process running. Any output that mysql-proxy might have is captured in /var/log/mysql-proxy-3306.log.
Also, by running supervisorctl, I could see something like this:

mysql-proxy-3306 RUNNING pid 1860, uptime 23:10:45

To test out the monitoring and restarting capabilities of supervisord, I did a hard kill -9 of the mysql-proxy process. In a matter of 2 seconds, the process was restarted by supervisord. Here are the lines related to this in /var/log/supervisord.log:

2010-04-22 02:04:59,161 INFO exited: mysql-proxy-3309 (terminated by SIGKILL; not expected)
2010-04-22 02:05:00,177 INFO spawned: 'mysql-proxy-3309' with pid 26137
2010-04-22 02:05:01,325 INFO success: mysql-proxy-3309 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

As Borat would say...I LIKE!!!

My intention is to apply this methodology for other processes that I was daemonizing via grizzled.os for example. The advantage of using supervisord are many, but of particular importance to me is that it will restart the process on errors, so I don't need to worry about doing that on my own.

That's about it for now. I will post again about how mysql-proxy behaves in failover mode in production conditions. For now I have it in staging.

Thursday, April 22, 2010

Installing the memcached Munin plugin

Munin doesn't ship with a memcached plugin. Almost all links that point to that plugin on the Munin wiki are broken. I finally found it here. Here's what I did to get it to work:
  • Download raw text of the plugin. I saved it as "memcached_". Replace @@PERL@@ variable at the top of the file with the path to perl on your system. Make memcached_ executable.
On the munin node that you want to get memcached stats for (ideally you'd add these steps to your automated deployment scripts; I added them in a fabric file called fab_munin.py):
  • Copy memcached_ in /usr/share/munin/plugins (or equivalent for your system)
  • Make 3 symlinks in /etc/munin/plugins to the memcached_ file, but name each symlink according to the metric that will be displayed in the munin graphs:
    • ln -snf /usr/share/munin/plugins/memcached_ memcached_bytes
    • ln -snf /usr/share/munin/plugins/memcached_ memcached_counters
    • ln -snf /usr/share/munin/plugins/memcached_ memcached_rates
  • Install the Perl Cache::Memcached module (you can do it via the cpan cmdline utility)
  • Restart the munin-node service
At this point you should see 3 graphs for memcached on the munin dashboard:
  • "Network traffic" (from the 'bytes' symlink)
  • "Current values" such as bytes allocated/current connections/current items (from the 'counters' symlink)
  • "Commands" such as cache misses/cache hits/GETs/SETs (from the 'rates' symlink).

Wednesday, April 07, 2010

"Load Balancing in the Cloud" whitepaper by Rightscale

Rightscale recently published a whitepaper titled "Load Balancing in the Cloud: Tools, Tips and Techniques" (you can get it from here, however registration is required). I found out about it from this post on the Rightscale blog which is a good introduction to the paper. Here are some things I found interesting in the paper:
  • they tested HAProxy, Zeus Technologies Load Balancer, aiCache's Web Accelerator and Amazon's Elastic Load Balancer (ELB), all deployed in Amazon EC2
  • the connection rate (requests/second) was the metric under test, because it turns out it's usually the limiting factor when deploying a LB in the cloud
  • it was nice to see HAProxy as pretty much the accepted software-based Open Source solution for load balancing in the cloud (the other LBs tested are not free/open source)
  • the load testing methodology was very sound, and it's worth studying; Brian Adler, the author of the whitepaper, clearly knows what he's doing, and he made sure he eliminated potential bottlenecks along the paths from load-generating clients to the Web servers behind the LB
  • ab and httperf were used to generate load
  • all the non-ELB solutions resulted in 5,000 requests/sec on average (Zeus was slightly better than HAProxy, 6.5K vs. 5.2K req/sec)
  • ELB was shown to be practically 'infinitely' scalable (for some value of infinity); however, the elasticity of ELB is gradual, so if you experience a sudden traffic spike, ELB might catch up too slowly for your needs
  • for slow and steady traffic increase though ELB seems to be ideal
  • it turns out that 100,000 packets/second total throughput (inbound + outbound) is a hard limit in EC2 and it is due to virtualization
  • so...if your Web traffic exceeds this limit, you need to deploy multiple load balancers (or use ELB of course); when I was at OpenX, we did just that by deploying multiple HAProxy instances and using DNS round-robin to load balance traffic across our load balancers
Overall, if you're interested in deploying high traffic Web sites in the cloud, I think it's worth your time to register and read the whitepaper.

Thursday, March 25, 2010

DKIM setup with postfix and OpenDKIM

If sending email is a critical part of your online presence, then it pays to look at ways to enhance the probability that messages you send will find their way into your recipients' inboxes, as opposed to their spam folders. This is fairly hard to achieve and there are no silver bullet guarantees, but there are some things you can do to try to enhance the reputation of your mail servers.

One thing you can do is use DKIM, which stands for DomainKeys Identified Mail. DKIM is a result of a merging between Yahoo's DomainKeys and Cisco's Identified Internet Mail. There's a great Wikipedia article on DKIM which I strongly recommend you read.

DKIM is a method for email authentication -- in a nutshell, mail senders use DKIM to digitally sign messages they send with a private key. They also publish the corresponding public key as a DNS record. On the receiving side, mail servers use the public key to verify the digital signature. So by using DKIM, you as a mail sender prove to your mail recipients that you are who you say you are.

Note that DKIM doesn't prevent spam. Spammers can also use DKIM, and sign their messages. However, DKIM achieves an important goal, which is to prevent spammers from spoofing the source of their emails and impersonate users in other mail domains by forging the 'From:' header in their spam emails. If spammers want to use DKIM, they are thus forced to use their real domain name in the 'From' header, and this makes it easier for the receiving mail servers to reject that email. See also 'Three myths about DKIM' by John R. Levine.

As an email sender, if you use DKIM, then your chances of your mail servers being whitelisted and of your mail domain being considered 'reputable' are increased.

Enough theory, let's see how you can set up DKIM with postfix. I'm going to use OpenDKIM and postfix on an Ubuntu 9.04 server.

1) Install prerequisites

# apt-get install libssl-dev libmilter-dev

2) Download and install OpenDKIM

# wget http://downloads.sourceforge.net/project/opendkim/opendkim-2.0.1.tar.gz
# tar xvfz opendkim-2.0.1.tar.gz
# cd opendkim-2.0.1
# ./configure
# make
# make install

You will also need to add /usr/local/lib to the paths inspected by ldconfig, otherwise opendkim will not find its required shared libraries when starting up. I created a file called /etc/ld.so.conf.d/opendkim.conf containing just one line:

/usr/local/lib

Then I ran:

# ldconfig

3) Add a 'dkim' user and group

# useradd dkim

4) Use the opendkim-genkey.sh script to generate a private key (in PEM format) and a corresponding public key inside a TXT record that you will publish in your DNS zone. The opendkim-genkey.sh script takes as arguments your DNS domain name (-d) and a so-called selector, which identifies this particular private key/DNS record combination. You can choose any selector name you want. In my example I chose MAILOUT.

# cd ~/opendkim-2.0.1/opendkim
# ./opendkim-genkey.sh -d mydomain.com -s MAILOUT
# mkdir -p /var/db/dkim
# cp MAILOUT.private /var/db/dkim/MAILOUT.key.pem

The opendkim-genkey.sh script generates a private key called MAILOUT.private, which I'm copying to /var/db/dkim/MAILOUT.key.pem. It also generates a file called MAILOUT.txt which contains the TXT record that you need to add to your DNS zone:

# cat MAILOUT.txt
MAILOUT._domainkey IN TXT "v=DKIM1; g=*; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADGTeQKBgQDATldYm37cGn6W1TS1Ukqy2ata8w2mw+JagOC+GNEi02gvDyDtfTT0ivczPl45V1SqyQhdwF3dPnhkUXgxjjzBvU6+5uTFU4kzrKz0Ew7vywsCMcfKUjYhtVTAbKAy+5Vf3CNmOlFlJnnh/fdQHVsHqqNjQII/13bVtcfYPGOXJwIDAQAB" ; ----- DKIM MAILOUT for mydomain.com

5) Configure DNS

Add the generated TXT record to the DNS zone for mydomain.com.

6) Configure opendkim

There is a sample file called opendkim.conf.sample located in the root directory of the opendkim source distribution. I copied it as /etc/opendkim.conf, then I set the following variables (this article by Eland Systems was very helpful):

Canonicalization relaxed/simple
Domain mydomain.com
InternalHosts /var/db/dkim/internal_hosts
KeyFile /var/db/dkim/MAILOUT.key.pem
Selector MAILOUT
Socket inet:9999@localhost
Syslog Yes
UserID dkim
X-Header yes

Note the InternalHosts setting. It points to a file listing IP addresses that belong to servers which use your mail server as a mail relay. They could be for example your application servers that send email via your mail server. You need to list their IP addresses in that file, otherwise mail sent by them will NOT be signed by your mail server running opendkim.

7) Start up opendkim


# /usr/local/sbin/opendkim -x /etc/opendkim.conf

At this point, opendkim is running and listening on the port you specified in the config file -- in my case 9999.

8) Configure postfix

You need to configure postfix to use opendkim as an SMTP milter.

Note that the postfix syntax in the opendkim documentation is wrong. I googled around and found this blog post which solved the issue.

Edit /etc/postfix/main.cf and add the following line:

smtpd_milters = inet:localhost:9999

(the port needs to be the same as the one opendkim is listening on)

Reload the postfix configuration:

# service postfix reload

9) Troubleshooting

At first, I didn't know about the InternalHosts trick, so I was baffled when email sent from my application servers wasn't being signed by opendkim. To troubleshoot, make sure you set

LogWhy yes

in opendkim.conf and then inspect /var/log/mail.log and google for any warnings you find (remember to always RTFL!!!).

When you're done troubleshooting, set LogWhy back to 'no' so that you don't log excessively.

10) Verifying your DKIM setup

At this point, try to send email to some of your email accounts such as gmail, yahoo, etc. When you get the email there, show all headers and make sure you see the DKIM-Signature and X-DKIM headers. Here's an example from an email I received in my GMail account (you need to click the 'Reply' drop-down and choose 'Show original' to see all the headers):


DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mydomain.com;
s=MAILOUT; t=1269364019;
bh=5UokauIClLiW/0JovG5US3xUr2LcjGTnutud9Ke1V2E=;
h=MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID:
Subject:From:Reply-to:To:Date;
b=X7cgUltyjJpqN7xavEHPrLSTnEqyj7fsuS/4HDrs3YfZg2d8K9EdvqJ5gwdrmkVEL
M27ZDugI0yaK5C+cdOZ2Fyj8nG83nLwcnMz7X3EqCkHP0CPlv9FCXtjispxLJ2W3xc
AUQnp4vVChYb/TEGmQV+Ilzzf9a9WDUMhWlaljjM=
X-DKIM: OpenDKIM Filter v2.0.1 mymailserver 4B4B864925


That's about it. There are quite a few moving parts here, so I hope somebody will find this useful.

Monday, March 15, 2010

Automated deployments with Fabric - tips and tricks

Fabric has become my automated deployment tool of choice. You can find tons of good blog posts on it if you google it. I touched upon it in 2 blog posts, 'Automated deployments with Puppet and Fabric' and 'Automated deployment systems: push vs. pull'. Here are some more tips and tricks extracted from my recent work with Fabric.

If it's not in a Fabric fabfile, it's not deployable

This is a rule I'm trying to stick to. All deployments need to be automated. No ad-hoc, one-off deployments allowed. If you do allow them, they can quickly snowball into an unmaintainable, unreproducible mess.

Update (via njl): Only deploy files that are checked out of source control

So true. I should have included this when I wrote the post. Make sure your fabfiles, as well as all files you deploy remotely are indeed what you expect them to be. It's best to keep them in a source control repository, and to be disciplined about checking them in every time you update them.

Along the same lines: when deploying our Tornado-based Web services here at Evite, we make sure our continuous integration system (we use Hudson) builds the eggs and saves them in a central location, and we install those eggs with Fabric from the central location.

Common Fabric operations

I find myself using a small number of the Fabric functions. Basically I mainly use these three operations:

1) copy files to a remote server (with the 'put' function)
2) run a command as a regular user on a remote server (with the 'run' function)
3) run a command as root on a remote server (with the 'sudo' function)

I also make use sometimes of the 'sed' function, which allows me for example to comment or uncomment lines in an Nginx configuration file, so I can take servers in and out of the load balancer (for commenting lines out, you can also use the 'comment' function).

Complete example

Here is a complete example of installing munin-node on a server:
# cat fab_munin.py

from fabric.api import *

# Globals

env.project='MUNIN'
env.user = 'root'

# Environments
from environments import *

# Tasks

def install():
install_prereqs()
copy_munin_files()
start()

def install_prereqs():
sudo("apt-get -y install munin-node chkconfig")
sudo("chkconfig munin-node on")

def copy_munin_files():
run('mkdir -p munin')
put('munin/munin-node.conf', 'munin')
put('munin/munin-node', 'munin')
sudo('cp munin/munin-node.conf /etc/munin/')
sudo('cp munin/munin-node /etc/munin/plugin-conf.d')

def start():
sudo('service munin-node start')

def stop():
sudo('service munin-node stop')

def restart():
sudo('service munin-node restart')



The main function in this fabfile is 'install'. Inside it, I call 3 other functions which can be also called on their own: 'install_prereqs' installs the munin-node package via apt-get, 'copy_munin_files' copies various configuration files to the remote server under ~/munin, then sudo copies them to /etc/munin, and finally 'start' starts up the munin-node service. I find that this is a fairly common pattern for my deployments: install base packages, copy over configuration files, start or restart service.

Note that I'm importing a module called 'environments'. That's where I define and name lists of hosts that I want to target during the deployment. For example, your environments.py file could contain the following:

from fabric.api import *

def stgapp():
"""App staging environment."""
env.hosts = [
'stgapp01',
'stgapp02',
]

def stgdb():
"""DB staging environment."""
env.hosts = [
'stgdb01',
'stgdb02',
]

def prdapp():
"""App production environment."""
env.hosts = [
'prdapp01',
'prdapp02',
]

def prddb():
"""DB production environment."""
env.hosts = [
'prddb01',
'prddb02',
]

To install munin-node on all the staging app servers, you would run:
fab -f fab_munin stgapp install
If you wanted to install munin-node on a server which is not part of any env.hosts lists defined in environments.py, you would just call:
fab -f fab_munin install
...and the fab utility will ask you for a host name or IP on which to run the 'install' function. Simple and powerful.

Dealing with configurations for different environment types

A common issue I've seen is that different environment types (test, staging, production) need different configurations, both for services such as nginx or apache, and for my own applications. One solution I found is to prefix my environment names with their types (tst for testing, stg for staging and prd for production), and to add a sufix to the configuration files, for example nginx.conf.tst for the testing environment, nginx.conf.stg for staging and nginx.conf.prd for production.

Then, in my fabfiles, I automatically send the correct configuration file over, based on the environment I specify on the command line. For example, I have this function in my fab_nginx.py fabfile:

def deploy_config():
host_prefix = env.host[:3]
env.config_file = 'nginx.conf.%s' % host_prefix
put('nginx/%(config_file)s' % env, '~')
sudo('mv ~/%(config_file)s /usr/local/nginx/conf/nginx.
with settings(warn_only=True):
sudo('service nginx restart')

When I run:
fab -f fab_nginx tstapp deploy_config
...this will deploy the configuration on the 2 hosts defined inside the tstapp list -- tstapp01 and tstapp02. The deploy_config function will capture 'tst' as the first 3 characters of the current host name, and it will then operate on the nginx.conf.tst file, sending it to the remote server as /usr/local/nginx/conf/nginx.conf.

I find that it is a good practice to not have local files called nginx.conf, because the risk of overwriting the wrong file in the wrong environment is increased. Instead, I keep 3 files around -- nginx.conf.tst, nginx.conf.stg and nginx.conf.prd -- and I copy them to remote servers accordingly.

Defining short functions inside a fabfile

My fabfiles are composed of short functions that can be called inside a longer function, or on their own. This gives me the flexibility to only deploy configuration files for example, or only install base packages, or only restart a service.

Automated deployments ensure repeatability

I may repeat myself here, but it is worth rehashing this point: in my view, the best feature of an automated deployment system is that it transforms your deployments from an ad-hoc, off-the-cuff procedure into a repeatable and maintainable process that both developer and operation teams can use (#devops anybody?). An added benefit is that you get for free a good documentation for installing your application and its requirements. Just copy and paste your fabfiles (or Puppet manifests) into a wiki page and there you have it.

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...