Linux


18
Aug 11

Auto rsync local changes to remote server

When developing software that’s intended to run on a server, I like to edit the code directly on my laptop, but don’t want to run the development server locally.

Instead, I have a dev setup on a remote server, and copy the local changes as needed, using rsync to copy only the changed bits.

Running rsync is manual work, prone to errors and easy to forget. So I automated it by writing a shell script that runs it whenever I change the directory.

The Script

The script is pretty simple. It uses inotifywait (part of inotify tools on Linux) to detect changes in the current directory and runs a command when that happens. To avoid running multiple commands on a related sequence of events, the script waits a second to see if there are more events coming.

The script itself doesn’t assume which command needs to be run, so it’s useful beyond just triggering rsync. You can find the script on GitHub.

Example usage

When working on a project, I start the watch script with something like:

    onchange.sh rsync -avt --delete . server:/path/

Speeding up rsync+ssh

Since I’m running rsync frequently, I want it to be as quick as possible (even though it’s happening in the background). Since the changes are very small, most of the time spent is in initiating SSH connection to the remote server.

To minimise this, I’m using SSH connection multiplexing. I just ssh into the server, even if I don’t need to do anything there. This keeps the connection open. Subsequent rsyncs over ssh just reuse it.

Portability

Unfortunately, as this uses inotify tools directly, it’s not very portable to non-Linux systems. It’s easy to write the equivalent tool in, say, python using watchdog.

That’s left as an excercise to the reader ;-)

Update: there’s also lsyncd, a full-blown tool for live syncing local with remote folders.


18
Jun 11

A Django setup using Nginx and Gunicorn

This is a howto on setting up Django on a Linux (Ubuntu) system using Nginx as a reverse proxy and Gunicorn as a Django service.

Django, Gunicorn, Nginx

The conventional way to run Django in production these days is using Apache2 and mod_wsgi. While there’s nothing wrong with that approach, I prefer Nginx. I also like to be able to control Django server separately from the web server.

There are several production-ready servers for Django. The best seem to be Gunicorn and uWSGI, and Gunicorn seems the best supported and most active project.

When running Django server separately from the web server, we need a way to start, stop and restart the Django server. A popular way for doing it in Django world is Supervisor, altough, for Ubuntu users, Upstart might be less hassle.

You probably already have a Django project you want to deploy, but for completenes’ sake, the steps here will use an empty toy “Hello World” Django project:

Preparation

First things first – you are using virtualenv, right? If not, you should.

  virtualenv --no-site-packages test
  cd test
  source bin/activate
  pip install gunicorn django
  django-admin.py startproject hello
  cd hello
  # to test the base setup works
  python manage.py runserver 0.0.0.0:8000

Gunicorn

Testing Django with Gunicorn is as simple as:

  gunicorn_django -b 0.0.0.0:8000

For production, we might want a bit more options, and we want to make sure the server is executing in the correct environment. The easiest way is to create a shell script to set it all up:

  #!/bin/bash
  set -e
  LOGFILE=/var/log/gunicorn/hello.log
  LOGDIR=$(dirname $LOGFILE)
  NUM_WORKERS=3
  # user/group to run as
  USER=your_unix_user
  GROUP=your_unix_group
  cd /path/to/test/hello
  source ../bin/activate
  test -d $LOGDIR || mkdir -p $LOGDIR
  exec ../bin/gunicorn_django -w $NUM_WORKERS \
    --user=$USER --group=$GROUP --log-level=debug \
    --log-file=$LOGFILE 2>>$LOGFILE

The number of workers is number of worker processes that will serve requests. You can set it as low as 1 if you’re on a small VPS. A popular formula is 1 + 2 * number_of_cpus on the machine (the logic being, half of the processess will be waiting for I/O, such as database). YMMV.

Don’t forget to mark the script as executable (chmod ug+x script.sh). You can run it from the command line for testing. Note that Gunicorn by default uses 127.0.0.1:8000 address (the same as Django debug server), which is fine if Nginx is on the same machine – you usually don’t want to have it wide open to anyone, and instead let Nginx handle incoming connections.

If you want to run several Django servers on the same machine, just make sure each uses a different port number.

Supervisor

Supervisor has extensive documentation, and this blog post is big already, so I’ll just point you to the official docs. The config file for running our server (/etc/supervisor/cont.d/hello.conf on Debian/Ubuntu) should look like this:

  [program:hello]
  directory = /path/to/test/hello/
  user = your_unix_user
  command = /path/to/test/hello/script.sh
  stdout_logfile = /path/to/logfile.log
  stderr_logfile = /path/to/logfile.log

Test it with supervisorctl {start,status,stop} hello (as root).

Upstart

Ubuntu alternative is Upstart, which has a similar config file (/etc/init/hello.conf). An example:

  description "Test Django instance"
  start on runlevel [2345]
  stop on runlevel [06]
  respawn
  respawn limit 10 5
  exec /path/to/test/hello/script.sh

Test it with service hello {start,status,stop} (as root).

Update 2011-11-14:For completeness of the Upstart setup configuration one has to add a soft link in /etc/init.d for a file named hello to /lib/init/upstart-job. So the following instruction should be executed after the .conf file has been created in /etc/init:

  sudo ln -s /lib/init/upstart-job /etc/init.d/hello

Update 2011-11-14: Christophe Meessen found and fixed several errors in the procedures and config files, and also provided info about the extra Upstart configuration I missed. Thanks Christophe!

Nginx

If you don’t have it set up, you should also install Nginx. The install procedure varies from system to system. On Debian and Ubuntu systems, it’s as simple as apt-get install nginx, and other Linux distributions usually have equivalent commands.

Nginx is mostly a drop-in replacement for Apache for serving static files, though there are some things to set up if you need to run PHP code as well.

For our setup, we need Nginx to serve as the reverse proxy for the upstream server(s). To do so, we add a server section to the config file:

server {
    listen   80;
    server_name example.com;
    # no security problem here, since / is alway passed to upstream
    root /path/to/test/hello;
    # serve directly - analogous for static/staticfiles
    location /media/ {
        # if asset versioning is used
        if ($query_string) {
            expires max;
        }
    }
    location /admin/media/ {
        # this changes depending on your python version
        root /path/to/test/lib/python2.6/site-packages/django/contrib;
    }
    location / {
        proxy_pass_header Server;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_connect_timeout 10;
        proxy_read_timeout 10;
        proxy_pass http://localhost:8000/;
    }
    # what to serve if upstream is not available or crashes
    error_page 500 502 503 504 /media/50x.html;
}

Ubuntu and Debian systems keep Nginx config files in same layout as for Apache, so the above cold be added to /etc/nginx/sites-available/hello (and enabled by symlinking from sites-enabled directory). Use nginx -t for config test and nginx -s reload to reload the configuration.

That’s it

And that’s it. The services are really quite simple to set up once you know what goes where, the setup is flexible and performant, and the server environments are isolated so it’s possible to host many different services with varying requirements on the same machine.

Have improvements on the above or your own helpful tips, or found an error in the post? Share in the comments.


28
Feb 11

Fast WordPress/Nginx setup on a cheap VPS

I’ve got a single Linode 512 VPS server hosting a few of my WordPress powered sites (including this blog), a few Python web apps, a mail server and a few other fairly standard things. It’s not a high-traffic site so the server is not under much stress, but sometimes it does get a bit more visitors, so I was wondering what easy things I can do to ensure it doesn’t crash miserably if at any moment something on it actually gets popular.

Nginx instead of Apache

First easy thing is to use Ningx instead of Apache for the web server. This may or may not be an easy thing to do – if you need (or prefer) to stay with Apache, Patrick McKenzie has some good advice on how to avoid Apache crashing and burning under load on a small VPS.

On the other hand, if all you’re using Apache for is mod_php (PHP) and mod_rewrite (clean URLs), it’s pretty easy to switch. It’s gotten a lot easier recently when PHP FastCGI Process Manager got included in the PHP mainline, finally giving PHP proper support for FastCGI. Since it’s a fairly new addition, depending on your operating system, you may have to compile php-fpm yourself, or already have it available in packages.

Nginx, being asynchronous (instead of using worker processes or threads, with one worker processing only one client at a time), can handle thousands of clients at the same time with little memory overhead, and is really efficient in serving static files. For PHP requests, it makes a FastCGI request “upstream” to a pool of php-fpm workers. To avoid having too many workers killing the server under load, as well as opening too many database connections (which is basically what happens with default Apache setup unless you tweak the worker and client limits), I only have 3 php worker processes (which serve all sites, not 3 per site).

Another advantage of Nginx over Apache workers is that it’s not susceptibe to slowloris attack. That is, it can handle huge number of slow clients connected to it just fine. Also, KeepAlive is a non-issue here (see Patrick’s blog post about KeepAlive, why it’s useful and why it can bring your Apache server to its knees).

Cache aggressively

Only 3 PHP instances can’t serve that many requests. But, since the sites are fairly static, it’s easy to cache them. So I use the excellent Wp-Cache plugin, which caches the pages into static files and serves them whenever possible.

But why stop there? WP-Cache, although really fast, still needs to run, making the PHP workers busy. Nginx comes with support for caching FastCGI responses, meaning it can cache the pages too, and serve them even quicker. It does have a drawback though – it’s impractical (if not impossible) to invalidate the cache properly on updates, so the nginx cache will get stale. UPDATE: I have since disabled the FastCGI cache due to it serving blank pages instead of the cached ones to some subset of visitors, so it’s off until I can see what happened. If you do use this caching, make sure you test it working properly from multiple IPs.

But my sites are fairly static, and I can live with updates not being visible in less than a minute, so I have it cache the pages just for one minute. It’s a short interval, but it’s good enough because the request will probably hit the Wp-Cache anyways, so the overhead won’t be big.

Nginx can also be used with memcached to have the pages in memory instead of disk, if you need really fast sites, but memory is the scarcest resource in a low-end VPS, so it’s better to be a bit slower than to start hitting swap because you’re trying to cache too many things in RAM.

Config files

To do this, I had to install and configure nginx and php-fpm, and install Wp-Cache plugin (which I used in default configuration). Relevant portion of my nginx config file:

# where to cache, how much disk to use
fastcgi_cache_path  /tmp/www-cache levels=1:2 \
        keys_zone=senkonetcache:8m max_size=30m;
server {
        # cache only 200 and 404 responses
        # caching 30x breaks WP login form
        fastcgi_cache_valid 200 1m;
        fastcgi_cache_valid 404 1m;
        location / {
                # clean urls; if there's no such file, rewrite it to point to
                # WP's index.php; unrelated nginx gotcha:
                # 'if' is declarative, see http://wiki.nginx.org/IfIsEvil
                if (!-e $request_filename) {
                        rewrite ^/(.*)$ /index.php?q=$1 last;
                }
                index index.php index.html;
        }
        # pass the *.php requests upstream
        location ~ \.php$ {
                # use the cache, Luke
                fastcgi_cache senkonetcache;
                fastcgi_pass 127.0.0.1:9000;
                fastcgi_index index.php;
                fastcgi_param SCRIPT_FILENAME \
                        /var/www/senko.net/$fastcgi_script_name;
                include /etc/nginx/fastcgi_params;
                # if you want to be able to log in to WP, make
                # sure nginx doesn't eat all the cookies
                fastcgi_pass_header Set-Cookie;
        }
}

And my /etc/php5/fpm/pool.d/www.conf file (default one is well-documented, read up on the options used here):

[www]
listen = 127.0.0.1:9000
listen.allowed_clients = 127.0.0.1
pm = dynamic
pm.max_children = 3
pm.start_servers = 1
pm.min_spare_servers = 1
pm.max_spare_servers = 3
pm.max_requests = 100

The numbers

I’ve used several tools (siege, httperf, ab, and my own custom tool) to try and test the performance. The tests were not very scientific, but all the tools more or less agree about the ballpark figures. I’ve recorded the numbers given by siege (with options -b -c 500 -t 2m, hitting only one URL):

  • no caching: 12 reqs/s, 32.3% availability, ~2.5 load average
  • only wp-cache: 166 reqs/s, 97.2% availability, ~1.25 load average
  • only (worm, 1m run) nginx cache: 403 reqs/s, 99.7% availability, ~1 load average
  • nginx cache and wp-cache: 412 reqs/s, 99.7% availability, ~1 load average

No surprises there about caching being a good way to speed up your website. What was surprising is that the server behaved nicely and recovered very quickly even in the no caching case – so even if it does get smashed, it doesn’t break (it just fails the requests that it didn’t have time to serve).

Also, although nginx-only and nginx/wp-wp cache numbers are almost the same, they hide the fact that nginx-only cache behaved quite badly on cold cache. When a request comes in, nginx either serves it from the cache or asks the upstream – but it doesn’t attempt to satisfy the rest of the requests from the first one. So if a lot of requests come while there’s nothing in the cache for that URL, all of them go upstream. When that happens, having Wp-Cache really saves the day.

In summary

  1. use Wp-Cache
  2. limit the number of workers (either fpm or Apache, whichever you use)
  3. Nginx is really fast
  4. Nginx FastCGI cache gives a lot of additional performance boost, if you’re fine with content being a bit stale

2
Feb 11

How to find out when a Linux system was installed

I bought my current laptop (Lenovo ThinkPad x200s, if you’re curious) a few years back, and have been running Debian unstable on it ever since. I was curious to find out when I last (re)installed it – I recall only installing it once, but I’ve got a few boxes lying around so I couldn’t be sure.

The fact that I upgrade the system regularly doesn’t help, as that touches basically everything on the system. I also can’t look into the logs, because I don’t keep them around for so long.

One of the thins that’s least likely to be touched, unless you’ve reformated your disk after installation (e.g. in restoring backed up system to a new disk), had filesystem problems, or use a file system not needing this feature, is the /lost+found directory:

root@tachyon:/# stat /lost+found
  File: `/lost+found'
  Size: 16384     	Blocks: 32         IO Block: 4096   directory
Device: fe01h/65025d	Inode: 11          Links: 2
Access: (0700/drwx------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2009-02-01 07:56:13.000000000 +0100
Modify: 2009-01-31 07:30:45.000000000 +0100
Change: 2010-02-05 08:10:53.000000000 +0100

So, I must’ve installed it on January 31st, 2009, at the lastest.

The other trick is to check when was the last time a password of some system user was changed, for example for sys, bin, or even root if you haven’t set/changed it (it’s not used on Ubuntu, for example):

root@tachyon:/# passwd -S sys
sys L 01/31/2009 0 99999 7 -1

Passwd agrees on the date. So, it seems that I really did install Debian on my laptop on January 31st 2009 (the day I bought it), and never reinstalled it. Nice :-)

Note that both methods work only if you’ve got the system time set correctly during the installation, which was the case here. That said, they should be applicable to any Linux system (and possibly wider), not only for Debian/Ubuntu systems.

I found the /lost+found idea on LinuxQuestions, and the passwd idea on Alex’ Ballas blog. If you’ve got another trick for this, please share in the comments.


24
Jan 11

Easy backups using rsync

If you want to avoid overkill of enterprise-grade Linux backup solutions just for your desktop or laptop, it’s quite easy to build a simple, OSX Time Machine -like backup system using rsync. The original idea is described in rsync Time Machine tutorial, and I’m doing exactly that for one of my desktops, with a few tweaks and simplifications.

The basic idea is to just copy the data to a new directory on each backup, while using hardlinks instead of copying already backed-up files. This makes it very easy to browse the backup history, without wasting disk space on multiple file copies:

rsync -a --link-dest=/backup/previous /data/to/backup /backup/new

I’m backing up to a (separate) local disk, but since I’m using rsync, it’s trivial to modify it to copy to a remote server. In fact, it’s better to use rsync to transfer files to a remote server than to mount network share on your computer and do local copy – network filesystems often mangle file information used by rsync to verify the files stay the same (so it can reuse them).

The OSX Time Machine does hourly, daily and weekly backups, but for the PC in question, daily backup for the last 7 days is good enough. So, for each backup I determine what’s the current date, what was the last backup name, and whether I need to delete any old backups:

todays=$(date +'%Y-%m-%d') # nicely sortable names for backups
last=$(ls -r | head -1)
to_delete=$(ls -r | tail -n +7) # will keep the last 6 backups

cd $BACKUP_DIR
rsync -aq --link-dest=$BACKUP_DIR/$last $SOURCES $BACKUP_DIR/$todays
[ -z "$to_delete" ] || rm -rf $to_delete

To have daily backups, I can just put the complete script into /etc/cron.daily, which means it’ll be run early every morning, while I’m not using the computer. To make sure it doesn’t put too heavy load on the system, I can also put rsync into idle I/O class (using ionice), and give it the least priority for CPU time (using nice).

The complete script:

#!/bin/bash

# config
SOURCES="/data /home/senko"
BACKUP_DIR="/backup/mypc"

# abort if any of the commands fail
set -e

todays=$(date +'%Y-%m-%d') # nicely sortable names for backups
last=$(ls -r | head -1)
to_delete=$(ls -r | tail -n +7) # will keep the last 6 backups

cd $BACKUP_DIR
ionice -c 3 nice -n +19 rsync -aq \
    --link-dest=${BACKUP_DIR}/${last} $SOURCES $BACKUP_DIR/${todays}

# now we're safe to remove the old one(s)
[ -z "$to_delete" ] || rm -rf $to_delete

One additional thing I’m considering doing in the future is putting a weekly snapshot into a compressed and encrypted archive and storing it somewhere else in case I really do need older data, and/or making the script a bit complicated to have daily/weekly/monthly backups.

Update: Many people have commented on various backup tools available for Linux. That is true, there are a lot. What this post shows is that sometimes you don’t need to install and use a special tool, if a few lines of shell script will do just fine. Most of the tools recommended have their configuration part larger than the entire script shown here. So, while reinventing the wheel is definitely not the way to go, buying a car to go to a grocery shop, instead of using a bicicle you already have is also not always optimal :-)