Linux


24
Jan 11

Easy backups using rsync

If you want to avoid overkill of enterprise-grade Linux backup solutions just for your desktop or laptop, it’s quite easy to build a simple, OSX Time Machine -like backup system using rsync. The original idea is described in rsync Time Machine tutorial, and I’m doing exactly that for one of my desktops, with a few tweaks and simplifications.

The basic idea is to just copy the data to a new directory on each backup, while using hardlinks instead of copying already backed-up files. This makes it very easy to browse the backup history, without wasting disk space on multiple file copies:

rsync -a --link-dest=/backup/previous /data/to/backup /backup/new

I’m backing up to a (separate) local disk, but since I’m using rsync, it’s trivial to modify it to copy to a remote server. In fact, it’s better to use rsync to transfer files to a remote server than to mount network share on your computer and do local copy – network filesystems often mangle file information used by rsync to verify the files stay the same (so it can reuse them).

The OSX Time Machine does hourly, daily and weekly backups, but for the PC in question, daily backup for the last 7 days is good enough. So, for each backup I determine what’s the current date, what was the last backup name, and whether I need to delete any old backups:

todays=$(date +'%Y-%m-%d') # nicely sortable names for backups
last=$(ls -r | head -1)
to_delete=$(ls -r | tail -n +7) # will keep the last 6 backups

cd $BACKUP_DIR
rsync -aq --link-dest=$BACKUP_DIR/$last $SOURCES $BACKUP_DIR/$todays
[ -z "$to_delete" ] || rm -rf $to_delete

To have daily backups, I can just put the complete script into /etc/cron.daily, which means it’ll be run early every morning, while I’m not using the computer. To make sure it doesn’t put too heavy load on the system, I can also put rsync into idle I/O class (using ionice), and give it the least priority for CPU time (using nice).

The complete script:

#!/bin/bash

# config
SOURCES="/data /home/senko"
BACKUP_DIR="/backup/mypc"

# abort if any of the commands fail
set -e

todays=$(date +'%Y-%m-%d') # nicely sortable names for backups
last=$(ls -r | head -1)
to_delete=$(ls -r | tail -n +7) # will keep the last 6 backups

cd $BACKUP_DIR
ionice -c 3 nice -n +19 rsync -aq \
    --link-dest=${BACKUP_DIR}/${last} $SOURCES $BACKUP_DIR/${todays}

# now we're safe to remove the old one(s)
[ -z "$to_delete" ] || rm -rf $to_delete

One additional thing I’m considering doing in the future is putting a weekly snapshot into a compressed and encrypted archive and storing it somewhere else in case I really do need older data, and/or making the script a bit complicated to have daily/weekly/monthly backups.

Update: Many people have commented on various backup tools available for Linux. That is true, there are a lot. What this post shows is that sometimes you don’t need to install and use a special tool, if a few lines of shell script will do just fine. Most of the tools recommended have their configuration part larger than the entire script shown here. So, while reinventing the wheel is definitely not the way to go, buying a car to go to a grocery shop, instead of using a bicicle you already have is also not always optimal :-)


10
Feb 10

If Linus was starting a kernel in 2010…

Imagine you’ve seen a blog post titled Free linux-like kernel sources for x86-64 PC’s with the following contents:

Do you pine for the nice days of linux 2.4, when men were men and wrote their own device drivers? Are you without a nice project and just dying to cut your teeth on a OS you can try to modify for your needs? Are you finding it frustrating when everything works on linux? No more all- nighters to get a nifty program working? Then this post might be just for you :-)

As I tweeted a month(?) ago, I’m working on a free version of a linux-lookalike for x86-64 computers. It has finally reached the stage where it’s even usable (though may not be depending on what you want), and I am willing to put out the sources for wider distribution. It is just version 0.02 (+1 (very small) patch already), but I’ve successfully run bash/gcc/gnu-make/gnu-sed/compress etc under it.

Sources for this pet project of mine can be found on github. The repo also contains some README-file and a couple of binaries to work under it (bash, update and gcc, what more can you ask for :-). Full kernel source is provided, as no linux code has been used. Library sources are only partially free, so that cannot be distributed currently. The system is able to compile “as-is” and has been known to work. Heh. Sources to the binaries (bash and gcc) can be found at the same place.

ALERT! WARNING! NOTE! These sources still need linux to be compiled (and gcc, possibly gcc4, haven’t tested), and you need linux to set it up if you want to run it, so it is not yet a standalone system for those of you without linux. I’m working on it. You also need to be something of a hacker to set it up (?), so for those hoping for an alternative to linux, please ignore me. It is currently meant for hackers interested in operating systems and x86-64′s with access to linux.

The system needs an IDE harddisk (sorry) and any VGA card. If you are still interested, please read my blog post about it, and/or ask me on Twitter for additinal info.

I can (well, almost) hear you asking yourselves “why?”. Next year (or the one after it who knows) will be The Year Of Linux Desktop, and I’ve already got Ubuntu. This is a program for hackers by a hacker. I’ve enjouyed doing it, and somebody might enjoy looking at it and even modifying it for their own needs. It is still small enough to understand, use and modify, and I’m looking forward to any comments you might have.

Does this sound familiar to you? If you know your Linux history, is should – it’s a ripoff of the original Linux announcement post that Linus posted to the comp.os.minix newsgroup.

A few points to consider:

  • look what a crazy (in a positive way) guy’s hobby did to the computer industry
  • is kernel programming still fun? should it be? is it serious work now?
  • is there still fun to be had in non-web related stuff?

20
Jan 10

Reading Wikipedia on N900

N900 is a device with a lot of connectivity options and a very capable browser. With that, it’s a good Wikipedia reader out of the box. But not so if your connectivity is limited (you’re on top of a mountain, or roaming, or don’t have a data plan alltogether and there are no open wifi hotspots).

Since I got a great Christmas gift from Collabora I’ve been poking at a small toy application that could store and read Wikipedia articles offline. Since Wikipedia hosts a huge number of articles, and device capabilities are limited compared to a desktop PC, this posed an interesting (but not unsolvable) challenge for the weekend hack sessions.

The result is Mawire (Maemo Wikipedia Reader):

Mawire 0.1 - home screen

Mawire 0.1 - home screen

Mawire 0.1 - search results

Mawire 0.1 - search results

Mawire 0.1 - Article view

Mawire 0.1 - Article view

 

 

The application

Having worked on bits and pieces in Maemo 5, I knew my way around the Maemo 5 SDK and some of the APIs, but nevertheless the Developer’s Guide was of great help. I’ve also perused examples, code (such as browser launcher) and packaging from marnanel‘s raeddit application.

The application is a lightweight reader, so the aim is not to display complete article, but rather to show enough information for a quick check, and provide convenient way for the user to find more on the Wikipedia itself.

install mawire
At the moment you can download the application from here, or browse or download the source code from github. I hope to upload it to maemo-extras soon. If you’re reading this from your N900, you can install mawire automatically by clicking on the install icon. Warning: It’s an early development (alpha) version, only tested on my device so far, so proceed with caution and only if you know what you’re doing.

Since writing this blog post (but before I hit the Publish button), I’ve played around with portrait mode functionality, and released version 0.2 which has portrait mode support. If your keyboard is closed, mawire will switch to portrait mode. When you slide it open at any time (e.g. for typing search queries, or when you want to copy/paste part of the article), mawire will switch to landscape mode.

Data handling

Since Wikipedia (especially the English edition) has a huge number of articles, amount of text for a complete and comprehensive copy is just too large to conveniently use on the device. Recent enwiki dump contains more than 3 million articles and is almost 6GB of bzip2′d XML.

To minimise database size, and since I wasn’t trying to replicate complete Wikipedia functionality, I decided to strip the articles as much as possible, not only of markup (so only bold and emphasis are preserved), but also to only include content of a few first paragraphs – up to the first heading. The idea is that topic overview is probably outlined first, and then each paragraph expands the coverage (often also having a complete article of its own).

So, the reader only includes the overview and provides “Read more…” button that connects to Wikipedia proper. So the app can be used not only as offline reader, but as quick Wikipedia launcher by itself.

The other problem is number of articles and search performance. If database contains up to a few million articles, sequential searching through the database is extremely slow. Unfortunately, the version of SQlite3 shipped on N900 (and indeed in many Linux distros ATM) doesn’t support fast fulltext search (FTS3), which is ideal for mawire.

So, the application currently ships with its own copy of SQLite library with enabled FTS3 module. It’s installed in a private lib directory so it doesn’t clash with the OS version (similar to what Firefox on Linux distros does), and is only ever used by mawire.

The data itself was prepared by a Python program that:

  1. parses the XML dump (using expat),
  2. extracts, parses and strips Wikipedia markup (using a bunch of hacked regexps, as I haven’t found a suitable wikimedia markup parser – this is the weakest part of the program and I hope to improve it in the future),
  3. filters articles we want to exclude (special pages, lists of things, too short articles),
  4. compresses the article text (using zlib) and finally stores them to a SQlite3 database (using SQLalchemy).

The final step is manual, building of FTS3 index, and consists of two SQL statements. This is so it could be done separately, on a machine having SQLite with FTS3.

The program is included in mawire source code, so you can use it to create custom Wikimedia databases (or, indeed, database for any MediaWiki powered wiki).

install mawire-enwiki-small A database of selected articles (3000 most visited + featured + good + vital; about 14 thousand articles in total) from English Wikipedia (13.6MB). Database is installed in /opt so it doesn’t fill up your rootfs.

Complete English edition, as well as several other major language editions are also available, but I haven’t created Maemo packages for them. You can download them directly, put them on your device or MMC card and select from the application menu.

2011-04-02 update: Moved the software repository and the databases to S3, and updated the links in this post accordingly.


31
Jan 09

Retweeting IM status using Gwibber

What are you doing?

telepathy+gwibber

Microblogging is now much more (well, for some) than just answering the above question, but some people do want to publish what they’re doing at the moment. If they also use an IM client (and who isn’t?), it makes sense to make the IM status message and “what I’m doing” tweets the same.

As my technology of choice when it comes to IM is Telepathy, I started thinking about creating a connection manager (essentially, a protocol support driver) for Twitter, but then it dawned to me there’s much simpler and better way to do it, if you just want to retweet the status messages.

My preferred Twitter client is gwibber, written in Python, and it was very easy to extend it to listen to presence status change messages from Telepathy’s Mission Control and repost them to the configured microblogging services. Together with Davyd‘s work on improved status message widget for Empathy, I think this is going to be useful for people changing their presence often (and not using canned statuses).

To keep the requirements to the minimum (and since it’s a very simple client), I’ve used dbus-python directly, and also wrapped the whole thing so that patched gwibber continues to run even if there’s no dbus-python available. The big ugly chunk of the change in the diff is preferences.glade, where for some reason glade reindented a whole lot of lines that I never touched (another table row with a single checkbox is the only UI change).

The code is in bzr branch in Launchpad. Go, grab it and play with it! Keep in mind it’s fresh and not thoroughly tested. Also beware, ye rhythmbox (or other music player) users that update your IM status with every new song – this will spam your microblog if you let it :)

Does this mean I think Empathy should become microblogging client? Absolutely not. There’s more to microblogging than sharing statuses, and existing clients (e.g. gwibber) already do it very well. This patch is just a way to integrate previously disconnected parts of your digital life (ie. IM and microblogging). To publish your status (both on IM and Twitter), use Empathy. To microblog, use gwibber.


26
Aug 07

Epiphany plugin update

Since my original Epiphany plugin hack, Empathy has matured and now sports new and improved version of Python bindings, including working bindings for Empathy-GTK, so we can reuse its widgets in other apps instead of reinventing half-assed ones.

So, I’ve updated the hack to work with the new bindings, and to use fancy Empathy’s ContactListView widget with contacts’ aliases, avatars and status messages, instead of plain ol’ boring JID list. I think it’s now usable (and pretty) enough to actually be used everyday, and not just an example of what could be done – so if you’ve got working Empathy setup already, give it a try! You can download the new version here: Empathy Link Share


Obligatory screenshot

Empathy and the bindings are still in heavy development, so it’s no surprise if some things don’t work as expected. If you get assertions like CRITICAL **: empathy_contact_list_get_members: assertion `EMPATHY_IS_CONTACT_LIST (list)' when you try to run the script although you are online, you might want to try and patch the bindings with this. Just go into empathy/python/pyempathygtk/ directory, patch it with patch -p0 < path_to_diff_file and rerun make and make install in the project directory.