Arhive kategorija: English

If Linus was starting a kernel in 2010…

Imagine you’ve seen a blog post titled Free linux-like kernel sources for x86-64 PC’s with the following contents:

Do you pine for the nice days of linux 2.4, when men were men and wrote
their own device drivers? Are you without a nice project and just dying
to cut your teeth on a OS you can try to modify for your needs? Are you
finding it frustrating when everything works on linux? No more all-
nighters to get a nifty program working? Then this post might be just
for you :-)

As I tweeted a month(?) ago, I’m working on a free version of a
linux-lookalike for x86-64 computers. It has finally reached the stage
where it’s even usable (though may not be depending on what you want),
and I am willing to put out the sources for wider distribution. It is
just version 0.02 (+1 (very small) patch already), but I’ve successfully
run bash/gcc/gnu-make/gnu-sed/compress etc under it.

Sources for this pet project of mine can be found on github. The repo also
contains some README-file and a couple of binaries to work under it
(bash, update and gcc, what more can you ask for :-). Full kernel
source is provided, as no linux code has been used. Library sources are
only partially free, so that cannot be distributed currently. The
system is able to compile “as-is” and has been known to work. Heh.
Sources to the binaries (bash and gcc) can be found at the same place.

ALERT! WARNING! NOTE! These sources still need linux to be compiled
(and gcc, possibly gcc4, haven’t tested), and you need linux to
set it up if you want to run it, so it is not yet a standalone system
for those of you without linux. I’m working on it. You also need to be
something of a hacker to set it up (?), so for those hoping for an
alternative to linux, please ignore me. It is currently meant for
hackers interested in operating systems and x86-64′s with access to linux.

The system needs an IDE harddisk (sorry) and any VGA card. If
you are still interested, please read my blog post about it, and/or ask me
on Twitter for additinal info.

I can (well, almost) hear you asking yourselves “why?”. Next year (or the
one after it who knows) will be The Year Of Linux Desktop, and I’ve already got
Ubuntu. This is a program for hackers by a hacker. I’ve enjouyed doing
it, and somebody might enjoy looking at it and even modifying it for
their own needs. It is still small enough to understand, use and
modify, and I’m looking forward to any comments you might have.

Does this sound familiar to you? If you know your Linux history, is should – it’s a ripoff of the original Linux announcement post that Linus posted to the comp.os.minix newsgroup.

A few points to consider:

  • look what a crazy (in a positive way) guy’s hobby did to the computer industry
  • is kernel programming still fun? should it be? is it serious work now?
  • is there still fun to be had in non-web related stuff?

Reading Wikipedia on N900

N900 is a device with a lot of connectivity options and a very capable browser. With that, it’s a good Wikipedia reader out of the box. But not so if your connectivity is limited (you’re on top of a mountain, or roaming, or don’t have a data plan alltogether and there are no open wifi hotspots).

Since I got a great Christmas gift from Collabora I’ve been poking at a small toy application that could store and read Wikipedia articles offline. Since Wikipedia hosts a huge number of articles, and device capabilities are limited compared to a desktop PC, this posed an interesting (but not unsolvable) challenge for the weekend hack sessions.

The result is Mawire (Maemo Wikipedia Reader):

Mawire 0.1 - home screen

Mawire 0.1 - home screen

Mawire 0.1 - search results

Mawire 0.1 - search results

Mawire 0.1 - Article view

Mawire 0.1 - Article view

The application

Having worked on bits and pieces in Maemo 5, I knew my way around the Maemo 5 SDK and some of the APIs, but nevertheless the Developer’s Guide was of great help. I’ve also perused examples, code (such as browser launcher) and packaging from marnanel‘s raeddit application.

The application is a lightweight reader, so the aim is not to display complete article, but rather to show enough information for a quick check, and provide convenient way for the user to find more on the Wikipedia itself.

install mawire
At the moment you can download the application from here, or browse or download the source code from github. I hope to upload it to maemo-extras soon. If you’re reading this from your N900, you can install mawire automatically by clicking on the install icon. Warning: It’s an early development (alpha) version, only tested on my device so far, so proceed with caution and only if you know what you’re doing.

Since writing this blog post (but before I hit the Publish button), I’ve played around with portrait mode functionality, and released version 0.2 which has portrait mode support. If your keyboard is closed, mawire will switch to portrait mode. When you slide it open at any time (e.g. for typing search queries, or when you want to copy/paste part of the article), mawire will switch to landscape mode.

Data handling

Since Wikipedia (especially the English edition) has a huge number of articles, amount of text for a complete and comprehensive copy is just too large to conveniently use on the device. Recent enwiki dump contains more than 3 million articles and is almost 6GB of bzip2′d XML.

To minimise database size, and since I wasn’t trying to replicate complete Wikipedia functionality, I decided to strip the articles as much as possible, not only of markup (so only bold and emphasis are preserved), but also to only include content of a few first paragraphs – up to the first heading. The idea is that topic overview is probably outlined first, and then each paragraph expands the coverage (often also having a complete article of its own).

So, the reader only includes the overview and provides “Read more…” button that connects to Wikipedia proper. So the app can be used not only as offline reader, but as quick Wikipedia launcher by itself.

The other problem is number of articles and search performance. If database contains up to a few million articles, sequential searching through the database is extremely slow. Unfortunately, the version of SQlite3 shipped on N900 (and indeed in many Linux distros ATM) doesn’t support fast fulltext search (FTS3), which is ideal for mawire.

So, the application currently ships with its own copy of SQLite library with enabled FTS3 module. It’s installed in a private lib directory so it doesn’t clash with the OS version (similar to what Firefox on Linux distros does), and is only ever used by mawire.

The data itself was prepared by a Python program that:

  1. parses the XML dump (using expat),
  2. extracts, parses and strips Wikipedia markup (using a bunch of hacked regexps, as I haven’t found a suitable wikimedia markup parser – this is the weakest part of the program and I hope to improve it in the future),
  3. filters articles we want to exclude (special pages, lists of things, too short articles),
  4. compresses the article text (using zlib) and finally stores them to a SQlite3 database (using SQLalchemy).

The final step is manual, building of FTS3 index, and consists of two SQL statements. This is so it could be done separately, on a machine having SQLite with FTS3.

The program is included in mawire source code, so you can use it to create custom Wikimedia databases (or, indeed, database for any MediaWiki powered wiki).

install mawire-enwiki-small A database of selected articles (3000 most visited + featured + good + vital; about 14 thousand articles in total) from English Wikipedia (13.6MB). Database is installed in /opt so it doesn’t fill up your rootfs.

Complete English edition, as well as several other major language editions are also available, but I haven’t created Maemo packages for them. You can download them directly, put them on your device or MMC card and select from the application menu.

Translate software using Google

googtext

Automatic statistical translation tools like Google Translate are getting better and better. They still often produce nonsensical and grammatically incorrect translations (depending on available language corpus, I guess), but the translations can be used to understand the crux of the text.

So, what happens if we try to apply these tools to translating software (in specific, translating gettext potfiles, which is the most widespread way of providing i18n in free software I know of)? Might be good, might be bad, and it will probably be hilarious ;-)

So, that’s my new pet project, googtext, does. It’s a small wrapper around Google AJAX Language API that allows users to upload their potfiles, extracts the message ids, passes them through the language api, and saves the resulting message to a new potfile the user gets to download.

Translations generated by googtext should certainly be redacted by a real person, but I think it’s sometimes easier to improve upon a bad translation than start from scratch (depending on the situation), so this might have some use (or just might be funny :) So, try it and let me know what you think in the comments.

The entire code for this is in the public domain, so if you want to play with the code, you can download it here. Since I couldn’t find any python module for Language API, I’ve made a thin wrapper myself (gtranslate module). There’s also the googtext utility itself (commandline), and the accompanying PHP upload script and HTML pages.

Have fun!

Empathy, Telepathy, Farsight, … ?

In the Telepathy stack there are a lot of components, so often it gets very confusing for people hearing about it for the first time. So, here’s a very basic breakdown:

Telepathy
A framework for doing IM and VOIP. Supports Jabber/GTalk, SIP, IRC, MSN, ICQ,… (not all of those support VOIP at the moment). It has no UI and the end user doesn’t really need to be concerned about it.
Empathy
A IM/VOIP client for GNOME that is using the Telepathy framework.
Farsight
An extension to GStreamer that supports VOIP. (Farsight devs, mea culpa for this dumbing down so much). Farsight is used by Telepathy for actually moving around audio and video in calls.

That’s it. Easy, no? :-)

Retweeting IM status using Gwibber

What are you doing?

telepathy+gwibber

Microblogging is now much more (well, for some) than just answering the above question, but some people do want to publish what they’re doing at the moment. If they also use an IM client (and who isn’t?), it makes sense to make the IM status message and “what I’m doing” tweets the same.

As my technology of choice when it comes to IM is Telepathy, I started thinking about creating a connection manager (essentially, a protocol support driver) for Twitter, but then it dawned to me there’s much simpler and better way to do it, if you just want to retweet the status messages.

My preferred Twitter client is gwibber, written in Python, and it was very easy to extend it to listen to presence status change messages from Telepathy’s Mission Control and repost them to the configured microblogging services. Together with Davyd‘s work on improved status message widget for Empathy, I think this is going to be useful for people changing their presence often (and not using canned statuses).

To keep the requirements to the minimum (and since it’s a very simple client), I’ve used dbus-python directly, and also wrapped the whole thing so that patched gwibber continues to run even if there’s no dbus-python available. The big ugly chunk of the change in the diff is preferences.glade, where for some reason glade reindented a whole lot of lines that I never touched (another table row with a single checkbox is the only UI change).

The code is in bzr branch in Launchpad. Go, grab it and play with it! Keep in mind it’s fresh and not thoroughly tested. Also beware, ye rhythmbox (or other music player) users that update your IM status with every new song – this will spam your microblog if you let it :)

Does this mean I think Empathy should become microblogging client? Absolutely not. There’s more to microblogging than sharing statuses, and existing clients (e.g. gwibber) already do it very well. This patch is just a way to integrate previously disconnected parts of your digital life (ie. IM and microblogging). To publish your status (both on IM and Twitter), use Empathy. To microblog, use gwibber.