CLI tool to review PO files

September 19, 2010
Mike Massonnet

If there is something annoying about reviewing PO files is that it is impossible. When there are two hundred messages in a PO file, how are you going to know which messages changed? Well, that's the way it works currently for Transifex but there are very good news, first a review board is already available which is a good step forward but second it is going to get some good kick to make it awesome. But until this happens, I have written two scripts to make such a review.

A shell script msgdiff.sh

Pros: tools available on every system
Cons: ugly output, needs template file

#!/bin/sh
PO_ORIG=$1
PO_REVIEW=$2
PO_TEMPL=$3

MSGMERGE=msgmerge
DIFF=diff
PAGER=more
RM=/bin/rm
MKTEMP=mktemp

# Usage
if test "$1" = "" -o "$2" = "" -o "$3" = ""; then
    echo Usage: $0 orig.po review.po template.pot
    exit 1
fi

# Merge
TMP_ORIG=`$MKTEMP po-orig.XXX`
TMP_REVIEW=`$MKTEMP po-review.XXX`
$MSGMERGE $PO_ORIG $PO_TEMPL > $TMP_ORIG 2> /dev/null
$MSGMERGE $PO_REVIEW $PO_TEMPL > $TMP_REVIEW 2> /dev/null

# Diff
$DIFF -u $TMP_ORIG $TMP_REVIEW | $PAGER

# Clean up files
$RM $TMP_ORIG $TMP_REVIEW

Example:

$ ./msgdiff.sh fr.po fr.review.po thunar.pot
[...]
 #: ../thunar-vcs-plugin/tvp-git-action.c:265
-#, fuzzy
 msgid "Menu|Bisect"
-msgstr "Différences détaillées"
+msgstr "Menu|Couper en deux"
 
 #: ../thunar-vcs-plugin/tvp-git-action.c:265
 msgid "Bisect"
-msgstr ""
+msgstr "Couper en deux"
[...]

A Python script podiff.py

Pros: programmable output

Cons: external dependency

The script depends on polib that can be installed with the setuptools scripts. Make sure setuptools is installed and than run the command sudo easy_install polib.

#!/usr/bin/env python
import polib

def podiff(path_po_orig, path_po_review):
    po_orig = polib.pofile(path_po_orig)
    po_review = polib.pofile(path_po_review)
    po_diff = polib.POFile()
    po_diff.header = "PO Diff Header"
    for entry in po_review:
        orig_entry = po_orig.find(entry.msgid)
        if not entry.obsolete and (orig_entry.msgstr != entry.msgstr \
        or ("fuzzy" in orig_entry.flags) != ("fuzzy" in entry.flags)):
            po_diff.append(entry)
    return po_diff


if __name__ == "__main__":
    import sys
    import os.path

    # Usage
    if len(sys.argv) != 3 \
      or not os.path.isfile(sys.argv[1]) \
      or not os.path.isfile(sys.argv[2]):
        print "Usage: %s orig.po review.po" % sys.argv[0]
        sys.exit(1)

    # Retrieve diff
    path_po_orig = sys.argv[1]
    path_po_review = sys.argv[2]
    po_diff = podiff(path_po_orig, path_po_review)

    # Print out orig v. review messages
    po = polib.pofile(path_po_orig)
    for entry in po_diff:
        orig_entry = po.find(entry.msgid)
        orig_fuzzy = review_fuzzy = "fuzzy"
        if "fuzzy" not in orig_entry.flags:
            orig_fuzzy = "not fuzzy"
        if "fuzzy" not in entry.flags:
            review_fuzzy = "not fuzzy"
        print "'%s' was %s is %s\n\tOriginal => '%s'\n\tReviewed => '%s'\n" % (entry.msgid, orig_fuzzy, review_fuzzy, orig_entry.msgstr, entry.msgstr)

Example:

$ ./podiff.py fr.po fr.review.po
'Menu|Bisect' was fuzzy is not fuzzy
 Original => 'Différences détaillées'
 Reviewed => 'Menu|Couper en deux'

'Bisect' was not fuzzy is not fuzzy
 Original => ''
 Reviewed => 'Couper en deux'
[...]

Benchmarking Compression Tools

September 6, 2010
Mike Massonnet

Comparison of several compression tools: lzop, gzip, bzip2, 7zip, and xz.

Lzop: small and very fast yet good compression.
Gzip: fast and good compression.
Bzip2: slow for both compression and decompression although very good compression.
7-Zip: LZMA algorithm, slower than Bzip2 for compression but very good compression.
Xz: LZMA2, evolution of LZMA algorithm.

Preparation

Be skeptic about compression tools and wanna promote the compression tool
Compare quickly old and new compression tools and find interesting results

So much for the spirit, what you really need is to write a script (Bash, Ruby, Perl, anything will do) because you will want to generate the benchmark data automatically. I picked up Ruby as it's nowadays the language of my choice when it comes to any kind of Shell-like scripts. By choosing Ruby I have a large panel of classes to process benchmarking data, for instance I have a Benchmark class (wonderful), I have a CSV class (awfully documented, redundant), and I have a zillion of Gems for any kind of tasks I would need to do (although I always avoid them).

I first focused on retrieving the data I was interested into (memory, cpu time and file size) and saving it in the CSV format. By doing so I am able to produce charts easily with existing applications, and I was thinking maybe it was possible to use GoogleCL to generate charts from the command line with Google Docs but it isn't supported (maybe it will maybe it won't, it's up to gdata-python-client). However there is an actual Google tool to generate charts, it is the Google Chart API that works by providing a URI to get an image. The Google Image Chart Editor website helps you to generate the chart you want in a friendly WYSIWYG mode, after that it is just a matter of computing the data into shape for the URI. But well while focusing on the charts I found the Ruby Gem googlecharts that makes it friendly to pass the data and save the image.

Ruby Script

The Ruby script needs the following:

It was written with Ruby 1.9
Linux/Procfs for reading the status of processes
Googlecharts: gem install googlecharts
ImageMagick for the command line tool convert (optional)

The Ruby script takes a path as argument, with which it creates a tarball inside a tmpfs directory in order to avoid I/O latencies from a hard-drive. Next it runs a number of commands over the tarball from which it collects benchmark data. The benchmark data is then saved inside CSV files that are reusable within spreadsheet applications. The data is also reused to retrieve charts from the Google Chart API and finally it calls the ImageMagick tool ''convert'' to collect the charts inside a single image. The summary displayed on the standard output is also saved inside a text file.

The script is a bit long for being pasted here (more or less 300 lines) so you can download it from my workstation. If the link doesn't work make sure the web browser doesn't encode ~ (f.e. to "%257E"), I've seen this happening with Safari (inside my logs)! If really you are out of luck, it is available on Pastebin.

Benchmarks

The benchmarks are available for three kinds of data. Compressed media files, raw media files (image and sound, remember that the compression is lossless), and text files from an open source project.

Media Files

Does it make sense at all to compress already compressed data. Obviously not! Let's take a look at what happens anyway.

As you see, compression tools with focus on speed don't fail, they still do the job quick while gaining a few hundred kilo bytes. However the other tools simply waste a lot of time for no gain at all.

So always make sure to use a backup application without compression over media files or the CPU will be heating up for nothing.

Raw Media Files

Will it make sense to compress raw data? Not really. Here are the results:

There is some gain in the order of mega bytes now, but the process is still the same and for that reason it is simply unadapted. For media files there are existing formats that will compress the data lossless with a higher ratio and a lot faster.

Let's compare lossless compression of a sound file. The initial WAV source file has a size of 44MB and lasts 4m20s. Compressing this file with xz takes about 90s, this is very long while it reduced the size to 36MB. Now if you choose the format FLAC, which is doing lossless compression for audio, you will have a record. The file is compressed in about 5s to a size of 24MB! The good thing about FLAC is that media players will decode it without any CPU cost.

The same happens with images, but I lack knowledge about photo formats so your mileage may vary. Anyway, except the Windows bitmap format, I'm not able to say that you will find images uncompressed just like you won't find videos uncompressed... TIFF or RAW is the format provided by many reflex cameras, it has lossless compression capabilities and contains many information about image colors and so on, this makes it the perfect format for photographers as the photo itself doesn't contain any modifications. You can also choose the PNG format but only for simple images.

Text Files

We get to the point where we can compare interesting results. Here we are compressing data that is the most commonly distributed over the Internet.

Lzop and Gzip perform fast and have a good ratio. Bzip2 has a better ratio, and both LZMA and LZMA2 algorithms even better. We can use an initial archive of 10MB, 100MB, or 400MB, the charts will always look alike the one above. When choosing a compression format it will either be good compression or speed, but it will definitely never ever be both, you must choose between this two constraints.

Conclusion

I never heard about the LZO format until I wanted to write this blog post. It looks like a good choice for end-devices where CPU cost is crucial. The compression will always be extremely fast, even for giga bytes of data, with a fairly good ratio. While Gzip is the most distributed compression format, it works just like Lzop, by focusing by default on speed with good compression. But it can't beat Lzop in speed, even when compressing in level 1 it will be fairly slower in matter of seconds, although it still beats it in the final size. When compressing with Lzop in level 9, the speed is getting ridiculously slow and the final size doesn't beat Gzip with its default level where Gzip is doing the job faster anyway.

Bzip2 is noise between LZMA and Gzip. It is very distributed as default format nowadays because it beats Gzip in term of compression ratio. It is of course slower for compression, but easily spottable is the decompression time, it is the worst amongst all in all cases.

Both LZMA and LZMA2 perform almost with an identical behavior. They are using dynamic memory allocation, unlike the other formats, where the higher the input data the more the memory is allocated. We can see the evolution of LZMA is using less memory but has on the other hand a higher cost on CPU time. And we can see they have excellent decompression time, although Lzop and Gzip have the best scores but then again there can't be excellent compression ratio and compression time. The difference between the compression ratio of the two formats is in the order of hundred of kilo bytes, well after all it is an evolution and not a revolution.

On a last note, I ran the benchmarks on an Intel Atom N270 that has two cores at 1.6GHz but I made sure to run the compression tools with only one core.

A few interesting links:

RAWpository: a collection of RAW images
LZMA vs Bzip2 at TheGeekStuff from 2010-06-04
Benchmarks by Stéphane Lesimple with different levels from 2010-03-09
Benchmarks by Advogato also with different levels from 2009-09-25

Benchmarking Compression Tools

September 6, 2010
Mike Massonnet

Comparison of several compression tools: lzop, gzip, bzip2, 7zip, and xz.

Lzop: small and very fast yet good compression.
Gzip: fast and good compression.
Bzip2: slow for both compression and decompression although very good compression.
7-Zip: LZMA algorithm, slower than Bzip2 for compression but very good compression.
Xz: LZMA2, evolution of LZMA algorithm.

Preparation

Be skeptic about compression tools and wanna promote the compression tool
Compare quickly old and new compression tools and find interesting results

So much for the spirit, what you really need is to write a script (Bash, Ruby, Perl, anything will do) because you will want to generate the benchmark data automatically. I picked up Ruby as it's nowadays the language of my choice when it comes to any kind of Shell-like scripts. By choosing Ruby I have a large panel of classes to process benchmarking data, for instance I have a Benchmark class (wonderful), I have a CSV class (awfully documented, redundant), and I have a zillion of Gems for any kind of tasks I would need to do (although I always avoid them).

I first focused on retrieving the data I was interested into (memory, cpu time and file size) and saving it in the CSV format. By doing so I am able to produce charts easily with existing applications, and I was thinking maybe it was possible to use GoogleCL to generate charts from the command line with Google Docs but it isn't supported (maybe it will maybe it won't, it's up to gdata-python-client). However there is an actual Google tool to generate charts, it is the Google Chart API that works by providing a URI to get an image. The Google Image Chart Editor website helps you to generate the chart you want in a friendly WYSIWYG mode, after that it is just a matter of computing the data into shape for the URI. But well while focusing on the charts I found the Ruby Gem googlecharts that makes it friendly to pass the data and save the image.

Ruby Script

The Ruby script needs the following:

It was written with Ruby 1.9
Linux/Procfs for reading the status of processes
Googlecharts: gem install googlecharts
ImageMagick for the command line tool convert (optional)

The Ruby script takes a path as argument, with which it creates a tarball inside a tmpfs directory in order to avoid I/O latencies from a hard-drive. Next it runs a number of commands over the tarball from which it collects benchmark data. The benchmark data is then saved inside CSV files that are reusable within spreadsheet applications. The data is also reused to retrieve charts from the Google Chart API and finally it calls the ImageMagick tool ''convert'' to collect the charts inside a single image. The summary displayed on the standard output is also saved inside a text file.

The script is a bit long for being pasted here (more or less 300 lines) so you can download it from my workstation. If the link doesn't work make sure the web browser doesn't encode ~ (f.e. to "%257E"), I've seen this happening with Safari (inside my logs)! If really you are out of luck, it is available on Pastebin.

Benchmarks

The benchmarks are available for three kinds of data. Compressed media files, raw media files (image and sound, remember that the compression is lossless), and text files from an open source project.

Media Files

Does it make sense at all to compress already compressed data. Obviously not! Let's take a look at what happens anyway.

As you see, compression tools with focus on speed don't fail, they still do the job quick while gaining a few hundred kilo bytes. However the other tools simply waste a lot of time for no gain at all.

So always make sure to use a backup application without compression over media files or the CPU will be heating up for nothing.

Raw Media Files

Will it make sense to compress raw data? Not really. Here are the results:

There is some gain in the order of mega bytes now, but the process is still the same and for that reason it is simply unadapted. For media files there are existing formats that will compress the data lossless with a higher ratio and a lot faster.

Let's compare lossless compression of a sound file. The initial WAV source file has a size of 44MB and lasts 4m20s. Compressing this file with xz takes about 90s, this is very long while it reduced the size to 36MB. Now if you choose the format FLAC, which is doing lossless compression for audio, you will have a record. The file is compressed in about 5s to a size of 24MB! The good thing about FLAC is that media players will decode it without any CPU cost.

The same happens with images, but I lack knowledge about photo formats so your mileage may vary. Anyway, except the Windows bitmap format, I'm not able to say that you will find images uncompressed just like you won't find videos uncompressed... TIFF or RAW is the format provided by many reflex cameras, it has lossless compression capabilities and contains many information about image colors and so on, this makes it the perfect format for photographers as the photo itself doesn't contain any modifications. You can also choose the PNG format but only for simple images.

Text Files

We get to the point where we can compare interesting results. Here we are compressing data that is the most commonly distributed over the Internet.

Lzop and Gzip perform fast and have a good ratio. Bzip2 has a better ratio, and both LZMA and LZMA2 algorithms even better. We can use an initial archive of 10MB, 100MB, or 400MB, the charts will always look alike the one above. When choosing a compression format it will either be good compression or speed, but it will definitely never ever be both, you must choose between this two constraints.

Conclusion

I never heard about the LZO format until I wanted to write this blog post. It looks like a good choice for end-devices where CPU cost is crucial. The compression will always be extremely fast, even for giga bytes of data, with a fairly good ratio. While Gzip is the most distributed compression format, it works just like Lzop, by focusing by default on speed with good compression. But it can't beat Lzop in speed, even when compressing in level 1 it will be fairly slower in matter of seconds, although it still beats it in the final size. When compressing with Lzop in level 9, the speed is getting ridiculously slow and the final size doesn't beat Gzip with its default level where Gzip is doing the job faster anyway.

Bzip2 is noise between LZMA and Gzip. It is very distributed as default format nowadays because it beats Gzip in term of compression ratio. It is of course slower for compression, but easily spottable is the decompression time, it is the worst amongst all in all cases.

Both LZMA and LZMA2 perform almost with an identical behavior. They are using dynamic memory allocation, unlike the other formats, where the higher the input data the more the memory is allocated. We can see the evolution of LZMA is using less memory but has on the other hand a higher cost on CPU time. And we can see they have excellent decompression time, although Lzop and Gzip have the best scores but then again there can't be excellent compression ratio and compression time. The difference between the compression ratio of the two formats is in the order of hundred of kilo bytes, well after all it is an evolution and not a revolution.

On a last note, I ran the benchmarks on an Intel Atom N270 that has two cores at 1.6GHz but I made sure to run the compression tools with only one core.

A few interesting links:

RAWpository: a collection of RAW images
LZMA vs Bzip2 at TheGeekStuff from 2010-06-04
Benchmarks by Stéphane Lesimple with different levels from 2010-03-09
Benchmarks by Advogato also with different levels from 2009-09-25

Searching the desktop with Pinot and Catfish

September 2, 2010
Josh Saddler

I was looking around for desktop search frameworks today, specifically something with a gtk frontend and that required the fewest resources to run.

I discovered Pinot, a dbus-based file index/monitor/search tool. It even comes with a minimal gtk+ interface. I found few reviews on Pinot, and even fewer recent reviews comparing it to other search frameworks like Strigi, Tracker, and Beagle. I also discovered Catfish, a lightweight frontend to several different search services. There's not much out there on integrating Catfish and Pinot, so I forged ahead and wrote my own code, then did some trial-and-error experiments.

All ebuilds are available on my overnight overlay. Instructions for adding the overlay are on the wiki.

Writing the ebuilds

The only ebuild I found for Pinot is sadly out-of-date, and is completely incorrect. Also, it depends on libtextcat, and I never found an ebuild for that.

So, I wrote my own ebuilds for the latest versions of Pinot and libtextcat.

Not content with Pinot's minimal gtk+ interface, I decided to try Catfish, a PyGtk frontend for several different search engines, including Pinot. Catfish is made by the same developer of Midori, a well-respected lightweight WebKit browser. While Catfish's development has been stalled for two years, I figured it was worth a shot, since its user interface is friendlier than Pinot's.

Catfish, like Pinot and libtextcat, is not in Portage, but there is an open bug for its inclusion. However, the ebuild for the latest version needed updating, as it didn't include Strigi or Pinot. So I rewrote it and added descriptive metadata.xml entries for Catfish's and Pinot's USE flags.

There's still a bit of work left on the Catfish ebuild, since there's a QA warning about not leaving precompiled Python object files in /usr/share/catfish. However, the application itself works perfectly. Just need to clean up the install process so that the bytecode doesn't clutter up the filesystem.

Pinot

On first run, Pinot will take a long time to index your files. I pointed it at my user's /home/ directory, which contains 51,000+ files, totaling 9.3GB on a Reiser3 filesystem with tail enabled. That operation took probably half an hour, and that's on a fast SSD! All of Pinot's indexes and databases take up 455MB, bringing my total /home/ usage to about 9.7GB. Pinot typically used about 50% of my CPU while doing so, sometimes dropping down to the 20s and 40s.

However, since Pinot is on a fast SSD, and it's running off a 2.3Ghz dual-core Athlon backed by 4GB RAM, I didn't notice any performance hit while indexing. I'm not running any special kernels or schedulers (like BFS) either; just vanilla-source-2.6.35.4. There was no noticeable lag or slowdown, despite viewing two Thunar windows, working with four terminals, and browsing nine Firefox tabs. My system was only laggy when compiling Pinot and its dependencies.

Once my /home/ was indexed, I searched around. Queries were pretty much instantaneous. There's no easy way to measure the speed of each query, since it's much too fast to time with a stopwatch. That's probably mostly because of the SSD -- as it is, without a desktop indexer/search app, most similar queries take less than a second. Once the initial filesystem index is complete, Pinot drops back to just monitoring directories if you've told it to do so, relying on the inotify feature in the kernel. That drops CPU and memory usage to zero, as near as I can tell. Nice!

Pinot's greatest advantage on my system, at least, is not its speed, but its usefulness for easily finding deeply buried files and folders.

Interestingly, even though Pinot by default is not supposed to index Git, CVS, or SVN repositories, it seems to ignore that setting. Searching for "catfish" turns up a document named catfish tricks and all the ebuilds and git logs that have "catfish" in the title. Apparently Pinot's regex filter isn't very reliable. I probably need to add in another asterisk to disable searching or indexing of any files within a git directory.

Catfish

Catfish mostly works as expected, though it defaults to using "find" rather than "pinot" as its search engine. I haven't yet found a way to set it to use Pinot as the default search provider. Catfish is quick to load, and its layout is fairly intuitive. Sometimes, however, it will just stop working with Pinot, and even though Pinot has indexed my entire home directory, Catfish won't return any search results, though I can get those results by using Pinot's interface. The rest of the time it works great.

Besides offering a friendlier UI for searches, Catfish's real strengths are its useful options, both for presentation and for tying in with my desktop's filemanager. With a couple of commandline switches, Catfish can display thumbnails of various filetypes, use larger icons in search results, use various wrappers for opening and working with files, or even use powerful regex search methods. No, it won't have the awesome preview capabilities of Gloobus, but you also don't have to install all of Gnome to get similar features.

Right out of the box, Catfish will allow you to open files and folders obtained from your search results just by clicking them. I don't know if that works for all filemanagers, but it works with Thunar, which is all I ask.

I like to use Catfish in combination with another powerful feature of Thunar: custom actions. Since Thunar lacks a built-in search bar (aside from a rudimentary go-to alphabetical list when you press a key), how do you integrate a search utility? One way is by adding search functions to the right-click menu.

Open a Thunar window, and go to Edit -> Configure custom actions.
Click the plus icon: +. Give the action a helpful title, description, and icon. "Search" is pretty standard among icon sets, so there should always be one available even when you change themes.
Add the action command: catfish --path=%F
Now go to the Appearance Conditions tab. I left the file pattern as * and checked all boxes, so that no matter where I browse or click, I can launch a Catfish search.
Save the new action and exit Thunar. The next Thunar window you launch will let you right-click anywhere in the browser to open a Catfish search.

You can add any commandline switch you like to the catfish command; just run catfish --help to see the available options.

Thunar's custom action feature is pretty nifty; there are all kinds of things you can put in the context menu. It comes with an example to open a terminal in the current directory. You can create actions to launch applications with a root prompt, convert one image type into another, play media, print or email documents, and more. If you can script it, you can write a trigger for it and stick it in the context menu. Just read the custom actions documentation for many more examples of what you can do with Thunar. Neat!

Looking forward

So, will I keep using Pinot and Catfish? Possibly. While I am leery of any process like Pinot that writes so often to my SSD, and I'm not at all happy with its database size compared to my actual directory size, I do like that it's fast, and responsive. It doesn't seem to have the huge memory leaks or lag that Strigi/Nepomuk do in KDE. In fairness, KDE is trying to get us to believe in the power of the "semantic desktop," while Pinot and Catfish just want to create an easy frontend for finding stuff, without worrying about associating them with various files or activities.

As long as the database doesn't get too much larger, or the indexing/monitoring services use too many resources, I'll keep it around. I've got five+ years of accumulated files in various folders, with more constantly being loaded to and from offline backups. Pinot and Catfish can help with my hard drive spring cleaning, and help me locate stuff that I've just plain forgotten about. The older you get, the less you remember, right?

What I'd really like is a search bar built-in to Thunar, maybe in the upper right corner, backed by Pinot. That'd place everything I need right up front, without having to drill down through right-click menus.

* * *

Speaking of Thunar:

Do you use Thunar? Do you use Dropbox? Xfce developer Mike Massonnet posted a message to the xfce-dev list this morning with a link to a new project: Thunar Dropbox. It integrates the Dropbox service right into your favorite lightweight filemanager. No longer do you have to run Nautilus just to use Dropbox easily. Now you can use it within Thunar.

Original post from Planet Gentoo.

Searching the desktop with Pinot and Catfish

September 2, 2010
nightmorph

I was looking around for desktop search frameworks today, specifically something with a gtk frontend and that required the fewest resources to run.

I discovered Pinot, a dbus-based file index/monitor/search tool. It even comes with a minimal gtk+ interface. I found few reviews on Pinot, and even fewer recent reviews comparing it to other search frameworks like Strigi, Tracker, and Beagle. I also discovered Catfish, a lightweight frontend to several different search services. There’s not much out there on integrating Catfish and Pinot, so I forged ahead and wrote my own code, then did some trial-and-error experiments.

All ebuilds are available on my overnight overlay. Instructions for adding the overlay are on the wiki.

Writing the ebuilds

The only ebuild I found for Pinot is sadly out-of-date, and is completely incorrect. Also, it depends on libtextcat, and I never found an ebuild for that.

So, I wrote my own ebuilds for the latest versions of Pinot and libtextcat.

Not content with Pinot’s minimal gtk+ interface, I decided to try Catfish, a PyGtk frontend for several different search engines, including Pinot. Catfish is made by the same developer of Midori, a well-respected lightweight WebKit browser. While Catfish’s development has been stalled for two years, I figured it was worth a shot, since its user interface is friendlier than Pinot’s.

Catfish, like Pinot and libtextcat, is not in Portage, but there is an open bug for its inclusion. However, the ebuild for the latest version needed updating, as it didn’t include Strigi or Pinot. So I rewrote it and added descriptive metadata.xml entries for Catfish’s and Pinot’s USE flags.

There’s still a bit of work left on the Catfish ebuild, since there’s a QA warning about not leaving precompiled Python object files in /usr/share/catfish. However, the application itself works perfectly. Just need to clean up the install process so that the bytecode doesn’t clutter up the filesystem.

Pinot

On first run, Pinot will take a long time to index your files. I pointed it at my user’s /home/ directory, which contains 51,000+ files, totaling 9.3GB on a Reiser3 filesystem with tail enabled. That operation took probably half an hour, and that’s on a fast SSD! All of Pinot’s indexes and databases take up 455MB, bringing my total /home/ usage to about 9.7GB. Pinot typically used about 50% of my CPU while doing so, sometimes dropping down to the 20s and 40s.

However, since Pinot is on a fast SSD, and it’s running off a 2.3Ghz dual-core Athlon backed by 4GB RAM, I didn’t notice any performance hit while indexing. I’m not running any special kernels or schedulers (like BFS) either; just vanilla-source-2.6.35.4. There was no noticeable lag or slowdown, despite viewing two Thunar windows, working with four terminals, and browsing nine Firefox tabs. My system was only laggy when compiling Pinot and its dependencies.

Once my /home/ was indexed, I searched around. Queries were pretty much instantaneous. There’s no easy way to measure the speed of each query, since it’s much too fast to time with a stopwatch. That’s probably mostly because of the SSD — as it is, without a desktop indexer/search app, most similar queries take less than a second. Once the initial filesystem index is complete, Pinot drops back to just monitoring directories if you’ve told it to do so, relying on the inotify feature in the kernel. That drops CPU and memory usage to zero, as near as I can tell. Nice!

Pinot’s greatest advantage on my system, at least, is not its speed, but its usefulness for easily finding deeply buried files and folders.

Interestingly, even though Pinot by default is not supposed to index Git, CVS, or SVN repositories, it seems to ignore that setting. Searching for “catfish” turns up a document named catfish tricks and all the ebuilds and git logs that have “catfish” in the title. Apparently Pinot’s regex filter isn’t very reliable. I probably need to add in another asterisk to disable searching or indexing of any files within a git directory.

Catfish

Catfish mostly works as expected, though it defaults to using “find” rather than “pinot” as its search engine. I haven’t yet found a way to set it to use Pinot as the default search provider. Catfish is quick to load, and its layout is fairly intuitive. Sometimes, however, it will just stop working with Pinot, and even though Pinot has indexed my entire home directory, Catfish won’t return any search results, though I can get those results by using Pinot’s interface. The rest of the time it works great.

Besides offering a friendlier UI for searches, Catfish’s real strengths are its useful options, both for presentation and for tying in with my desktop’s filemanager. With a couple of commandline switches, Catfish can display thumbnails of various filetypes, use larger icons in search results, use various wrappers for opening and working with files, or even use powerful regex search methods. No, it won’t have the awesome preview capabilities of Gloobus, but you also don’t have to install all of Gnome to get similar features.

Right out of the box, Catfish will allow you to open files and folders obtained from your search results just by clicking them. I don’t know if that works for all filemanagers, but it works with Thunar, which is all I ask.

I like to use Catfish in combination with another powerful feature of Thunar: custom actions. Since Thunar lacks a built-in search bar (aside from a rudimentary go-to alphabetical list when you press a key), how do you integrate a search utility? One way is by adding search functions to the right-click menu.

Open a Thunar window, and go to Edit -> Configure custom actions.
Click the plus icon: +. Give the action a helpful title, description, and icon. “Search” is pretty standard among icon sets, so there should always be one available even when you change themes.
Add the action command: catfish --path=%F
Now go to the Appearance Conditions tab. I left the file pattern as * and checked all boxes, so that no matter where I browse or click, I can launch a Catfish search.
Save the new action and exit Thunar. The next Thunar window you launch will let you right-click anywhere in the browser to open a Catfish search.

You can add any commandline switch you like to the catfish command; just run catfish --help to see the available options.

Thunar’s custom action feature is pretty nifty; there are all kinds of things you can put in the context menu. It comes with an example to open a terminal in the current directory. You can create actions to launch applications with a root prompt, convert one image type into another, play media, print or email documents, and more. If you can script it, you can write a trigger for it and stick it in the context menu. Just read the custom actions documentation for many more examples of what you can do with Thunar. Neat!

Looking forward

So, will I keep using Pinot and Catfish? Possibly. While I am leery of any process like Pinot that writes so often to my SSD, and I’m not at all happy with its database size compared to my actual directory size, I do like that it’s fast, and responsive. It doesn’t seem to have the huge memory leaks or lag that Strigi/Nepomuk do in KDE. In fairness, KDE is trying to get us to believe in the power of the “semantic desktop,” while Pinot and Catfish just want to create an easy frontend for finding stuff, without worrying about associating them with various files or activities.

As long as the database doesn’t get too much larger, or the indexing/monitoring services use too many resources, I’ll keep it around. I’ve got five+ years of accumulated files in various folders, with more constantly being loaded to and from offline backups. Pinot and Catfish can help with my hard drive spring cleaning, and help me locate stuff that I’ve just plain forgotten about. The older you get, the less you remember, right?

What I’d really like is a search bar built-in to Thunar, maybe in the upper right corner, backed by Pinot. That’d place everything I need right up front, without having to drill down through right-click menus.

* * *

Speaking of Thunar:

Do you use Thunar? Do you use Dropbox? Xfce developer Mike Massonnet posted a message to the xfce-dev list this morning with a link to a new project: Thunar Dropbox. It integrates the Dropbox service right into your favorite lightweight filemanager. No longer do you have to run Nautilus just to use Dropbox easily. Now you can use it within Thunar.

August Xfce desktop

August 11, 2010
nightmorph

This month’s Xfce desktop:

icons: awoken
gtk+: axiomd
xfwm4: axiomd
background: The Crown of the Sun
cursor: Obsidian xcursors

The uncluttered version that shows off the wallpaper and conky configuration:

I built my environment around the wallpaper, an image of a solar eclipse, bringing out the haunting beauty of the sun’s corona. I cropped this photo from APOD to fit my screen dimensions.

With such a beautiful cosmic backdrop, I had to search for matching theme elements. I used the same window manager and gtk+ theme, axiomd. It’s nice and dark, with moon dust highlights.

It’s been a long, long time since I last installed conky. I decided to give it another go, now that it’s capable of doing beautiful things with Cairo and Lua. I was especially impressed by this configuration I found on the Arch Linux forums.

I made a few modifications to the ring meter scripts for conky. The end result is pretty decent, considering I haven’t done much heavy tweaking yet. You’ll need to emerge conky with the lua-cairo and lua-imlib USE flags set, or else the scripts won’t function.

The rings frame the corona, with just a touch of transparency to blend it into the deeper space backdrop. From left to right, the rings measure: CPU core 2 load, memory usage, /usr/portage, /, and CPU core 1 load. Adding, removing, shrinking, or expanding rings is pretty easy. The ring scripts are well-commented. The biggest obstacle I’ve run into so far is adapting the configs to my screen size, ensuring that items are placed just right. I could tweak the ring’s curvature to precisely match the eclipse, but it’s close enough as it is.

I picked up the icon set because it’s very attractive for both dark and light environments. It’s very flexible, with numerous alternative icon versions, extra standalone icons, many distribution logos, and a number of helpful scripts inside the tarball. I used one of the included Gentoo logos as my Xfce menu icon.

The mouse cursor theme is glossy and dark, yet it has a few blue animations to add a splash of color. To get it, run emerge obsidian-xcursors.

Applications

In the foreground, Decibel Audio Player is running in the “mini” mode, playing a beautiful track by Planet Boelex.

Thunar is the filemanager open in the background. An Xfce terminal displays an eix-sync operation.

Running in the panel are an assortment of application launchers, including customized dropdown menus for frequently used programs.

After the Xfce menu, launchers, and taskbar, the notification area holds the tray icon for Decibel Audio Player. Then a genmon applet that runs my lastsync.sh Portage script. After genmon, there are plugins for volume control, the Orage clock, and local weather.

Now that I’m using conky, I can probably find a way to integrate the weather, clock, and Portage sync script with the existing ring meters, or even run it in another instance off to the side. Anything to reduce my crowded top panel.

August Xfce desktop

August 11, 2010
Josh Saddler

This month's Xfce desktop:

icons: awoken
gtk+: axiomd
xfwm4: axiomd
background: The Crown of the Sun
cursor: Obsidian xcursors

The uncluttered version that shows off the wallpaper and conky configuration:

I built my environment around the wallpaper, an image of a solar eclipse, bringing out the haunting beauty of the sun's corona. I cropped this photo from APOD to fit my screen dimensions.

With such a beautiful cosmic backdrop, I had to search for matching theme elements. I used the same window manager and gtk+ theme, axiomd. It's nice and dark, with moon dust highlights.

It's been a long, long time since I last installed conky. I decided to give it another go, now that it's capable of doing beautiful things with Cairo and Lua. I was especially impressed by this configuration I found on the Arch Linux forums.

I made a few modifications to the ring meter scripts for conky. The end result is pretty decent, considering I haven't done much heavy tweaking yet. You'll need to emerge conky with the lua-cairo and lua-imlib USE flags set, or else the scripts won't function.

The rings frame the corona, with just a touch of transparency to blend it into the deeper space backdrop. From left to right, the rings measure: CPU core 2 load, memory usage, /usr/portage, /, and CPU core 1 load. Adding, removing, shrinking, or expanding rings is pretty easy. The ring scripts are well-commented. The biggest obstacle I've run into so far is adapting the configs to my screen size, ensuring that items are placed just right. I could tweak the ring's curvature to precisely match the eclipse, but it's close enough as it is.

I picked up the icon set because it's very attractive for both dark and light environments. It's very flexible, with numerous alternative icon versions, extra standalone icons, many distribution logos, and a number of helpful scripts inside the tarball. I used one of the included Gentoo logos as my Xfce menu icon.

The mouse cursor theme is glossy and dark, yet it has a few blue animations to add a splash of color. To get it, run emerge obsidian-xcursors.

Applications

In the foreground, Decibel Audio Player is running in the "mini" mode, playing a beautiful track by Planet Boelex.

Thunar is the filemanager open in the background. An Xfce terminal displays an eix-sync operation.

Running in the panel are an assortment of application launchers, including customized dropdown menus for frequently used programs.

After the Xfce menu, launchers, and taskbar, the notification area holds the tray icon for Decibel Audio Player. Then a genmon applet that runs my lastsync.sh Portage script. After genmon, there are plugins for volume control, the Orage clock, and local weather.

Now that I'm using conky, I can probably find a way to integrate the weather, clock, and Portage sync script with the existing ring meters, or even run it in another instance off to the side. Anything to reduce my crowded top panel.

Original post from Planet Gentoo.

Documentation status report, part 2

August 2, 2010
Josh Saddler

Been meaning to provide a follow-up to the last documentation report for a few days now, as well as a couple other news items.

Gentoo in the press

LWN ran an article on Linux distributions for PowerPC machines. Gentoo gets the top mention.

Package maintenance

I had the treecleaner team remove a package I maintain, WhaawMP. I hadn't used it in a long time and was no longer interested in maintaining it. Upstream seemed to be dead, and there were several user interface bugs and crashers in daily use. Also, I didn't want to put in the work on trying to make the ebuild comply with the stupid Python3 stabilization forced on all our users. Thanks to Jeremy for punting it. If you're looking for a lightweight video player alternative, please read the comment I left on the bug. bug 315067

Documentation status

Now, down to the docs work I've done, mostly on the 21st and 22nd, after the last status report. The biggest news is that I finished rewriting the handbooks for the autobuilds. In two days, I did four architecture handbooks. I put in some long hours, but it felt good to finally have them all done.

Handbook updates

Sparc: updated the handbook for the autobuilds. Also fixed the kernel config "conditionals" by adding in version strings to the handbook index code, so that the latest stable version magically appears in the guide. Truly XSL is an awesome thing. The former GDP lead once said that writing for the handbook is almost like programming it. The code is designed to take variables, drop them in place for given conditions, and to test for those conditions depending on the presence of other variables (which we call "keys"), which architecture you're viewing, etc, and then drop those variables in to the rendered page. Once the XSL framework is in place, though, maintaining the GuideXML in the handbooks is much easier. We just drop the newest variable for LiveCD ISO size into the appropriate arch index, and it shows up as "115 MB" in that handbook. You can see some of our keys and how we use them.
PPC: updated handbook for the autobuilds. bug 260403, bug 292726, bug 234310
PPC: fixed the abstracts in the index. There was a lot of abstracts in the toplevel index. Abstracts are supposed to be in each chapter, so that the index just picks them up and includes them in the rendered page. Our XSL is frickin' amazing.
PPC: removed the warning and kernel config for voluntary preemption. I asked the PPC team if this old warning was still valid, and it turns out that the preempt code in the kernel actually works okay. Thanks to Joe for investigating.
PPC64: updated handbook for the autobuilds. bug 260403, bug 292726, bug 234310
MIPS: updated handbook. MIPS still doesn't have weekly stages or LiveCDs. Because MIPS media dates back to 2008, there are some things I can't fix in the handbook, like using eselect for profile management. If it's not in the stages or CDs, I can't document it. The profiles in particular have been significantly reworked for 10.0, and like everything else, will require some heavy rewrites in the handbook. The team is aware of how ancient their releases are, and are working to put out new media for more recent MIPS chips. bug 260403, bug 292726, bug 234310
AMD64: fixed a broken link to the AMD64 FAQ

Desktop doc updates

Xfce guide: updated the firefox package name. I was watching #gentoo-commits and happened to notice that nirbheek changed the name from mozilla-firefox to just firefox.

Other doc updates

OpenRC migration: added a note on kernel module variables and how OpenRC assigns priority. bug 269349
vpnc guide: updated the kernel configuration and adjusted the GuideXML to match coding standards. Thanks to tanderson for reporting via IRC. Also changed the text on vpnc overwriting /etc/resolv.conf. Old versions didn't overwrite it, but recent releases do. bug 330345
Optimization guide: I updated the GCC documentation links to point at the 4.4 series, since it's been stable for awhile now. The links were pointing to the old 4.3 series.

Project page updates

Updated project status notes: GMN, GWN, Installer

Website updates

Where: removed the last reference to 2008.0 media, as the handbooks have all been switched to the autobuilds. Only HPPA still referred to the 2008.0 LiveCD, since that's the last available release. That information has been in the HPPA handbooks for a long time.
Contact: added another note saying that PR does not provide user support. We've been getting a lot of emails asking us for support, so I've been adding notes to our project page and the toplevel contact page.
Lists: updated the list of mailing lists with information on closed and inactive list. Thanks to Jeremy for the patch. bug 291860

Original post from Planet Gentoo.

Documentation status report, part 2

July 23, 2010
nightmorph

Been meaning to provide a follow-up to the last documentation report for a few days now, as well as a couple other news items.

Gentoo in the press

LWN ran an article on Linux distributions for PowerPC machines. Gentoo gets the top mention.

Package maintenance

I had the treecleaner team remove a package I maintain, WhaawMP. I hadn’t used it in a long time and was no longer interested in maintaining it. Upstream seemed to be dead, and there were several user interface bugs and crashers in daily use. Also, I didn’t want to put in the work on trying to make the ebuild comply with the stupid Python3 stabilization forced on all our users. Thanks to Jeremy for punting it. If you’re looking for a lightweight video player alternative, please read the comment I left on the bug. bug 315067

Documentation status

Now, down to the docs work I’ve done, mostly on the 21st and 22nd, after the last status report. The biggest news is that I finished rewriting the handbooks for the autobuilds. In two days, I did four architecture handbooks. I put in some long hours, but it felt good to finally have them all done.

Handbook updates

Sparc: updated the handbook for the autobuilds. Also fixed the kernel config “conditionals” by adding in version strings to the handbook index code, so that the latest stable version magically appears in the guide. Truly XSL is an awesome thing. The former GDP lead once said that writing for the handbook is almost like programming it. The code is designed to take variables, drop them in place for given conditions, and to test for those conditions depending on the presence of other variables (which we call “keys”), which architecture you’re viewing, etc, and then drop those variables in to the rendered page. Once the XSL framework is in place, though, maintaining the GuideXML in the handbooks is much easier. We just drop the newest variable for LiveCD ISO size into the appropriate arch index, and it shows up as “115 MB” in that handbook. You can see some of our keys and how we use them.
PPC: updated handbook for the autobuilds. bug 260403, bug 292726, bug 234310
PPC: fixed the abstracts in the index. There was a lot of abstracts in the toplevel index. Abstracts are supposed to be in each chapter, so that the index just picks them up and includes them in the rendered page. Our XSL is frickin’ amazing.
PPC: removed the warning and kernel config for voluntary preemption. I asked the PPC team if this old warning was still valid, and it turns out that the preempt code in the kernel actually works okay. Thanks to Joe for investigating.
PPC64: updated handbook for the autobuilds. bug 260403, bug 292726, bug 234310
MIPS: updated handbook. MIPS still doesn’t have weekly stages or LiveCDs. Because MIPS media dates back to 2008, there are some things I can’t fix in the handbook, like using eselect for profile management. If it’s not in the stages or CDs, I can’t document it. The profiles in particular have been significantly reworked for 10.0, and like everything else, will require some heavy rewrites in the handbook. The team is aware of how ancient their releases are, and are working to put out new media for more recent MIPS chips. bug 260403, bug 292726, bug 234310
AMD64: fixed a broken link to the AMD64 FAQ

Desktop doc updates

Xfce guide: updated the firefox package name. I was watching #gentoo-commits and happened to notice that nirbheek changed the name from mozilla-firefox to just firefox.

Other doc updates

OpenRC migration: added a note on kernel module variables and how OpenRC assigns priority. bug 269349
vpnc guide: updated the kernel configuration and adjusted the GuideXML to match coding standards. Thanks to tanderson for reporting via IRC. Also changed the text on vpnc overwriting /etc/resolv.conf. Old versions didn’t overwrite it, but recent releases do. bug 330345
Optimization guide: I updated the GCC documentation links to point at the 4.4 series, since it’s been stable for awhile now. The links were pointing to the old 4.3 series.

Project page updates

Updated project status notes: GMN, GWN, Installer

Website updates

Where: removed the last reference to 2008.0 media, as the handbooks have all been switched to the autobuilds. Only HPPA still referred to the 2008.0 LiveCD, since that’s the last available release. That information has been in the HPPA handbooks for a long time.
Contact: added another note saying that PR does not provide user support. We’ve been getting a lot of emails asking us for support, so I’ve been adding notes to our project page and the toplevel contact page.
Lists: updated the list of mailing lists with information on closed and inactive list. Thanks to Jeremy for the patch. bug 291860

Documentation status report

July 20, 2010
Josh Saddler

I've been smashing documentation bugs left and right since getting back from vacation, as well as searching out old documents and project pages and fixing 'em up.

Most of the updates have been to the installation & Portage handbooks, but there are many changes to the other documentation, including the desktop guides for graphics cards, and my Xfce guide. There's even a new doc on Logcheck, written by one of our developers.

Here's a brief summary of what I've done in the last week:

New documentation:

Logcheck guide: Thanks to phajdan.jr. bug 322223

Handbook updates:

Change ccache recommendation; it's really only for developers: bug 327945
Use layman rather than gensync for working with Portage overlays: bug 305047
Add another note on IA32 emulation in the kernel for (non-)multilib users: bug 326691
Fix file verification process for the Alpha, AMD64, ARM, HPPA, IA64, and x86 handbooks: bug 283402. This was an old one: when we went to the weekly media autobuilds, Release Engineering signed the files with a new GPG key, and changed how the files were signed. All the handbooks need to be updated, as they still have the old keys and instructions from the previous release.
Update installation instructions for the autobuilds. Completed Alpha, AMD64, ARM, HPPA, IA64, and x86: bug 283402, bug 292726, bug 260403. Still need to do PPC, PPC64, Sparc, and possibly MIPS, if they have sufficiently recent media.
Use -march=core2 for recent Intel EM64T chips, rather than the old -march=nocona. Fix MCE section of kernel config. Add new Atom processor type: bug 323381
Update Grub documentation links. Upstream removed all grub legacy instructions in favor of grub2, which won't be stable any time soon. Fixed the handbooks and other docs to use the offsite Grub Wiki: bug 328679
Fix a missing fstab. Gave ARM the same generic fstab example as the other arches: bug 328095
category/package move for chkrootkit

Desktop doc updates:

Xfce guide: Change USE flags for opera; no longer needs qt-static. bug 328087
nVidia guide: Use new driver installation methods. Add links to xorg-server guide to get X configured before dealing with nVidia-specific issues. Update kernel and module info. bug 307481
ATI FAQ: General cleanups. Add R800 (Evergreen) info. Remove old GATOS project text. Update Catalyst availability section.

Other doc updates:

FAQ: Update Grub documentation link. Update gcc -march info for x86 and AMD64. Fix internal GuideXML code. bug 328679
Quickinstall guides (x86, LVM2+RAID): Fix ccache recommendation. bug 327945
LDAP guide: use more recent 2.3 configuration file shipped with the ebuild. bug 325497
SHOUTcast guide: Miscellaneous typo fixes. bug 323401
IPv6 guide: update net-dns/totd info now that it's stable. Fix GuideXML and minor text issues throughout. bug 326771. This doc presents an ongoing problem, because it recommends a package it shouldn't. I sent an email to the gentoo-dev mailing list asking for help with this one.
AMD64 FAQ: Update Flash installation info. Adobe decided to drop 64-bit versions (again) beginning in version 10.1, and our developers had to mask 10.0 for security reasons. This means that there is no Adobe Flash for non-multilib profile users. And nspluginwrapper is (once again) too unstable, so 32-bit Flash with a 64-bit browser is not recommended. Probably will have to install firefox-bin or some other 32-bit browser. Stupid Adobe.
UTF-8 guide: Fix wrong category for the Xfce terminal, leftover from when it was moved out of xfce-extra. bug 328977
Fix metadoc index for retired developers and add logcheck guide entry

Project page updates:

Overlays userguide: Extensive GuideXML, grammar, etc. rewrites to make the guide more readable and more helpful. Add more instructions for things like keywording packages per the Portage handbook. Add SCM homepage links. This series of updates was prompted by bug 305047, the gensync to layman change.
GUIs: Update retired developers
PR: Add note stating that PR does not offer user support, and list available support resources. Hopefully this will cut down on the amount of support requests the PR team receives in our inbox every month.

Website updates:

IRC: Add the Qt project channel, #gentoo-qt. bug 328665

One of my fellow developers, jkt, has been helping out a bit in the last couple of weeks, closing bug 301840 and bug 325885. This was especially important when I was on vacation and then out sick. I'm always happy when someone besides me steps up and gets our docs into shape. Thanks, Jan!

So that's about it. There are still plenty of open documentation bugs, but the list has shrunk significantly. My biggest project now is to finish the rest of the handbooks for the weekly autobuild instructions. The rest of our open bugs will require just as many hours and days to fix, as large portions of our handbooks and guides will need to be rewritten. Hopefully I can at least get the autobuild updates done in the next few days.

Original post from Planet Gentoo.

Xfce

Subdomains

A shell script msgdiff.sh

A Python script podiff.py

Preparation

Ruby Script

Benchmarks

Media Files

Raw Media Files

Text Files

Conclusion

Preparation

Ruby Script

Benchmarks

Media Files

Raw Media Files

Text Files

Conclusion

Gentoo in the press

Package maintenance

Documentation status

Handbook updates

Desktop doc updates

Other doc updates

Project page updates

Website updates

Gentoo in the press

Package maintenance

Documentation status

Handbook updates

Desktop doc updates

Other doc updates

Project page updates

Website updates

New documentation:

Handbook updates:

Desktop doc updates:

Other doc updates:

Project page updates:

Website updates:

External Blogs

Xfce Resources

Archives