Codingfulness

RSS

Tune your PostgreSQL and get 10x faster tests

For me, the biggest annoyance when running tests is the time that they need to be finished. In a customer’s project, we are using RSpec with PostgreSQL. I noticed that, when running the tests, the hard disk was specially noisy. I was curious about what was having so many access to the disk, so I used pidstat to find out it.

$ sudo pidstat -d 3 -T TASK
Linux 3.2.0-2-amd64 (silencio)  12/09/12  _x86_64_  (4 CPU)

01:29:21          PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
01:29:24          222      0,00      1,33      0,00  jbd2/sda5-8
01:29:24         1562      0,00     40,00      0,00  jbd2/dm-7-8
01:29:24         4203      0,00      4,00      0,00  gnome-terminal
01:29:24         6373      0,00     41,33      0,00  postgres
01:29:24         6376      0,00     10,67      0,00  postgres
01:29:24         6378      0,00    165,33      0,00  postgres
01:29:24         6579      0,00      8,00      0,00  ruby
01:29:24         6612      0,00    480,00     72,00  postgres

01:29:24          PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
01:29:27         1562      0,00     34,67      0,00  jbd2/dm-7-8
01:29:27         4203      0,00      1,33      0,00  gnome-terminal
01:29:27         6373      0,00     41,33      0,00  postgres
01:29:27         6376      0,00     13,33      0,00  postgres
01:29:27         6378      0,00    165,33      0,00  postgres
01:29:27         6612      0,00    226,67      0,00  postgres

The output is longer, but this fragment is enough to see that PostgreSQL is doing a lot of I/O operations. If you take a closer look, you will notice that all operations are writing (the kB_rd/s column is always 0,00). We can assume that (almost) everything needed to manage the queries is cached on memory, so no read operation is needed.

I was wondering how I could reduce the write accesses. Thus, the performance should improve. I decided to play with the configuration options for the PostgreSQL WAL. In Debian, these options can be modified in /etc/postgresql/9.1/main/postgresql.conf.

The changes are:

work_mem = 100MB
fsync = off
wal_writer_delay = 10000

With this three options I ran the tests again, and I was impressed by the results. Previously, the tests was taking 21 minutes to finish:

Finished in 22 minutes 18.48 seconds
146 examples, 0 failures

Now, the time is just 2 minutes. This is, 10x faster:

Finished in 2 minutes 53.25 seconds
146 examples, 0 failures

Obviously, your mileage may vary. The difference will be smaller if your disk are fast. But, anyway, this will improve the time needed to run the tests.

Don’t do this in production

The default values are better for a production environment. What we are doing here is to reduce the disk access, but that is not a good idea if you are managing real data. In a test/development environment, we can loss data and still be alive

A better console with autols

Some years ago (on November 10th, 2006) I wrote one of my favourite console hacks.

At that time I was digging in Bash and discovered the PROMPT_COMMAND variable. It defines a command line that will be run before the prompt is showed. I thought this might be useful to see the files in a directory when enters in it, so the cdls sequence will not be so repeated. There are some constraints:

  • It has to be fast. I use the console a lot and don’t like waiting to the prompt to be ready.
  • The content should be limited. If a directory has hundreds or thousands of files it should not list them.
  • I would like to have a little header with a summary of the content (total size, number of directories, etc).

After some minutes I had coded a dirty C program which makes everything, with a little Bash support. Something like this:

I’ve been really enjoying this for years, and some weeks ago I realized that almost nobody knows this hack. Last friday, in a technical meeting at Aentos, I decided to release this to anyone who is interested. I refactored the code (mostly in order to have arguments, since I had harcoded my own options) and publish this at GitHub.

The project is available at https://github.com/ayosec/autols. Enjoy it!

How to detect the change directory.

An insteresting part of this hack is how to detect when the prompt has gone to a different directory. In Zsh it is very easy, since it provides a chpwd function. With the add-zsh-hook function the code is even easier.

autoload add-zsh-hook

launch_autols() {
  eval autols $AUTOLS_OPTIONS
}

add-zsh-hook chpwd launch_autols

Bash has nothing like a change directory hook. RVM replaces the cd with a custom function to detect the directory changes. This hack is not useful for me since I use many ways to change directory. Concretly, autocd, pushd/popd, the awesome cdargs tool and even the CDPATH variable (yes, I use all of them). None of these methods will invoke the cd function. The best way (although not the nicest) to detect a new directory is to check the current directory every time the prompt is showed.

The result is a little uglier than the Zsh equivalent, but it works well.

function launch_autols {
  if [ ! x$AUTOLS_LAST_DIR = x$PWD ]
  then
    AUTOLS_LAST_DIR=$PWD
    eval autols $AUTOLS_OPTIONS
  fi
}

AUTOLS_LAST_DIR=$PWD
PROMPT_COMMAND="launch_autols;$PROMPT_COMMAND"

Was thinking about Java

I’ve just found this article with 9 reasons to use Java, on Twitter. There are some good points, but the author didn’t say anything about the language itself, only about its ecosystem.

I don’t hate Java, although many people do. I believe that many of these people hate Java just because a lot of people around them do it. Indeed, in some communities, those who don’t hate Java are seen like weirdos.

I see Java as a very powerful language, but with a too verbose and hard to write syntax. If you usually code with languages like Ruby or Python, you can be scared with some Java snippets.

For example, take a look at this Ruby code:

Now, something similar in Java (they are two files):

Maybe, you are thinking that this is madness. What about a C version?

Make your own opinion.

The Java Virtual Machine

IMO, the strongest point of Java is its virtual machine. The JVM is one of the best VMs that you can use right now. Fortunately, nowadays there are a lot of JVM languages capable to be run in the JVM. My favorite is Mirah.

Look at this:

This code will compile in pure Java. There is no dirty tricks, just a lot of work done by the really smart Mirah compiler.

(Updated)

Esteban Manchado wrote the Scala version of this example. Enjoy it!

A little reason for CoffeeScript

In Backbone.js,

var titles = Books.map(function(book) {
  return book.get("title");
});

Just do

titles = Books.map (book) -> book.get "title"

Inline images in Twitter

Some Twitter clients, like Echofon and many others, can show the images inside the tweet. Something that we can call as inlined images. This is awesome to some people and noisy for others. I like that feature, although I think it is more important “to be fast” using Twitter. After some time playing with a lot of clients (in desktop and mobile devices) I ended up using Twicca for my Android devices and the web version for desktop, with Firefox.

Twicca has no “inlined images”, and its interface is not as glamorous as others clients, but this thing is really powerful once you have been using it during some days. The web version is functional enough the most of the time, but it would be better with a few features, like showing the images.

A few weeks ago I discovered the dotjs Add-on for Firefox, which is a port of the original dotjs (only available for Google Chrome). dotjs is something like a “GreaseMonkey for fast and little hacks”. Fast because you only need to create a ~/.js/domain.tld.js file (without using the browser interface, just your code editor) and put there the JavaScript code that would be run every time you open a page in domain.tld or www.domain.tld. And little because you are limited to the things you normally do in the page itself (I have needed the GM_xmlhttpRequest for cross-domain requests) and you don’t have something like a domain manager, so you can not use patterns like “*.foo.bar” or have several files for the same domain (like hack1.google.com.js and hack2.google.com.js). Anyway, it is perfect for many cases.

I wrote an S³E (simple, silly and stupid extension) using dotjs to open YouTube videos in its own window. That was funny and very easy.

Yesterday I read an interesting post from Esteban Manchado. When I took a look to his Opera extension I thought that it would be nice to implement the inline images feature in the web version. After all, Twitter already shows images when you click on the tweet and open the side bar, so it should be easy.

The first step was to find out how Twitter gets the image preview. I open Firebug and click in any tweet with a link to an external image in a supported provider (like twitpic, yfrog or whatever). When you open it you can see several requests to the API. Open the URLs and you will see a lot of data. None of that data is useful to detect the image preview location. So, I guessed that the answer would be found in the JavaScript code. There are a couple of files, but one of them (named phoenix.bundle.js) has the text “yfrog” and “twitpic” inside. Since the code is obfuscated I pasted it in jsbeautifier.org. After some minutes digging into the de-obfuscated code I found a way to reuse that logic. Basically, it is:

twttr.media.resolveImageUrl(url, 300, {
 success: function(image_preview_url) { },
 error: function(reason) { }
});

The twttr.media object is not accessible directly in a dotjs file. Like in GreaseMonkey I had to use unsafeWindow.

The result is in https://gist.github.com/1073923:

Enjoy!