Tim Retout's www presence

Thu, 03 Jan 2013

New Year

Another year. 2012 was busy - I got moved house twice, changed jobs, and got married. In 2013, I should become a father, fingers crossed (due mid-April). Change is a familiar friend now.

I just listened to Tom Armitage speaking about coding on Radio 4 - I /think/ the podcast mp3 link will work for people outside the UK, but the iPlayer probably won't. If you can get hold of it, it's worth the 20 minutes of your time.

If I had to make a New Year's resolution, it would be to listen to more Radio 4 - there's such a lot of it, though. I'm going to try subscribing to some of their podcasts and listening to them on my commute - timeshifting some of the best bits. Might work.

Posted: 03 Jan 2013 23:09 | Tags: , ,

Fri, 21 Dec 2012

Perl Forking, Reference Counting and Copy-on-Write

I have been dealing with an interesting forking issue at work. It happens to involve Perl, but don't let that put you off.

So, suppose you need to perform an I/O-bound task that is eminently parallelizable (in our case, generating and sending lots of emails). You have learnt from previous such attempts, and broken out Parallel::Iterator from CPAN to give you easy fork()ing goodness. Forking can be very memory-efficient, at least under the Linux kernel, because pages are shared between the parent and the children via a copy-on-write system.

Further suppose that you want to generate and share a large data structure between the children, so that you can iterate over it. Copy-on-write pages, should be cheap, right?

my $large_array_ref = get_data();

my $iter = iterate( sub {
    my $i = $_[1];
    my $element = $large_array_ref->[$i];

    ...
}, [0..1000000] );

Sadly, when you run your program, it gobbles up memory until the OOM killer steps in.

Our first problem was that the system malloc implementation was less good for this particular task than Perl's built-in malloc. Not a problem, we were using perlbrew anyway, so a quick few experimental rebuilds later and this was solved.

More interesting was the slow, 60MB/s leak that we saw after that. There were no circular references, and everything was going out of scope at the end of the function, so what was happening?

Recall that Perl uses reference counting to track memory allocation. In the children, because we took a reference to an element of the large shared data structure, we were effectively writing to the relevant page in memory, so it would get copied. Over time, as we iterated through the entire structure, the children would end up copying almost every page! This would double our memory costs. (We confirmed the diagnosis using 'smem', incidentally. Very useful.)

The copy-on-write semantics of fork() do not play well with reference-counted interpreted languages such as Perl or CPython. Apparently a similar issue occurs with some mark-and-sweep garbage-collection implementations - but Ruby 2.0 is reputed to be COW-friendly.

All was not lost, however - we just needed to avoid taking any references! Implement a deep copy that does not involve saving any intermediate variables along the way. This can be a bit long-winded, but it works.

my $large_array_ref = get_data();

my $iter = iterate( sub {
    my $i = $_[1];
    my %clone;

    $clone{id}  = $large_array_ref->[$i]{id};
    $clone{foo} = $large_array_ref->[$i]{foo};
    ...
}, [0..1000000] );

This could be improved if we wrote an XS CPAN module that cloned data structures without incrementing any reference counts - I presume this is possible. We tried the most common deep-copy modules from CPAN, but have not yet found one that avoids reference counting.

This same problem almost certainly shows up when using the Apache prefork MPM and mod_perl - even read-only global variables can become unshared.

I would be very interested to learn of any other approaches people have found to solve this sort of problem - do email me.

Posted: 21 Dec 2012 22:38 | Tags:

Sat, 27 Oct 2012

Recruiting

On Monday, I need to start hiring a Perl programmer - or, at least, a programmer willing to write Perl. I work for a website where people post their CVs, which tends to help - although this will mean that my boss wants me to do it without going through recruiters. Which is fine. I just have to use the search interface that recruiters normally use.

And looking through all these CVs, it dawned on me that I don't have a clue whether any of the people are suitable for the job. I have to search for keywords that we think might be relevant - "Perl", I guess - and then sort through the hundreds of people who come back from the search. It's very painful, because you can't really judge a CV without reading it - and even that won't necessarily tell you the important things about that person. Do they actually write good code? Do they work well in a team?

When searching for a piece of information, you probably need just one website to answer your question; when searching for job candidates, I guess you need to see a range of CVs. And then you need to interview them; this could take weeks.

Sucks to be me.

Posted: 27 Oct 2012 17:21 | Tags:

Sat, 06 Oct 2012

Wedding

Today, Kate and I got married!

Thank you to everyone who sent best wishes. A big wedding party will follow in the next 18 months or so (when we've saved some money!), to which many more people will be invited. This was the minimum viable subset of wedding - we got the product to market early, and both stakeholders are very satisfied.

We had dinner at the Caribbean restaurant in town which is always busy - turns out there's a reason for that. They do these one-pot meals in enamel dishes, which tasted amazing.

It is a strange new feeling to be a husband. :)

Posted: 06 Oct 2012 21:00 | Tags:

Tue, 02 Oct 2012

The Library - a challenge

I visited my local library at the weekend, on a whim. (The weekend before, I'd been to the British Library for the first time, so I guess this inspired me.)

The computing sections at public libraries do not tend to inspire me, on the whole. Southampton's is actually relatively good - there are four tall bookcases assigned to computing, although one is introductory IT (Word and Excel), and another seems to be assigned to graphic design (Photoshop). Still, the two in the middle do have some programming texts - you just have to work around "Red Hat Linux 7 for Dummies" and the like.

I think the trouble is, computing books become obsolete so quickly; you don't get point releases of Shakespeare every six months. I also suspect the low demand for serious computing texts creates a vicious circle, where the type of people who might want that sort of thing know better than to look in the library for it.

It got me thinking - could this be changed? Could I change it?

Posted: 02 Oct 2012 20:57 | Tags:

Enscript 1.6.6

The other day, I released GNU Enscript 1.6.6. You should all go and send me bug reports.

It's basically the same as the 1.6.5.90 release, but more official. (I'm bored of the long version numbers - maybe I ought to knock a decimal point off.)

Posted: 02 Oct 2012 20:28 | Tags: ,

Thu, 13 Sep 2012

Inbox zero

My email's been out of control for a while now. I've noticed a correlation between the state of my email and my state of mind - I don't know which way the causation flows, if any.

At the weekend (after assembling the shelves), I archived my entire inbox. Again. But I find the hard part with email bankruptcy is preventing the entire cycle recurring. This time, I was more drastic.

I deleted all my labels, and all but three of my filters. I removed the Smart Labels, the chat widget, the calendar integration, the Google+ circles (more on that soon). There are no distractions in my inbox any more. I'm forcing myself to process every non-spam email that comes to my address.

(Yes, I'm embarassed to say that I use Gmail - it seemed like a good idea at the time, because of the spam filtering. I occasionally notice bugs with ancient emails becoming unarchived or unread, and wonder if Google uses a probabilistic data structure to access all those attributes quickly enough. Can't investigate, though, because it's Gmail. Consider this the first step towards moving away.)

So rather than hiding the deluge in the background, I'm unsubscribing from a lot of mailing lists, and informing marketers that I don't want their communications. It turns out that I get a lot less email than I feared - and processing them this way means that high-traffic sources get removed first.

My remaining problem is what to do with email from mailing lists like debian-private - I don't really care enough to read everyone's vacation plans, but unsubscribing doesn't seem like the right thing either - there's (obviously) no online archive for this type of mailing list. For the moment, I've added a filter back to hide just vacation messages, but there's a more general problem lurking here.

Posted: 13 Sep 2012 21:33 | Tags:

Mon, 10 Sep 2012

Shelves

Bookshelves are a wonderful thing.

Kate and I have been living without enough book space since we moved in together - all our shelves have been double-stacked. Finding anything is a pain, because it's impossible to tell if you still own the book you're looking for.

No longer. One short visit to Ikea (plus delivery and assembly), and we have a huge 5x5 Expedit bookshelf dividing our living area. It has comfortably absorbed our entire collection (plus boxes) - we have double-stacked it, but can access both sides. Books are grouped by subject matter, so can actually be found. (We have a huge O'Reilly collection, but gave up sorting by publisher when we discovered various non-computing books in there...)

Now I can sit on the sofa, and have (paper) reading material within arm's reach. It's good to have something more to our living space than a television and a pair of laptops. As an added bonus, you can't see the washing up from the living room any more.

Screw e-books.

Posted: 10 Sep 2012 21:48 | Tags:

Wed, 06 Jun 2012

NMUs on the go

Today, as an experiment, I attempted to fix a Debian bug while on the train to work.

I use a 3G card from Three.co.uk in my Lenovo Thinkpad x121e, and my commute is from Southampton Central to Fleet (changing at Winchester) - just under an hour. 3G coverage is not 100%, but tends to be better around the major stops.

  • First, I found a bug. I used udd.debian.org to browse for a relatively simple RC bug, and found bug #674992 in actionaz. The fix was outlined in the report already, so there was very little thinking required.
  • Next, I confirmed the FTBFS using cowbuilder. Unfortunately, this required downloading roughly 120MB of dependencies - I have 1GB of data per month, but I couldn't afford to do this every day. I was lucky in that I was near Basingstoke at the time, so had a good HSDPA signal to get the bulk of this. The build had failed before I reached Fleet.
  • In the background, I updated debian/control and debian/changelog with the fix. I was able to set off the build, but had to suspend the laptop until lunchtime before it could finish. Cowbuilder needed to download only a few extra build-deps, as the vast majority were cached from the initial run.
  • On the train home, I checked over the result, signed it and uploaded. In this instance, the built package was small enough to upload, but I could see this being a problem with others.
  • Finally, I sent the nmudiff, although that was delayed briefly by a drop in connectivity before Southampton Airport.

Thoughts: firstly, part of me is amazed that this is possible. Secondly, there could be a case for a local Debian mirror on my laptop. Otherwise, an interesting experimental extension to UDD would be "Required bandwidth" - the sum of the recursive build-dependencies plus the upload size of the diff/binaries.

Posted: 06 Jun 2012 19:19 | Tags: ,

Tue, 08 May 2012

Engaged!

Following on from the weekend of change, I've got engaged to Kate. :)

We now need to organise a combined housewarming/engagement party...

Posted: 08 May 2012 20:52 | Tags:

Copyright © 2007-2012 Tim Retout