Tim Retout's www presence

Sat, 17 Jan 2015

CPAN PR Challenge - January - IO-Digest

I signed up to the CPAN Pull Request Challenge - apparently I'm entrant 170 of a few hundred.

My assigned dist for January was IO-Digest - this seems a fairly stable module. To get the ball rolling, I fixed the README, but this was somehow unsatisfying. :)

To follow-up, I added Travis-CI support, with a view to validating the other open pull request - but that one looks likely to be a platform-specific problem.

Then I extended the Travis file to generate coverage reports, and separately realised the docs weren't quite fully complete, so fixed this and added a test.

Two of these have already been merged by the author, who was very responsive.

Part of me worries that Github is a centralized, proprietary platform that we now trust most of our software source code to. But activities such as this are surely a good thing - how much harder would it be to co-ordinate 300 volunteers to submit patches in a distributed fashion? I suppose you could do something similar with the list of Debian source packages and metadata about the upstream VCS, say...

Posted: 17 Jan 2015 22:01 | Tags:

Fri, 25 Jul 2014

London.pm's July 2014 tech meeting

Last night, I went to the London.pm tech meeting, along with a couple of colleagues from CV-Library. The talks, combined with the unusually hot weather we're having in the UK at the moment, combined with my holiday all last week, make it feel like I'm at a software conference. :)

The highlight for me was Thomas Klausner's talk about OX (and AngularJS). We bought him a drink at the pub later to pump him for information about using Bread::Board, with some success. It was worth the long, late commute back to Southampton.

All very enjoyable, and I hope they have more technical meetings soon. I'm planning to attend the London Perl Workshop later in the year.

Posted: 25 Jul 2014 08:36 | Tags: ,

Mon, 02 Dec 2013

How not to parse search queries

While I remember, I have uploaded the slides from my talk about Solr and Perl at the London Perl Workshop.

This talk was inspired by having seen and contributed to at least five different sets of Solr search code at my current job, all of which (I now believe) were doing it wrong. I distilled this hard-won knowledge into a 20 minute talk, which - funny story - I actually delivered twice to work around a cock-up in the printed schedule. I don't believe any video was successfully taken, but I may be proved wrong later.

I have also uploaded the Parse::Yapp grammar mentioned in the talk.

In case you don't have time to read the slides, the right way to present Solr via Perl is to use the 'edismax' parser, and write your code a bit like this:

my $solr = WebService::Solr->new($url);
my $s = $query->param('q');

# WebService::Solr::Query objects are useful for
# 'fq' params, but avoid them for main 'q' param.
my $options = {
 fq => [WebService::Solr::Query->new(...)];
};

$solr->search($s, \%options);

The key thing here is not to put any complicated parsing code in between the user and Solr. Avoid Search::QueryParser at all costs.

Posted: 02 Dec 2013 22:58 | Tags: , ,

Questhub.io

At the London Perl Workshop last Saturday, one of the lightning talks was about Questhub.io, formerly known as "play-perl.org".

It's social gamification for your task list, or something like that. Buzzword-tastic! But most importantly, there seems to be a nice community of programming types to procrastinate with you on your quests. This means I can finally get to work refuting lamby's prediction about gamification of Debian development!

Tasks are referred to as "Quests", and are pursued in themed "Realms", for that World of Warcraft feeling. For example, there's a "Perl" realm, and a "Lisp" realm, and a "Haskell" realm, but also non-programming realms like "Fitness" and "Japanese".

Of course, part of me now wants to construct a federated version which can be self-hosted. :) Another downside of questhub currently is the lack of SSL support - your session cookies are sent in plain text. I hope this changes soon.

Posted: 02 Dec 2013 21:55 | Tags: , ,

Fri, 21 Dec 2012

Perl Forking, Reference Counting and Copy-on-Write

I have been dealing with an interesting forking issue at work. It happens to involve Perl, but don't let that put you off.

So, suppose you need to perform an I/O-bound task that is eminently parallelizable (in our case, generating and sending lots of emails). You have learnt from previous such attempts, and broken out Parallel::Iterator from CPAN to give you easy fork()ing goodness. Forking can be very memory-efficient, at least under the Linux kernel, because pages are shared between the parent and the children via a copy-on-write system.

Further suppose that you want to generate and share a large data structure between the children, so that you can iterate over it. Copy-on-write pages, should be cheap, right?

my $large_array_ref = get_data();

my $iter = iterate( sub {
    my $i = $_[1];
    my $element = $large_array_ref->[$i];

    ...
}, [0..1000000] );

Sadly, when you run your program, it gobbles up memory until the OOM killer steps in.

Our first problem was that the system malloc implementation was less good for this particular task than Perl's built-in malloc. Not a problem, we were using perlbrew anyway, so a quick few experimental rebuilds later and this was solved.

More interesting was the slow, 60MB/s leak that we saw after that. There were no circular references, and everything was going out of scope at the end of the function, so what was happening?

Recall that Perl uses reference counting to track memory allocation. In the children, because we took a reference to an element of the large shared data structure, we were effectively writing to the relevant page in memory, so it would get copied. Over time, as we iterated through the entire structure, the children would end up copying almost every page! This would double our memory costs. (We confirmed the diagnosis using 'smem', incidentally. Very useful.)

The copy-on-write semantics of fork() do not play well with reference-counted interpreted languages such as Perl or CPython. Apparently a similar issue occurs with some mark-and-sweep garbage-collection implementations - but Ruby 2.0 is reputed to be COW-friendly.

All was not lost, however - we just needed to avoid taking any references! Implement a deep copy that does not involve saving any intermediate variables along the way. This can be a bit long-winded, but it works.

my $large_array_ref = get_data();

my $iter = iterate( sub {
    my $i = $_[1];
    my %clone;

    $clone{id}  = $large_array_ref->[$i]{id};
    $clone{foo} = $large_array_ref->[$i]{foo};
    ...
}, [0..1000000] );

This could be improved if we wrote an XS CPAN module that cloned data structures without incrementing any reference counts - I presume this is possible. We tried the most common deep-copy modules from CPAN, but have not yet found one that avoids reference counting.

This same problem almost certainly shows up when using the Apache prefork MPM and mod_perl - even read-only global variables can become unshared.

I would be very interested to learn of any other approaches people have found to solve this sort of problem - do email me.

Posted: 21 Dec 2012 22:38 | Tags:

Thu, 19 Jan 2012

Perl tutorial searches revisited

So since my last post about perl tutorials, the Perl Tutorial Hub has leaped from page 2 to be the top result for the relevant Google search. The Leeds tutorial has dropped off the first page.

I couldn't figure out how such a dramatic reversal could have happened, until I asked Mithaldu on IRC; the admins of the old Leeds tutorial have added a (delayed) redirect. So, Google has interpreted that as a 302 status, and given perl-tutorial.org all the old inbound links, presumably.

Perhaps there is hope for Perl yet. :)

Posted: 19 Jan 2012 23:32 | Tags: ,

Mon, 09 Jan 2012

Perl Tutorial

Hello, World!

Last year, a bit of a fuss was kicked up in the Perl community about the low quality of search results for the phrase "Perl tutorial". Various ideas for fixing this were proposed, including the handy Perl tutorial hub, but kicking Leeds University off the coveted top spot is going to be a real challenge.

The problem is, most Perl tutorials on the internet were written for Perl 4; modern Perl doesn't get a look-in. It's a miracle anyone manages to learn Perl at all...

While thinking over this problem, I was reading Mithaldu's original criteria for the "content creation" option. "Community effort"... "github repo"... "exported to HTML regularly"... if only Perl had some central site where you can publish documentation... that all Perl hackers can access and update... like CPAN.

So although my documentation-writing skills are pretty weak, I proudly introduce the Perl-Tutorial CPAN dist and github repository. The great thing about writing Perl documentation using POD is that you can link to other CPAN references so easily - as the basics get filled out, they can guide the user towards how to learn more about each topic. Everyone who's anyone knows how to send a pull request on github, and there seems to be far more of a community feel to CPAN these days.

Version 0.001 is just "Hello, World!" - but watch this space. :)

Posted: 09 Jan 2012 20:41 | Tags: ,

Sat, 11 Sep 2010

Debian Perl talk

Today I went to HantsLUG at IBM Hursley.

I delivered a talk on the Debian Perl team aimed at end users, which was well received - I got a head start by getting people in #debian-perl to review the slides beforehand, which was very helpful. I'm told there will be a video uploaded in a month or so.

I also plugged SmoothWall Express on Debian to some new people, and there was interest. My most recent discovery is that I probably need to extend netcfg in the debian installer to allow configuring more than one network interface.

Posted: 11 Sep 2010 19:07 | Tags: , , , ,

Thu, 14 Jan 2010

Net-NationalRail-LiveDepartureBoards

On Tuesday, I released version 0.02 of Net::NationalRail::LiveDepartureBoards to CPAN. So far, no one has complained. This module is probably of interest only to people in the UK; it looks up which trains are next to arrive/depart from a particular station.

This release was prompted by a patch sent to me by Ian Dash, implementing a filtering feature I was too lazy to write myself. If someone wants to put a fancy GNOME applet around it, I'd be grateful. ;) I think the next step is to add a nicer OO interface.

My original reason for writing the module was to advertise this SOAP API that ATOC publishes - it could easily be wrapped in languages other than Perl. That particular URL was found by inspecting the official widget for Windows Vista.

Posted: 14 Jan 2010 19:41 | Tags: , ,

Thu, 07 Jan 2010

Hudson and Devel::Cover

I wrote a plugin for Hudson today, which integrates Devel::Cover (Perl's test coverage tool) into the build reports.

Actually, that's currently an exaggeration. All it does is add a checkbox in the configure page, and a link to Devel::Cover's reports on the build page when it's enabled. I spent the day remembering how to program in Java.

Tomorrow I might be in a position to extend it into something more attractive - I'll publish it very soon, but I need to run it past my employer. Watch this space.

In other news, I volunteered to package Hudson for Debian, and then discovered just how many dependencies it has. This should keep me going until squeeze+1, I think.

Posted: 07 Jan 2010 00:44 | Tags: , , ,

Sat, 14 Nov 2009

RC bug roundup

On Wednesday, I fixed #551228 in libgstreamer-perl - from the bug log, it looked like it would be an intriguing parallel-build problem, but I reckon it was just a faulty test.

Next I applied a patch from the upstream bug tracker for #544894 in libtk-filedialog-perl, which was fine; but then we noticed that there was no explicit copyright notice in the source, so it hasn't been uploaded yet. The code is from 1996, so we would request removal from Debian if it weren't for 'horae' depending on it. Hmm...

Then yesterday I found the time to test #520406 in libdbd-mysql-perl, which I remember being open as long ago as DebConf. It turns out it's dead easy - there's already a test case and a patch upstream, which was applied a while ago, so the bug is already fixed in squeeze. Today I prepared an updated package to fix this in lenny.

So, I'm still falling short of one RC bug a day, and pkg-perl has more RC bugs open than at the start of the week. :( The weekend's not over yet, though.

Posted: 14 Nov 2009 15:42 | Tags: , ,

Sun, 25 Oct 2009

Transaction Scope Guards

I've been writing some Perl DBI code which involves some fairly involved error handling; I've been looking for a way to roll back transactions neatly when certain errors happen.

I very nearly reinvented the concept of a 'transaction scope guard' which I now find is implemented in DBIx::Class (with Scope::Guard implementing a more general version). A lexical variable can be used to detect in which cases a transaction should be ended, because the object it points to will get DESTROYed when it goes out of scope. Some rough code to illustrate the concept is below.

# $fallback controls whether we use nested transactions,
# which is slower but lets us commit all the other lines in the batch.
sub process_batch {
    my $fallback = shift;

    # This catches any exception handling that makes us leave the function.
    # The DESTROY method of $transaction contains $db->rollback()
    my $transaction = $db->txn_scope_guard();

    for (1..2000) {
        # File access error will get thrown to outside of function,
        # so the transaction will be rolled back.
        my $line = get_next_line();
        last unless $line;

        my $parsed;
        eval {$parsed = parse_line($line)};
        if ($@) {
            # Handle line parsing errors here without interfering with
            # DB error handling.
            warn "Could not parse $line\n";
            next;
        }

        # here's the actual db code
        eval {
            $db->savepoint() if $fallback;
            $db->process_one_line($parsed);
        }
        if ($@) {
            if ($fallback) {
                # Just roll back to savepoint.  Transaction continues.
                $db->rollback_to_savepoint();
            } else {
                # Propagate error outside of function, ending transaction.
                die $@;
            }
        }
    }

    $transaction->commit();
}

The advantage is that rollback code can be kept in one place, and not repeated all over the various error handling cases.

I had actually gone off this idea, because I couldn't see any documentation defining exactly when the DESTROY method gets called in Perl. But given it's got into DBIx::Class, it must be fine! I also prefer their API over what I was considering implementing; the transaction object that gets returned will handle only commit(), not any other database calls.

Posted: 25 Oct 2009 01:11 | Tags: ,

Sun, 04 Oct 2009

Code Reuse

At work, I have been refactoring old Perl code. Part of me feels that this was tangential to the main aims of the project I've been assigned, but another part of me can list all the bugs I've found/fixed and the advantages in terms of maintainability, so on balance I think it was a good idea.

Something I like even more than tidying code is reducing the amount of code required. (I've been doing a lot of that as well.) Breaking code into reusable modules is the essence of what I'm trying to achieve - later projects within the company (and perhaps beyond) should not have to reinvent what I'm writing. I'm attempting to replace boilerplate code with existing modules from CPAN where I can - I'm fortunate in that this seems to be possible licensing-wise within our product.

Now, there is some code which loops over a list of plugins, and forks off a background process to deal with each of them. Which CPAN module should I use? There are a few which look promising (like the classic Proc::Daemon), but force the parent process to exit. Not good for a loop. There's one which does dodgy hacks with parsing the output of 'ps', rather than using 'kill 0 => $pid' like every other module. Some handle pidfiles (which I want, provided I can customize the name, but what if I didn't?), and some don't (which is fine, I'll just use one of the separate pidfile modules). But why do CPAN modules reimplement the pidfile handling themselves anyway?

One of the best approaches I've seen so far is MooseX::Daemonize. Moose's role system looks like it would be very useful, but I'm a bit hesitant to introduce Moose to the company just yet. Maybe I should.

My point is, there is all this code, but subtle problems are preventing its reuse. I see the same thing going on at work; there are some utility libraries, but they contain large numbers of mostly unrelated functions. Developers often do not spot code which might be useful later, so they are left as a subroutine in a script. Ironically, I am reimplementing these functions in a more modular fashion, and hopefully this will catch on.

Posted: 04 Oct 2009 23:27 | Tags:

Wed, 09 Sep 2009

Unit testing

I spent the last day and a half writing a vaguely interesting Perl module for testing some code which gives a subtly different answer each time (i.e. incorporates data from time() and /dev/urandom) and has side effects (i.e. writes to the file system).

By overriding Perl's built-in 'open' function, it is possible to prefix each filename with the location of a temporary directory, effectively emulating chroot(). I also replaced Perl's time() with one that always returned the same answer. This meant that the login code I was testing would return a reliable result.

You have to be careful with prototypes. Spot the difference:

my $result = gmtime(time+$seconds);
my $result = gmtime(time()+$seconds);

Without adding a prototype to the new time() function, these will give different answers. I now have to go back to work tomorrow and close a bug I mistakenly filed. :)

I'm hoping to finish off my evil hacky overriding module and release it to CPAN. I want to add some routines to set up and tear down temporary chroot directories. Obviously there are some limitations to my approach; I'm not currently handling relative paths very well, and system() calls will not be "chrooted". But it should be quite handy and reusable in any case.

Posted: 09 Sep 2009 19:16 | Tags: , ,

Thu, 08 Jan 2009

O RAILLY

I am not having a good year.

Traditionally, when annoyed, I make extravagant purchases that I may or may not regret later. In this new economic climate, however, I have found a substitute outlet.

Arriving soon at a CPAN mirror near you: Net::NationalRail::LiveDepartureBoards 0.01 - an interface to a SOAP API from ATOC. Given a station code, you can obtain the next few arrivals/departures/both.

This is in hacky Perl, but the module should be easy to translate to other languages which have SOAP libraries.

Posted: 08 Jan 2009 00:00 | Tags: , , ,

Sat, 01 Nov 2008

That time of year again

I tend to update DateTime::Event::WarwickUniversity at around this time each year, according to the changelog. Version 0.05 will appear on CPAN with the next update. My testcases still pass, at least.

Posted: 01 Nov 2008 00:00 | Tags: , , ,

Thu, 22 Nov 2007

More CPAN uploads

Following my update on Monday, I've made changes to the build systems of both DateTime::Calendar::WarwickUniversity and DateTime::Event::WarwickUniversity, in my search for higher kwalitee. These are not important updates, they just add a few more tests, and so on.

Posted: 22 Nov 2007 00:00 | Tags: , , , , ,

Mon, 19 Nov 2007

DateTime::Event::WarwickUniversity version 0.02

Warwick University appear to have changed some of their future term dates, so I have released version 0.02 of DateTime::Event::WarwickUniversity to CPAN.

This release also fixes bugs which were happening when using DateTime objects with time zones, so everyone should probably upgrade.

Overall, I'm surprised that it took me a year before I had an excuse for a new release. It would be worth adding the ability to get a real date from a given term week, but I haven't quite needed it yet.

Posted: 19 Nov 2007 00:00 | Tags: , , , , ,

Contact

Tim Retout tim@retout.co.uk
JabberID: tim@retout.co.uk

Comments

I'm afraid I have turned off comments for this blog, because of all the spam. Let's face it, I didn't read them anyway. Feel free to email me.

Me Elsewhere

Copyright © 2007-2014 Tim Retout