seL4 clock magic

I have been looking at seL4 some more recently, and had a small patch merged today to remove a legacy Python module from a helper script. (I was trying to run the script on a system without that module installed, and it was almost easier to patch it out.)

However, the more I think about this code and how it’s used, the more it seems wrong on at least five other levels.

The patch itself is quite uninteresting; this script was importing the past module (part of future?) to use the xrange function. Python 2 used to have separate xrange and range functions, where range returned a list in memory while xrange generated an iterator. Because this seL4 script is iterating over a large range of values, it’s important the list is not generated in-memory. But Python 3 removed the xrange function and just has range return an object, so it’s trivial to avoid the module import.

Having thought carefully some more about the specific line, there’s surely an off-by-one error in it - range iterates over 0 to n-1, so this line shouldn’t be subtracting one if it’s looking to test all 32-bit values:

    for i in range(2**32-1):

But then again, this is being used for a ‘sanity check’ of a magic bit shift algorithm that speeds up division operations to convert CPU ticks to microseconds on 32-bit arm platforms. Surely if the algorithm’s good, it shouldn’t be necessary to validate it exhaustively against every possible 32-bit value?

Also, 32 bits isn’t enough, because this is 64-bit division. include/api/types.h shows that ticks_t is always a uint64_t, so if this were a proof by exhaustion it should run to 2**64 (though that would take infeasibly long).

As discussed in issue #1352, lots of people have been running this code with the wrong divisor anyway. But because the bit shift path is only used on 32-bit platforms, it’s not clear to me that there’s even any point in specifying CLK_SHIFT/MAGIC on platforms which are 64-bit only (e.g. the tx2 port).

And to follow this rabbit hole to the very end, in comments on PR #1435 and issue #1509 it’s clear that the future of this code is to remove it, as it’s 1. unnecessarily clever (on 64-bit platforms the equivalent code just uses a division, so performance can’t be that important), and 2. the entire concept of converting to microseconds breaks the seL4 principle of not abstracting away details of the hardware.

So this has left me unclear on whether my small patch was a good thing or not, but I certainly learnt something about this corner of seL4 timer handling. And I’ve ordered a copy of “Hacker’s Delight” on the recommendation of a code comment.