Sam’s Greenstone Blog 20/6/2011
admin. Monday, June 20th, 2011One of the things we are working on at the moment in Greenstone is 64-bit compatibility (especially on Linux and Mac). Last week while David was away at JCDL I decided to take a deeper look at the issue.
At the most basic level there are two problems. The first is that, in the C and C++ programming languages there exists a data type known as a “long” and it is usually used to store a long integer (a large whole number). On a 32-bit Linux or Mac machine a long is represented using 32 bits (32 ones or zeros), but on a 64-bit Linux or Mac machine a long is represented using 64 bits. This may seem harmless enough, but unfortunately some of our older code has been written assuming that a long is 32 bits and this can stop the code compiling successfully or (in the worst case) cause some fairly hard-to-track-down crashes.
The second problem is fairly similar. Pointers (which are basically big numbers that represent memory addresses) are the same as longs in that they are 32 bits long on a 32-bit machine and 64 bits long on a 64-bit machine. This is only ever an issue when we try and convert a pointer to a number. For example, in some places we try to convert a pointer to an int (32 bit integer on both 32-bit and 64-bit) which doesn’t even compile on 64-bit machines.
There are 3 different ways that we have tried to solve this problem. The first method (the one that we currently use) is to enable an option during compilation that compiles the code as if it were on a 32-bit computer. This method works but has a massive downside that, in order for our code libraries to work together we need to compile most of the rest of the code as if it is 32-bit as well.
The second method is to change all occurrences of long to a data type that is always 32 bits. This effectively makes it exactly the same as the 32-bit version but without the downside of the first method. The downside of making the program run as if it is 32-bit is that you lose the advantages of 64-bit (such as a bigger memory space).
The third method is to only change the areas that cause problems on 64-bit and to leave the rest of code as it is. This gives us the advantages of 64-bit as well as letting the rest of our code also remain 64-bit. The only downside is the potential to miss areas that are problematic and that might only show up at a later stage.
So last week I spent a lot of time debugging the various errors I encountered trying to implement the third method and for the most part it has been successful.