Anu’s blog entry for 5 March – 23 March
The first two weeks involved:
- generating some files for translation of the Greenstone interface (Mongolian, Bhutanese) and committing changes translators had submitted (Laotian)
- fixing up the GS2 CORBA code, including bringing it up to speed with the rest of GS2’s runtime code, so that CORBA works again: it can now compile once more, and the corbaserver and corbarecptldd client program run well against each other when on the same machine. Running the server against the client in a remote situation does not yet work, but it did not work in the demo/hello-1 example of the now-updated MICO package either.
- there was still a small error in the way the PDFBox extension tests for Java when Java is version 1.7 that made the extension not work with JDK1.7. The test for the presence of Java now has to run java -version rather than just java, since the return value in Java 7 is different from that in Java 6.
- when testing the Powerpoint plugin, it was found that the OpenOffice extension needed to be corrected to make jodconverter use the same port as that which OpenOffice is run on. It was moreover discovered that users can’t already have the graphical user interface of OO running in the background, nor can they start this, during Greenstone’s processing of documents using the OO extension.
This week:
- there was some issue with Greenstone 3’s tomcat server crashing on 64 bit Linux owing to a Java segmentation fault created by an error in the JNI code. Dr Bainbridge found out that the number of bytes to store pointers to data structures shared between Java and C++ code needed to be long rather than int, so MG’s and MGPP’s JNI code was updated. The error has not returned since, but debugging code has been left in for future debugging if required.
- Dr Te Taka hoped to update the Maori translations for Greenstone’s interface using Google’s Translator Toolkit (GTT), and suggested that Greenstone’s translation process be expanded to allow this so that other translators too could benefit from the toolkit for translation if they wanted. He found out that the toolkit accepted an open-XML format called TMX, Translation Memory eXchange, and thus would need the strings that required translation to be converted into the TMX XML format (rather than into the usual spreadsheets versions of the .excel.xml format which we currently generate). Two new XSLT files have been written which Te Taka may kindly be testing for us: the first generates the TMX translation files that translators can load into Google’s Translator Toolkit. The second XSLT takes translated TMX files and converts them into an intermediary format that can be processed in the usual manner when submitting new and updated translations back into Greenstone.
- currently looking at usersDB in GS3 having the correct values on startup.
Update: did not get much further with the GS3 usersDB as there was a lot more to be done with the translation files for GTT and their processing. The process became clearer thanks to Te Taka’s explanations and his testing at each stage. TMX files will only be needed the first time a translator migrates from GS’s usual translation procedure, which makes use of excel spreadsheet files, to Google’s toolkit. The TMX file will start them up with all the up-to-date translated strings that are available so far in GS3 for the selected language. For the strings that need to be translated and updated, the translator will get a text file that contains the unicode spreadsheet data (as comma separated values, but the file will have a .txt extension instead of .csv in order to preserve the unicode). The translator will then copy the English and <Language> columns of the spreadsheet into the GTT. Once their translation work is done, they can send these same columns back by way of the same spreadsheet.