Sam’s Greenstone Blog 26/11/2011

admin. Saturday, November 26th, 2011

This week has mostly been spent improving Greenstone 3’s capability to display paged documents. This has mostly involved upgrading the table of contents functionality to better handle documents with a lot of pages and also have names like “Page 1”, “Page 2”, “Page 3” etc. making them virtually indistinguishable by their names. In this case it would be much better if images of the pages were displayed. Fortunately many of these collections will already have these thumbnails available so these will now be displayed in the table of contents instead of their names. Simply replacing the names with images however results in two more problems. The first is that a lot of images take up a lot of space on the page, and the second problem is that it greatly increases the amount that the user has to download from the server for each page. Even though a single black and white thumbnail is likely only to be around 10KB in size, having a thousand of these (which is not unrealistic), or if the images are color then they can quickly add up in size.

To solve both of these problems I decided that a good option would be to create a box in the table of contents that only shows a few pages at a time and can be scrolled from right to left to go through the images of the pages. As well as saving space, this approach also has the added benefit that images do not need to be loaded until they are visible within the box (i.e. they have been scrolled over). So I have implemented it so that images are loaded dynamically as necessary.

I have also added a new feature to Greenstone 3 that may prove useful in improving some of the interactions that happen between XSLT and Javascript. One thing I have been needing to do a reasonable amount recently is take parts of pages and add them to other pages. Our current method for doing this is to get the page we want and to “cut” the detail we want out of it. To hopefully smooth out this interaction I have added the ability for XSL templates to be specified in the CGI arguments given to the page. This allows Javascript AJAX calls to single out the exact part of the page they want or even create new information, all in a single AJAX call.

Sam’s Greenstone Blog 18/11/2011

admin. Friday, November 18th, 2011

This week has mostly been focused on bug fixing. One bug we discovered a while ago was that the code that highlights search terms in the text would also find occurrences of the terms inside tags (e.g. it would find the word farming in <a href=”farming.html”>farming</a>). The fix was to exclude the characters inside these tags from being considered by the highlight searching code by looking for the < character and ignoring all characters until we see a > character. You may be thinking “But what if there is a < in the document text?”, the answer is that this isn’t an issue as the document text will not contain any of these characters that don’t belong to tags as they will be escaped as &lt; and &gt;.

Another bug I fixed was to do with the Document Structure Editor. The bug was that it always wiped the contents of any images in the collection that was being built, leaving empty files, but the XML files were being preserved fine. The main bug was caused by the index directory not being deleted correctly. This was because the server still had the collection loaded in the runtime system (so that it can be viewed) while it tried to delete its index. So it required that the collection be briefly deactivated in the runtime system so that this replacement (the newly built index replacing the old one) could take place.

Another problem was with displaying paged-image collections. The system would only ever show the root level section and the top level sections and no sections lower than that. I tracked this down to the top levels sections being marked as “leaf” nodes instead of “internal” nodes. Whether this is a bug or whether this has been done deliberately I will try and figure out next week.

Also next week I will do some work on enabling a basic form of spatial searching (searching by locations) in any collections that contain documents with latitude and longitude information.

Anu’s blog entry for the week ending 11 Nov 2011

ak19. Friday, November 11th, 2011

As several people had encountered issues in the recent 2.85 release, a lot of this week was spent looking at them so that we can get 2.86 out as soon as possible.

The bugs and oversights are not fatal and work-arounds are possible:

1) If you don’t have the PDF-box extension for Greenstone installed already, GLI will suggest where it can be obtained from. However, the URL it provides points to an olderversion of the PDF-box extension, which happens to be one that’s not functional. If you want the version of PDF-Box that works with 2.85, get it from

http://trac.greenstone.org/browser/main/tags/2.85/gs2-extensions/pdf-box/trunk/pdf-box-java.tar.gz

or http://trac.greenstone.org/browser/main/tags/2.85/gs2-extensions/pdf-box/trunk/pdf-box-java.zip

2)  The Greenstone demo collection in 2.85 contains HTML files that can’t get converted into XML properly enough to work well with the flash file generated by the Realistic Book feature. So if you’re thinking of testing out the realistic book option of the HTMLPlugin against the HTML files included in the Greenstone demo collection, rather than against your own HTML files, get the improved demo collection from SVN at http://svn.greenstone.org/main/trunk/greenstone2/collect/demo

3) On Vista, if your Greenstone is installed in a path containing brackets, such as “Program Files (x86)” as can happen on Windows 7 machines, then launching Greenstone is likely to fail. On Windows, spaces in Greenstone’s installation path are okay, but brackets aren’t handled well-enough yet. This will be fixed in a future release of Greenstone 2.

4) The fourth bug is more serious in that there is no work-around. It was found by a member on the mailing list when he was using the Datelist Classifier and discovered that references to [ex.srclink] or [srclink] in his Format statements did not get resolved to the URL of the source file. (However, the default browsing classifiers had no problem with such Format statements and would display the correct URL.) This has now been fixed by Dr Bainbridge and will be present in the next release of Greenstone.

5) Another discovery made is that Ubuntu now seems to have a problem with the open-office extension. This was not  the case some two months back when, after a bugfix, the extension was tested on the Ubuntu both here and by another dedicated member of the Greenstone family on his own Ubuntu. However, the new problem has been confirmed to now exist, including when run from the commandline, and even older versions of the Greenstone extension are performing similarly despite having worked at one point. Perhaps this has something to do with updates on the Ubuntu, but we’ll be investigating it further.

Sam’s Greenstone Blog 11/11/2011

admin. Friday, November 11th, 2011

This week I have been working on a different area of Greenstone 3 for a change. We noticed that one area that was lacking in Greenstone 3 was the ability to display paged-image collections. For those of you who are not aware, a paged-image collection is a collection of (usually) scanned documents that consist of both the original images and the OCRed text. A good example collection in Greenstone 2 is the Māori Niupepa Collection. At the moment there seems to be multiple issues preventing a collections like this from working correctly in Greenstone 3. As usual we will also be taking this opportunity explore any upgrades for these features as we implement them for Greenstone 3. One particular area that we discussed was around the way that the document could be navigated, we intend to make it easier to scroll through pages. But I’ll go into more detail once I start implementing it and have a better idea of what works well.

I have also fixed more minor bugs in the Document Editor and have also added the ability to modify the text of documents. The next feature in development is the ability to add/remove/modify metadata. We still need to decide on what is the best way to approach this issue as it has the potential to be quite complicated, but once we decide on that it should not take very long to implement and I have already done a lot of the client-side work for it.

Official Greenstone 2.85 released!

ak19. Friday, November 4th, 2011

At last, we did it. After a lot of testing, bug discovery and fixing, we’ve finally released Greenstone 2.85. It should be much improved from 2.84. There were also some last minute changes from release candidate version 2.

Please do grab a binary for your operating system by visiting the download page at http://www.greenstone.org/download and start using it!

The Release Notes can be found at http://wiki.greenstone.org/wiki/index.php/2.85_Release_Notes

Sam’s Greenstone Blog 4/11/2011

admin. Friday, November 4th, 2011

My work on the Document Structure Editor is on the back-burner at the moment (although still progressing well) as I have been designing a prototype collection that integrates a map-view into the various parts of Greenstone, to display the spacial information present in the collection. At this point I am modifying the Tipple Paradise Garden collection, which is a test collection created by the developers of Tipple (Tourist Information Provider Digital Library). It is particularly useful as each document in the collection has a latitude and longitude value associated with it.

So, using the Google Maps API I have inserted a map into the browsing, searching and document pages. The map contains markers, marking the locations of the documents contained on that page and the markers can be clicked on to take you to the corresponding document. A information bubble moves from marker to marker displaying the names of each document (this is to avoid having all the names displayed at once, potentially creating a lot of clutter on the map). Next to the usual document links is another link that can be used to focus a single document on the map (centring it).

Next week I’ll be back to my Document Structure Editor work, where I will be trying to figure out why the Seamless Web Editor Javascript isn’t behaving as expected. Assuming I get it working I will be able to add text editing to the interface.

Greenstone 2.85rc2 (release candidate 2) released

ak19. Friday, October 28th, 2011

There was a lot of testing going on in the last 2 months, and I forgot all about writing blog entries.

The first stage of testing was to go through the Greenstone tutorials on Windows (Vista), Linux (Ubuntu) and Mac (Leopard). Some bugs were discovered and fixed, and after that RC1 of GS2.85 could be released.

Thereafter, further tests were conducted on all three OS: testing out combinations of the 3 indexers and 3 database types, processing of a range of file types including the use of Greenstone’s PDFBox and OpenOffice extensions, filenames with different encodings and HTML files that interlink with each other using different encodings, the remote Greenstone server and the GLI applet were tested out, as well as spaces in the filepath for Windows. This time, the tests were conducted on Windows XP, Linux CentOS as well as Mac Leopard again. A lot of bugs had still got through the net after the first stage of testing, but were caught this time around and fixed for the release of GS2.85 RC2.

Greenstone 2.85 RC2 was finally released on Wednesday 26 October 2011. The Greenstone Team invites all those interested to please test the new release binaries out, which can be obtained from http://www.greenstone.org/snapshots, and write back on any bugs or issues encountered. The updated release notes are at http://wiki.greenstone.org/wiki/index.php/2.85_Release_Notes

The release notes already contain instructions on a patch for a minor issue that Diego discovered in the earlier release and which had persisted into the current one.

Sam’s Greenstone Blog 17/10/2011

admin. Monday, October 17th, 2011

Progress on the Document Structure Editor (the name is still undecided) is going well. It now actually makes the changes and then builds the collection, which results in the changes actually showing up in the documents, which is quite satisfying to see!

The building process takes a reasonable amount of time (especially if multiple collections need to be built) so we needed a way to inform the user of what is currently happening on the server. We originally had the code to trigger the collection building on the server, as it made sense to build the collections straight after the archive files had been modified (which is essentially what this system does). This approach hit a road-block however as it has difficulty if multiple collections are to be built sequentially and we want to be able to inform the user of what’s happening on the web interface. Basically once each build is complete the collection must be activated (the building -> index step you may know about if you’ve ever build a collection using the command line rather than GLI) and these things became very tricky to order correctly without requiring a lot more code. So we decided to make the process simpler and move the code that decides when and what collections to build to the client-side.

This week I will be continuing to work on this system, most likely focusing on editing metadata or document text.

Sam’s Greenstone Blog 7/10/2011

admin. Friday, October 7th, 2011

This week work has continued on the Document Basket/Document Maker/Document Structure Editor (we’re still deciding on the final name). The move and duplicate operations have been implemented and are successfully being mirrored on the server, so it is now very easy to move sections around and duplicate them. Unfortunately these changes do not yet show up in the regular interface as the collection will need to be built after each save and this has yet to be implemented.

I have also been able to get the undo functionality up and running. So now you will be able to undo all your operations up until the point where you choose to save. The client-side interface keeps a list of all the transactions you have made (moves, duplicate, create, delete etc.) and is able to undo any operations you make on the client by removing them from the list of transactions (and updating the interface), once you choose to save however the list of transactions is sent to the server to be executed so undoing is no longer as simple to implement. It may be implemented some time in the future but at this stage it would take more time than we can spare for something that is not essential.

We have also begun implementing a way to modify metadata (such as document/section title, author or subject metadata) as part of the system. Allowing a way to modify metadata more directly instead of having to use GLI, which collection designers may find quite useful. At the moment it is only working on the client-side and we have yet to connect it back to the server.

Next week I will continue to connect the missing operations to the server and add collection building as part of the save feature so the changes can be viewed.

Sam’s Greenstone Blog 3/10/2011

admin. Monday, October 3rd, 2011

Those who are eagerly awaiting the release of the final version of 2.85 will not have to wait much longer. Anu has been working hard testing it on each of the platforms we support and for the most part things are looking good. Any assistance in testing is always greatly appreciated and if you would like to help us out then please download the 2.85 release candidate which is available here. If you find any problems then join the mailing list to email us at <greenstone-users @ list.waikato.ac.nz> and let us know. The more you can tell us about the issue the better.

Work on the Document Basket functionality continues to go well. I am in the initial stages of connecting the front-end Javascript to the Java back-end. To transmit the operations we are using JSON (rather than XML) as it is a very simple to write in Javascript and we have found a good Java library (gson) that converts JSON back into an object. So hopefully this week we will start seeing some promising results.