Archive for the ‘Greenstone3’ Category

Sam’s Greenstone Blog 2/12/2011

admin. Friday, December 2nd, 2011.

This week I have been tidying up the new paged-image functionality so that it dynamically loads each page (rather than doing a full page reload each time) and also added the functionality that allows the user to choose from “Text view” (which only shows the OCR’d text), “Image view” (which shows the original image) and “Default view” (which shows both the text and the image). These are also switched dynamically which is nice and are remembered if you leave a document page and go to a new one.

I also fixed up an annoying problem with GLI. One of the ways you can customise collections in Greenstone 3 is by writing Javascript in the collectionConfig.xml file and those familiar with XML will know that you cannot put ‘&’, ‘<‘ or ‘>’ into text nodes (you have to replace them with &amp;, &lt; and &gt; respectively). These special characters a relatively common in Javascript so each time they are used they have to be escaped. The problem we were having with GLI was that it would read in the file and replace the characters with their usual forms (&, < and >) and when it went to save the file it wouldn’t escape these characters. So the next time this file was read in GLI would produce an error because the file was no longer valid XML. We eventually tracked this problem down and fixed it.

Next week I will continue to work on the paged-image functionality (specifically the “next page” and “previous page” buttons) as well as adding some new code to HTMLPlugin that will add any files referred to in CSS files (e.g. background-image) as associated files of the HTML page.

Sam’s Greenstone Blog 26/11/2011

admin. Saturday, November 26th, 2011.

This week has mostly been spent improving Greenstone 3’s capability to display paged documents. This has mostly involved upgrading the table of contents functionality to better handle documents with a lot of pages and also have names like “Page 1”, “Page 2”, “Page 3” etc. making them virtually indistinguishable by their names. In this case it would be much better if images of the pages were displayed. Fortunately many of these collections will already have these thumbnails available so these will now be displayed in the table of contents instead of their names. Simply replacing the names with images however results in two more problems. The first is that a lot of images take up a lot of space on the page, and the second problem is that it greatly increases the amount that the user has to download from the server for each page. Even though a single black and white thumbnail is likely only to be around 10KB in size, having a thousand of these (which is not unrealistic), or if the images are color then they can quickly add up in size.

To solve both of these problems I decided that a good option would be to create a box in the table of contents that only shows a few pages at a time and can be scrolled from right to left to go through the images of the pages. As well as saving space, this approach also has the added benefit that images do not need to be loaded until they are visible within the box (i.e. they have been scrolled over). So I have implemented it so that images are loaded dynamically as necessary.

I have also added a new feature to Greenstone 3 that may prove useful in improving some of the interactions that happen between XSLT and Javascript. One thing I have been needing to do a reasonable amount recently is take parts of pages and add them to other pages. Our current method for doing this is to get the page we want and to “cut” the detail we want out of it. To hopefully smooth out this interaction I have added the ability for XSL templates to be specified in the CGI arguments given to the page. This allows Javascript AJAX calls to single out the exact part of the page they want or even create new information, all in a single AJAX call.

Sam’s Greenstone Blog 18/11/2011

admin. Friday, November 18th, 2011.

This week has mostly been focused on bug fixing. One bug we discovered a while ago was that the code that highlights search terms in the text would also find occurrences of the terms inside tags (e.g. it would find the word farming in <a href=”farming.html”>farming</a>). The fix was to exclude the characters inside these tags from being considered by the highlight searching code by looking for the < character and ignoring all characters until we see a > character. You may be thinking “But what if there is a < in the document text?”, the answer is that this isn’t an issue as the document text will not contain any of these characters that don’t belong to tags as they will be escaped as &lt; and &gt;.

Another bug I fixed was to do with the Document Structure Editor. The bug was that it always wiped the contents of any images in the collection that was being built, leaving empty files, but the XML files were being preserved fine. The main bug was caused by the index directory not being deleted correctly. This was because the server still had the collection loaded in the runtime system (so that it can be viewed) while it tried to delete its index. So it required that the collection be briefly deactivated in the runtime system so that this replacement (the newly built index replacing the old one) could take place.

Another problem was with displaying paged-image collections. The system would only ever show the root level section and the top level sections and no sections lower than that. I tracked this down to the top levels sections being marked as “leaf” nodes instead of “internal” nodes. Whether this is a bug or whether this has been done deliberately I will try and figure out next week.

Also next week I will do some work on enabling a basic form of spatial searching (searching by locations) in any collections that contain documents with latitude and longitude information.

Sam’s Greenstone Blog 11/11/2011

admin. Friday, November 11th, 2011.

This week I have been working on a different area of Greenstone 3 for a change. We noticed that one area that was lacking in Greenstone 3 was the ability to display paged-image collections. For those of you who are not aware, a paged-image collection is a collection of (usually) scanned documents that consist of both the original images and the OCRed text. A good example collection in Greenstone 2 is the Māori Niupepa Collection. At the moment there seems to be multiple issues preventing a collections like this from working correctly in Greenstone 3. As usual we will also be taking this opportunity explore any upgrades for these features as we implement them for Greenstone 3. One particular area that we discussed was around the way that the document could be navigated, we intend to make it easier to scroll through pages. But I’ll go into more detail once I start implementing it and have a better idea of what works well.

I have also fixed more minor bugs in the Document Editor and have also added the ability to modify the text of documents. The next feature in development is the ability to add/remove/modify metadata. We still need to decide on what is the best way to approach this issue as it has the potential to be quite complicated, but once we decide on that it should not take very long to implement and I have already done a lot of the client-side work for it.

Sam’s Greenstone Blog 4/11/2011

admin. Friday, November 4th, 2011.

My work on the Document Structure Editor is on the back-burner at the moment (although still progressing well) as I have been designing a prototype collection that integrates a map-view into the various parts of Greenstone, to display the spacial information present in the collection. At this point I am modifying the Tipple Paradise Garden collection, which is a test collection created by the developers of Tipple (Tourist Information Provider Digital Library). It is particularly useful as each document in the collection has a latitude and longitude value associated with it.

So, using the Google Maps API I have inserted a map into the browsing, searching and document pages. The map contains markers, marking the locations of the documents contained on that page and the markers can be clicked on to take you to the corresponding document. A information bubble moves from marker to marker displaying the names of each document (this is to avoid having all the names displayed at once, potentially creating a lot of clutter on the map). Next to the usual document links is another link that can be used to focus a single document on the map (centring it).

Next week I’ll be back to my Document Structure Editor work, where I will be trying to figure out why the Seamless Web Editor Javascript isn’t behaving as expected. Assuming I get it working I will be able to add text editing to the interface.

Sam’s Greenstone Blog 17/10/2011

admin. Monday, October 17th, 2011.

Progress on the Document Structure Editor (the name is still undecided) is going well. It now actually makes the changes and then builds the collection, which results in the changes actually showing up in the documents, which is quite satisfying to see!

The building process takes a reasonable amount of time (especially if multiple collections need to be built) so we needed a way to inform the user of what is currently happening on the server. We originally had the code to trigger the collection building on the server, as it made sense to build the collections straight after the archive files had been modified (which is essentially what this system does). This approach hit a road-block however as it has difficulty if multiple collections are to be built sequentially and we want to be able to inform the user of what’s happening on the web interface. Basically once each build is complete the collection must be activated (the building -> index step you may know about if you’ve ever build a collection using the command line rather than GLI) and these things became very tricky to order correctly without requiring a lot more code. So we decided to make the process simpler and move the code that decides when and what collections to build to the client-side.

This week I will be continuing to work on this system, most likely focusing on editing metadata or document text.

Sam’s Greenstone Blog 7/10/2011

admin. Friday, October 7th, 2011.

This week work has continued on the Document Basket/Document Maker/Document Structure Editor (we’re still deciding on the final name). The move and duplicate operations have been implemented and are successfully being mirrored on the server, so it is now very easy to move sections around and duplicate them. Unfortunately these changes do not yet show up in the regular interface as the collection will need to be built after each save and this has yet to be implemented.

I have also been able to get the undo functionality up and running. So now you will be able to undo all your operations up until the point where you choose to save. The client-side interface keeps a list of all the transactions you have made (moves, duplicate, create, delete etc.) and is able to undo any operations you make on the client by removing them from the list of transactions (and updating the interface), once you choose to save however the list of transactions is sent to the server to be executed so undoing is no longer as simple to implement. It may be implemented some time in the future but at this stage it would take more time than we can spare for something that is not essential.

We have also begun implementing a way to modify metadata (such as document/section title, author or subject metadata) as part of the system. Allowing a way to modify metadata more directly instead of having to use GLI, which collection designers may find quite useful. At the moment it is only working on the client-side and we have yet to connect it back to the server.

Next week I will continue to connect the missing operations to the server and add collection building as part of the save feature so the changes can be viewed.

Sam’s Greenstone Blog 3/10/2011

admin. Monday, October 3rd, 2011.

Those who are eagerly awaiting the release of the final version of 2.85 will not have to wait much longer. Anu has been working hard testing it on each of the platforms we support and for the most part things are looking good. Any assistance in testing is always greatly appreciated and if you would like to help us out then please download the 2.85 release candidate which is available here. If you find any problems then join the mailing list to email us at <greenstone-users @ list.waikato.ac.nz> and let us know. The more you can tell us about the issue the better.

Work on the Document Basket functionality continues to go well. I am in the initial stages of connecting the front-end Javascript to the Java back-end. To transmit the operations we are using JSON (rather than XML) as it is a very simple to write in Javascript and we have found a good Java library (gson) that converts JSON back into an object. So hopefully this week we will start seeing some promising results.

Sam’s Greenstone Blog 26/9/2011

admin. Monday, September 26th, 2011.

Hello again, sorry about the big delay between posts, things have been pretty busy here recently and remembering to write this often slips my mind.

We’ve released our first release candidate for 2.85 so please try it out and let us know if there are any issues. You can find it at http://www.greenstone.org/snapshots.

The front-end for the Document Basket (formerly the Document Maker) is looking really good now, sections can be added, moved around, duplicated and removed. The text of each section can also be easily edited thanks to Brook Novak’s Seaweed (Seamless Web Editing) technology which was developed here at the University of Waikato. For those of you who haven’t seen this yet, it basically allows you to click text on a web page and start editing it right there without the need for any complicated text boxes/buttons etc. Very cool.

We have yet to connect the front-end Document Basket interface to the back-end yet and we are still working on features such as undo, so that is what I will be working on this for this week.

Anu’s entry for the month of Aug 2011

ak19. Friday, August 26th, 2011.

It’s been about 4 weeks since I wrote an entry. In the meantime we’ve been tidying up the last of the To Do list items for the upcoming GS2 release and several of the To Do list items for the GS3 release. Sam is now hard working on the GS3 interface alongside his other work on the Document Maker. It now looks like GS3 may be released separately, after GS2.

Some of the more involved things that required doing were:

  • testing OAI (dc.Resource Identifier issues) and downloading over OAI
  • The extracted embedded metadata, ex.*.metadata (e.g. ex.dc.* prefixes), needed to be handled different from ex.metadata. This required some changes in various files and a lot of testing.
  • Conflicts between EmbeddedMetadataPlugin and some of the existing Plugins in the pipeline (OAI, DSpace, PDF plugins). Fortunately, Dr Bainbridge came up with fixes. After some testing, the known problems with these plugins no longer exist. With the tutorials we will continue to investigate how well other plugins interact with the EmbeddedMetaPlugin.
  • The OAI validator at openarchives now had a test where GS2’s OAI server failed and a different one where the GS3 OAI server failed. These have been fixed up.
  • The GS3 installer needed to have an admin page, like the GS2 installer does, where the user can enable admin pages and provide a password.
  • wvware.pl is a new intermediary script to launch wvware in its own particular environment. This script is necessary in order for wvware’s required environment not to be set globally (thereby tampering with Linux’ windowing/GUI libraries)
  • At the moment, after John Rose’s request, we’re in the process of merging the two server configuration files (glisite.cfg and llssite.cfg), so we can have just one, with some properties qualified by a “gli” prefix. The Server.jar code, the GS2 C++ code, the startup scripts and config files have been sufficiently modified to work with the work-in-progress on the GLI code, while still working with the stable GLI. Changing the GLI code was tricky two years ago, and made the code’s behaviour rather  complex. Now that I’m in the process of testing the latest overhaul to it, the changes I’ve just made to what was stable are still very buggy and reproducing the bugs takes some time. Fortunately, without the changes to the GLI code, everything else committed is able to work as accurately as before, which is fortunate since if I break anything, it will be just the LocalLibraryServer.java GLI code that once committed needs to be reverted.
  • The above task has now been completely resolved, and changes committed after being tested thoroughly on both Windows and Linux.

Minor issues also kept popping up over the last month.

  • There was a Z3950 “issue”that sidetracked me and which turned out not to be an issue after all: The Library of Congress’ Z3950 address seems to return SRU data. The fix is simply for the user to use the right module of the download pane.
  • A bug in starting and stopping GS3 via GLI on windows
  • One Greenstone member encountered a unicode issue that I wasn’t able to reproduce after initial investigations.
  • Minor but frustrating bugs with the GLI for GS3 have been resolved (an extra nested <format/> tag appearing when all format statements have been removed, and the preview button activating itself when editing format statements in an unbuilt GS3 collection)
  • Fixed GS3’s way of handling the port in the GSI application, so that it is no longer arbitrarily modified. The Do Not Modify port is still available.
  • Some requests on the mailing list like porting indexed databases from one GS2 version to the next, since changes had been made to the name of an ex.metadata