From: Diego Spano
Date: Wed Nov 5 08:52:42 2008
Subject: [greenstone-users] Archives - Index/assoc: why both?

Hi list,

it is well known that every document in import folder generates a folder in
archives after the import process. Then, building process copies all source
files to index/assoc, so we have duplicated disk space needed to host all
files.

I have a collection with almost 700.000 tiff files, all imported with
Pagedimgplug. This collection is not a static one, every couple of days we
add new documents, so we have two options:

1- Use Lucene and incremental building: this sounds interesting but we have
many problems with parsing doc.xml files, accents and many other things.

2- Use MGPP: it works great, we have all the features we need but
incremental indexing is not possible. So every few days we have to reindex
all again, and again... This approach consumes a reasonable time to generate
indexes, but it spends a lot of time copying 700.000 files from archives to
building/assoc, and deleting the old index folder with the other 700.000
files.

The questions are:

a- Is it possible to link to source files directly from archives folder?.
This will result in saving a lot of time because copying files form archives
to assoc is no more necessary. I remember that someone asked for something
like this, but I can□t find the mail in the email archives collection. I
think that builcol.pl must be modified to work this way. Is there anybody
out there that can do it?.

b- Is it possible to add an option to future releases where the user can
choose weather buildcol with source docs in place (in archives folders) or
not?.

TIA.

Diego Spano
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://list.scms.waikato.ac.nz/mailman/private/greenstone-users/attachments/20081104/4caa78aa/attachment-0001.html