mc-dir.pl is a tool for converting a directory with xml documents into one xml file suitable for Tamino massload utility inoxmld    --  version 1.0 by   Jan Harmsen    08-January-2002

mc-dir.pl is no official Software AG product, please read mc-dir.pl for further information

mc-dir.pl has been tested in conjunction with Tamino 2.3.1 / Tamino 3.1 on SuSE Linux / Win2K.


Watch out:
The Tamino massloader inoxmld consumes approximately 10 times the space of the raw xml data as temporary working space !!
If your index is large, even more space is needed.
To massload 200MB of XML data you will need at least 2 GB of temporary working space !!


technical information about mc-dir.pl


  1. short description

  2. usage instructions

  3. technical architecture: what mc-dir.pl will do

  4. inoxmld massload performance



  1. short description     back to top

    mc-dir.pl is a script to convert a directory of xml documents into one big xml file which can be loaded directly into Tamino with Tamino massload utility inoxmld

    name:

    mc-dir.pl (massload conversion of a directory)

    input parameters:

    mandatory:
    name of a directory with XML documents e.g. c:/mc-directory
    optional:
    -nodocname to prevent usage of XML filename for attribute ino:docname

    output:

    XML output file (for use with Tamino massload utility)     + log file

    usage:

    perl mc-dir.pl c:\mc-directory

    The files in the input directory must be well-formed xml documents, i.e. they must have the following format:

    <?xml version="1.0" encoding=.....>
    <root_element>
    ..... content ....
    </root_element>

    Any Doctype declaration will be removed automatically by mc-dir.pl because the massload utility inoxmld requires this. Any Processing Instructions (PIs) or comments appearing BEFORE the root element are kept.



  2. usage instructions     back to top

    To test mc-dir.pl simply run test-mc.bat, this will run mc-dir.pl on the test-directory ./mc-directory

    To use mc-dir.pl



  3. technical architecture: what mc-dir.pl will do    back to top



  4. inoxmld massload performance    back to top

    The time needed to upload xml documents with inoxmld depends mainly on the complexity of the schema and of the size and number of xml documents.
    Here some test results for Tamino 2.3.1.4 on a WinNT workstation, Pentium 4 with 1.7 GHz and 512 MB RAM (the size of the Tamino TSD2 schema was 2.5 MB, for loading 10 GB of xml data 20 hours were needed):

    number of docs filesize in bytes time in seconds avg. docsize in kB loadtime ms/doc load kB/s

    schema 1:
    7419 233.039.371 1283 31 172 177
    2547 332.300.218 1739 127 682 186
    5925 336.734.739 1572 55 265 209
    8500 206.796.668 1108 24 130 182

    schema 2:
    7419 233.039.371 821 31 111 277
    2547 332.300.218 1138 127 447 285