indexer.xsl

Indexer: creates indices for everything under the body.

Builds indices on the basis of the subcorpus, file type, location in the tree and (redundantly) the tag name. The index structure is as follows (linearly):

  1. subcorpus identifier as ISO-639-3 (or a sequence thereof) cut out from the path.
  2. underline
  3. file type (source text, kind of annotation, etc.), using the classification from the main header (soon)
  4. underline
  5. position among the siblings, for each subtree below <body>
  6. dash
  7. tag name

<body> is indexed separately - its index skips parts 4 and 5 from the above sequence.

TODO: possibly, make it skip the entire file upon text/@rend="noindex" or the individual node and its children when */@rend="noindex" is encountered.

There might need to be two versions, one with indenting, and one with indent set to "no"... (compare the segmentation files); alternatively, indenting may be switched off for good.

Distributor: Open-Content Text Corpus (http://OCTC.sourceforge.net/)

Author:
Piotr Bański
Copyright:
the author(s), 2010; license: GPL v3 or any later version (http://www.gnu.org/licenses/gpl.html).
SVN Id:
$Id: indexer.xsl 426 2010-12-19 02:46:42Z bansp $
XSLT Version:
2.0
Namespace Prefix Summary:
f - func
xd - http://www.pnp-software.com/XSLTdoc
xi - http://www.w3.org/2001/XInclude
xs - http://www.w3.org/2001/XMLSchema
xsl - http://www.w3.org/1999/XSL/Transform
XPath Default Namespace:
http://www.tei-c.org/ns/1.0

Outputs Summary

No short description available

Variables Summary

xs:string* excepted_tags - source
Not going to implement this right now, but it may hold (bare) names of non-indexed elements (gap, hi, q?); for now, I list them positively, in the match template
item()+ file_types - source
A lookup-table for file types; it will be externalized at some point and only referenced from here
item() index_root - source
There is no need to count from the top of the tree, <body> is enough
xs:integer index_root_depth - source
No short description available
xs:string iso_id - source
Note that this assumes that you only index stuff under lg/ or align/
xs:string my_fname - source
check the file name (and type) of the file operated on; recall that dash MUST ONLY separate file modifiers

Match Templates Summary

Don't copy the old xml:id, if it is in the subtree we are interested in (otherwise the new one would be overwritten)
Index the body as well
Go through the designated elements and (re)index them

Functions Summary

xs:string f:calc_pos (param: item() my_node) - source
Create the dot string for the element's position in the tree rooted in <body>
xs:string f:create_index (param: item() node) - source
No short description available

Outputs Detail

No short description available
Attributes
encoding
UTF-8
indent
no
method
xml

Variables Detail

xs:string* excepted_tags - source
Not going to implement this right now, but it may hold (bare) names of non-indexed elements (gap, hi, q?); for now, I list them positively, in the match template
item()+ file_types - source
A lookup-table for file types; it will be externalized at some point and only referenced from here
Never use dashes here because they separate modifiers from file names ("morph-2.xml", etc.).
item() index_root - source
There is no need to count from the top of the tree, <body> is enough
Note that, somewhat kludgily, <body> is explicitly indexed below as well.
xs:integer index_root_depth - source
No short description available
xs:string iso_id - source
Note that this assumes that you only index stuff under lg/ or align/
xs:string my_fname - source
check the file name (and type) of the file operated on; recall that dash MUST ONLY separate file modifiers
Additional assumption: files will end in '.xml'.

Match Templates Detail

Don't copy the old xml:id, if it is in the subtree we are interested in (otherwise the new one would be overwritten)
Index the body as well
.. this actually circumvents the $index_root and should be done in the general template (above) by matching against that variable, but I don't want to spuriously check N times for whether something happens (not) to be the $index_root.
Go through the designated elements and (re)index them

Functions Detail

xs:string f:calc_pos (param: item() my_node) - source
Create the dot string for the element's position in the tree rooted in <body>
Parameters:
item() my_node -
xs:string f:create_index (param: item() node) - source
No short description available
Parameters:
item() node -