indexer.xsl
Imports
Indexer: creates indices for everything under the body.
Builds indices on the basis of the subcorpus, file type, location in the tree and (redundantly) the tag name. The index structure is as follows (linearly):
- subcorpus identifier as ISO-639-3 (or a sequence thereof) cut out from the path.
- underline
- file type (source text, kind of annotation, etc.), using the classification from the main header (soon)
- underline
- position among the siblings, for each subtree below <body>
- dash
- tag name
<body> is indexed separately - its index skips parts 4 and 5 from the above sequence.
TODO: possibly, make it skip the entire file upon text/@rend="noindex" or the individual node and its children when */@rend="noindex" is encountered.
There might need to be two versions, one with indenting, and one with indent set to "no"... (compare the segmentation files); alternatively, indenting may be switched off for good.
Distributor: Open-Content Text Corpus (http://OCTC.sourceforge.net/)
Author:
Piotr Bański
Copyright:
the author(s), 2010; license: GPL v3 or any later version (http://www.gnu.org/licenses/gpl.html).
SVN Id:
$Id: indexer.xsl 426 2010-12-19 02:46:42Z bansp $
XSLT Version:
2.0
Namespace Prefix Summary:
f - func
xd - http://www.pnp-software.com/XSLTdoc
xi - http://www.w3.org/2001/XInclude
xs - http://www.w3.org/2001/XMLSchema
xsl - http://www.w3.org/1999/XSL/Transform
XPath Default Namespace:
http://www.tei-c.org/ns/1.0
Variables Summary
Not going to implement this right now, but it may hold (bare) names of non-indexed elements (gap, hi, q?); for now, I list them positively, in the match template
A lookup-table for file types; it will be externalized at some point and only referenced from here
There is no need to count from the top of the tree, <body> is enough
No short description available
Match Templates Summary
Don't copy the old xml:id, if it is in the subtree we are interested in
(otherwise the new one would be overwritten)
Go through the designated elements and (re)index them
Functions Summary
Create the dot string for the element's position in the tree rooted in <body>
No short description available
Outputs Detail
Variables Detail
Not going to implement this right now, but it may hold (bare) names of non-indexed elements (gap, hi, q?); for now, I list them positively, in the match template
A lookup-table for file types; it will be externalized at some point and only referenced from here
Never use dashes here because they separate modifiers from file names ("morph-2.xml", etc.).
There is no need to count from the top of the tree, <body> is enough
Note
that, somewhat kludgily, <body> is explicitly indexed below as well.
No short description available
Match Templates Detail
Don't copy the old xml:id, if it is in the subtree we are interested in
(otherwise the new one would be overwritten)
Index the body as well
.. this actually circumvents the $index_root and should
be done in the general template (above) by matching against that variable, but I don't
want to spuriously check N times for whether something happens (not) to be the $index_root.
Go through the designated elements and (re)index them
Functions Detail
Create the dot string for the element's position in the tree rooted in <body>
Parameters:
item() my_node -
No short description available
Parameters:
item() node -