indexer.xsl

Imports

identity_transform.xsl

Indexer: creates indices for everything under the body.

Builds indices on the basis of the subcorpus, file type, location in the tree and (redundantly) the tag name. The index structure is as follows (linearly):

subcorpus identifier as ISO-639-3 (or a sequence thereof) cut out from the path.
underline
file type (source text, kind of annotation, etc.), using the classification from the main header (soon)
underline
position among the siblings, for each subtree below <body>
dash
tag name

<body> is indexed separately - its index skips parts 4 and 5 from the above sequence.

TODO: possibly, make it skip the entire file upon text/@rend="noindex" or the individual node and its children when */@rend="noindex" is encountered.

There might need to be two versions, one with indenting, and one with indent set to "no"... (compare the segmentation files); alternatively, indenting may be switched off for good.

Distributor: Open-Content Text Corpus (http://OCTC.sourceforge.net/)

Author:

Piotr Bański

the author(s), 2010; license: GPL v3 or any later version (http://www.gnu.org/licenses/gpl.html).

SVN Id:

$Id: indexer.xsl 426 2010-12-19 02:46:42Z bansp $

XSLT Version:

2.0

Namespace Prefix Summary:

f - func

xd - http://www.pnp-software.com/XSLTdoc

xi - http://www.w3.org/2001/XInclude

xs - http://www.w3.org/2001/XMLSchema

xsl - http://www.w3.org/1999/XSL/Transform

XPath Default Namespace:

http://www.tei-c.org/ns/1.0

Outputs Summary

#default - source

No short description available

Variables Summary

xs:string* excepted_tags - source

Not going to implement this right now, but it may hold (bare) names of non-indexed elements (gap, hi, q?); for now, I list them positively, in the match template

item()+ file_types - source

A lookup-table for file types; it will be externalized at some point and only referenced from here

item() index_root - source

There is no need to count from the top of the tree, <body> is enough

xs:integer index_root_depth - source

No short description available

xs:string iso_id - source

Note that this assumes that you only index stuff under lg/ or align/

xs:string my_fname - source

check the file name (and type) of the file operated on; recall that dash MUST ONLY separate file modifiers

Match Templates Summary

@xml:id - source

Don't copy the old xml:id, if it is in the subtree we are interested in (otherwise the new one would be overwritten)

body - source

Index the body as well

div | head | p | ab | item | list | hi | q | linkGrp | ptr | seg | s - source

Go through the designated elements and (re)index them

Functions Summary

xs:string f:calc_pos (param: item() my_node) - source

Create the dot string for the element's position in the tree rooted in <body>

xs:string f:create_index (param: item() node) - source

No short description available

Outputs Detail

#default - source

No short description available

Attributes

encoding

UTF-8

indent

method

xml

Variables Detail

xs:string* excepted_tags - source

Not going to implement this right now, but it may hold (bare) names of non-indexed elements (gap, hi, q?); for now, I list them positively, in the match template

item()+ file_types - source

A lookup-table for file types; it will be externalized at some point and only referenced from here

Never use dashes here because they separate modifiers from file names ("morph-2.xml", etc.).

item() index_root - source

There is no need to count from the top of the tree, <body> is enough

Note that, somewhat kludgily, <body> is explicitly indexed below as well.

xs:integer index_root_depth - source

No short description available

xs:string iso_id - source

Note that this assumes that you only index stuff under lg/ or align/

xs:string my_fname - source

check the file name (and type) of the file operated on; recall that dash MUST ONLY separate file modifiers

Additional assumption: files will end in '.xml'.

Match Templates Detail

@xml:id - source

Don't copy the old xml:id, if it is in the subtree we are interested in (otherwise the new one would be overwritten)

body - source

Index the body as well

.. this actually circumvents the $index_root and should be done in the general template (above) by matching against that variable, but I don't want to spuriously check N times for whether something happens (not) to be the $index_root.

div | head | p | ab | item | list | hi | q | linkGrp | ptr | seg | s - source

Go through the designated elements and (re)index them

Functions Detail

xs:string f:calc_pos (param: item() my_node) - source

Create the dot string for the element's position in the tree rooted in <body>

Parameters:

item() my_node -

xs:string f:create_index (param: item() node) - source

No short description available

Parameters:

item() node -