octc2tmx.xsl
This script creates TMX translation memory documents out of Open-Content Text Corpus align.xml files.
It should probably cache the content of the <text> of the monolingual documents, to reduce file access. Not sure whether it's worth it.
TO DO: make it dive into divs that do not fully partition the given linkGrp; differentiate between 1/many:many misalignments and 1/many:0 misalignments perhaps (what if there are more than two languages involved though?). Make sure that the type of alignment (e.g. paragraph, sentence, etc., can be retrieved from the align.xml files for the purpose of creating the appropriate properties in the TMX.
Distributor: Open-Content Text Corpus (http://OCTC.sourceforge.net/). Please report problems in our Trac instance.
The documentation of the input files is provided in the OCTC wiki.
Note that the official namespace for TMX is "http://www.lisa.org/tmx14", but I haven't seen it used even once, so the OCTC tools do not support it, for now. Please let us know if you encounter problems related to the (non-)use of this namespace.