Multilingual texts alignment consists in identifying correspondences between different text units, e.g., words, sentences, paragraphs, etc. in parallel texts.
The main approach of alignment evaluation is to compare a system-computed alignment with a manually produced reference alignment, usually called a gold standard. Different tasks have been defined in previous evaluation exercises such as Blinker, ARCADE, HLT-NAACL and ACL.
Alignment evaluations were generally performed by using traditional IR measures:
AER (Och and Ney, 2000), Alignment Error Rate, derived from F-measure
ARCADE II Evaluation package
Data from HLT-NAACL 2003 workshop on parallel texts (English, Romanian, French)
The Bible, parallel biblical texts available in several languages, among which Chinese, Danish, English, French, Greek, Swahili.
The MULTEXT corpora (English, French, German, Italian and Spanish) and MULTEXT-East corpora (English, Bulgarian, Czech, Estonian, Hungarian, Romanian and Slovenian).
The ARCADE/ROMENSEVAL multilingual corpora (English, French, German, Italian, Spanish, Arabic, Chinese, Japanese, Greek, Persian, Russian)
Data from ACL 2005 workshop on Building and Using Parallel Texts (English, Inukitut, Romanian and Hindi).