Bifid - Parallel corpus alignment
July 29, 2024: We updated the server and everything looks fine
Last week we did some long awaited maintenance service of the hardware hosting this
website. Everything went smoothly and we haven't encountered any bugs so far. Anyway, if you happen to see something off, please drop a line to rogelio dot nazar at pucv dot cl. Cheers!
Bifid is a program that takes a set of documents with their translations
and performs different functions:
- It separates the set of documents in the two languages
- It aligns every document with their translation
- It aligns the sentences in each pair of documents
- It extracts a bilingual vocabulary from the aligned sentences
- It export results in csv and tmx formats
- It imports tmx documents, in case you already have your corpus
aligned at the sentence level
and what you want is to obtain a bilingual vocabulary.
- The bilingual vocabulary includes multi-word expressions.
Give it a try:
Here you have a nice little parallel corpus in English
and Spanish extracted from
Revista Chilena de Neuropsiquiatría.
Download the zip file and upload it again
to your account.
You can also upload a tmx file if you have it already,
and in this way bypass the document and sentence alignment.
Here is an example file from
Opus corpus:
emea.tmx.zip (warning: this is a large file
and it takes time to process).
Lastly, if you want to try with a different pair of languages, here is
subset of the Canadian Hansards, with English and French.
Bifid has been online in one way or another since 2004 (yes, it's going to be 20 years now).
Lately, its server had gone down and it was neglected.
But here it is, again, restored to its former glory.
Some (old) publications on the project:
Nazar, R. (2011). Parallel corpus alignment at the document, sentence and vocabulary levels.
Procesamiento del Lenguaje Natural, n. 47.
Nazar, R. (2012). Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario.
Linguamatica, vol. 4, no. 2.
If you have questions, feel free to send email: rogelio dot nazar at pucv dot cl