Tecling logo » The universe is not perfect, but it's working on it.      ABOUT RESEARCH SOLUTIONS SOFTWARE CONTACT
Technologies for Linguistic Analysis
Bifid: Parallel corpus alignment at the document, sentence and vocabulary levels
Logo Bifid

Bifid is a program for parallel corpora alignment:

Web demo: http://www.bifidalign.com/

July 29, 2024: We updated the server and everything looks fine

Last week we did some long awaited maintenance service of the hardware hosting this website. Everything went smoothly and we haven't encountered any bugs so far. Anyway, if you happen to see something off, please drop a line to rogelio dot nazar at pucv dot cl. Cheers!

Bifid is a program that takes a set of documents with their translations
and performs different functions:
  1. It separates the set of documents in the two languages
  2. It aligns every document with their translation
  3. It aligns the sentences in each pair of documents
  4. It extracts a bilingual vocabulary from the aligned sentences
  5. It export results in csv and tmx formats
  6. It imports tmx documents, in case you already have your corpus
    aligned at the sentence level and what you want is to obtain a bilingual vocabulary.
  7. The bilingual vocabulary includes multi-word expressions.

Give it a try:
Here you have a nice little parallel corpus in English
and Spanish extracted from
Revista Chilena de Neuropsiquiatría.
Download the zip file and upload it again to your account.

You can also upload a tmx file if you have it already,
and in this way bypass the document and sentence alignment.
Here is an example file from Opus corpus:
emea.tmx.zip (warning: this is a large file
and it takes time to process).
Lastly, if you want to try with a different pair of languages, here is
subset of the Canadian Hansards, with English and French.

Bifid has been online in one way or another since 2004 (yes, it's going to be 20 years now).
Lately, its server had gone down and it was neglected.
But here it is, again, restored to its former glory.
Some (old) publications on the project:
Nazar, R. (2011). Parallel corpus alignment at the document, sentence and vocabulary levels.
Procesamiento del Lenguaje Natural, n. 47.

Nazar, R. (2012). Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario.
Linguamatica, vol. 4, no. 2.


If you have questions, feel free to send email: rogelio dot nazar at pucv dot cl
Error while reading file.

References:
Nazar, R. (2011). "Parallel corpus alignment at the document, sentence and vocabulary levels". Procesamiento del Lenguaje Natural, n. 47.
Nazar, R. (2012). "Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario". Linguamatica, vol. 4, no. 2.

Contact: rogelio.nazar at gmail.com
Related concepts: Parallel Corpus Alignment, Bilingual Vocabulary Extraction, Machine Translation, Computational Linguistics, Computational Lexicography