Bifid: Parallel corpus alignment at the document, sentence and vocabulary levels![]()
Bifid is a program for parallel corpora alignment:
July 29, 2024: We updated the server and everything looks fineLast week we did some long awaited maintenance service of the hardware hosting this website. Everything went smoothly and we haven't encountered any bugs so far. Anyway, if you happen to see something off, please drop a line to rogelio dot nazar at pucv dot cl. Cheers!Bifid is a program that takes a set of documents with their translations and performs different functions:
Give it a try: Here you have a nice little parallel corpus in English and Spanish extracted from Revista Chilena de Neuropsiquiatría. Download the zip file and upload it again to your account. You can also upload a tmx file if you have it already, and in this way bypass the document and sentence alignment. Here is an example file from Opus corpus: emea.tmx.zip (warning: this is a large file and it takes time to process). Lastly, if you want to try with a different pair of languages, here is subset of the Canadian Hansards, with English and French. Bifid has been online in one way or another since 2004 (yes, it's going to be 20 years now). Lately, its server had gone down and it was neglected. But here it is, again, restored to its former glory. Some (old) publications on the project: Nazar, R. (2011). Parallel corpus alignment at the document, sentence and vocabulary levels. Procesamiento del Lenguaje Natural, n. 47. Nazar, R. (2012). Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario. Linguamatica, vol. 4, no. 2. If you have questions, feel free to send email: rogelio dot nazar at pucv dot cl Error while reading file. References: Nazar, R. (2011). "Parallel corpus alignment at the document, sentence and vocabulary levels". Procesamiento del Lenguaje Natural, n. 47. Nazar, R. (2012). "Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario". Linguamatica, vol. 4, no. 2. Contact: rogelio.nazar at gmail.com Related concepts: Parallel Corpus Alignment, Bilingual Vocabulary Extraction, Machine Translation, Computational Linguistics, Computational Lexicography |