April 7, 2021
Five of our students get ANID scholarships: two of them with the best score in Chile

Ana Castro, Benjamín López Hidalgo, Javier Obreque, Natalie Mies and Maximiliano Ramírez studied linguistics with us at PUCV.cl and are members of the research group Tecling. Now, ANID.cl just granted them a prestigious scholarship to continue with postgraduate studies. Two of these students, Ana and Javier, share the first place in the national ranking, from a total of 2129 people who applied for this benefit. Only 223 scholarships were granted in total, which speaks for how fierce the competition has been. But the best part is that, according to the letter they received, the score is in a scale from 0 to 5, but they anyway assigned Ana and Javier 5.042 points!!!

Update on April 10: they were five, not four.

February 18, 2021
Dismark: first results of Project Fondecyt 1191481

The very first results of our research project Dismark (Fondecyt Regular 1191481) are already available: a taxonomy of discourse markers (DMs) in English, Spanish, German and French, and an algorithm to classify new DMs. Documentation is on the making, but you can already browse the categories and try the classifier. It's pretty cool:
But beware: these are raw results. We have not yet started to revise them. They may contain more than a few mistakes.

February 15, 2021
Irene Renau interviewed in El Mercurio

Our dear Irene Renau is again the star of the local press! In this interview with ElMercurio.com she discusses the hot topic of writing in the digital media.

January 23, 2021
Readeutsch: a reading pacer for parallel corpora

So, we need to learn German as quickly as possible. How do we do it? Well, one idea would be to read some literary pieces alongside with their German translations. Seems like a good idea. Why don't you try it and tell us if it works?

UPDATE (January 27, 2021): we added new versions. One for the Spanish-German pair:
and another for French-Spanish:

January 16, 2021
Linguini: our new language detector

Happy times! Linguini is here! We just created this program, which will detect the main languague of a text and then fragments written in other languages. It's pretty cool.

This is the view from where we are located, in the Sausalito lagoon, a quiet and lovely place in Viña del Mar, Chile. Sunny days. Birds can be seen in the center of the lagoon (click to enlarge).

As researchers, we are currently affiliated to:
Pontificia Universidad Católica de Valparaíso
Instituto de Literatura y Ciencias del Lenguaje

Av. El Bosque 1290, Viña del Mar, Chile

Upcoming Events

Around April 10, 2021 (with new delay): we are planing to launch text·a·gram, a comprehensive tool for Spanish text analysis, which will replace the current versions of our Deixis tool, as well as Modal and Marzopo. It will also include new functionalities such as reference detection and some degree of anaphora resolution. It will be the main tool in our Spanish Text grammar courses.

Around May, 2021: we had to postpone our plans to launch the new version of Bifid. But we will do it! It will include the new functions we have been working on: text segmentation (Segismund) and language detection (Linguini).

Latest ideas & research projects

We are developing new projects in computational linguistics and natural language processing:

+ Fondecyt Regular (2019-2021): "Polisemia regular de los sustantivos del español: análisis semiautomático de corpus, caracterización y tipología" (Regular polysemy of nouns in Spanish: semiautomatic analysis of corpus, characterization and tipology). Lead researcher: Irene Renau. Ref.: 1191204.

+ Fondecyt Regular (2019-2021): "Inducción automática de taxonomías de marcadores discursivos a partir de corpus multilingües" (Automatic induction of taxonomies of discourse markers from multilingual corpora). Lead researcher: Rogelio Nazar. Ref.: 1191481.

+ Ecos-Sud (International Project between Chile and France): "Inducción automática de taxonomías del español y el francés mediante técnicas cuantitativas y estadística de corpus". Lead researcher: Irene Renau. Ref.: C16H02.

+ Fondecyt Regular: "Desarrollo de la competencia terminológica a lo largo de la inserción disciplinar". Lead Researcher: Sabela Fernández. Co-researcher: Rogelio Nazar. Ref.: 11121597.

Recent publications

+ Nazar, R.; Balvet, A.; Ferraro, G.; Marín, R.; Renau, I. (2020). "Pruning and repopulating a lexical taxonomy: experiments in Spanish, English and French". Journal of Intelligent Systems, vol. 30, num. 1, pp. 376-394. PDF

+ Nazar, R.; Renau, I., Acosta, N., Robledo, H., Soliman, H., Zamora, S. (2020). "Corpus-Based Methods for Recognizing the Gender of Anthroponyms". Names: A Journal of Onomastics.

+Asenjo, S.; Nazar, R. (2020). "Marcadores discursivos en niños de 7 años con trastorno específico del lenguaje: estudio descriptivo". RLA. Revista de lingüística teórica y aplicada, vol. 58 núm 1, pp. 93-114. PDF.

+ Nazar, R.; Obreque, J.; Renau, I. (2020). "Tarántula –> araña –> animal : asignación de hiperónimos de segundo nivel basada en métodos de similitud distribucional". Procesamiento del Lenguaje Natural, núm 64, pp. 29-36. PDF.

+ Renau, I.; Nazar, R.; Lecaros, V. (2020). "La evolución de las marcas ortográficas y tipográficas en los procesos de lexicalización de neologismos: un estudio en el vocabulario de la crisis económica en prensa española". Revista Española de Lingüística Aplicada, vol. 33, núm. 1, pp. 227-277.

Solutions for text processing

It is critical for organizations to have the ability to process information automatically, and very often that information is contained in documents to be read by humans rather than machines. We have different methods for text processing depending on the goal.

We can be helpful teaching people how to automatize their text processing routines. We can batch-process thousands of documents to extract information from them or to derive different types of statistics. We can also change these document, or generate databases or email correspondence based on information extracted from them. Anything that involves intelligent management of information can benefit from different degrees of automatization, and by doing that we can free time, effort and resources.

Tell us which are your needs and we will show you what we can do about it.