Technologies for Linguistic Analysis
October 24, 2020:
New tool for the detection of discourse markers in Spanish

For a long time we were in the need for some tool that could detect and classify discourse markers (DMs) in Spanish according to the taxonomy of Martín Zorraquino & Portolés (1999), as this taxonomy is the one we use with our students of Text Grammars. We collected a large sample of DMs both manually and automatically, but the collection we now publish has been manually curated. It is published here in its entirety (if you spot some mistake please let us know).
In this demo you can paste a text in Spanish and the program will detect and classify DMs according to said taxonomy.

The documentation is still pending.

October 23, 2020
We presented lots of papers at Wopatec 2020

Our team presented various communications at the Fifth Edition of Wopatec, which was this time co-organized with the Third International Congress of Computational and Corpus Linguistics, held in October 21-23, 2020 and hosted by Universidad de Antioquia in Colombia. Members of our team who presented their research were Javier Obreque, Ana Castro, Valentina Ravest, Benjamín López, Nicolás Acosta and Rogelio Nazar. Six presentations in total... yeah we sort of invaded the program.
Check the web site for more details: https://cilcc20.wordpress.com/english/

October 15, 2020:
New web demo for modality detection in Spanish

Yeah, we know. We are crazy. We should focus on one thing at a time and try to make a piece of software that really works. Instead of that, we keep biting off more than we can chew. We can't help it! It's in our nature. We are now launching a new web demo based on modality detection in Spanish. It takes a text as input and it detects and classifies instances of modality. It's pretty cool.
Come on, give it a try:
And, you guessed it: the documentation is still pending.
Update: [October 23, 2020]: we added some documentation.

This is the view from where we are located, in the Sausalito lagoon, a quiet and lovely place in Viña del Mar, Chile. Sunny days. Birds can be seen in the center of the lagoon (click to enlarge).

As researchers, we are currently affiliated to:
Pontificia Universidad Católica de Valparaíso
Instituto de Literatura y Ciencias del Lenguaje

Av. El Bosque 1290, Viña del Mar, Chile

Latest ideas & research projects

We are developing new projects in computational linguistics and natural language processing:

+ Fondecyt Regular (2019-2021): "Polisemia regular de los sustantivos del español: análisis semiautomático de corpus, caracterización y tipología" (Regular polysemy of nouns in Spanish: semiautomatic analysis of corpus, characterization and tipology). Lead researcher: Irene Renau. Ref.: 1191204.

+ Fondecyt Regular (2019-2021): "Inducción automática de taxonomías de marcadores discursivos a partir de corpus multilingües" (Automatic induction of taxonomies of discourse markers from multilingual corpora). Lead researcher: Rogelio Nazar. Ref.: 1191481.

+ Ecos-Sud (International Project between Chile and France): "Inducción automática de taxonomías del español y el francés mediante técnicas cuantitativas y estadística de corpus". Lead researcher: Irene Renau. Ref.: C16H02.

+ Fondecyt Regular: "Desarrollo de la competencia terminológica a lo largo de la inserción disciplinar". Lead Researcher: Sabela Fernández. Co-researcher: Rogelio Nazar. Ref.: 11121597.

Recent publications

+ Nazar, R.; Balvet, A.; Ferraro, G.; Marín, R.; Renau, I. (2020). "Pruning and repopulating a lexical taxonomy: experiments in Spanish, English and French". Journal of Intelligent Systems, (forthcoming...).

+ Nazar, R.; Obreque, J.; Renau, I. (2020). "Tarántula –> araña –> animal : asignación de hiperónimos de segundo nivel basada en métodos de similitud distribucional". Procesamiento del Lenguaje Natural, núm 64, pp. 29-36. PDF

+ Renau, I.; Nazar, R.; Lecaros, V. (2020). "La evolución de las marcas ortográficas y tipográficas en los procesos de lexicalización de neologismos: un estudio en el vocabulario de la crisis económica en prensa española". Revista Española de Lingüística Aplicada, vol. 33, núm. 1, pp. 227-277.

Solutions for text processing

It is critical for organizations to have the ability to process information automatically, and very often that information is contained in documents to be read by humans rather than machines. We have different methods for text processing depending on the goal.

We can be helpful teaching people how to automatize their text processing routines. We can batch-process thousands of documents to extract information from them or to derive different types of statistics. We can also change these document, or generate databases or email correspondence based on information extracted from them. Anything that involves intelligent management of information can benefit from different degrees of automatization, and by doing that we can free time, effort and resources.

Tell us which are your needs and we will show you what we can do about it.