Technologies for Linguistic Analysis
November 23, 2020:
New paper published at Names: A Journal of Onomastics

We are proud to announce that we just got a new paper published at Names: A Journal of Onomastics about our dear project Genom (available at tecling.com/genom ).
In the paper we describe a series of methods for automatically determining the gender of proper names, based on their co-occurrence with words and grammatical features in a large corpus. A method like this offers the possibility of obtaining real and up-to-date name-gender links, and this can be applied to a variety of natural language processing tasks such as information extraction, machine translation, anaphora resolution or large-scale delivery or email correspondence, among others.

November 12, 2020
Irene Renau will offer a talk about loan words

Irene Renau, founder member of Tecling.com, will offer a talk about loan words at the Chilean Academy of Language. She will show data of our latest research (in collaboration with R. Nazar y L. Díaz) about monitoring the ortographic variants of loan words in 10 Spanish and Latinamerican newspapers. The event will take place today Thursday, November 12, 2020, at 18hs. Follow the link below:

November 7, 2020:
New paper at RLA - Revista de Lingüística Teórica y Aplicada

From time to time we are lucky enough to have a brilliant student that wants to do her thesis with us. On this occasion, that brilliant student was Sara Asenjo. Now she published a paper at RLA - Revista de Lingüística Teórica y Aplicada, based on her thesis and coauthored by Rogelio Nazar, who was her advisor. The paper (written in Spanish) offers a description of the use of discourse markers in 25 seven-year-old children diagnosed with specific language impairment (SLI) in contrast with a control group consisting of the same number of typically developing (TD) children of the same age.

Tools & demos

We have implemented different types of applications and most of them can be tested online. Take a look.

+ Bifid: a parallel corpus aligner

+ Dsele: a model dictionary for ELE learners

+ Deixis: a tool for the identification of deixis in Spanish texts

+ EMaD: automatic categorization of Spanish discouse markers

+ Estilector: computer assisted writing for Spanish

+ GeNom: a program to detect the gender of proper nouns

+ HAT: a project for the treatment of polysemy in lexical taxonomies

+ Jaguar: a tool for statistic corpus analysis

+ Kind: a taxonomy induction algorithm

+ Kwico: a concordancer for big corpora

+ Marzopo: a program to detect discourse markers in Spanish

+ Modal: a program to detect modality in Spanish

+ Neven: a program to detect eventive nouns

+ Termout: a terminology extraction system

+ POL: named entity recognition and classification

+ Poppins: a supervised text classifier

+ Porcus: an interface for various taggers and parsers for Spanish

+ pullPOS: a project for the detection of plurals in Spanish

+ Sapo: a program to detect similarities between documents

+ Sicam: a program to separate a Spanish Word in syllables

+ Verbario: corpus pattern analysis in Spanish


This is the view from where we are located, in the Sausalito lagoon, a quiet and lovely place in Viña del Mar, Chile. Sunny days. Birds can be seen in the center of the lagoon (click to enlarge).

As researchers, we are currently affiliated to:
Pontificia Universidad Católica de Valparaíso
Instituto de Literatura y Ciencias del Lenguaje

Av. El Bosque 1290, Viña del Mar, Chile

Upcoming Events

November 30, 2020: Rogelio Nazar will be guest editor of the special issue on Computational Linguistics of the journal Anales de lingüística, founded by the great Catalan linguist Joan Corominas in 1941, at Universidad Nacional de Cuyo, Argentina.
If you would like to have your paper published in this issue, send it by email to rogelio dot nazar at pucv dot cl before the deadline indicated in the heading.
Here is the call for papers (at the moment only available in Spanish).
And here the instructions for authors.

Latest ideas & research projects

We are developing new projects in computational linguistics and natural language processing:

+ Fondecyt Regular (2019-2021): "Polisemia regular de los sustantivos del español: análisis semiautomático de corpus, caracterización y tipología" (Regular polysemy of nouns in Spanish: semiautomatic analysis of corpus, characterization and tipology). Lead researcher: Irene Renau. Ref.: 1191204.

+ Fondecyt Regular (2019-2021): "Inducción automática de taxonomías de marcadores discursivos a partir de corpus multilingües" (Automatic induction of taxonomies of discourse markers from multilingual corpora). Lead researcher: Rogelio Nazar. Ref.: 1191481.

+ Ecos-Sud (International Project between Chile and France): "Inducción automática de taxonomías del español y el francés mediante técnicas cuantitativas y estadística de corpus". Lead researcher: Irene Renau. Ref.: C16H02.

+ Fondecyt Regular: "Desarrollo de la competencia terminológica a lo largo de la inserción disciplinar". Lead Researcher: Sabela Fernández. Co-researcher: Rogelio Nazar. Ref.: 11121597.

Recent publications

+ Nazar, R.; Balvet, A.; Ferraro, G.; Marín, R.; Renau, I. (2020). "Pruning and repopulating a lexical taxonomy: experiments in Spanish, English and French". Journal of Intelligent Systems, (forthcoming...).

+ Nazar, R.; Obreque, J.; Renau, I. (2020). "Tarántula –> araña –> animal : asignación de hiperónimos de segundo nivel basada en métodos de similitud distribucional". Procesamiento del Lenguaje Natural, núm 64, pp. 29-36. PDF

+ Renau, I.; Nazar, R.; Lecaros, V. (2020). "La evolución de las marcas ortográficas y tipográficas en los procesos de lexicalización de neologismos: un estudio en el vocabulario de la crisis económica en prensa española". Revista Española de Lingüística Aplicada, vol. 33, núm. 1, pp. 227-277.

Solutions for text processing

It is critical for organizations to have the ability to process information automatically, and very often that information is contained in documents to be read by humans rather than machines. We have different methods for text processing depending on the goal.

We can be helpful teaching people how to automatize their text processing routines. We can batch-process thousands of documents to extract information from them or to derive different types of statistics. We can also change these document, or generate databases or email correspondence based on information extracted from them. Anything that involves intelligent management of information can benefit from different degrees of automatization, and by doing that we can free time, effort and resources.

Tell us which are your needs and we will show you what we can do about it.