Tecling logo » The World is automatic      ABOUT RESEARCH SOLUTIONS SOFTWARE CONTACT
Technologies for Linguistic Analysis

July 23, 2021
Irene Renau delivers a talk at PhrasaLex II

Irene Renau presented the result of her research in regular polysemy in Spanish nouns at the PHRASALEX II Workshop on - Phraseological Approaches to Learner’s Lexicography 2021. The title of the presentation is 'Regular Polysemy in context: a corpus based analysis of regular polysemy in Spanish nouns'. This event is going on right now! We will be updating...

July 16, 2021
Tildau: diacritic restoration of proper nouns

It often happens that some database systems eliminate tildes in people's names. For example, in our university, all students' records are in this state. As we often need to send email correspondence to them, we thought it would be nice to address them using the proper orthography. For this we developed Tildau. It takes a list of proper nouns without tildes and performs some sort of diacritic restoration.
For example:
input: Ramon Jose Sotomayor Diaz
output: Ramón José Sotomayor Díaz
We did this for our own purposes and share it as it is. It will only work with proper names. It will not be useful for full diacritic restoration of documents.
Do you want to try it? Yeahhh you know you want it. Come on. Give it a try:

June 28, 2021
New paper at the International Journal of Lexicography

We've just realized our paper on new verbs has already been published online at the International Journal of Lexicography. In this paper, Ana Castro, Rogelio Nazar and Irene Renau explain how to obtain new verbs from Spanish corpora. But there are many interesting ideas in this research which can be applied to problems other than neology. For instance, the algorithm described there could very well be used for spell checking. We will be doing some of that stuff soon. Stay tuned!

May 25, 2021
TEXT·A·GRAM is almost ready!

Yes, it is not ready, but it is almost ready. We still have to polish many details, but some functionality is already available:
The program has been designed to undertake different levels of text analysis for Spanish from the point of view of Text Grammar.
It will detect:

  1. The referents of the text (the objects the text talks about)
  2. Discourse markers (using Dismark's taxonomy)
  3. Personal, temporal and spatial deixis
  4. Modality

Other layers of analysis are planned as well and will be incorporated as soon as we can (i.e., the next long weekend we get).

May 19, 2021
New version of Porcus for French and English

Our colleague Nicolás Acosta modified the downloadable version of Porcus so that it can also process English and French, apart from Spanish, which was the language of the previous version. The web-demo, however, still processes only Spanish, but at some point we will adjust it to make it able to process the other two languages as well.
Enjoy with moderation!

April 23, 2021
New feature added to Kind: navigation by categories

This had been in our to-do list for a very long time. Today, between the loads of paperwork and meetings that seem to be a perennial part of the work in a university, we somehow managed to finish this new feature of Kind, our Lexical Taxonomy Project. Now you can navigate the categories created so far, in Spanish, English and French, and this is of course very interesting. By doing this we also noticed, however, the terrible amount of vandalism and spamboting that this website suffers. All of this activity is feeding the database with lots of undesirable terms. So, be prepared to find some pretty weird stuff popping up from time to time. We will try to come up with some ideas to tackle this problem.
Please tell us if you find any bugs or if you have comments.

April 7, 2021
Five of our students get ANID scholarships: two of them with the best score in Chile

Ana Castro, Benjamín López Hidalgo, Javier Obreque, Natalie Mies and Maximiliano Ramírez studied linguistics with us at PUCV.cl and are members of the research group Tecling. Now, ANID.cl just granted them a prestigious scholarship to continue with postgraduate studies. Two of these students, Ana and Javier, share the first place in the national ranking, from a total of 2129 people who applied for this benefit. Only 223 scholarships were granted in total, which speaks for how fierce the competition has been. But the best part is that, according to the letter they received, the score is in a scale from 0 to 5, but they anyway assigned Ana and Javier 5.042 points!!!

Update on April 10: they were five, not four.

February 18, 2021
Dismark: first results of Project Fondecyt 1191481

The very first results of our research project Dismark (Fondecyt Regular 1191481) are already available: a taxonomy of discourse markers (DMs) in English, Spanish, German and French, and an algorithm to classify new DMs. Documentation is on the making, but you can already browse the categories and try the classifier. It's pretty cool:
But beware: these are raw results. We have not yet started to revise them. They may contain more than a few mistakes.

Tools & demos

We have implemented different types of applications and most of them can be tested online. Take a look.

+ Bifid: a parallel corpus aligner

+ Cryptoman: a script to generate cryptograms

+ Dismark: a multilingual taxonomy of discourse markers (new!)

+ Dsele: a model dictionary for ELE learners

+ Estilector: computer assisted writing for Spanish

+ GeNom: a program to detect the gender of proper nouns

+ HAT: a project for the treatment of polysemy in lexical taxonomies

+ Jaguar: a tool for statistic corpus analysis

+ Kind: a lexical taxonomy induction algorithm

+ Kwico: a concordancer for big corpora

+ Lealem: a reading pacer for parallel German-Spanish texts

+ Leafran: a reading pacer for parallel French-Spanish texts

+ Linguini: a language detector

+ Neven: a program to detect eventive nouns

+ Termout: a terminology extraction system

+ POL: named entity recognition and classification

+ Poppins: a supervised text classifier

+ Porcus: an interface for various taggers and parsers for Spanish

+ pullPOS: a project for the detection of plurals in Spanish

+ Readeutsch: a reading pacer for parallel German-English texts

+ Sapo: a program to detect similarities between documents

+ Sicam: a program to separate a Spanish Word in syllables

+ TEXT·A·GRAM: a program to analyze Spanish texts

+ Verbario: corpus pattern analysis in Spanish


This is the view from where we are located, in the Sausalito lagoon, a quiet and lovely place in Viña del Mar, Chile. Sunny days. Birds can be seen in the center of the lagoon (click to enlarge).

As researchers, we are currently affiliated to:
Pontificia Universidad Católica de Valparaíso
Instituto de Literatura y Ciencias del Lenguaje

Av. El Bosque 1290, Viña del Mar, Chile

Upcoming Events

Somewhere in July/Agust, 2021: we had to postpone our plans to launch the new version of Bifid. But we will do it! It will include the new functions we have been working on: text segmentation (Segismund) and language detection (Linguini).

September 7, 2021: Ana Castro, Rogelio Nazar and Irene Renau will be presenting papers at the EURALEX Conference (online): https://euralex2020.gr

September 22-24, 2021: Daniel Mora and Rogelio Nazar will be presenting their papers at the SEPLN 2021 Conference (online): http://www.hitz.eus/sepln2021

Latest ideas & research projects

We are developing new projects in computational linguistics and natural language processing:

+ Fondecyt Regular (2019-2021): "Polisemia regular de los sustantivos del español: análisis semiautomático de corpus, caracterización y tipología" (Regular polysemy of nouns in Spanish: semiautomatic analysis of corpus, characterization and tipology). Lead researcher: Irene Renau. Ref.: 1191204.

+ Fondecyt Regular (2019-2021): "Inducción automática de taxonomías de marcadores discursivos a partir de corpus multilingües" (Automatic induction of taxonomies of discourse markers from multilingual corpora). Lead researcher: Rogelio Nazar. Ref.: 1191481.

+ Ecos-Sud (International Project between Chile and France): "Inducción automática de taxonomías del español y el francés mediante técnicas cuantitativas y estadística de corpus". Lead researcher: Irene Renau. Ref.: C16H02.

+ Fondecyt Regular: "Desarrollo de la competencia terminológica a lo largo de la inserción disciplinar". Lead Researcher: Sabela Fernández. Co-researcher: Rogelio Nazar. Ref.: 11121597.

+ See more.

Recent publications

+ Nazar, R.; Balvet, A.; Ferraro, G.; Marín, R.; Renau, I. (2020). "Pruning and repopulating a lexical taxonomy: experiments in Spanish, English and French". Journal of Intelligent Systems, vol. 30, num. 1, pp. 376-394. PDF

+ Nazar, R.; Renau, I., Acosta, N., Robledo, H., Soliman, H., Zamora, S. (2020). "Corpus-Based Methods for Recognizing the Gender of Anthroponyms". Names: A Journal of Onomastics.

+Asenjo, S.; Nazar, R. (2020). "Marcadores discursivos en niños de 7 años con trastorno específico del lenguaje: estudio descriptivo". RLA. Revista de lingüística teórica y aplicada, vol. 58 núm 1, pp. 93-114. PDF.

+ Nazar, R.; Obreque, J.; Renau, I. (2020). "Tarántula –> araña –> animal : asignación de hiperónimos de segundo nivel basada en métodos de similitud distribucional". Procesamiento del Lenguaje Natural, núm 64, pp. 29-36. PDF.

+ Renau, I.; Nazar, R.; Lecaros, V. (2020). "La evolución de las marcas ortográficas y tipográficas en los procesos de lexicalización de neologismos: un estudio en el vocabulario de la crisis económica en prensa española". Revista Española de Lingüística Aplicada, vol. 33, núm. 1, pp. 227-277.

+ See more.

Solutions for text processing

It is critical for organizations to have the ability to process information automatically, and very often that information is contained in documents to be read by humans rather than machines. We have different methods for text processing depending on the goal.

We can be helpful teaching people how to automatize their text processing routines. We can batch-process thousands of documents to extract information from them or to derive different types of statistics. We can also change these document, or generate databases or email correspondence based on information extracted from them. Anything that involves intelligent management of information can benefit from different degrees of automatization, and by doing that we can free time, effort and resources.

Tell us which are your needs and we will show you what we can do about it.