Tecling logo » The World is automatic      ABOUT RESEARCH SOLUTIONS SOFTWARE CONTACT
Technologies for Linguistic Analysis

October 19, 2021
Katabak: a new program for the analysis of references in scientific papers

Katabak (http://tecling.com/katabak) is a program that takes the text of a scientific paper in English or Spanish and checks that every citation in the body of the text appears in the reference list and vice versa. We badly needed something like this for revistasignos.cl because, as editors, we always have to do this manually and it is an extremely laborious, error-prone task. A script to automatize this procedure was in our to-do list since 2014, more or less. And it took us less than two hours to write it! We would have saved countless hours of our precious time if we had done this before, but we are always very busy doing lots of other stuff. Anyway, it's not perfect yet. It makes some mistakes because it's brand new, so don't be too judgmental. It will take some more time to finish polishing this script. Try it and please let us know if you spot any of those mistakes.

September 30, 2021
New YouTube video: presentation of Tecling.com at Universidad del Salvador, Argentina

Irene Renau and Rogelio Nazar delivered a long talk entitled Tecling.com: Herramientas experimentales para el procesamiento de textos (Tecling.com: experimental tools for text processing) at the II Jornadas de Lingüística y Gramática Española, organized by Universidad del Salvador, Argentina. This presentation (in Spanish) is also available at YouTube:
It was a great experience and we are thankful to the organizers and the audience.

September 24, 2021
Today we presented a new paper about discourse markers at SEPLN 2021

Very early in the morning, Rogelio Nazar delivered a presentation about Dismark, our project on discourse markers. The presentation (in Spanish) is available in video at YouTube:
(And yes, today we officially inaugurated our YouTube channel. We will be regularly posting new videos there).

The paper is also available at the SEPLN journal: http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6383

September 21, 2021
New version of Project Sicam, for the analysis of Spanish poems

Happy times!
We have a new version of Project Sicam. This is a Perl implementation of the algorithm originally designed by Ricardo Martínez for the descriptive analysis of the metrics of syllables, verses and poems in Spanish.
Try it out:
You paste there a poem in Spanish and it will give you a very detailed analysis and classification of each word, verse, stanza and the whole poem.

September 16, 2021
Different talks, at different venues

We've been busy lately delivering talks at different international conferences. It's being quite a run. Things that happen in the new virtual era. First, on July 6 there was a paper at ELEX2021 about discourse markers. A few days later (July 23), a talk at PhrasaLex II about regular polysemy. Then, last week (September 7) we were presenting a paper at EURALEX about verbal neology. Next week (September 24) we will also be presenting papers on different topics at the SEPLN 2021 Conference (also online). And even the week after that (September 30) we will be giving another online presentation at Universidad del Salvador, Argentina, to describe our research in general. Later, on October 25-29, 2021, we will be at ALED-2021 and Riterm 2021. We certainly are saving some money on plane tickets and hotels.

August 23, 2021
Hernán Robledo successfully defended his PhD Dissertation

Last Wednesday, our colleague Hernán Robledo defended his thesis about the automatic discovery of categories of discourse markers. In the framework of Project Fondecyt 1191481 and with Rogelio Nazar as advisor, Hernán explored methods to exploit parallel corpora using clustering algorithms. With such methods he obtained groups of Spanish discourse markers that fulfill the same function. The thesis, written in Spanish, will be made available online and we hope there will also be a paper about it soon, published in English.

August 7, 2021
Tildau now restors diacritics of full texts

Tildau is getting better: now it can perform diacritic restoration of full documents. When we started with this implementation, a few weeks ago, we were only capable of restoring diacritics in lists of Spanish proper nouns. Now, however, we can process running text and restore diacritics even in the tricky cases when one has different genuine forms in the Spanish vocabulary, with or without diacritics, such as pérdida/perdida, etc.
This comes almost ten years after our paper Spell-checking in Spanish: The case of diacritic accents.
The current implementation is not perfect, as it has some problems with high-frequency words such as qué/que or cómo/como, etc., which will probably need some special rules. But we will get there eventually and as it stands now it's pretty cool anyway. Come on! Give it a try:

July 23, 2021
Irene Renau delivers a talk at PhrasaLex II

Irene Renau presented the result of her research in regular polysemy in Spanish nouns at the PHRASALEX II Workshop on - Phraseological Approaches to Learner’s Lexicography 2021. The title of the presentation is 'Regular Polysemy in context: a corpus based analysis of regular polysemy in Spanish nouns'. This event is going on right now!

July 16, 2021
Tildau: diacritic restoration of proper nouns

It often happens that some database systems eliminate tildes in people's names. For example, in our university, all students' records are in this state. As we often need to send email correspondence to them, we thought it would be nice to address them using the proper orthography. For this we developed Tildau. It takes a list of proper nouns without tildes and performs some sort of diacritic restoration.
For example:
input: Ramon Jose Sotomayor Diaz
output: Ramón José Sotomayor Díaz
We did this for our own purposes and share it as it is. It will only work with proper names. It will not be useful for full diacritic restoration of documents.
Do you want to try it? Yeahhh you know you want it. Come on. Give it a try:

June 28, 2021
New paper at the International Journal of Lexicography

We've just realized our paper on new verbs has already been published online at the International Journal of Lexicography. In this paper, Ana Castro, Rogelio Nazar and Irene Renau explain how to obtain new verbs from Spanish corpora. But there are many interesting ideas in this research which can be applied to problems other than neology. For instance, the algorithm described there could very well be used for spell checking. We will be doing some of that stuff soon. Stay tuned!

Tools & demos

We have implemented different types of applications and most of them can be tested online. Take a look.

+ Bifid: a parallel corpus aligner

+ Cryptoman: a script to generate cryptograms

+ Dismark: a multilingual taxonomy of discourse markers (new!)

+ Dsele: a model dictionary for ELE learners

+ Estilector: computer assisted writing for Spanish

+ GeNom: a program to detect the gender of proper nouns

+ HAT: a project for the treatment of polysemy in lexical taxonomies

+ Jaguar: a tool for statistic corpus analysis

+ Kind: a lexical taxonomy induction algorithm

+ Kwico: a concordancer for big corpora

+ Lealem: a reading pacer for parallel German-Spanish texts

+ Leafran: a reading pacer for parallel French-Spanish texts

+ Linguini: a language detector

+ Neven: a program to detect eventive nouns

+ Termout: a terminology extraction system

+ POL: named entity recognition and classification

+ Poppins: a supervised text classifier

+ Porcus: an interface for various taggers and parsers for Spanish

+ pullPOS: a project for the detection of plurals in Spanish

+ Readeutsch: a reading pacer for parallel German-English texts

+ Sapo: a program to detect similarities between documents

+ Sicam: a program to separate a Spanish Word in syllables

+ TEXT·A·GRAM: a program to analyze Spanish texts

+ Verbario: corpus pattern analysis in Spanish


This is the view from where we are located, in the Sausalito lagoon, a quiet and lovely place in Viña del Mar, Chile. Sunny days. Birds can be seen in the center of the lagoon (click to enlarge).

As researchers, we are currently affiliated to:
Pontificia Universidad Católica de Valparaíso
Instituto de Literatura y Ciencias del Lenguaje

Av. El Bosque 1290, Viña del Mar, Chile

Upcoming Events

October 25-29, 2021: Rogelio Nazar will be presenting at ALED 2021: https://comunidadaled.org/?p=911

October 26-29, 2021: Rogelio Nazar will be presenting at RITERM 2021: https://easychair.org/cfp/riterm2020_2021

Latest ideas & research projects

We are developing new projects in computational linguistics and natural language processing:

+ Fondecyt Regular (2019-2021): "Polisemia regular de los sustantivos del español: análisis semiautomático de corpus, caracterización y tipología" (Regular polysemy of nouns in Spanish: semiautomatic analysis of corpus, characterization and tipology). Lead researcher: Irene Renau. Ref.: 1191204.

+ Fondecyt Regular (2019-2021): "Inducción automática de taxonomías de marcadores discursivos a partir de corpus multilingües" (Automatic induction of taxonomies of discourse markers from multilingual corpora). Lead researcher: Rogelio Nazar. Ref.: 1191481.

+ Ecos-Sud (International Project between Chile and France): "Inducción automática de taxonomías del español y el francés mediante técnicas cuantitativas y estadística de corpus". Lead researcher: Irene Renau. Ref.: C16H02.

+ Fondecyt Regular: "Desarrollo de la competencia terminológica a lo largo de la inserción disciplinar". Lead Researcher: Sabela Fernández. Co-researcher: Rogelio Nazar. Ref.: 11121597.

+ See more.

Recent publications

+ Nazar, R. (2021). "Inducción automática de una taxonomía multilingüe de marcadores discursivos: primeros resultados en castellano, inglés, francés, alemán y catalán". Procesamiento del Lenguaje Natural, núm 67, pp. 127-138. PDF

+ Nazar, R. (2021). "Automatic induction of a multilingual taxonomy of discourse markers". Iztok Kosem et al. (eds.) Electronic lexicography in the 21st century: post-editing lexicography. Lexical Computing CZ s.r.o., Brno, pages 440-454. PDF

+ Castro, A.; Nazar, R.; Renau, I. (2021). "New verbs and dictionaries: a method for the automatic detection of neology in Spanish verbs". International Journal of Lexicography, ...

+ Nazar, R.; Renau, I., Acosta, N., Robledo, H., Soliman, H., Zamora, S. (2021). "Corpus-Based Methods for Recognizing the Gender of Anthroponyms". Names: A Journal of Onomastics, vol. 69 num. 3. PDF

+ See more.

Solutions for text processing

It is critical for organizations to have the ability to process information automatically, and very often that information is contained in documents to be read by humans rather than machines. We have different methods for text processing depending on the goal.

We can be helpful teaching people how to automatize their text processing routines. We can batch-process thousands of documents to extract information from them or to derive different types of statistics. We can also change these document, or generate databases or email correspondence based on information extracted from them. Anything that involves intelligent management of information can benefit from different degrees of automatization, and by doing that we can free time, effort and resources.

Tell us which are your needs and we will show you what we can do about it.