September 5, 2024 Prof. Elisabetta Jezek in the Winter Seminars on Lexical Semantics 2024
5 de septiembre, 2024 Hernán Robledo presentó su trabajo en la Universidad de Londres
22 de agosto, 2024 Impresionante convocatoria del taller de Python
17 de agosto, 2024 TEXT·A·GRAM está otra vez en línea
8 de agosto 2024 Taller de introducción a Python
2 de agosto 2024 Nuestra colaboradora Ana Castro obtiene el primer puesto de la beca ANID
July 29, 2024 We will be presenting two papers at ICAI 2024July 21, 2024 We are updating this website
July 11, 2024 Two talks by Irene Renau in less than 24 hours10 de julio 2024 Tenemos nuevo artículo sobre marcadores discursivos
July 1, 2024 We have a new version of Termout
28 de junio, 2024 Lucía Castillo presenta conferencia sobre Ciencia Abierta
June 14, 2024 We have a new version of BifidMay 31, 2024 We have a new version of KindMay 24, 2024 Irene Renau presented two talks in Murcia
May 6, 2024 We started to clean-up the Spanish Wiktionary
|
Tools & demosWe have implemented different types of applications and most of them can be tested online. Take a look. + Bifid: a parallel corpus aligner + Compare: a simple script to compare two lists of words + Cryptoman: a script to generate cryptograms + Dismark: a multilingual taxonomy of discourse markers + Estilector: computer assisted writing for Spanish + GeNom: a program to detect the gender of proper nouns + Jaguar: a tool for statistic corpus analysis + Kind: a lexical taxonomy induction algorithm + Kwico: a concordancer for big corpora + Lealem: a reading pacer for parallel German-Spanish texts + Leafran: a reading pacer for parallel French-Spanish texts + Linguini: a language detector + Neven: a program to detect eventive nouns + POL: named entity recognition and classification + Poppins: a supervised text classifier + Porcus: an interface for various taggers and parsers for Spanish + pullPOS: a project for the detection of plurals in Spanish + Punkt: punktuation of discourse markers in Spanish + Randall: a list randomizer + Readeutsch: a reading pacer for parallel German-English texts + Regex: a Perl script for regular expressions + Sapo: a program to detect similarities between documents + Sicam: a program to analyze Spanish poetry + Termout: a terminology extraction system + TEXT·A·GRAM: a program to analyze Spanish texts + Verbario: corpus pattern analysis in Spanish |
This is the view from where we are located, in the Sausalito lagoon, a quiet and lovely place in Viña del Mar, Chile. Sunny days. Birds can be seen in the center of the lagoon (click to enlarge). As researchers, we are currently affiliated to:
Av. El Bosque 1290, Viña del Mar, Chile |
Upcoming Events[UPDATED: September 29, 2024]8 de octubre de 2024 a las 17 horas de Chile: Rogelio Nazar estará presentando en línea para el IDI Research Group, de la Universidad de las Américas, una charla titulada 'Text·a·Gram: métodos cuantitativos para el análisis del discurso'. El objetivo es presentar una línea de investigación sobre modelado de géneros discursivos y una herramienta de código abierto, generada en el marco de ese proyecto, que permite extraer estadísticas descriptivas sobre la distribución de marcadores discursivos, deícticos y operadores modales en lengua castellana. November 8, 2024: at 16h Madrid time (GMT+2) or 12h in Chilean time (GMT-4) Irene Renau and Rogelio Nazar will be presenting their research results at the II Seminario UAM: “Jornadas de lexicología y lexicografía del español: modelos, metodologías y herramientas” (Conference on Spanish lexicology and lexicography: models, methodologies and tools), event organized by Rosario González, Beatriz Méndez, Elena de Miguel y Alberto Anula. The title of the presentation is 'La lingüística aplicada en acción: experimentos con herramientas para el procesamiento de texto' (Applied linguistics in action: experiments with text processing tools). |
Tweets by TeclingGroup | |
Latest ideas & research projects We are developing new projects in computational linguistics and natural language processing:
|
Recent publications+ Nazar, R.; Renau, I.; Robledo, H. (In press). Dismark and Text·a·Gram: Automatic identification and categorization of discourse markers in texts. In Proceedings of DISROM 2022 (Discourse Markers in Romance Languages, Craiova, 16-18 June 2022). + Obreque, J.; Nazar, R. (2023). Detección de operadores modales: una primera exploración en castellano. Linguamatica. 15(2): 37--49. PDF + Renau, Irene. (2023). A corpus-based study of semantic neology of the Covid-19 pandemic. Quaderns de Filologia: Estudis Lingüístics XXVIII: 55-76. PDF + Nazar, R. (2023). Extensión, variación y evolución del léxico español. In Battaner, P., Torner, S, Renau, I. Lexicografía hispánica / The Routledge Handbook of Spanish Lexicography. Cap. 14, pp. 204-218. + López-Hidalgo, B.; Renau, I.; Nazar, R. (2023). Correlación entre la metáfora orientacional BUENO ES ARRIBA / MALO ES ABAJO y polaridad positiva/negativa en verbos del español: un estudio con estadística de corpus. Humanidades Digitales, Corpus y Tecnología del Lenguaje. University of Groningen Press, pp. 307-323. PDF + Nazar, R. & Acosta, N. (2023). Termout: a tool for the semi-automatic creation of term databases. In Haddad, Amal; Terryn, Ayla; Mitkov, Ruslan; Rapp, Reinhard; Zweigenbaum, Pierre and Sharoff, Serge (eds.) Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC), INCOMA, Shoumen, Bulgaria, pp. 9-18. PDF + Nazar, R. & Renau, I. (2023). Estilector: un sistema de evaluación automática de la escritura académica en castellano. Revista Perspectiva Educacional, 62(2): 37-59. PDF + Robledo, H.; Nazar, R. (2023). A proposal for the inductive categorisation of parenthetical discourse markers in Spanish using parallel corpora. International Journal of Corpus Linguistics. http://doi.org/10.1075/ijcl.20017.rob + Renau, I.; Nazar, R. (2022). Towards a multilingual dictionary of discourse markers: automatic extraction of units from parallel corpus. In: Klosa-Kückelhaus, A.; Engelberg, S.; Möhrs, C.; Storjohann, P. Dictionaries and Society. Proceedings of the XX EURALEX International Congress, Mannheim: IDS-Verlag, pp. 262-272. PDF + Nazar, R; Lindemann, D. (2022). Terminology extraction using co-occurrence patterns as predictors of semantic relevance. Proceedings of the TERM21 Workshop. Language Resources and Evaluation Conference (LREC 2022), Marseille, 20-25 June 2022, pp. 26-29. PDF |
Solutions for text processingIt is critical for organizations to have the ability to process information automatically, and very often that information is contained in documents to be read by humans rather than machines. We have different methods for text processing depending on the goal. We can be helpful teaching people how to automatize their text processing routines. We can batch-process thousands of documents to extract information from them or to derive different types of statistics. We can also change these document, or generate databases or email correspondence based on information extracted from them. Anything that involves intelligent management of information can benefit from different degrees of automatization, and by doing that we can free time, effort and resources. Tell us which are your needs and we will show you what we can do about it. |