NEVEN
We present a study in the field of the automatic
detection of non-deverbal eventive nouns, which
are those nouns that designate events but have not
experienced a process of derivation from verbs, such
as fiesta (‘party’) or cóctel (‘cocktail’) and, for this
reason, do not present the typical morphological features
of deverbal nouns, such as -ci´on, -miento, and
are therefore more difficult to detect.
In the present research we continue and extend the
work initiated by Resnik
(2010), who offers a number
of cues for the detection of this type of lexical unit. We
apply Resnik’s ideas and we also add new ones, among
them, the inductive analysis of the words that tend to
co-occur with eventive nouns in corpora, in order to
use them as predictors of this condition. Furthermore,
we simplify the classification algorithm considerably,
and we apply the experiments to a larger corpus, the
EsTenTen (Kilgarriff & Renau, 2013), comprising more
than 9 billion running words. Finally, we present
the first results of the automatic extraction of eventive
nouns from the corpus, among which we find plenty
non-deverbal nouns.
Web demo: http://www.tecling.com/neven
Source code:
http://www.tecling.com/neven/neven.zip (it's a zip file: it needs to be inflated before use).
Usage:
perl neven.pl input.txt > result.htm
Beforehand, you need the contexts of occurrence of a word extracted from the corpus. But you will need to edit the script
in order to set the right path to the folder where the contexts are stored. These concordances
are stored in a file bearing the same name of the word's lemma.
You can obtain these concordances from any corpus using our free corpus concordancer Kwico.
Comments in the script are at the moment only in Spanish.
Pending Work: Users interested only in non-deverbal eventive nouns will need a few changes in the script que filter out those nouns having deverbal morphology (e.g. -ción, -miento). What is interesting about this program is that it completely ignores such morphological cues. The morphology filter is a safe and simple method and will be ready soon.
Funding:
This research is supported by a grant from the Chilean
Government: Conicyt-Fondecyt 11140686, “Inducción
automática de taxonomías de sustantivos generales y especializados a partir de corpus textuales desde el enfoque de
la lingüística cuantitativa” (Automatic taxonomy induction from corpora for terminology and general vocabulary using quantitative measures). Lead researcher: Rogelio Nazar.
Related publications:
Nazar, R.; Soto, R.; Urrejola, K. (2017). Detección automática de nombres eventivos no deverbales en castellano: un enfoque cuantitativo basado en corpus. Revista Linguamatica, vol. 9, num. 2, pp. 21-31.
Related concepts: computacional lexicography, inductive corpus analysis,
non-deverbal eventive nouns
Questions or comments? Feel free to drop a line.
|