Termout.org logo/LING

Update: February 24, 2023 The new version of Termout.org is now online, so this web site is now obsolete and will soon be dismantled.

Lista de candidatos sometidos a examen:
1) dataset (*)
(*) Términos presentes en el nuestro glosario de lingüística

1) Candidate: dataset

Is in goldstandard

paper corpusSignosTxtLongLines416 - : sequences. We identify the causes of the errors of each type and suggest ways for preventing such errors with corresponding analysis of their cost and scale of impact. The analysis is performed for extractions from two Spanish-language text datasets: the FactSpaCIC dataset of grammatically correct and verified sentences and the RawWeb dataset of unedited text fragments from the Internet . Extraction is performed by the ExtrHech system.

paper corpusSignosTxtLongLines416 - : In this paper, we give extensive analysis and classification of errors in relation extraction that are specific to Open IE based on POS tagging and syntactic constraints. The analysis is performed for extractions from texts in the Spanish language. The experiments were conducted for two datasets: FactSpaCIC, a dataset of grammatically correct sentences collected from school textbooks (Aguilar, 2012 ), and RawWeb, a dataset of sentences randomly collected directly from the Web without any preprocessing except language detection (Horn, Zhila, Gelbukh & Lex, 2013). In the experiments, extraction was performed using ExtrHech, a state-of-the-art Open IE system for Spanish that implements an extraction method following the approach based on constraints over POS-tag sequences.

paper corpusSignosTxtLongLines416 - : The issues listed above caused errors on both datasets: FactSpaCIC and RawWeb . Below we describe some issues that did not occur in the grammatically correct dataset FactSpaCIC (possibly due to its limited size), yet they occurred on the (larger) RawWeb dataset.

paper corpusSignosTxtLongLines416 - : We have performed error analysis for two datasets: the FactSpaCIC dataset of grammatically correct verified sentences and the RawWeb dataset of texts directly extracted from the Internet . We have shown that the distributions of types of errors are similar for both datasets.

Evaluando al candidato dataset:

1) factspacic: 5
3) errors: 5 (*)
5) grammatically: 4
6) sentences: 4 (*)
7) extraction: 4 (*)
8) correct: 4
9) rawweb: 4

Lengua: eng
Frec: 28
Docs: 6
Nombre propio: / 28 = 0%
Coocurrencias con glosario: 3
Puntaje: 4.016 = (3 + (1+4.95419631038688) / (1+4.85798099512757)));
Candidato aceptado

No se encontraron referencias bibliográficas sociadas al/ alos término(s)

(Que existan referencias dedicadas a un término es también indicio de terminologicidad.)