TitleErrores ortográficos y de competencia en textos de la web en euskera
Publication TypeJournal Article
Year of Publication2010
AuthorsAlegria, Iñaki, Izaskun Etxeberria, and Igor Leturia
JournalRevista de la Asociación Española para el Procesamiento del Lenguaje Natural
Volume45
ISSN1135-5948
Abstract

The objective of the work presented in this paper is to estimate the quality of corpora retrieved from the Basque Web. The methodology i followed is similar to that used for English and Germany by Ringlstetter et al. (2006). The main difference lies in the fact that we reuse spelling checkers for detecting errors. We think that by this way we obtain a higher error coverage and that the method can be applied to other languages with practically no manual work provided such tools are available for them. The results
obtained can be useful for improving the quality of corpora obtained from the web, eliminating documents containing errors over a given threshold.

URLhttp://www.elhuyar.org/hizkuntza-zerbitzuak/informazioa/corpus-tresnak/Errores_web.pdf