28.06.2016 22:58

Invited Talk by Fabian Barteld: Detecting spelling variants in non-standard texts

The Language Technology Group is pleased to announce the talk of

Fabian Barteld (University Hamburg)


with the title


Detecting spelling variants in non-standard texts


The talk will take place on July 6th from 11:00-12:30 at S2|02 A126.


Abstract:

Recent years have seen a growing interest in applying natural language processing tools to domains that feature non-standardized spelling. These domains include very different data such as historical texts and computer-mediated communication (CMC). In computational linguistics, spelling variation in these types of texts have mainly been approached as deviations from a corresponding standard, i.e., standard language in the case of CMC and a corresponding modern standard in the case of historical texts. Consequently, techniques like spell checking, rule-based string transduction, string similarity and machine translation are used to find the standard variant for non-standard spellings. In my talk, I will present experiments with an alternative approach for dealing with spelling variation where variants of the same word form are detected in the data without the reference to a standard. Such an alternative approach allows handling spelling variation in domains that lack a standardized variant and for word forms for which no standard variant exist, e.g., emoticons in CMC and extinct words in historical texts.

Bio:

Fabian Barteld has studied German philology, media science, mathematics and computer science. He currently works at the University of Hamburg in the project "Referenzkorpus Mittelniederdeutsch/ Niederrheinisch (1200-1650)" (Reference corpus Middle Low German/ Low Rhenish; http://referenzkorpus-mnd-nrh.de/) and does his PhD research on spelling variation.




Related news:


Category:
Allgemeine News



A A A | Drucken Print | Impressum Impressum | Sitemap Sitemap | Suche Search | Kontakt Contact | Webseitenanalyse: Mehr Informationen
zum Seitenanfangzum Seitenanfang