Modeling Non-Standard Text Classification Tasks

Lipka, Nedim

Treffer 10 von 29

Modeling Non-Standard Text Classification Tasks

Text classification deals with discovering knowledge in texts and is used for extracting, filtering, or retrieving information in streams and collections. The discovery of knowledge is operationalized by modeling text classification tasks, which is mainly a human-driven engineering process. The outcome of this process, a text classification model, is used to inductively learn a text classificationText classification deals with discovering knowledge in texts and is used for extracting, filtering, or retrieving information in streams and collections. The discovery of knowledge is operationalized by modeling text classification tasks, which is mainly a human-driven engineering process. The outcome of this process, a text classification model, is used to inductively learn a text classification solution from a priori classified examples. The building blocks of modeling text classification tasks cover four aspects: (1) the way examples are represented, (2) the way examples are selected, (3) the way classifiers learn from examples, and (4) the way models are selected. This thesis proposes methods that improve the prediction quality of text classification solutions for unseen examples, especially for non-standard tasks where standard models do not fit. The original contributions are related to the aforementioned building blocks: (1) Several topic-orthogonal text representations are studied in the context of non-standard tasks and a new representation, namely co-stems, is introduced. (2) A new active learning strategy that goes beyond standard sampling is examined. (3) A new one-class ensemble for improving the effectiveness of one-class classification is proposed. (4) A new model selection framework to cope with subclass distribution shifts that occur in dynamic environments is introduced.…

Metadaten
Dokumentart:	Dissertation
Verfasserangaben:	Nedim Lipka
DOI (Zitierlink):	https://doi.org/10.25643/bauhaus-universitaet.1862 Zitierlink
URN (Zitierlink):	https://nbn-resolving.org/urn:nbn:de:gbv:wim2-20130307-18626 Zitierlink
Gutachter:	James Shanahan
Betreuer:	Benno Stein ORCiD GND
Sprache:	Englisch
Datum der Veröffentlichung (online):	03.06.2013
Jahr der Erstveröffentlichung:	2013
Datum der Abschlussprüfung:	25.02.2013
Datum der Freischaltung:	07.03.2013
Veröffentlichende Institution:	Bauhaus-Universität Weimar
Titel verleihende Institution:	Bauhaus-Universität Weimar, Fakultät Medien
Institute und Partnereinrichtugen:	Fakultät Medien / Professur Content Management und Webtechnologien
GND-Schlagwort:	Text Classification; Machine Learning
DDC-Klassifikation:	000 Informatik, Informationswissenschaft, allgemeine Werke / 000 Informatik, Wissen, Systeme / 000 Informatik, Informationswissenschaft, allgemeine Werke
BKL-Klassifikation:	54 Informatik
Lizenz (Deutsch):	Creative Commons 4.0 - Namensnennung-Nicht kommerziell-Keine Bearbeitung (CC BY-NC-ND 4.0)

Universitätsbibliothek
Weimar Open Access

Modeling Non-Standard Text Classification Tasks

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste

UniversitätsbibliothekWeimar Open Access

Modeling Non-Standard Text Classification Tasks

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste

Universitätsbibliothek
Weimar Open Access