54 Informatik
Refine
Document Type
- Article (17)
- Doctoral Thesis (17)
- Master's Thesis (5)
- Bachelor Thesis (3)
- Preprint (3)
- Conference Proceeding (2)
Institute
- Institut für Strukturmechanik (ISM) (16)
- Professur Content Management und Webtechnologien (7)
- Junior-Professur Computational Architecture (4)
- Professur Informatik in der Architektur (4)
- Professur Medieninformatik (3)
- Professur Systeme der Virtuellen Realität (3)
- Professur Bauphysik (2)
- Professur Informatik im Bauwesen (2)
- Bauhaus-Institut für zukunftsweisende Infrastruktursysteme (b.is) (1)
- Fachbereich Medieninformatik (1)
Keywords
- Maschinelles Lernen (9)
- Machine learning (6)
- machine learning (6)
- OA-Publikationsfonds2020 (5)
- BIM (4)
- Mensch-Maschine-Kommunikation (4)
- Deep learning (3)
- IFC (3)
- Machine Learning (3)
- Virtuelle Realität (3)
In this paper we introduce LUCI, a Lightweight Urban Calculation Interchange system, designed to bring the advantages of a calculation and content co-ordination system to small planning and design groups by the means of an open source middle-ware. The middle-ware focuses on problems typical to urban planning and therefore features a geo-data repository as well as a job runtime administration, to coordinate simulation models and its multiple views. The described system architecture is accompanied by two exemplary use cases that have been used to test and further develop our concepts and implementations.
Web applications that are based on user-generated content are often criticized for containing low-quality information; a popular example is the online encyclopedia Wikipedia. The major points of criticism pertain to the accuracy, neutrality, and reliability of information. The identification of low-quality information is an important task since for a huge number of people around the world it has become a habit to first visit Wikipedia in case of an information need. Existing research on quality assessment in Wikipedia either investigates only small samples of articles, or else deals with the classification of content into high-quality or low-quality. This thesis goes further, it targets the investigation of quality flaws, thus providing specific indications of the respects in which low-quality content needs improvement. The original contributions of this thesis, which relate to the fields of user-generated content analysis, data mining, and machine learning, can be summarized as follows:
(1) We propose the investigation of quality flaws in Wikipedia based on user-defined cleanup tags. Cleanup tags are commonly used in the Wikipedia community to tag content that has some shortcomings. Our approach is based on the hypothesis that each cleanup tag defines a particular quality flaw.
(2) We provide the first comprehensive breakdown of Wikipedia's quality flaw structure. We present a flaw organization schema, and we conduct an extensive exploratory data analysis which reveals (a) the flaws that actually exist, (b) the distribution of flaws in Wikipedia, and, (c) the extent of flawed content.
(3) We present the first breakdown of Wikipedia's quality flaw evolution. We consider the entire history of the English Wikipedia from 2001 to 2012, which comprises more than 508 million page revisions, summing up to 7.9 TB. Our analysis reveals (a) how the incidence and the extent of flaws have evolved, and, (b) how the handling and the perception of flaws have changed over time.
(4) We are the first who operationalize an algorithmic prediction of quality flaws in Wikipedia. We cast quality flaw prediction as a one-class classification problem, develop a tailored quality flaw model, and employ a dedicated one-class machine learning approach. A comprehensive evaluation based on human-labeled Wikipedia articles underlines the practical applicability of our approach.
Text classification deals with discovering knowledge in texts and is used for extracting, filtering, or retrieving information in streams and collections. The discovery of knowledge is operationalized by modeling text classification tasks, which is mainly a human-driven engineering process. The outcome of this process, a text classification model, is used to inductively learn a text classification solution from a priori classified examples. The building blocks of modeling text classification tasks cover four aspects: (1) the way examples are represented, (2) the way examples are selected, (3) the way classifiers learn from examples, and (4) the way models are selected.
This thesis proposes methods that improve the prediction quality of text classification solutions for unseen examples, especially for non-standard tasks where standard models do not fit. The original contributions are related to the aforementioned building blocks: (1) Several topic-orthogonal text representations are studied in the context of non-standard tasks and a new representation, namely co-stems, is introduced. (2) A new active learning strategy that goes beyond standard sampling is examined. (3) A new one-class ensemble for improving the effectiveness of one-class classification is proposed. (4) A new model selection framework to cope with subclass distribution shifts that occur in dynamic environments is introduced.
Der inhaltlichen Qualitätssicherung von Bauwerksinformationsmodellen (BIM) kommt im Zuge einer stetig wachsenden Nutzung der verwendeten BIM für unterschiedliche Anwen-dungsfälle eine große Bedeutung zu. Diese ist für jede am Datenaustausch beteiligte Software dem Projektziel entsprechend durchzuführen. Mit den Industry Foundation Classes (IFC) steht ein etabliertes Format für die Beschreibung und den Austausch eines solchen Modells zur Verfügung. Für den Prozess der Qualitätssicherung wird eine serverbasierte Testumgebung Bestandteil des neuen Zertifizierungsverfahrens der IFC sein. Zu diesem Zweck wurde durch das „iabi - Institut für angewandte Bauinformatik” in Zusammenarbeit mit „buildingSMART e.V.“ (http://www.buildingsmart.de) ein Global Testing Documentation Server (GTDS) implementiert. Der GTDS ist eine, auf einer Datenbank basierte, Web-Applikation, die folgende Intentionen verfolgt:
• Bereitstellung eines Werkzeugs für das qualitative Testen IFC-basierter Modelle
• Unterstützung der Kommunikation zwischen IFC Entwicklern und Anwendern
• Dokumentation der Qualität von IFC-basierten Softwareanwendungen
• Bereitstellung einer Plattform für die Zertifizierung von IFC Anwendungen
Gegenstand der Arbeit ist die Planung und exemplarische Umsetzung eines Werkzeugs zur interaktiven Visualisierung von Qualitätsdefiziten, die vom GTDS im Modell erkannt wurden. Die exemplarische Umsetzung soll dabei aufbauend auf den OPEN IFC TOOLS (http://www.openifctools.org) erfolgen.
Texts from the web can be reused individually or in large quantities. The former is called text reuse and the latter language reuse. We first present a comprehensive overview of the different ways in which text and language is reused today, and how exactly information retrieval technologies can be applied in this respect. The remainder of the thesis then deals with specific retrieval tasks. In general, our contributions consist of models and algorithms, their evaluation, and for that purpose, large-scale corpus construction.
The thesis divides into two parts. The first part introduces technologies for text reuse detection, and our contributions are as follows: (1) A unified view of projecting-based and embedding-based fingerprinting for near-duplicate detection and the first time evaluation of fingerprint algorithms on Wikipedia revision histories as a new, large-scale corpus of near-duplicates. (2) A new retrieval model for the quantification of cross-language text similarity, which gets by without parallel corpora. We have evaluated the model in comparison to other models on many different pairs of languages. (3) An evaluation framework for text reuse and particularly plagiarism detectors, which consists of tailored detection performance measures and a large-scale corpus of automatically generated and manually written plagiarism cases. The latter have been obtained via crowdsourcing. This framework has been successfully applied to evaluate many different state-of-the-art plagiarism detection approaches within three international evaluation competitions.
The second part introduces technologies that solve three retrieval tasks based on language reuse, and our contributions are as follows: (4) A new model for the comparison of textual and non-textual web items across media, which exploits web comments as a source of information about the topic of an item. In this connection, we identify web comments as a largely neglected information source and introduce the rationale of comment retrieval. (5) Two new algorithms for query segmentation, which exploit web n-grams and Wikipedia as a means of discerning the user intent of a keyword query. Moreover, we crowdsource a new corpus for the evaluation of query segmentation which surpasses existing corpora by two orders of magnitude. (6) A new writing assistance tool called Netspeak, which is a search engine for commonly used language. Netspeak indexes the web in the form of web n-grams as a source of writing examples and implements a wildcard query processor on top of it.
Entwurf eines Spieler-Modells für eine erweiterbare Spielplattform zur Ausbildung in der Bauphysik
(2012)
Im Projekt Intelligentes Lernen beschäftigen sich die Professuren Content Management und Web-Technologien, Systeme der Virtuellen Realität und Bauphysik der Bauhaus- Universität Weimar mit der Entwicklung innovativer Informationstechnologien für eLearning- Umgebungen. In den Teilbereichen Retrieval, Extraktion und Visualisierung großer Dokumentkollektionen, sowie simulations- und planbasierter Wissensvermittlung werden Algorithmen und Werkzeuge erforscht, um eLearning-Systeme leistungsfähiger zu machen und um somit den Lernerfolg zu optimieren.
Ziel des Projekts, auf dem Gebiet des simulationsbasierten Wissenstransfers, ist die Entwicklung eines Multiplayer Online Games (MOG) zur Ausbildungsunterstützung in der Bauphysik.
Im Rahmen der vorliegenden Bachelorarbeit wird für diese digitale Lernsoftware ein Spieler- Modell zur Verwaltung der spielerspezifischen Daten entworfen und in das bestehende Framework integriert. Der Schwerpunkt der Arbeit liegt in der Organisation der erlernten Fähigkeiten des Spielers und in der an den Wissensstand angepassten Auswahl geeigneter Spielaufgaben. Für die Anwendung im eLearning-Bereich ist die Erweiterbarkeit des Modells um neue Lernkomplexe eine wesentliche Anforderung.