000 Informatik, Wissen, Systeme
Search engines are very good at answering queries that look for facts. Still, information needs that concern forming opinions on a controversial topic or making a decision remain a challenge for search engines. Since they are optimized to retrieve satisfying answers, search engines might emphasize a specific stance on a controversial topic in their ranking, amplifying bias in society in an undesired way. Argument retrieval systems support users in forming opinions about controversial topics by retrieving arguments for a given query. In this thesis, we address challenges in argument retrieval systems that concern integrating them in search engines, developing generalizable argument mining approaches, and enabling frame-guided delivery of arguments.
Adapting argument retrieval systems to search engines should start by identifying and analyzing information needs that look for arguments. To identify questions that look for arguments we develop a two-step annotation scheme that first identifies whether the context of a question is controversial, and if so, assigns it one of several question types: factual, method, and argumentative. Using this annotation scheme, we create a question dataset from the logs of a major search engine and use it to analyze the characteristics of argumentative questions. The analysis shows that the proportion of argumentative questions on controversial topics is substantial and that they mainly ask for reasons and predictions. The dataset is further used to develop a classifier to uniquely map questions to the question types, reaching a convincing F1-score of 0.78.
While the web offers an invaluable source of argumentative content to respond to argumentative questions, it is characterized by multiple genres (e.g., news articles and social fora). Exploiting the web as a source of arguments relies on developing argument mining approaches that generalize over genre. To this end, we approach the problem of how to extract argument units in a genre-robust way. Our experiments on argument unit segmentation show that transfer across genres is rather hard to achieve using existing sequence-to-sequence models.
Another property of text which argument mining approaches should generalize over is topic. Since new topics appear daily on which argument mining approaches are not trained, argument mining approaches should be developed in a topic-generalizable way. Towards this goal, we analyze the coverage of 31 argument corpora across topics using three topic ontologies. The analysis shows that the topics covered by existing argument corpora are biased toward a small subset of easily accessible controversial topics, hinting at the inability of existing approaches to generalize across topics. In addition to corpus construction standards, fostering topic generalizability requires a careful formulation of argument mining tasks. Same side stance classification is a reformulation of stance classification that makes it less dependent on the topic. First experiments on this task show promising results in generalizing across topics.
To be effective at persuading their audience, users of an argument retrieval system should select arguments from the retrieved results based on what frame they emphasize of a controversial topic. An open challenge is to develop an approach to identify the frames of an argument. To this end, we define a frame as a subset of arguments that share an aspect. We operationalize this model via an approach that identifies and removes the topic of arguments before clustering them into frames. We evaluate the approach on a dataset that covers 12,326 frames and show that identifying the topic of an argument and removing it helps to identify its frames.
Die Form der Datenbank
(2023)
Datenbanken sind heute die wichtigste Technik zur Organisation und Verarbeitung von Daten. Wie wurden sie zu einer der allgegenwärtigsten und gleichzeitig unsichtbarsten Praxis, die menschliche Zusammenarbeit ermöglicht? Diese Studie beginnt mit einer historiographischen Erkundung der zentralen medialen Konzepte von Datenbanken und mündet in das praxeologische Konzept der "Daten als Formation", kurz: In-Formation.
Der erste Hauptteil befasst sich mit der Formatierung von Daten durch die Verarbeitung strukturierter Daten mittels relationaler Algebra. Es wird erarbeitet, auf welche Weise Struktur neues Wissen schafft. Im zweiten Teil wird erörtert, wie Datenbanken durch den diagrammatisch-epistemischen Raum der Tabelle operationalisiert werden. Drittens untersucht die Studie Transaktionen als Erklärungen dafür, wie Daten und reale Handlungen koordiniert und synchronisiert werden können.
Im zweiten Hauptteil wird untersucht, wie relationale Datenbanken zunehmend zum Zentrum von Softwareanwendungen und Infrastrukturen wurden, wobei der Schwerpunkt auf wirtschaftlichen Praktiken liegt. In einem vergleichenden Ansatz wird anhand von Fallstudien in der DDR der 1970er bis 1990er Jahren die Frage gestellt, ob es eine „sozialistische“ Datenbankmanagement-Software gegeben hat. Dabei werden die „westlichen“ Produktionsdatenbanken BOMP, COPICS und MAPICS (IBM) sowie R2 (SAP) im Zusammenspiel mit den ostdeutschen Sachgebietsorientierten Programmiersystemen (SOPS) von Robotron diskutiert. Schließlich untersucht dieser Teil, wie die DDR ihr eigenes relationales Datenbankmanagementsystem, DABA 1600, entwickelte und dabei „westliche“ Technologie neu interpretierte.
Das abschließende Kapitel fasst die Konzepte der relationalen Datenbanken als heute wichtigsten Datenorganisationstechnik zusammen. Es erörtert, inwiefern es möglich ist, die historiographische Erzählung über die Entstehung von Datenbankmanagementsystemen und ihre Folgen für die Geschichte der Informatik zu dezentrieren. Es schließt mit der Erkenntnis, dass östliche und westliche Medien der Kooperation sich in Form und Funktion erstaunlich ähnlich sind, beide wurzeln in den tiefen Genealogien von organisatorischen und wissensbildenden Datenpraktiken.
Neben dieser medienwissenschaftlichen Arbeit besteht die Dissertation aus einem künstlerischen Teil, der dokumentiert wird: Anhand einer Reihe von Vlogs erkundet die fiktionale Figur „Data Proxy“ aktuelle Datenökologien.
During the previous decades, the upcoming demand for security in the digital world, e.g., the Internet, lead to numerous groundbreaking research topics in the field of cryptography. This thesis focuses on the design and analysis of cryptographic primitives and schemes to be used for authentication of data and communication endpoints, i.e., users. It is structured into three parts, where we present the first freely scalable multi-block-length block-cipher-based compression function (Counter-bDM) in the first part. The presented design is accompanied by a thorough security analysis regarding its preimage and collision security. The second and major part is devoted to password hashing. It is motivated by the large amount of leaked password during the last years and our discovery of side-channel attacks on scrypt – the first modern password scrambler that allowed to parameterize the amount of memory required to compute a password hash. After summarizing which properties we expect from a modern password scrambler, we (1) describe a cache-timing attack on scrypt based on its password-dependent memory-access pattern and (2) outline an additional attack vector – garbage-collector attacks – that exploits optimization which may disregard to overwrite the internally used memory. Based on our observations, we introduce Catena – the first memory-demanding password-scrambling framework that allows a password-independent memory-access pattern for resistance to the aforementioned attacks. Catena was submitted to the Password Hashing Competition (PHC) and, after two years of rigorous analysis, ended up as a finalist gaining special recognition for its agile framework approach and side-channel resistance. We provide six instances of Catena suitable for a variety of applications. We close the second part of this thesis with an overview of modern password scramblers regarding their functional, security, and general properties; supported by a brief analysis of their resistance to garbage-collector attacks. The third part of this thesis is dedicated to the integrity (authenticity of data) of nonce-based authenticated encryption schemes (NAE). We introduce the so-called j-IV-Collision Attack, allowing to obtain an upper bound for an adversary that is provided with a first successful forgery and tries to efficiently compute j additional forgeries for a particular NAE scheme (in short: reforgeability). Additionally, we introduce the corresponding security notion j-INT-CTXT and provide a comparative analysis (regarding j-INT-CTXT security) of the third-round submission to the CAESAR competition and the four classical and widely used NAE schemes CWC, CCM, EAX, and GCM.
The planning process in civil engineering is highly complex and not manageable in its entirety.
The state of the art decomposes complex tasks into smaller, manageable sub-tasks. Due to the close interrelatedness of the sub-tasks, it is essential to couple them. However, from a software engineering point of view, this is quite challenging to do because of the numerous incompatible software applications on the market. This study is concerned with two main objectives: The first is the generic formulation of coupling strategies in order to support engineers in the implementation and selection of adequate coupling strategies. This has been achieved by the use of a coupling pattern language combined with a four-layered, metamodel architecture, whose applicability has been performed on a real coupling scenario. The second one is the quality assessment of coupled software. This has been developed based on the evaluated schema mapping. This approach has been described using mathematical expressions derived from the set theory and graph theory by taking the various mapping patterns into account. Moreover, the coupling quality has been evaluated within the formalization process by considering the uncertainties that arise during mapping and has resulted in global quality values, which can be used by the user to assess the exchange. Finally, the applicability of the proposed approach has been shown using an engineering case study.