TY - JOUR A1 - Völske, Michael A1 - Gollub, Tim A1 - Hagen, Matthias A1 - Stein, Benno T1 - A keyquery-based classification system for CORE JF - D-Lib Magazine N2 - We apply keyquery-based taxonomy composition to compute a classification system for the CORE dataset, a shared crawl of about 850,000 scientific papers. Keyquery-based taxonomy composition can be understood as a two-phase hierarchical document clustering technique that utilizes search queries as cluster labels: In a first phase, the document collection is indexed by a reference search engine, and the documents are tagged with the search queries they are relevant—for their so-called keyqueries. In a second phase, a hierarchical clustering is formed from the keyqueries within an iterative process. We use the explicit topic model ESA as document retrieval model in order to index the CORE dataset in the reference search engine. Under the ESA retrieval model, documents are represented as vectors of similarities to Wikipedia articles; a methodology proven to be advantageous for text categorization tasks. Our paper presents the generated taxonomy and reports on quantitative properties such as document coverage and processing requirements. KW - Massendaten KW - Taxonomie KW - Dynamic Taxonomy Composition, Keyquery, Classification Systems, Reverted Index, Big Data Problem Y1 - 2014 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:gbv:wim2-20170426-31662 ER - TY - JOUR A1 - Vakkari, Pertti A1 - Völske, Michael A1 - Potthast, Martin A1 - Hagen, Matthias A1 - Stein, Benno T1 - Predicting essay quality from search and writing behavior JF - Journal of Association for Information Science and Technology N2 - Few studies have investigated how search behavior affects complex writing tasks. We analyze a dataset of 150 long essays whose authors searched the ClueWeb09 corpus for source material, while all querying, clicking, and writing activity was meticulously recorded. We model the effect of search and writing behavior on essay quality using path analysis. Since the boil-down and build-up writing strategies identified in previous research have been found to affect search behavior, we model each writing strategy separately. Our analysis shows that the search process contributes significantly to essay quality through both direct and mediated effects, while the author's writing strategy moderates this relationship. Our models explain 25–35% of the variation in essay quality through rather simple search and writing process characteristics alone, a fact that has implications on how search engines could personalize result pages for writing tasks. Authors' writing strategies and associated searching patterns differ, producing differences in essay quality. In a nutshell: essay quality improves if search and writing strategies harmonize—build-up writers benefit from focused, in-depth querying, while boil-down writers fare better with a broader and shallower querying strategy. KW - Information Retrieval KW - Textproduktion KW - Suchverfahren KW - Aufsatz KW - Suchverhalten KW - Pfadanalyse KW - Suchmaschine Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:gbv:wim2-20210804-44692 UR - https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24451 VL - 2021 IS - volume 72, issue 7 SP - 839 EP - 852 PB - Wiley CY - Hoboken, NJ ER - TY - JOUR A1 - Wiegmann, Matti A1 - Kersten, Jens A1 - Senaratne, Hansi A1 - Potthast, Martin A1 - Klan, Friederike A1 - Stein, Benno T1 - Opportunities and risks of disaster data from social media: a systematic review of incident information JF - Natural Hazards and Earth System Sciences N2 - Compiling and disseminating information about incidents and disasters are key to disaster management and relief. But due to inherent limitations of the acquisition process, the required information is often incomplete or missing altogether. To fill these gaps, citizen observations spread through social media are widely considered to be a promising source of relevant information, and many studies propose new methods to tap this resource. Yet, the overarching question of whether and under which circumstances social media can supply relevant information (both qualitatively and quantitatively) still remains unanswered. To shed some light on this question, we review 37 disaster and incident databases covering 27 incident types, compile a unified overview of the contained data and their collection processes, and identify the missing or incomplete information. The resulting data collection reveals six major use cases for social media analysis in incident data collection: (1) impact assessment and verification of model predictions, (2) narrative generation, (3) recruiting citizen volunteers, (4) supporting weakly institutionalized areas, (5) narrowing surveillance areas, and (6) reporting triggers for periodical surveillance. Furthermore, we discuss the benefits and shortcomings of using social media data for closing information gaps related to incidents and disasters. KW - Katastrophe KW - Social Media KW - Datenbank KW - Information KW - Katastrophenmanagement KW - Soziale Medien KW - Datensammlung Y1 - 2021 U6 - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:gbv:wim2-20210804-44634 UR - https://nhess.copernicus.org/articles/21/1431/2021/nhess-21-1431-2021.html VL - 2021 IS - Volume 21, Issue 5 SP - 1431 EP - 1444 PB - European Geophysical Society CY - Katlenburg-Lindau ER -