TY  - JOUR
A1  - Frommholz, Ingo
A1  - Haider M., al-Khateeb
A1  - Potthast, Martin
A1  - Ghasem, Zinnar
A1  - Shukla, Mitul
A1  - Short, Emma
T1  - On Textual Analysis and Machine Learning for Cyberstalking Detection
JF  - Datenbank Spektrum
N2  - Cyber security has become a major concern for users and businesses alike. Cyberstalking and harassment have been identified as a growing anti-social problem. Besides detecting cyberstalking and harassment, there is the need to gather digital evidence, often by the victim. To this end, we provide an overview of and discuss relevant technological means, in particular coming from text analytics as well as machine learning, that are capable to address the above challenges. We present a framework for the detection of text-based cyberstalking and the role and challenges of some core techniques such as author identification, text classification and personalisation. We then discuss PAN, a network and evaluation initiative that focusses on digital text forensics, in particular author identification.
KW  - Text Mining
KW  - Maschinelles Lernen
Y1  - 2016
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:gbv:wim2-20170418-31352
SP  - 127
EP  - 135
ER  - 
TY  - JOUR
A1  - Vakkari, Pertti
A1  - Völske, Michael
A1  - Potthast, Martin
A1  - Hagen, Matthias
A1  - Stein, Benno
T1  - Predicting essay quality from search and writing behavior
JF  - Journal of Association for Information Science and Technology
N2  - Few studies have investigated how search behavior affects complex writing tasks. We analyze a dataset of 150 long essays whose authors searched the ClueWeb09 corpus for source material, while all querying, clicking, and writing activity was meticulously recorded. We model the effect of search and writing behavior on essay quality using path analysis. Since the boil-down and build-up writing strategies identified in previous research have been found to affect search behavior, we model each writing strategy separately. Our analysis shows that the search process contributes significantly to essay quality through both direct and mediated effects, while the author's writing strategy moderates this relationship. Our models explain 25–35% of the variation in essay quality through rather simple search and writing process characteristics alone, a fact that has implications on how search engines could personalize result pages for writing tasks. Authors' writing strategies and associated searching patterns differ, producing differences in essay quality. In a nutshell: essay quality improves if search and writing strategies harmonize—build-up writers benefit from focused, in-depth querying, while boil-down writers fare better with a broader and shallower querying strategy.
KW  - Information Retrieval
KW  - Textproduktion
KW  - Suchverfahren
KW  - Aufsatz
KW  - Suchverhalten
KW  - Pfadanalyse
KW  - Suchmaschine
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:gbv:wim2-20210804-44692
UR  - https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24451
VL  - 2021
IS  - volume 72, issue 7
SP  - 839
EP  - 852
PB  - Wiley
CY  - Hoboken, NJ
ER  - 
TY  - JOUR
A1  - Wiegmann, Matti
A1  - Kersten, Jens
A1  - Senaratne, Hansi
A1  - Potthast, Martin
A1  - Klan, Friederike
A1  - Stein, Benno
T1  - Opportunities and risks of disaster data from social media: a systematic review of incident information
JF  - Natural Hazards and Earth System Sciences
N2  - Compiling and disseminating information about incidents and disasters are key to disaster management and relief. But due to inherent limitations of the acquisition process, the required information is often incomplete or missing altogether. To fill these gaps, citizen observations spread through social media are widely considered to be a promising source of relevant information, and many studies propose new methods to tap this resource. Yet, the overarching question of whether and under which circumstances social media can supply relevant information (both qualitatively and quantitatively) still remains unanswered. To shed some light on this question, we review 37 disaster and incident databases covering 27 incident types, compile a unified overview of the contained data and their collection processes, and identify the missing or incomplete information. The resulting data collection reveals six major use cases for social media analysis in incident data collection: (1) impact assessment and verification of model predictions, (2) narrative generation, (3) recruiting citizen volunteers, (4) supporting weakly institutionalized areas, (5) narrowing surveillance areas, and (6) reporting triggers for periodical surveillance. Furthermore, we discuss the benefits and shortcomings of using social media data for closing information gaps related to incidents and disasters.
KW  - Katastrophe
KW  - Social Media
KW  - Datenbank
KW  - Information
KW  - Katastrophenmanagement
KW  - Soziale Medien
KW  - Datensammlung
Y1  - 2021
U6  - http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:gbv:wim2-20210804-44634
UR  - https://nhess.copernicus.org/articles/21/1431/2021/nhess-21-1431-2021.html
VL  - 2021
IS  - Volume 21, Issue 5
SP  - 1431
EP  - 1444
PB  - European Geophysical Society
CY  - Katlenburg-Lindau
ER  -