BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20220812T074334Z
LOCATION:Samarkand Room
DTSTART;TZID=Europe/Stockholm:20220627T114500
DTEND;TZID=Europe/Stockholm:20220627T121500
UID:submissions.pasc-conference.org_PASC22_sess175_pap117@linklings.com
SUMMARY:Toward a Big Data Analysis System for Historical Newspaper Collect
 ions Research
DESCRIPTION:Paper\n\nToward a Big Data Analysis System for Historical News
 paper Collections Research\n\nPuthanveetil Satheesan, Bhavya, Davies, Crai
 g, Zhang...\n\nThe availability and generation of digitized newspaper coll
 ections have provided researchers in several domains with a powerful tool 
 to advance their research. More specifically, digitized historical newspap
 ers give us a magnifying glass into the past. In this paper, we propose a 
 scalable and customizable big data analysis system that enables researcher
 s to study complex questions about our society as depicted in news media f
 or the past few centuries by applying cutting-edge text analysis tools to 
 large historical newspaper collections. We discuss our experience with bui
 lding a preliminary version of such a system, including how we have addres
 sed the following challenges: processing millions of digitized newspaper p
 ages from various publications worldwide, which amount to hundreds of tera
 bytes of data; applying article segmentation and Optical Character Recogni
 tion (OCR) to historical newspapers, which vary between and within publica
 tions over time; retrieving relevant information to answer research questi
 ons from such data collections by applying human-in-the-loop machine learn
 ing; and enabling users to analyze topic evolution and semantic dynamics w
 ith multiple compatible analysis operators. We also present some prelimina
 ry results of using the proposed system to study the social construction o
 f juvenile delinquency in the United States and discuss important remainin
 g challenges to be tackled in the future.\n\nDomain: Computer Science and 
 Applied Mathematics, Humanities and Social Sciences
END:VEVENT
END:VCALENDAR
