Towards Events Annotated Corpus of Polish Michał Marcińczuk, Marcin Oleksy, Jan Kocoń, Tomasz Bernaś and Michał Wolski {michal.marcinczuk, marcin.oleksy, jan.kocon, tomasz.bernas, michal.wolski}@pwr.edu.pl Institute of Informatics Wrocław University of Technology Wybrzeże Wyspiańskiego 27, Wrocław, Poland April 10, 2015 Work financed as part of the investment in the CLARIN-PL research infrastructure funded by the Polish Ministry of Science and Higher Education.
Introduction» Event recognition Event recognition Part of Natural Language Engineering Major task in Information Extraction (IE) field The goal is to identify actions and some states described in text: Textual evidence of an event Event arguments (who? when? where?...) Event attributes (specific/generic, true/false, past/present...) Structural representation of events data mining M. Marcińczuk, et al. April 10, 2015 2 / 15
Example (1/2) Introduction» Example Text Two Russians and a Frenchman left the Mir and endured a rough landing on the snow-covered plains of Central Asia on Thursday. (...) The two Russians arrived on the Mir last August (...). Solovyou celebrated his 50th birthday during his six-month space voyage. What to annotate: Temporal expressions Events Signals Links source: http://www.themoscowtimes.com/ M. Marcińczuk, et al. April 10, 2015 3 / 15
Example (2/2) Introduction» Example Rob O Neil - weteran NAVY Seals, który zastrzelił w maju 2011 roku Osamę bin Ladena, po 16 latach służby odszedł z jednostki i ujawnił swoją tożsamość. Nazwisko mężczyzny wyszło na jaw po tym, jak amerykańska stacja informacyjna Fox News poinformowała, że żołnierz udzieli w niej wywiadu i opowie o całej akcji wymierzonej w szefa Al-Kaidy. Jak mówi jego ojciec, nie boją się zemsty ze strony Państwa Islamskiego, ani innych organizacji terrorystycznych. M. Marcińczuk, et al. April 10, 2015 4 / 15
Introduction» Example Example (2/2) - timeline maju 2011 roku zastrzelił służby odszedł ujawnił Rob O Neil 16 lat Fox News poinformowała wyszło (na jaw) mówi ojciec udzieli opowie (nie) boimy się akcji zemsty M. Marcińczuk, et al. April 10, 2015 5 / 15
Events in TimeML» What to annotate? What to annotate? Tensed Verbs: A fresh flow of lava, gas and debris erupted there Saturday. Untensed verbs: Prime Minister Benjamin Netanyahu called the prime minister of the Netherlands to thank him for thousands of gas masks (...). Nominalizations: Israel will ask the US to delay a military strike against Iraq until the Jewish state is fully prepared for a possible Iraqi attack. Adjectives: A Philippine volcano, dormant for six centuries, began exploding with searing gases, thick ash and deadly debris. Prepositional phrases: All 75 people on board the Aeroflot Airbus died. Predicative Clauses: "There is no reason why we would not be prepared," Mordechai told the Yediot Ahronot daily. M. Marcińczuk, et al. April 10, 2015 6 / 15
Classes Events in TimeML» Classes REPORTING: say, report, announce,... PERCEPTION: see, hear, watch, feel,... ASPECTUAL: begin, start, finish, stop, continue,... I_ACTION: attempt, try, promise, offer, regret,... I_STATE: believe, want, wish,... STATE: be on board, kidnapped, recovering, love,... OCCURRENCE: die, crash, build, merge, sell, take advantage of,... M. Marcińczuk, et al. April 10, 2015 7 / 15
Event annotation» Textual mentions Textual mentions (1/2) Step 1 Annotation of textual mentions of events Event X Event Y Verb Action dynamic Reporting We know that X occured or not Light_predicate Auxiliary verbs State static Perception A D Ascpectual B I_Action We do not know if X occured or not I_State C M. Marcińczuk, et al. April 10, 2015 8 / 15
Event annotation» Textual mentions Textual mentions (2/2) Action State Reporting Perception pracy 53 ma 93 powiedział 41 widać 18 spotkania 43 mają 47 mówi 26 zobacz 18 spotkanie 40 mieć 23 stwierdził 13 zobaczyć 12 zginęło 35 miał 23 mówił 13 widzę 7 odbędzie 35 oznacza 21 informuje 9 posłuchać 5 I_Action I_State Light_predicate Aspectual zapowiedział 13 można 163 doszło 5 zakończył 13 pozwala 10 może 120 ma 5 zaczęła 12 zgodę 9 ma 45 ulec 5 zaczyna 10 wymaga 9 trzeba 42 dokonał 4 rozpoczął 9 proszę 8 należy 39 prowadzić 3 końcowa 8 Table 1: Top 5 mentions (ortographic forms) for each category. M. Marcińczuk, et al. April 10, 2015 9 / 15
What has been done?» Event mentions Event mentions in KPWr Documents in KPWr 1634 Documents annotated 558 Annotations (unique) 9557 Annotations (total) 24023 0 2000 4000 6000 8000 10000 12000 14000 Action Aspectual I_Action I_State Light_predicate Perception Reporting State M. Marcińczuk, et al. April 10, 2015 10 / 15
What has been done?» Annotation agreement Annotation agreement Positive specific agreement between two annotators (A and B) for 100 documents from KPWr. Events (only spans) 3184 393 664 85.76% Events 2561 1016 1287 68.98% Action 2085 766 418 77.89% Aspectual 46 4 10 86.79% Perception 20 2 37 50.63% Reporting 39 29 28 57.78% I_Action 23 19 21 53.49% I_State 115 61 70 63.71% State 213 92 634 36.98% Light_predicate 20 41 20 39.60% M. Marcińczuk, et al. April 10, 2015 11 / 15
Arguments What is to be done?» Event arguments Step 2 Linking event mentions with generic arguments agency who performed the action, temporal when the action was performed and how long the action was being performed or how long the state was present, spatial where the action was performed. M. Marcińczuk, et al. April 10, 2015 12 / 15
Attributes What is to be done?» Event attributes Step 3 described event attributes generality specific or general, tense past, present or future, polarity affirmative or negative, M. Marcińczuk, et al. April 10, 2015 13 / 15
What is to be done?» Event linking Event linking Step 4 event linking subordination link relations between events (modal, factive, counter-factive, evidential, negative evidential, conditional) aspectual link relation between an aspectual event and its argument event. M. Marcińczuk, et al. April 10, 2015 14 / 15
The End» Thank you for your attention. M. Marcińczuk, et al. April 10, 2015 15 / 15