04695ntm a22006017i 4500 000260339 CZ-PrVSE 20220604071223.0 m d cr n|||||||||| 220604s2022 xr fsbm 000 0 eng d NEZPRACOVANÝ IMPORT ABA006 cze ABA006 ABA006 rda Vajdečka, Peter ISIS:135768 dis Abstraktní sumarizace zpráv o ověřování faktů pomocí předem natrénovaného transformeru na extraktivních souhrnech eng Abstractive summarization of fact check reports with pre-trained transformer tuning on extractive summaries / Peter Vajdečka 2022 ?? stran : digital, PDF soubor Vedoucí práce: Vojtěch Svátek Diplomová práce (Ing.)—Vysoká škola ekonomická v Praze. Fakulta informatiky a statistiky, 2022 Obsahuje bibliografii Textový (vysokoškolská kvalifikační práce) Rok obhajoby 2022 Fact checking is an activity aiming to remedy the global problem of disinformation spread. The result of this process, undertaken by numerous initiatives such as demagog.cz or politifact.com, are fact check reports written by human editors. Since the reports are frequently too long for a casual reader, and contain auxiliary parts not directly relevant for judging the claim veracity, automated creation of fact check report summaries is a topical task. The reader could then look at the shorter summary, containing the most salient points of the report, and then decide whether they dig deeper into some parts of the full report or not. In the field of natural language processing, neural network models with transformer architectures achieve state-of-the-art results on many downstream tasks, including text summarization. These models are trained on a massive textual knowledge base, which ensures that just a small quantity of data is required to fine-tune these models – in contrast to large amounts of training data needed when the learning process starts from scratch just for the particular application. We propose a novel procedure for text data reduction for the purpose of fine-tuning a natural language generation model, the Unified Text to Text Transformer (T5), in order to summarize a fact check report. First, the Local Outlier Factor approach is used to generate an extractive summary of the report, using sentence vectorization via the TF-IDF, DOC2VEC and BERT contextual representations. In addition, BERT is fine-tuned specifically for the given task and achieves the best results when compared to the other vector representations. Finally, the T5 Transformer is fine-tuned using these extractive summaries (reports containing fewer sentences than the original ones) to generate the final abstractive summaries. On English texts from politifact.com, the new method outperformed all state-of-the-art methods. As regards the Czech language, we were, to our knowledge, the first to apply automatic summarization to demagog.cz data. For comparison, the new procedure was also applied to generate short summaries for a known Czech news dataset (SumeCzech); although we only used 10 % of the initial training data for model fine-tuning, we overcame most of the state-of-the-art results. Způsob přístupu: Internet znalostní a webové technologie [obor dipl. práce] diplomové práce fd132022 czenas master's theses eczenas natural language generation local outlier factor natural language processing neural network transformer architecture BERT TF-IDF DOC2VEC fact-checking summarization Svátek, Vojtěch, 1967 prosinec 1.- mzk2004217940 ths Vencovský, Filip mzk2015883325 opn Vysoká škola ekonomická v Praze. Fakulta informatiky a statistiky kn20010709399 dgg https://insis.vse.cz/zp/78669/podrobnosti VŠKP v InSIS https://insis.vse.cz/zp/78669 Hlavní práce https://insis.vse.cz/zp/78669/posudek/vedouci Hodnocení vedoucího https://insis.vse.cz/zp/78669/posudek/oponent/74044 Oponentura https://insis.vse.cz/zp/78669/priloha/23915 Přiloha k práci https://insis.vse.cz/zp/78669/podrobnosti dc:identifier NEPOSILAT VSKP vse78669 220602 78669