<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
 <record>
  <leader>04695ntm a22006017i 4500</leader>
  <controlfield tag="001">000260339</controlfield>
  <controlfield tag="003">CZ-PrVSE</controlfield>
  <controlfield tag="005">20220604071223.0</controlfield>
  <controlfield tag="006">m        d</controlfield>
  <controlfield tag="007">cr n||||||||||</controlfield>
  <controlfield tag="008">220604s2022    xr     fsbm   000 0 eng d</controlfield>
  <datafield tag="STA" ind1=" " ind2=" ">
   <subfield code="a">NEZPRACOVANÝ IMPORT</subfield>
  </datafield>
  <datafield tag="040" ind1=" " ind2=" ">
   <subfield code="a">ABA006</subfield>
   <subfield code="b">cze</subfield>
   <subfield code="c">ABA006</subfield>
   <subfield code="d">ABA006</subfield>
   <subfield code="e">rda</subfield>
  </datafield>
  <datafield tag="100" ind1="1" ind2=" ">
   <subfield code="a">Vajdečka, Peter</subfield>
   <subfield code="%">ISIS:135768</subfield>
   <subfield code="4">dis</subfield>
  </datafield>
  <datafield tag="242" ind1="1" ind2="0">
   <subfield code="a">Abstraktní sumarizace zpráv o ověřování faktů pomocí předem natrénovaného transformeru na extraktivních souhrnech</subfield>
   <subfield code="y">eng</subfield>
  </datafield>
  <datafield tag="245" ind1="1" ind2="0">
   <subfield code="a">Abstractive summarization of fact check reports with pre-trained transformer tuning on extractive summaries /</subfield>
   <subfield code="c">Peter Vajdečka</subfield>
  </datafield>
  <datafield tag="264" ind1=" " ind2="0">
   <subfield code="c">2022</subfield>
  </datafield>
  <datafield tag="300" ind1=" " ind2=" ">
   <subfield code="a">?? stran :</subfield>
   <subfield code="3">digital, PDF soubor</subfield>
  </datafield>
  <datafield tag="500" ind1=" " ind2=" ">
   <subfield code="a">Vedoucí práce: Vojtěch Svátek</subfield>
  </datafield>
  <datafield tag="502" ind1=" " ind2=" ">
   <subfield code="a">Diplomová práce (Ing.)—Vysoká škola ekonomická v Praze. Fakulta informatiky a statistiky, 2022</subfield>
  </datafield>
  <datafield tag="504" ind1=" " ind2=" ">
   <subfield code="a">Obsahuje bibliografii</subfield>
  </datafield>
  <datafield tag="516" ind1=" " ind2=" ">
   <subfield code="a">Textový (vysokoškolská kvalifikační práce)</subfield>
  </datafield>
  <datafield tag="518" ind1=" " ind2=" ">
   <subfield code="a">Rok obhajoby 2022</subfield>
  </datafield>
  <datafield tag="520" ind1="3" ind2=" ">
   <subfield code="a">Fact checking is an activity aiming to remedy the global problem of disinformation spread. The result of this process, undertaken by numerous initiatives such as demagog.cz or politifact.com, are fact check reports written by human editors. Since the reports are frequently too long for a casual reader, and contain auxiliary parts not directly relevant for judging the claim veracity, automated creation of fact check report summaries is a topical task. The reader could then look at the shorter summary, containing the most salient points of the report, and then decide whether they dig deeper into some parts of the full report or not. In the field of natural language processing, neural network models with transformer architectures achieve state-of-the-art results on many downstream tasks, including text summarization. These models are trained on a massive textual knowledge base, which ensures that just a small quantity of data is required to fine-tune these models – in contrast to large amounts of training data needed when the learning process starts from scratch just for the particular application. We propose a novel procedure for text data reduction for the purpose of fine-tuning a natural language generation model, the Unified Text to Text Transformer (T5), in order to summarize a fact check report. First, the Local Outlier Factor approach is used to generate an extractive summary of the report, using sentence vectorization via the TF-IDF, DOC2VEC and BERT contextual representations. In addition, BERT is fine-tuned specifically for the given task and achieves the best results when compared to the other vector representations. Finally, the T5 Transformer is fine-tuned using these extractive summaries (reports containing fewer sentences than the original ones) to generate the final abstractive summaries. On English texts from politifact.com, the new method outperformed all state-of-the-art methods.</subfield>
  </datafield>
  <datafield tag="520" ind1="8" ind2=" ">
   <subfield code="a">As regards the Czech language, we were, to our knowledge, the first to apply automatic summarization to demagog.cz data. For comparison, the new procedure was also applied to generate short summaries for a known Czech news dataset (SumeCzech); although we only used 10 % of the initial training data for model fine-tuning, we overcame most of the state-of-the-art results.</subfield>
  </datafield>
  <datafield tag="538" ind1=" " ind2=" ">
   <subfield code="a">Způsob přístupu: Internet</subfield>
  </datafield>
  <datafield tag="653" ind1="0" ind2=" ">
   <subfield code="a">znalostní a webové technologie [obor dipl. práce]</subfield>
  </datafield>
  <datafield tag="655" ind1=" " ind2="7">
   <subfield code="a">diplomové práce</subfield>
   <subfield code="7">fd132022</subfield>
   <subfield code="2">czenas</subfield>
  </datafield>
  <datafield tag="655" ind1=" " ind2="9">
   <subfield code="a">master's theses</subfield>
   <subfield code="2">eczenas</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">natural language generation</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">local outlier factor</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">natural language processing</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">neural network</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">transformer architecture</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">BERT</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">TF-IDF</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">DOC2VEC</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">fact-checking</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">summarization</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Svátek, Vojtěch,</subfield>
   <subfield code="d">1967 prosinec 1.-</subfield>
   <subfield code="7">mzk2004217940</subfield>
   <subfield code="4">ths</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Vencovský, Filip</subfield>
   <subfield code="7">mzk2015883325</subfield>
   <subfield code="4">opn</subfield>
  </datafield>
  <datafield tag="710" ind1="2" ind2=" ">
   <subfield code="a">Vysoká škola ekonomická v Praze.</subfield>
   <subfield code="b">Fakulta informatiky a statistiky</subfield>
   <subfield code="7">kn20010709399</subfield>
   <subfield code="4">dgg</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/78669/podrobnosti</subfield>
   <subfield code="y">VŠKP v InSIS</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/78669</subfield>
   <subfield code="y">Hlavní práce</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/78669/posudek/vedouci</subfield>
   <subfield code="y">Hodnocení vedoucího</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/78669/posudek/oponent/74044</subfield>
   <subfield code="y">Oponentura</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/78669/priloha/23915</subfield>
   <subfield code="y">Přiloha k práci</subfield>
  </datafield>
  <datafield tag="999" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/78669/podrobnosti</subfield>
   <subfield code="y">dc:identifier</subfield>
  </datafield>
  <datafield tag="993" ind1=" " ind2=" ">
   <subfield code="x">NEPOSILAT</subfield>
   <subfield code="y">VSKP</subfield>
  </datafield>
  <datafield tag="999" ind1="4" ind2="9">
   <subfield code="a">vse78669</subfield>
   <subfield code="b">220602</subfield>
  </datafield>
  <datafield tag="999" ind1="4" ind2="5">
   <subfield code="x">78669</subfield>
  </datafield>
 </record>
</collection>
