<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
 <record>
  <leader>03750ntm a22004937i 4500</leader>
  <controlfield tag="001">000727719</controlfield>
  <controlfield tag="003">CZ-PrVSE</controlfield>
  <controlfield tag="005">20260201122909.0</controlfield>
  <controlfield tag="006">m        d</controlfield>
  <controlfield tag="007">cr n||||||||||</controlfield>
  <controlfield tag="008">260201s2025    xr     fsbm   000 0 eng d</controlfield>
  <datafield tag="STA" ind1=" " ind2=" ">
   <subfield code="a">NEZPRACOVANÝ IMPORT</subfield>
  </datafield>
  <datafield tag="040" ind1=" " ind2=" ">
   <subfield code="a">ABA006</subfield>
   <subfield code="b">cze</subfield>
   <subfield code="c">ABA006</subfield>
   <subfield code="d">ABA006</subfield>
   <subfield code="e">rda</subfield>
  </datafield>
  <datafield tag="100" ind1="1" ind2=" ">
   <subfield code="a">Jadviščoková, Šárka</subfield>
   <subfield code="%">ISIS:139382</subfield>
   <subfield code="4">dis</subfield>
  </datafield>
  <datafield tag="242" ind1="1" ind2="0">
   <subfield code="a">Možnosti testování moderního stacku pro kvalitu dat a strojové učení</subfield>
   <subfield code="y">eng</subfield>
  </datafield>
  <datafield tag="245" ind1="1" ind2="0">
   <subfield code="a">Testing possibilities of a modern stack for data quality and machine learning /</subfield>
   <subfield code="c">Šárka Jadviščoková</subfield>
  </datafield>
  <datafield tag="264" ind1=" " ind2="0">
   <subfield code="c">2025</subfield>
  </datafield>
  <datafield tag="300" ind1=" " ind2=" ">
   <subfield code="a">?? stran :</subfield>
   <subfield code="3">digital, PDF soubor</subfield>
  </datafield>
  <datafield tag="500" ind1=" " ind2=" ">
   <subfield code="a">Vedoucí práce: Petr Máša</subfield>
  </datafield>
  <datafield tag="502" ind1=" " ind2=" ">
   <subfield code="a">Diplomová práce (Ing.)—Vysoká škola ekonomická v Praze. Fakulta informatiky a statistiky, 2026</subfield>
  </datafield>
  <datafield tag="504" ind1=" " ind2=" ">
   <subfield code="a">Obsahuje bibliografii</subfield>
  </datafield>
  <datafield tag="516" ind1=" " ind2=" ">
   <subfield code="a">Textový (vysokoškolská kvalifikační práce)</subfield>
  </datafield>
  <datafield tag="518" ind1=" " ind2=" ">
   <subfield code="a">Rok obhajoby 2026</subfield>
  </datafield>
  <datafield tag="520" ind1="3" ind2=" ">
   <subfield code="a">This thesis investigates the design, implementation, and evaluation of an automated data processing architecture in which data quality validation is tightly integrated into a machine learning pipeline. The work addresses a challenge from current data-driven systems: machine learning models often rely on datasets that may contain inconsistencies or violations of domain rules, because traditional pipelines enforce systematic quality controls before model training is not automatically implemented. A modular framework was developed using Docker for reproducible environments to address this gap and software such as Apache Airflow for orchestration, Great Expectations for data validation, and PostgreSQL for storage and tracking of experimental outputs were used. The system incorporates adaptive logic capable of detecting data quality issues and triggering remediation steps automatically. During the research, set of corruption scenarios was created – including missing values, outliers, categorical errors, and conjoined multi-type corruptions so the quantification of how different data defects influence both data quality metrics and model performance was possible. Random Forest and Logistic Regression models were evaluated on corrupted and remediated datasets to measure degradation and recovery effects. The results demonstrate that while machine learning performance is relatively robust to moderate corruption, data validation metrics reveal substantial structural distortions that models fail to expose. The integrated approach provides a more reliable and transparent workflow, ensuring that machine learning is trained only on validated. The thesis concludes by outlining the strengths, limitations, and practical applicability of the proposed architecture.</subfield>
  </datafield>
  <datafield tag="538" ind1=" " ind2=" ">
   <subfield code="a">Způsob přístupu: Internet</subfield>
  </datafield>
  <datafield tag="653" ind1="0" ind2=" ">
   <subfield code="a">data a analytika pro business [obor dipl. práce]</subfield>
  </datafield>
  <datafield tag="655" ind1=" " ind2="7">
   <subfield code="a">diplomové práce</subfield>
   <subfield code="7">fd132022</subfield>
   <subfield code="2">czenas</subfield>
  </datafield>
  <datafield tag="655" ind1=" " ind2="9">
   <subfield code="a">master's theses</subfield>
   <subfield code="2">eczenas</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">Machine Learning</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">Data Quality</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">Great Expectations</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Máša, Petr</subfield>
   <subfield code="%">ISIS:17194</subfield>
   <subfield code="4">ths</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Maryška, Miloš</subfield>
   <subfield code="7">ola2012698940</subfield>
   <subfield code="4">opn</subfield>
  </datafield>
  <datafield tag="710" ind1="2" ind2=" ">
   <subfield code="a">Vysoká škola ekonomická v Praze.</subfield>
   <subfield code="b">Fakulta informatiky a statistiky</subfield>
   <subfield code="7">kn20010709399</subfield>
   <subfield code="4">dgg</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/88324/podrobnosti</subfield>
   <subfield code="y">VŠKP v InSIS</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/88324</subfield>
   <subfield code="y">Hlavní práce</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/88324/posudek/vedouci</subfield>
   <subfield code="y">Hodnocení vedoucího</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/88324/posudek/oponent/88922</subfield>
   <subfield code="y">Oponentura</subfield>
  </datafield>
  <datafield tag="999" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/88324/podrobnosti</subfield>
   <subfield code="y">dc:identifier</subfield>
  </datafield>
  <datafield tag="993" ind1=" " ind2=" ">
   <subfield code="x">NEPOSILAT</subfield>
   <subfield code="y">VSKP</subfield>
  </datafield>
  <datafield tag="999" ind1="4" ind2="9">
   <subfield code="a">vse88324</subfield>
   <subfield code="b">260123</subfield>
  </datafield>
  <datafield tag="999" ind1="4" ind2="5">
   <subfield code="x">88324</subfield>
  </datafield>
 </record>
</collection>
