<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
 <record>
  <leader>06593ntm a22006617i 4500</leader>
  <controlfield tag="001">000703342</controlfield>
  <controlfield tag="003">CZ-PrVSE</controlfield>
  <controlfield tag="005">20240224103936.0</controlfield>
  <controlfield tag="006">m        d</controlfield>
  <controlfield tag="007">cr n||||||||||</controlfield>
  <controlfield tag="008">240224s2023    xr     fsbm   000 0 eng d</controlfield>
  <datafield tag="STA" ind1=" " ind2=" ">
   <subfield code="a">NEZPRACOVANÝ IMPORT</subfield>
  </datafield>
  <datafield tag="040" ind1=" " ind2=" ">
   <subfield code="a">ABA006</subfield>
   <subfield code="b">cze</subfield>
   <subfield code="c">ABA006</subfield>
   <subfield code="d">ABA006</subfield>
   <subfield code="e">rda</subfield>
  </datafield>
  <datafield tag="100" ind1="1" ind2=" ">
   <subfield code="a">Štěpánek, Lubomír</subfield>
   <subfield code="%">ISIS:120809</subfield>
   <subfield code="4">dis</subfield>
  </datafield>
  <datafield tag="242" ind1="1" ind2="0">
   <subfield code="a">Machine learning and other robust approaches at the service of survival analysis :</subfield>
   <subfield code="b">Alternatives to selected methods in statistical inference and prediction</subfield>
   <subfield code="y">eng</subfield>
  </datafield>
  <datafield tag="245" ind1="1" ind2="0">
   <subfield code="a">Machine learning and other robust approaches at the service of survival analysis :</subfield>
   <subfield code="b">Alternatives to selected methods in statistical inference and prediction</subfield>
   <subfield code="c">Lubomír Štěpánek</subfield>
  </datafield>
  <datafield tag="264" ind1=" " ind2="0">
   <subfield code="c">2023</subfield>
  </datafield>
  <datafield tag="300" ind1=" " ind2=" ">
   <subfield code="a">?? stran :</subfield>
   <subfield code="3">digital, PDF soubor</subfield>
  </datafield>
  <datafield tag="500" ind1=" " ind2=" ">
   <subfield code="a">Vedoucí práce: Luboš Marek</subfield>
  </datafield>
  <datafield tag="502" ind1=" " ind2=" ">
   <subfield code="a">Disertační práce (Ph.D.)—Vysoká škola ekonomická v Praze. Fakulta informatiky a statistiky, 2024</subfield>
  </datafield>
  <datafield tag="504" ind1=" " ind2=" ">
   <subfield code="a">Obsahuje bibliografii</subfield>
  </datafield>
  <datafield tag="516" ind1=" " ind2=" ">
   <subfield code="a">Textový (vysokoškolská kvalifikační práce)</subfield>
  </datafield>
  <datafield tag="518" ind1=" " ind2=" ">
   <subfield code="a">Rok obhajoby 2024</subfield>
  </datafield>
  <datafield tag="520" ind1="3" ind2=" ">
   <subfield code="a">Survival analysis is a popular field of statistics and deals with many tasks both in statistical inference and prediction. While comparison of survival curves as one of the typical inferential tasks is performed using the log-rank test and other approaches, prediction of time to an event of interest for a given individual is commonly made using Cox proportional hazard model or others. However, all the methods in the survival toolbox are limited by relatively strict statistical assumptions, which violations may bias the results of the techniques applied to real data. In this work, we address the issue that the commonly used methods, both in statistical inference and prediction, are limited by their assumptions, and improve them using robust approaches, particularly machine-learning algorithms and delta method. In general, machine-learning approaches do not require to meet so strict assumptions; that is the reason we may get more robust alternatives to the traditional techniques. While the log-rank test or Cox proportional hazards model (or others) might be used within statistical inference in survival analysis for comparing two or more groups represented by their survival curves, we investigate rather tree-based methods for the same task and derive some new statistical properties of this approach. Intuitively, a random forest containing a large proportion of trees with sufficient complexity, adjusted by tree pruning, can classify individuals from various groups into two or more classes depicted by their survival curves, which tends to reject the null hypothesis about no statistical difference between the curves. Thus, a proportion of trees with sufficient complexity classifying into two or more groups, depicted by their survival curves, is very close to the p-value estimate as an analogy of the classical Wald's t-test output of the Cox's regression. We denote the p-value's analogy as phi-value.</subfield>
  </datafield>
  <datafield tag="520" ind1="8" ind2=" ">
   <subfield code="a">Furthermore, a level of the pruning of decision trees the random forest model is built with can reduce the tree complexity and, therefore, modify the frequency of null hypothesis false rejection output by the random forest alternative. Also, survival curves could be approximately compared using confidence intervals around the Kaplan-Meier estimator for the survival probability of different groups. So, using the delta method, we adjust the formula for the variance of the Kaplan-Meier estimator for particular cases when information about an event of interest is uncertain, e.g., not appropriately updated in time. Regarding prediction in survival analysis, the Cox model is limited by relatively strict statistical assumptions. So, we propose decomposing the time-to-event variable into &quot;time&quot; and &quot;event&quot; components and using the latter as a target variable for various machine-learning classification algorithms, which are almost assumption-free, unlike the Cox model. While the time component is continuous and is used as one of the covariates, i.e., input variables for various classification algorithms such as logistic regression, naïve Bayes classifiers, decision trees, random forests, and artificial neural networks, the event component is binary, thus, may be modeled using these classification algorithms. We further present simulations demonstrating how the random-forest-based method's rate of false null hypothesis rejection decreases with the increasing tree pruning level. Finally, the adjusted Kaplan-Meier estimation and time-to-event decomposition is applied to predict a decrease or non-decrease of IgG and IgM blood antibodies against COVID-19 (SARS-CoV-2), respectively, below a laboratory cut-off, for a given individual at a given time point. Based on the analytical derivations, simulations, and real-world data applications, the introduced methods seem to enrich the family of all alternatives for survival curves' comparison and time-to-event prediction, and, even</subfield>
  </datafield>
  <datafield tag="538" ind1=" " ind2=" ">
   <subfield code="a">Způsob přístupu: Internet</subfield>
  </datafield>
  <datafield tag="653" ind1="0" ind2=" ">
   <subfield code="a">statistika [obor disert. práce]</subfield>
  </datafield>
  <datafield tag="655" ind1=" " ind2="7">
   <subfield code="a">disertace</subfield>
   <subfield code="7">fd132024</subfield>
   <subfield code="2">czenas</subfield>
  </datafield>
  <datafield tag="655" ind1=" " ind2="9">
   <subfield code="a">dissertations</subfield>
   <subfield code="2">eczenas</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">machine learning</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">decision trees</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">time-to-event prediction</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">classification algorithms</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">COVID-19</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">antibody blood level decrease</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">robust methods</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">survival analysis</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">survival curves comparison</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">random forest</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">assumption-free</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">delta method</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">adjusted Kaplan-Meier estimator</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">Cox proportional hazard model</subfield>
  </datafield>
  <datafield tag="690" ind1=" " ind2=" ">
   <subfield code="a">time-to-event variable decomposition</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Marek, Luboš</subfield>
   <subfield code="7">pna2005262058</subfield>
   <subfield code="4">ths</subfield>
  </datafield>
  <datafield tag="700" ind1="1" ind2=" ">
   <subfield code="a">Komárková, Lenka,</subfield>
   <subfield code="d">1976-</subfield>
   <subfield code="7">xx0033675</subfield>
   <subfield code="4">opn</subfield>
  </datafield>
  <datafield tag="710" ind1="2" ind2=" ">
   <subfield code="a">Vysoká škola ekonomická v Praze.</subfield>
   <subfield code="b">Fakulta informatiky a statistiky</subfield>
   <subfield code="7">kn20010709399</subfield>
   <subfield code="4">dgg</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/75105/podrobnosti</subfield>
   <subfield code="y">VŠKP v InSIS</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/75105</subfield>
   <subfield code="y">Hlavní práce</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/75105/posudek/vedouci</subfield>
   <subfield code="y">Hodnocení vedoucího</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/75105/posudek/oponent/80878</subfield>
   <subfield code="y">Oponentura</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/75105/posudek/oponent/80879</subfield>
   <subfield code="y">Oponentura</subfield>
  </datafield>
  <datafield tag="999" ind1="4" ind2="0">
   <subfield code="u">https://insis.vse.cz/zp/75105/podrobnosti</subfield>
   <subfield code="y">dc:identifier</subfield>
  </datafield>
  <datafield tag="993" ind1=" " ind2=" ">
   <subfield code="x">NEPOSILAT</subfield>
   <subfield code="y">VSKP</subfield>
  </datafield>
  <datafield tag="999" ind1="4" ind2="9">
   <subfield code="a">vse75105</subfield>
   <subfield code="b">240224</subfield>
  </datafield>
  <datafield tag="999" ind1="4" ind2="5">
   <subfield code="x">75105</subfield>
  </datafield>
 </record>
</collection>
