Early Identification of COVID-19 with Machine Learning [COVID-19]

  • Research type

    Research Study

  • Full title

    Early identification of COVID-19 from undifferentiated medical presentations using Machine Learning

  • IRAS ID

    281832

  • Contact name

    Andrew Soltan

  • Contact email

    andrew.soltan@medsci.ox.ac.uk

  • Sponsor organisation

    University of Oxford / Clinical Trials and Research Governance

  • Duration of Study in the UK

    0 years, 9 months, 7 days

  • Research summary

    In December 2019, Covid-19 - a novel illness caused by SARS-CoV-2 - emerged in China and rapidly spread to the rest of the world. With non-specific clinical symptoms, the rapid identification of Covid-19 from undifferentiated medical presentations to hospital is essential for expedient care and appropriate use of protective equipment and isolation rooms during early admission. Current testing for SARS-CoV-2 is by Polymerase Chain Reaction (PCR) of nasopharyngeal swabs, with a turnaround time of 24-48 hours in UK centres and up to 72 hours in California. Sensitivities of the PCR test is thought to be below 70%, and the need for equipment and trained operators has limited scaling of testing facilities. \n\nThis retrospective case-control study seeks to apply machine learning methods to provide expedited identification of Covid-19, within the first hour of the admission, using immediately-available laboratory blood tests and other data already performed as part of the standard-of-care. Algorithms will be trained on retrospective pre-morbid and front-door biochemistry of patients presenting with Covid-19 illness, in addition to a variety of acute respiratory presentations. Models will then be expanded, where data permits, to incorporate other data available at the same point in time, including physiological measurements and past diagnoses. \n\nMachine learning classifiers will be developed, and performance assessed with cross-validation to avoid overfitting (including holding out data from other sites, according to data availability). Metrics used to evaluate performance will include AUROC (Areas Under Receiver Operating Curve), Accuracy, F1 scores, sensitivity-specificity and precision-recall. An additional output of this work would be to identify risk factors, or phenotypes, that correspond to increased likelihood of a positive test.\n\nThere is a critical need for this urgent risk stratification, within the existing testing infrastructure, to initiate expedited appropriate clinical care to the patient and judicial use of isolation facilities and protective equipment. Further, earlier identification during a global health emergency could permit timely implementation of appropriate infection control measures.

  • REC name

    N/A

  • REC reference

    N/A