EDC Biomarker Discovery in Colorectal and Lung Cancer

  • Research type

    Research Study

  • Full title

    EDC Biomarker Discovery in Colorectal and Lung Cancer Serum Samples from UKCTOCS using microRNA, Tumour Autoantibody and Proteomics Technologies

  • IRAS ID

    163990

  • Contact name

    Usha Menon

  • Contact email

    u.menon@ucl.ac.uk

  • Sponsor organisation

    UCL

  • Duration of Study in the UK

    2 years, 0 months, 1 days

  • Research summary

    Summary of Research

    The Early Diagnosis Consortium is a joint project between Cancer Research UK, Cancer Research Technology, Abcodia Ltd and UCL. The aim of the project is to undertake discovery experiments for novel serum biomarkers for the early diagnosis of lung and colorectal cancer using the pre-diagnosis serum from the UKCTOCS Biobank. Previously a number of technologies have been tested via external technology partners for their ability to detect analytes in UKCTOCS serum. This enabled the decision to undertake the discovery studies using technologies to detect proteins, microRNA by next generation sequencing and tumour autoantibodies and compare the signals in cases versus controls to achieve a cancer specific signal. The samples used from the cancer cases will be close to diagnosis and then at intervals up to 7 years prior to diagnosis which will give the project the best chance to identify those markers which are up-regulated early in the course of the disease.

    Summary of Results

    Despite our improved understanding of cancer biology the number of people being diagnosed with cancer continues to increase. Current global figures estimate 12.7 million new cancer cases and 7.6 million deaths from cancer per year. Lung cancer is the second most common cancer in the UK and although prognosis is generally poor, earlier detection allows treatment by chemotherapy and radiotherapy. Furthermore, colorectal (bowel) cancer is the third most common cancer in the UK and while 5 year survival rates are 75% if the disease is detected, once the cancer has metastasised survival rates decrease to only 5%. Survival figures from Cancer Research UK show that the single greatest impact on cancer survival is the stage at which the cancer is detected. Thus, there is a great need to identify novel biomarkers for lung and colorectal cancer diagnostics at the earliest possible stage.

    The principal research question of this project was the discovery of potential novel serum biomarkers for the early diagnosis of colorectal and lung cancers. Biomarkers which indicate the presence of cancer at least one year before clinical diagnosis would have considerable clinical benefit for patients in both cancer types. This set of experimental studies examined the potential of four molecular technologies; proteomics, miRNAs, tumour autoantibodies and lipidomics to deliver early detection serum biomarkers using the UKCTOCS serum samples. The use of pre-diagnostic serum samples from the UKCTOCS trial collected at different time points prior to colorectal and lung cancer diagnosis aimed to give the project the best chance to identify those markers which might be up-regulated early in the course of the disease and if successful this could provide the basis for further validation studies of potentially promising, novel blood tests.

    A nested case-control set of colorectal cancer cases (n=67) with three serial samples at different time points prior to the diagnosis of the disease and matched controls were used initially in this project. This set was analysed by all four different technology platforms. The technology platforms that showed promising results were further validated with an additional nested case/control set of another 67 cases with serial samples and match controls. For the lung cancer a nested case-control set consisting of 60 cases with three serial samples and matched controls was used only with one of the technologies. A summary of the outcome of each technology is presented below:

    Tumour autoantibodies: The initial results from the discovery phase of the “tumour autoantibody profiling” using the colorectal set were promising with panels of 183 markers giving 87% correct identification of the case/control status, whereas allowing for a panel of 200 markers gave 94% correct case/control classification. Biological relevance analysis of the significant biomarkers was undertaken it was found that the identified biomarker panels were enriched most prominently for cell cycle and DNA replication markers, which was expected from cancer biology and are in line with the mechanistic role of tumour‐autoantibodies from the literature. However, there were also some confounding effects that were noticed during the analysis which deemed further investigation in the next phase. To avoid and minimise confounding effects like differences in antigenic reactivities due to regional centres seen in the discovery phase, a balanced experimental design was used during the validation phase with all samples from the same regional centre processed in the same experimental round. The validation phase found around 50 proteins/clones to be associated with cancer, of which 38 had a good predictive performance with an area under the curve (AUC) >0.7. However the analysis also showed that the regional centre effect where the samples were collected from could not be fully eliminated. Different normalisation procedures were applied and replicate testing elucidated comparable numbers of significant features, a proportion of 10%-50% remaining constant in both replicates. Technical reproducibility (intra and inter-day variation) and analytical performance was good although it had some potential for future improvements such as improving the replicate differences. An independent analysis of the results performed by the bioinformatics team at Imperial College concluded that although the results were of potential interest, they did not reach the required significance to make them worth pursuing further with the current technology.

    The same concept for the autoantibody profiling used in the colorectal study was applied to the nested case-control set lung cancer cases and matched controls (60 cases with three serial samples and matched controls). During the discovery phase 315 proteins/clones were found with a potential association to the lung cancer diagnosis. The results from this analysis were independently assessed by two bioinformatic teams. After consideration of all data presented it was decided that as the results from the autoantibody data on lung cancer did not reach the required significance it was not worth pursuing this technology further.

    Proteomics: During the discovery phase 102 proteins were found to be significantly differentially expressed in colorectal cancer cases versus controls. Most of the significant proteins had a trend to be up-regulated in the cancer group over time, with very few of the significant proteins decreasing with time. Biological annotation of the statistically significant proteins showed 36% to have known associations with colorectal cancer, thus this dataset included proteins with both known and novel associations with this cancer type. Based on these results 191 proteins targets were used for validation. The protein expression analysis results from this phase reproduced and confirmed the results obtained in the discovery phase. Classification analysis was performed on the data and panels of up to 4 target proteins were assessed for the ability to predict the development of colorectal cancer. Results indicated that the panels identified using the data from samples collected 0-12 months prior to diagnosis gave the best classification performance. The identified panels with 4 or 5 proteins performed well in cancer prediction with AUC values ranging between 0.78-0.87. These results were independently confirmed by the bioinformatics team at Imperial College London. However these values deemed not to have adequate level of performance for detecting colorectal cancer if the panel was to be used in the future by the NHS. It was therefore recommended that CEA biomarker should be added to the analysis to check whether it improved the previous performance. Analysis of CEA data alone revealed very consistent baseline values in the healthy control samples. A proportion of the cases had elevated levels of CEA, however levels of CEA in some of the cases stayed low even up to the point of diagnosis. Hence, CEA alone seemed to be a poor pre-diagnosis marker based on the data generated from the current sample set. The bioinformatics team at Imperial College London also combined the CEA data with previous proteomic data produced from the same set and found that AUC improved up to 0.9. Review of the current combined data concluded that this was a good level of performance for the combined panel and recommended that these biomarkers are worth to be validated in an independent cohort of cases and controls. The analysis of this work has been completed and identified a series of protein biomarker panels that when measured serially could form the basis for blood tests to support early detection of colorectal cancer. A manuscript is currently under preparation with a summary of the results from the whole proteomics work on this project.

    Lipidomics: A chloroform/methanol/KCl Folch extraction methodology was adopted to isolate the lipids from the serum samples. This was carried out in the presence of standards for each class of lipid to allow for correction of recoveries. Data was analysed using internally developed software. The results showed differences in lipid class abundance and the structure of lipid species between cancer samples and control samples. Changes in some lipid species were observed longitudinally in the case samples as the cancer progresses. The lipidomics analysis highlights changes in a number of molecular species in the serum from the cancer patients compared to the controls. It is notable that the changes in long, unsaturated PE (phosphatidylethanolamine) and particularly PI (phosphatidylinositol) mirror what the collaborators have previously observed with tissue from colorectal cancer patients. An independent analysis of the data performed by the bioinformatics team at Imperial College London confirmed that specific sphingomyelins, phosphatidylcholines and triacyglycerols had significant associations with cancer. However, the performance of multi-lipid panels was poor with AUCs of 0.60-0.65, likely due to high variability of lipid expression within the cohort. Further analysis of the data by the bioinformatics team including additional variables found a similar level of performance (AUCs of 0.60-0.65).

    miRNA: Total RNA was isolated and concentrated before preparation of small RNA libraries and next generation sequencing on an Illumina HiSeq 2000 instrument. Quality control data showed efficient RNA recovery, good consistency of sample processing and successful library preparation. However, in all of the extensive data analysis methods used there was no clear separation between cases versus controls or time to diagnosis to meet the confidence levels required to avoid very high false discovery rates. A multivariate machine learning approach was further used to test a number of feature selectors and configurations by splitting the samples into a training and test set (70/30 split) but again only very subtle changes in miRNA expression were detected. The best result with 10 miRNAs gave small effect size differences and an accuracy prediction rate of only 64% which was unacceptable. Therefore, although the microRNA analysis experiments were completed to a high standard they did not identify any significant hits to be used further and thus no additional work was performed with this technology. Despite this outcome as the microRNA database is extremely large and comprises of many isomers further analysis was performed in the control group only by the bioinformatics team at Imperial College. The work from this sub-analysis has been published: "The 14q32 maternally imprinted locus is a major source of longitudinally stable circulating microRNAs as measured by small RNA sequencing" (Sci Rep. 2019 Oct 31;9(1):15787. doi: 10.1038/s41598-019-51948-6).

  • REC name

    South Central - Hampshire B Research Ethics Committee

  • REC reference

    14/SC/1323

  • Date of REC Opinion

    10 Oct 2014

  • REC opinion

    Favourable Opinion