Developing machine learning tools for the label-free identification of cancer stem cells

Cardiff University

About the Project


Brain cancer affects almost 10.000 patients/year in the UK. The most frequent of these cancers in adults is glioblastoma (GBM), which is invariably lethal. GBMs are highly heterogeneous cancers, which constitutes a tremendous obstacle to the development of effective therapies. GBM cancer stem cells (GSCs) are key to the formation, maintenance, recurrence, and resistance of GBM tumours. Using Coherent Anti-Stokes Raman Scattering (CARS) microscopy we measured global protein levels, finding higher protein levels in GSCs than non-stem GBM cells (NGCs). We have applied CARS microscopy to human patient GBM samples and identified intratumour heterogeneity in global protein levels within GBM cells. This technological advancement highlights the exciting potential of CARS microscopy for the label-free identification of cancer stem cells in situ, which has potential to revolutionize the study, diagnosis, and prognosis of GBM in the future. 

As part of a new research project, we will deploy CARS imaging to human patient specimens. CARS label-free visualization may be useful to detect the presence and position of GSCs in patient samples, as well as their spatio-temporal dynamics in relation to extracellular matrix, vasculature, and myelin (all of which can also be identified label-free). 

Following CARS image acquisition, a key step is the development of robust quantitative computational analysis workflows able to extract chemical components and generate spectral and spatial ‘profiles’ of GSCs and NGCs. Our current methodology is based on non-negative matrix factorization, a class of multivariate analysis. In this unsupervised approach, random guesses for the spatial and spectral profiles are introduced and modified by the algorithm to minimise the factorization error. However, presently the reproducibility and stability of the method is still not optimal, especially if the experimental data exhibit significant signal fluctuations. 

Aims and Objectives

This PhD studentship will focus on:

1) Developing new factorisation methods of CARS hyperspectral images, to enable the robust identification of GSCs across human GBM patients.

2) Investigate alternative multivariate analysis, i.e. introducing convolutional constraints to reproduce local correlation between components, and deep learning architectures.

3) Retrieved components will be used as features for an unsupervised classification, highlighting the types and distributions of cell sub-populations within the tissue. Morphology features or other omics data can be integrated together with the CARS components to improve the classification results.

4) Evaluate deep leaning approaches, i.e. finding complex patterns in the data rather than using the basic metrics of traditional machine learning, to surpass the disadvantage of the additional computational load. 

Methods and Anticipated Results

Transformative technologies and data environments for diagnosis and disease prediction. New methodologies for CARS image analysis have the potential to enable label-free identification of cancer stem cells, which are more resistant to anti-cancer therapies and a candidate source for tumour recurrence. Improved detection of cancer stem cells in patient samples would improve diagnosis for brain tumour patients. 

The improved methods will help in enhancing a sparse sampling algorithm that we have developed to decrease the acquisition time of the CARS imaging. This aspect will be crucial to increase the volume of experimental data available, which in consequence will further consolidate the machine learning predictions. 

Importantly, the new knowledge acquired during the studentship could be easily transferred to the analysis of other experimental (imaging and beyond) data, increasing the employability of the student at the end of the project. 

To help us track our recruitment effort, please indicate in your email – cover/motivation letter where ( you saw this job posting.