Big dAta in patients with breast cANCEr: risk prognosis for clinically relevant impairments of health-related quality of life based on Machine Learning – EORTC BALANCE

Principal investigator(s)
Lonneke van de Poll-Franse
IKNL
Eindhoven, Netherlands
, Bernhard Holzner
Medical University of Innsbruck
Innsbruck, Netherlands
Project coordinator(s)
Thijs van der Heijden
Netherlands Cancer Institute
Amsterdam, Netherlands
, Niclas Hubel
Medical University of Innsbruck
Innsbruck, Austria

Project summary

Big data has found its way to cancer research and medicine but is currently not (yet) living up to the promises it can bring to personalized care for individual cancer patients. The bottleneck of big data in cancer research and medicine is that a lot more data is needed than for ‘classic’ statistical regression analyses due to the nature of the machine learning models. Existing internationally scattered datasets need to be harmonized, analyzed and presented effectively to both clinician and patient so that it can be used to inform treatment decisions.

The BALANCE project has started with an effort to collect data sets concerning health-related quality of life (HRQoL) in breast cancer patients, measured through the EORTC QLQ-C30 and BR23/42. The first step is to accrue a sufficient amount of data from multiple international sources. Datasets from both research and clinical routine will be harmonized into a dataset large enough to provide comprehensive and accurate risk prediction models that can aid patients and their healthcare providers in understanding a patient’s risk of experiencing poor HRQoL in the future.

Besides the EORTC QLQ-C30 and/or BR23/42 data, additional clinical and patient data are needed for the prediction models. However, there is no uniform standard set for the additional data, resulting in the first challenge of harmonizing these datasets.  These types of challenges result in (potential) loss of data and limit the scope of the prediction models.

The prediction models must be comprehensive and accurate for prediction on individual patient levels. Building the models will take a two-track approach, namely first building “classical” regression models and second building the machine learning (ML)/deep learning (DL) models. Predictive regression models do not unlock all information stored in big data sets, as these models are rigid and variable selection is determined before data analysis. ML/DL models have the advantage of self-correcting and learning ability by running multiple algorithms over the same dataset unlocking more information than regression models whilst factoring in more variables. ML/DL, regression and hybrid models will be compared with each other to see whether these models achieve prognostic accuracy.

The main deliverable will be statistical models based on ML and DL that allow the prognostication of clinically important problems and symptoms, supporting the early identification of cancer patients at risk of experiencing such. The BALANCE project furthermore aims to be a proof of concept for bringing together data internationally to develop risk predictive models for HRQoL using data on demographic, (bio)medical, treatment, lifestyle and, economic factors, while also using data on comorbidities and psychological condition.

Achievements

Minimal data requirement reached via recruitment
Dataset accrual – Ongoing
Harmonisation dataset –Ongoing
Model building – Started

Future plans

We are open for new collaborators and data, so please contact t.vd.heijden@nki.nl if you want to collaborate and/or have a breast cancer HRQoL dataset.

For patients

Quality of life in cancer patients fluctuates during diagnosis, treatment, and survivorship and between patients. Giving patients and healthcare professionals an individualized prediction of which domains of quality of life are expected to change and how it impacts the patient is the aim of the project.

This project collects already existing data on the quality of life of breast cancer patients and survivors from studies and care centres for its large database. This database is analyzed in two ways: statistical modelling and artificial intelligence algorithms. These models and algorithms will provide the aforementioned individual predictions and provide information about what QoL impairments to expect from disease and/or treatment.

Go to Top