Clinical AI
Improving clinical trial efficiency
to accelerate the delivery of innovative treatments to patients

Within the pharmaceutical industry, clinical trials are essential for the delivery of new drugs or treatment to the public. At minimum, a novel drug has to undergo 3 phases of study in order to receive FDA approval. Phase 3 trials are usually the last and most difficult hurdle to clear due to the scale of their complexity and duration, taking anywhere between 2 to 15 years.

Because of their significant logistical complexity and purpose, Phase 3 trials have a very high risk of both delays and failure, which directly corresponds to delays in getting life-saving drugs to patients. In addition, the success of trials is typically dependent on the personal experience of the trial organizer. We hypothesize that providing any kind of non-subjective and non-empirical insight for trial duration has the potential to increase chances of trial success.

Despite numerous trials and a global market valuation of $55.86 billion in clinical research services, there lacks a robust, publicly accessible tool for predicting trial durations. Our work aims to fill this gap by developing a machine learning pipeline that provides duration estimates for Phase 3 oncology trials, the most resource-intensive phase in drug development.

We propose a Clinical AI Web API that uses machine learning to predict the duration of clinical trials. This tool aims to serve Clinical Research Organizations (CROs), pharmaceutical companies, and researchers. By entering their preliminary study parameters, the API will provide them with accurate predictions that can aid in better resource management, budgeting, and strategic planning.

Our Team

With diverse academic and professional expertise, we enrich our project with in-depth industry knowledge, innovative analytical perspectives, and advanced data management skills.

Cynthia Xu
Data and ML Engineering Lead
Backend Developer

Applied ML Fellow,
Los Almost National Laboratory

Adeline Chin
Market Research Lead
Frontend Developer

Clinical Data Associate,
Translational Drug Development

Jooyeon Hahm
ML Research Lead
Website Deveopr

ML Engineer,
EBSCO Information Services

Data

Our project leverages a substantial dataset from ClinicalTrials.gov, a comprehensive global registry for clinical research studies. Specifically, we have sourced data from 19,049 completed interventional cancer studies conducted between January 1, 2011, and May 30, 2024, which includes 5,053 Phase 1 studies, 5,982 Phase 2 studies, and 1,634 Phase 3 studies. This dataset provides extensive details on each trial, including trial locations, enrollment numbers, participant eligibility, types of interventions, and methods of outcome measurement, offering a robust foundation for our machine learning model to accurately predict clinical trial durations.

Data to Features

Our feature engineering process transformed the study protocol data into trainable features. These features include unique study identifiers, durations of primary and overall study completions, and binned categories for these durations. We also detailed the number of conditions and groups examined, age groups of participants, and study locations. We included types and numbers of interventions, sponsor types, intervention models, responsible parties, the presence of a data monitoring committee, allocation types, masking levels, enrollment counts, and the inclusion of healthy volunteers. Additionally, the features also cover study purposes (treatment, diagnostic, prevention, supportive), types of interventions (procedural, device, behavioral, drug, radiation, biological), and outcome measures (overall survival, duration of response, adverse outcome).

Our Unique Features

We extracted from text data new variables such as the durations of primary and secondary outcome measurements and the numbers of inclusion and exclusion criteria. We also incorporated 5-year survival rates for specific cancer types to enhance the model's performance in predicting study durations.

  • Durations of primary and secondary outcome measures

  • Numbers of patient inclusion and exclusion criteria

  • 5-year survival rates for categorical cancer types

Models

With our extensive and unique features, we have trained Random Forest, Light GBM, and XGBoost classification models using various binning strategies with 2, 3, 4, and 5 bins to forecast study durations. Our best-performing model achieved an accuracy of 0.803 and a precision of 0.805, making it the best model publicly available for forecasting study durations. This model's performance demonstrates the effectiveness of our extensive feature engineering process and the robustness of our approach in handling complex clinical trial data. Below we have sample model specs and metrics.

Model 1

Random Forest

accuracy 0.803

3

Study Phase

1,134

Training Set Size

2

Bins

Model 2

Light GBM

accuracy 0.735

2

Study Phase

4,187

Training Set Size

2

Bins

Model 3

XGBoost

accuracy 0.603

1

Study Phase

3,537

Training Set Size

3

Bins

Our Product

Our product takes information about your clinical trial, feeds it into our model, and returns a prediction of your study duration. You can enter the information using our user-friendly interface. Watch our demo for detailed guideline.

Results

Our binary predictive model is more generalizable and accurate than the most recently published models. It successfully predicted whether Phase 3 trials, which are the most lengthy and complex, would align with the average duration, achieving an accuracy of 0.803. This result highlights the robustness and reliability of our approach, offering significant potential for improving the planning and management of clinical trials. By providing precise duration estimates, our model can help streamline operations and optimize resource allocation in clinical research organizations.

Long et al. (2024)

Our Model

Trained on Phase 1 lymphoma studies
Trained on all Phase 1, 2, 3 studies
Tested only on lymphoma and lung cancer
Tested on all cancer types
Small testing set size
Robust testing set size
Did not address text data
Extracted trainable features from text data
Highest Accuracy 0.725
Highest Accuracy 0.803

Acknowledgements

We would like to express our sincere gratitude to Professor Puya Vahabi (Berkeley MIDS) his invaluable guidance and support throughout this project. We also extend our heartfelt thanks to Professor Korin Reid (Berkeley MIDS) for her unwavering encouragement and assistance during challenging times. Their expertise and mentorship have been instrumental in the successful completion of this work. We also thank our classmates for their feedback and encouragement.