Skip to content
7 min read

Enabling AI‑ready oncology cohorts from hospital data across indications and regions


Executive summary

A leading global pharmaceutical company partnered with BC Platforms to advance AI-driven oncology research across indications and regions. Although the client’s AI capabilities were well established, progress was constrained by limited access to high-quality hospital data, especially longitudinal clinical data and imaging, including radiotherapy. 

BC Platforms built multi-modal AI-ready oncology cohorts directly from hospital clinical and imaging systems using a reusable and traceable regulatory-grade approach. By working inside hospital data environments, we unlocked clinically rich data not available through traditional real-world data sources. 

The result was a set of longitudinal, medically contextualized anonymized datasets that supported multiple AI use cases across indications and regions while reducing operational complexity and creating a reusable foundation for future oncology R&D. 

Project snapshot

IndicationRenal cell carcinoma (RCC), non‑small cell lung cancer (NSCLC), head & neck squamous cell carcinoma (HNSCC)
ApproachHospital‑based cohort construction using a regulatory‑grade, reusable methodology 
CountriesFrance, Cyprus, Thailand, US
DataLongitudinal clinical data and multi‑modal imaging, including radiotherapy
FocusBuilding patient-level data multi-modal AI‑ready oncology patient cohorts from hospital data
ApplicationsPatient stratification, progression risk, early intervention
DesignMulti‑indication, multi‑country cohort development using a global umbrella protocol

Key metrics

~1,000 patients
4 countries
6-8 months from hospital data to AI‑ready cohorts 

Background 

Our client, a global pharmaceutical company with advanced AI and data science capabilities, was using AI in its oncology R&D to improve patient stratification, understand disease progression, and support earlier patient treatment / intervention. 

The constraint was no longer algorithm development, but access to fit-for-purpose data. Traditional real-world data sources lacked the clinical depth, longitudinal structure, and imaging needed for oncology AI, especially radiotherapy and other hospital-based data. 

To support multiple indications and regions, our client needed consistent, AI-ready datasets built from real clinical practice. That meant accessing hospital clinical and imaging systems directly and doing so through a scalable model for site engagement, approvals, and governance. 

Our client’s internal teams were not set up to manage hospital engagement, ethics approvals, and multi-country data governance at scale, leaving no repeatable way to generate AI-ready oncology data from hospital systems. 

Challenge 

Our challenge was to build longitudinal, anonymized patient-level oncology patient datasets directly from hospitals and make them usable for AI across multiple indications and geographies. This required solving for data access, cohort definition, multi-modal modal harmonization and integration, data governance, and asset reuse. 

Accessing hospital clinical and imaging data

The most valuable oncology patient data — longitudinal clinical records, radiology, and radiotherapy — sits in hospital systems such as EHRs, PACS, and treatment planning platforms. Because these data are heterogeneous, locally governed, and rarely standardized, access depends on direct hospital engagement and site-specific approvals that add time and complexity. 

Defining cohorts fit for AI use cases

AI model development requires cohorts defined around clinical questions such as progression risk or treatment response — not around whatever data happens to be available. In hospital settings, variable documentation, inconsistent coding, and unstructured fields make eligible patients hard to identify, often increasing manual review and delaying model development. 

Integrating multimodal data while preserving clinical context 

Successful analysis depends on linking clinical, radiology, and radiotherapy data over time. Because these systems use different identifiers and standards, patient-level integration is technically complex, and incomplete linkages can undermine interpretability and confidence in model outputs. 

Managing regulatory and governance complexity across countries 

Multi-country oncology research involves different ethics frameworks, data protection rules, and hospital-specific governance requirements. Managing repeated submissions, agreements, and compliance steps slows execution and makes regional expansion difficult for research and analytics teams. 

Reusing data and operating models across indications 

Many AI initiatives are still run as one-off projects tied to a single indication or geography. Rebuilding data access, governance, and infrastructure for each use case adds cost, delays execution – and prevents the creation of reusable oncology data assets. Together, these barriers made it difficult for our client to create a scalable, repeatable approach to AI-ready oncology cohorts from hospital data. 

Solution 

We built AI-ready oncology cohorts directly from hospital clinical and imaging systems using a repeatable, regulatory-grade model designed to scale across indications and regions. 

The model combined direct hospital access, data harmonization, multi-modal data integration, and scalable governance with our team also managing hospital coordination, approvals, and delivery end-to-end. 

Direct access to hospital clinical and imaging data

We accessed clinical and imaging data directly from hospital systems, including EHRs, radiology platforms, and radiotherapy environments. This unlocked longitudinal, patient-level data with clinical context that traditional secondary sources do not provide. 

AIready cohort construction with a common data model 

Raw hospital data were transformed into anonymized patient-level, longitudinal oncology patients aligned across sites and countries. Using BC Unify, our data harmonization and mastering platform, fragmented clinical and imaging data were standardized into a consistent structure that supported reliable cohort definitions and reuse without rebuilding from scratch for each project. 

Multimodal data integration including radiotherapy 

Clinical, radiology, and radiotherapy data were integrated at the patient level over time using BC Image, which automates image extraction, de‑identification and anonymization, and processing at scale. This preserved clinical context while producing datasets suitable for model training and validation in real‑world care pathways.

Secure, governed data access for AI teams 

AI-ready datasets were delivered through BC Mosaic, our trusted research environment, providing secure access and governed collaboration for analysis. This gave the client’s AI and analytics teams harmonized data without having to manage infrastructure or access controls themselves. 

Regulatorygrade governance framework at scale 

BC Platforms implemented a global umbrella protocol per indication covering ethics, data protection, and governance across hospitals and countries. This reduced set-up time for new sites and supported expansion without restarting regulatory and contracting processes. 

Reusable model across indications and regions 

The same hospital network, governance model, and data infrastructure were reused as the project expanded across indications and from Europe to the US. This replaced one-off efforts with a scalable operating model for generating AI-ready oncology datasets from hospital data. In practice, this gave the client a repeatable way to turn fragmented hospital data into AI-ready oncology cohorts at scale. 

Impact 

The project delivered faster access to hospital clinical and imaging data, reduced operational burden, and established a scalable way to generate AI‑ready oncology datasets across indications and regions.

Faster access to hospitalgrade data 

The client gained access to longitudinal clinical and imaging data — particularly radiotherapy — directly from hospital systems, reducing the time required to build datasets. This removed a key bottleneck in AI model development. 

Reduced operational and regulatory burden 

BC Platforms managed hospital coordination, ethics approvals, and governance. Our client’s internal teams were able to focus on research and model development rather than compliance and data access processes. 

Clinically meaningful datasets for AI 

Datasets combined clinical data with imaging and radiotherapy in a longitudinal, patient‑level structure. This supported development of AI models grounded in real clinical practice. 

Scalability across indications and regions 

The approach was reused across oncology indications and expanded from Europe to the US, replacing one‑off efforts with a consistent model. 

Foundation for future oncology R&D 

The client established a reusable data and governance foundation supporting future AI initiatives and broader clinical strategy. This enabled a shift from project‑based data access to a repeatable approach for generating and using hospital-based patient datasets across indications and regions. 

Why BC Platforms 

BC Platforms was selected for our ability to operate directly within hospital data environments and deliver clinically-rich oncology datasets not accessible through traditional real‑world data sources. 

Rather than rely on secondary or partially de‑identified datasets, we sourced data from inside the hospital ecosystem to access and structure clinical and imaging data, including radiotherapy. Specifically we:

  • Enabled direct access to hospital clinical and imaging systems, including radiotherapy 
  • Delivered longitudinal, patient‑level datasets with full medical context 
  • Aligned regulatory‑grade execution with clinical development standards 
  • Maintained full ownership of hospital coordination, ethics, and governance 
  • Provided reusable framework supporting expansion across indications and regions 

This approach resulted in higher data quality, faster execution, reduced internal burden for our client’s AI teams – and a scalable foundation supporting ongoing oncology research and AI development. 

Conclusion

For AI models supporting advanced oncology research and use cases, the real constraint is not ambition or algorithms. It’s access to clinically-rich hospital data that can be governed, integrated, and reused at scale. 

We turned that bottleneck into a highly usable operating model — giving our client a scalable advantage in building AI-ready oncology cohorts from real clinical practice. 

For more information on how we support AI-ready oncology evidence generation, contact our team.