Situation
A leading pharmaceutical company received the UK Biobank cohort with a large amount (n= 500,000) of phenotypic data in support of target discovery on a deeper level. Company researchers sought to explore the UK Biobank dataset at scale, building complex queries that take multiple parameters into effect and further refine cohorts to match diseases of interestChallenge
To extract its full value, the company’s data needed to be clean, easy to navigate and diverse. Ongoing curation and harmonization is also required to maintain the dataset and integrate it with other data that becomes available
In-house capabilities for data curation and harmonization were unable to efficiently curate and harmonize the data, forcing the company to accept lower data ROI unless a solution could be found
Action
Curation and Standardization
- BC Platforms first worked with researchers to understand their needs, expectations and research questions
- Based on client needs, BC Platforms configured our proprietary software platform to enable the client’s objectives
- BC Platforms then ingested, curated and harmonized the n= 500 000 data set
Data Exploration & Management
To ensure the data set could be used to its full potential now and in the future, BC Platforms applied its Ontology Browser and BC|MATCH curation solution enabling additional data pipelines to be built for the dataset as the company’s efforts grow and expand
Impact
Deepened Target Identification: BC Platforms enabled researchers to build complex queries that take multiple parameters into effect, further refining cohorts to closer match the disease of interest and be more precise in targeting
Time & Resource Savings: Time from data release to usability was significantly reduced, from months to days. Additionally, ongoing management of the dataset was minimized as the dataset can quickly incorporate new datasets
Continued dataset ROI: subsequent data releases have been integrated and harmonized in under a week