INTEGRATE: Driving Excellence in Integrative Cancer Research through Innovative Biomedical Infrastructures (INTEGRATE)

Published on: 31/10/2013

The INTEGRATE project aims to develop innovative infrastructures to enable data and knowledge sharing and to foster large-scale collaboration in biomedical research. Its flexible infrastructure components and tools will bring together heterogeneous multi-scale biomedical data generated through standard and novel technologies within post-genomic clinical trials.

There is a strong need in biomedical research, especially in the case of complex heterogeneous diseases such as cancer, to achieve an all-comprising harmonisation of efforts: To integrate the available data and knowledge in comprehensive models supported by interoperable infrastructures and tools, to standardise methodologies, and to achieve wide-scale data sharing and reuse, and multidisciplinary collaboration.

INTEGRATE aims to build solutions that support a large and multidisciplinary biomedical community ranging from basic, translational and clinical researchers to the pharmaceutical industry to collaborate, share data and knowledge, and build and share predictive models for response to therapies, with the end goal of improving patient outcome. Moving away from empirical medicine, towards evidence-based personalised care has the potential to both dramatically improve patient outcome and to reduce costs. INTEGRATE will deliver:

  • reconfigurable infrastructure components;
  • tools for sharing and collaboration;
  • standards-based data models;
  • repositories of data, models and knowledge.

The INTEGRATE environment will enable:

  • Collection, preservation, management and reuse of data collected within multi-centric clinical trials. These unique comprehensive datasets will be made available through uniform interfaces to support information sharing and collaborative knowledge generation.
  • Multi-disciplinary collaboration, providing an environment and tools that support researchers across domains, institutions and industries to jointly contribute to research objectives, develop common methodologies and complex analyses, and efficiently make use of each other's expertise and results.
  • Collaborative definition and development of relevant clinical questions and more efficient validation of potential biomarker results and predictive models in clinical trials.

INTEGRATE facilitates collaborative development, preservation and sharing of multi-scale realistic and validated predictive models of response to novel therapies and to anti-cancer drugs. We will propose methodologies for model development, a modelling framework, and predictive multi-scale models in the context of breast cancer. INTEGRATE also provide standards-based interoperability to existing research and clinical infrastructures to support efficient information reuse and integration.

Policy Context

As biomedical research evolves specific needs emerge. Information from research is combined with clinical data collected through clinical trials. In further research such data is integrated with data from molecular diagnostics and used to further develop clinical decisions support tools assisting care providers in the clinic. Access to Patient clinical data for research and development of CDS tools and therapy applications is critical. In collecting patient data special care should be taken to assure that the entire process of data extraction and linkage is in line with all applicable legal and ethical requirements, applicable European and local laws and regulations, including, but not limited to, the 95/46/EEC (Data Protection Directive), medical secrecy laws, the ICH Guidelines for Good Clinical Practices and the Declaration of Helsinki. INTEGRATE aims to deliver to the biomedical research community reconfigurable infrastructure components and tools for sharing and collaboration. As a consortium we promote full compliance with legal and ethical legislation and guidelines and aim to follow the highest standards with respect to privacy and ethics in developing our results.

The project is funded by the EC: programme acronym: FP7-ICT, sub-programme area: FP7-ICT-2009-6


Non exhaustive List of legislation, regulations and policy:

EU (proposed) regulation:

EU directives:

  • 95/46/EEC (Data Protection Directive)
  • 2001/20/EC (clinical trials on medicinal products for human use)
  • 2005/28 EC (directive on good clinical practice)

EU Communications

International Organisations

Codes of Research Practice

Description of target users and groups

The first aim of INTEGRATE is to meet the needs of the cancer academic research community, starting at the level of the Breast International Group and expanding to other cancer research communities. INTEGRATE will build on the existing Breast International Group (BIG) network to support the adoption of the INTEGRATE solutions. This network relies on 50 cooperative research groups (typically national groups of hospitals – over 3000 hospitals). In addition to supporting the cancer academic research community, the sustainability potential relies on buy-in of commercial partners (e.g. the pharmaceutical industry).

Exchanges with similar health ICT projects (SAGE, TRANSCEND, etc) will be important throughout the life of the INTEGRATE project to share experience and maximise interoperability. Sharing of clinical, pathology, and high-throughput genomic data (in particular next-generation sequencing data) in an efficient and secure manner is a challenge for which no definitive solutions have been identified, and the INTEGRATE consortium, with the know-how that it will acquire, could contribute to emerging initiatives in this area.

INTEGRATE solutions will gradually be proposed outside the initial research network to a wide inter-disciplinary academic cancer research community, comprising oncologists, pathologists and imaging specialists, clinical trial managers, IT specialists and data analysts.

In the process of defining the user needs, five main roles have been identified for the users of the INTEGRATE environment: clinicians, core laboratory staff, clinical trial administrators, IT administrators and researchers. There is not necessarily a one-to-one correspondence between a person and a role. For example, the same person can act as a clinician or as a researcher depending on the circumstances.

Description of the way to implement the initiative

Image removed.

Sharing and integration of data from oncology clinical trials

At the centre of INTEGRATE is a shared repository of clinical, genomic and imaging data, originating from multiple clinical trials in breast cancer. By accessing data from multiple trials, researchers will be able to build predictive models, identify biomarkers and answer other research questions faster and with more confidence. Additionally, fine-grained control of access to subsets of the data by different user groups will enable flexible patterns of collaboration.

The “semantic interoperability layer”, consisting of the core data set and flexible data and metadata models, will allow the easy integration of data and the implementation of intelligent workflows that incorporate knowledge of breast cancer. By adopting commonly agreed standards for data sharing and medical nomenclature, INTEGRATE will also be able to “talk” to other data-sharing platforms and become a data hub integrated in a wider network.


Support for patient screening in oncology clinical trials

INTEGRATE will provide tools to streamline the screening phase of breast cancer clinical trials. Before a patient is enrolled in a clinical trial, she (or he) must meet a certain number of eligibility criteria such as age, cancer type and stage, or previous or concomitant treatments. INTEGRATE will facilitate this by managing lists of eligibility criteria for registered trials, and by automating electronic data capture and the evaluation of the criteria. It will also provide interfaces to allow linking and extracting of clinical data for eligibility from electronic health records, the acquisition of molecular testing data from central laboratories, and the tracking of biological samples.

INTEGRATE will also provide tools for central review of data across trials, allowing the definition of panels of experts and creating a framework for accessing and annotating data. In this context, it will integrate tools to visualise and annotate digital pathology images.

Finally, tools for statistical and bioinformatics analyses of data will also be incorporated in INTEGRATE, and a collaborative environment will be provided where researchers can share and annotate statistical models built from the data. The modular nature of the architecture will make it possible to easily plug in analytical components on top of the data querying component.


Tools for researchers

The loose coupling between these components through a standards-based, service-oriented architecture will create a platform that can be readily adapted to changing requirements and integrate external components. This approach also ensures that parts of INTEGRATE can be reused in other contexts.


  • The development of the INTEGRATE environment is based on a standard-based approach
  • The semantics of the data will be captured by standard terminology systems such as SNOMED CT, ICD, LOINC to facilitate compatibility with other available systems and to support their reuse
  • Interoperability will be achieved through the identification of a set of core concepts which fully cover the chosen clinical domain and the available data
  • The set of concepts will be validated by clinical and knowledge engineering experts to assure proper coverage and soundness
  • The need of the clinical users have been collected and incorporated in clinical scenarios which will constitute the basis of our technical development.
  • INTEGRATE will bring together newly developed technologies specifically produced for this effort and existing tools via an interoperable semantic layer
  • The environment will initially be evaluated in early breast cancer trials developed within the Breast International Group (BIG). This will subsequently be expanded to advanced breast cancer and applicable to other tumour types.

A first video (below) showing the functionality of the patient screening application for clinical trials has been uploaded to our website. To support dissemination of our tools and training of the end-users, similar movies demonstrating the other INTEGRATE tools will be made.

Technology solution

Significant effort and financial investments in biomedical research and in the healthcare industry have resulted into a wealth of data, information and knowledge with the potential to bring along large qualitative improvements in patient outcome. However, the heterogeneity of data, the low adoption of shared standards, the fragmentation with respect to methodology, infrastructure and tools, the duplication of efforts, and the insufficient collaboration across disciplines, organisations and industries limit the impact of these investments in both biomedical research and clinical care. To address these pressing issues, there is a strong need in biomedical and clinical research and in clinical care, especially in the case of complex heterogeneous diseases such as cancer, to achieve an all-comprising harmonisation of efforts: To integrate the available data and knowledge in comprehensive models supported by interoperable infrastructures and tools, to standardise methodologies, and to achieve wide-scale data sharing and reuse, and multidisciplinary collaboration.

The INTEGRATE project addresses both the technological standardisation aspects and the clinical aspects related to the standardisation of methodologies and of data representation. The use of standards is essential in our project to ensure wide-scale adoption of our solutions and to enable sustainability. To facilitate information sharing and collaboration in the wide biomedical community, we need to make use whenever possible of existing standards.  Additionally, building solutions based on standards enables us to make use of previous efforts in data sharing, modelling and knowledge generation, and to access important external sources of data and knowledge.

The development of predictive models in an environment build together with the clinical trials community enables us to benefit of an accelerated adoption process towards the clinical practice. Additionally, by focusing on interoperability with existing clinical infrastructures INTEGRATE aims to reduce the distance between research and clinical practice.

It is widely recognised today that further progress in life sciences depends on our ability to develop and use common representations (ontologies, integrated vocabularies, etc.) to model and describe heterogeneous information. Known terminologies and ontologies are used in our project to foster a better semantic underpinning of our framework providing the needed semantic layer on top of the usual lexical/keyword approaches.

Detailed information on the various standards used in the INTEGRATE project can be found at Deliverable “D2.1 State of the art report on standards” provides an overview of the standardisation landscape relevant for the project, while deliverable “D2.4 Initial System Architecture and Implementation Status” describes the way various standards are used by the INTEGRATE components (e.g. semantic interoperability layer, security components, end-user applications, etc.)

The main idea behind the INTEGRATE project is to efficiently close the loop between clinical research and clinical care by providing a standards-based, secure and semantically-aware data sharing environment and building essential tools for clinical research and clinical care.  The environment should support the data and knowledge flow between clinical research and clinical care, and speed up the transfer of new results from clinical research into care. The figure below depicts the areas to which the project contributes through infrastructure components and tools.

Image removed.

Technology choice: Proprietary technology, Standards-based technology, Mainly (or only) open standards, Open source software

Main results, benefits and impacts

Key applications developed by the project are aiming to:

  • Improve patient screening for clinical trials: Streamline the screening process by (semi-)automatically evaluating the criteria with the data of the individual patients
  • Enable cohort selection and analysis for the generation of research hypotheses and retrospective validation based on EHR and CT data
  • Support efficient collaborative review of pathology data.

Figure 3 below includes screens of the patient-screening application for clinical trials. The application matches clinical trial eligibility criteria to the available patient data and supports the clinician to efficiently identify the relevant trials for which the patient may be eligible.

Image removed.

Figure 3

Figure 4 below depicts our application for central pathology review and collaboration. Patient Raw Image data are uploaded to the Central Repository (INTEGRATE data sharing environment). Image data are processed (sliced into tiles) in order to be viewable by the CPR Image Viewer. Moderators access these image data through Central Pathology Review platform and use them in order to define new Review Protocols. There is an internal mechanism for defining Protocol Templates and use them subsequently for opening new Review Protocols. There are also additional collaboration tools, providing assistance to moderators and reviewers (messaging service, notification services, conflict resolution mechanism). An image viewer and annotator helps the reviewer to make and store their observations on the image.

Image removed.

Figure 4


The objectives of INTEGRATE are three-fold:

  1. to create a platform for the sharing of large multi-level datasets and knowledge generated by oncology clinical trials;
  2. to provide tools to streamline the way patients are screened for enrolment into clinical trials of targeted anticancer therapies,
  3. on top of the INTEGRATE data environment to provide tools that support research on retrospective data (cohort studies, analysis, predictive modelling, etc.) and collaboration.

Beyond the immediate benefits that it will bring to oncology research, INTEGRATE, through its open architecture, will be a test bed for new patterns of data sharing and collaboration between academic laboratories, non-profit research organisations and pharmaceutical and biotechnology companies (regardless of the disease domain on which they focus). Finally, it will foster the adoption of good practices in order to facilitate faster and better development of anticancer therapies.

INTEGRATE will develop a standards-based environment that will support co-ordinated and collaborative biomedical research by allowing optimal exploitation of breast cancer clinical trial data, and thus potentially contribute to significant advancements in cancer research. The INTEGRATE methodology of collection, processing, sharing and reuse of data will result in improved processing of breast cancer clinical trials and thus the associated financial investment as well as advanced communication and earlier transfer of new expertise and tools to patients and the clinical care setting.

INTEGRATE is being developed and will be first used in the context of the Breast International Group (BIG) research programme to support molecular screening and data sharing for its members and collaborators. Parallel to this, INTEGRATE is also reaching out to other ICT initiatives (EURECA, p-medicine, TRANSCEND, Sage Bionetworks, EHR4CR, VPH NOE, CHIC, etc.) and end-users groups through meetings, participation in events and through the INTEGRATE Dissemination and Launching Events (September 2013, May 2014). Publications in conferences and our 6-monthly newsletter are as well important channels to disseminate our results and to connect with the clinical and research communities.

Project goals:

  • Overcome obstacles to collaborative research
  • Superior use of high quality data
  • Analysis across disciplines to develop clinical and molecular biomarkers and tailored treatment
  • Speed delivery of new technologies and treatments to patients
  • Reduce economic burden of cancer research

Return on investment

Return on investment: Larger than €10,000,000

Track record of sharing

The communication and dissemination approach of INTEGRATE is implemented at different levels. It is based on solid project-level sharing of knowledge and communication patterns and it extends gradually to different target-users groups, from the BIG network to other collaborative networks in the biomedical domain and to the general public.

The major external dissemination activities aimed to:

  • Identify the different external users groups that could benefit from the INTEGRATE project’s results and the best channels of communication to reach them;
  • Effectively use these communication channels to present the INTEGRATE project’s results and tools;
  • Establish links and encourage synergies with similar projects and initiatives;
  • Provide the foundation of a comprehensive exploitation strategy;
  • Participate in convergence activities at EU-level and co-organise dissemination events (workshops, summer schools, conference sessions).

Although our focus has been clinical research in breast cancer, many of our solutions and tools can be applied in different clinical domains and the approaches to data sharing and collaboration are generic enough to be relevant for the wide biomedical community.

To share the project results in the community we have established links with several other initiatives in data sharing (e.g. Sage Bionetworks in the US) and with other projects (p-medicine, CHIC, etc.), participated in conferences and events, and published newsletters that included next to the latest project results interviews with opinion leaders in the field who presented their perspective on the need for data sharing and collaboration. To promote our project in the user community, we have recently organised a mini-symposium “INTEGRATE – The Potential of Data Sharing” at The European Cancer Congress 2013 Amsterdam, The Netherlands, 27 September 2013.

The INTEGRATE public website presents general project information, participant information, downloadable publications and deliverables. Furthermore, it informs viewers about previous and forthcoming events and activities of the project.

To ensure that the project outcomes fulfil the needs of our user community we will organise evaluation & validation workshops with clinical experts from different healthcare organisations in the EU and beyond, able to contribute a range of distinct perspectives and contexts to the project.

Lessons learnt

  • There is still significant need in the biomedical community for tools and solutions supporting wide-scale data sharing and collaboration. While relevant standards exist or are emerging there is still a high degree of heterogeneity and ad-hoc development.
  • More profiles and guidelines on using standards need to be developed, because standards in the medical domain still leave a lot of room for variation which impacts interoperability. These various implementations encourage heterogeneity. At the same time, standards-based solutions need to take into account clinical domain specificities which results in a need for customisation.  
  • To enable data sharing and collaboration we need to move beyond addressing the technical issues and take into account that there is a need for building the right mindset in the community and for supporting the creation of the right ecosystem that encourages sharing and open communication. 
Scope: International