PROTEUS — PRACTICE

Pooling and Exchanging Data: Chapter 15

In this excerpt (Chapter 15) from the PROTEUS-Practice Guide, you’ll discover some of the approaches to data architecture and curation available for patient-reported outcome (PRO) systems.

This webpage contains the entire contents of Chapter 15. You can also download the PROTEUS-Practice Guide by clicking here.

Key Points

Pooled data can be stored either in centralized data warehouses or in distributed data warehouses
Identifying an appropriate data model and associated meta-data is an important aspect of maximizing the utility of pooled patient-reported outcome (PRO) data, and there are numerous models to choose from

Overview

Combining PRO data collected across different medical sites and different points can create robust datasets, allowing for more meaningful research questions to be answered. However, it is important to ensure that there is some degree of consistency across the aggregated data and how it is specified in the data model.

There are two overarching approaches to data architecture and curation. The first is through a centralized warehouse, which stores all the extracted data from many sites. The second is through a federated/distributed warehouse, wherein sites maintain their own data and respond to data analysis queries independently.

There are hundreds of available data models to choose from. Several factors must be considered when choosing a data model, including the granularity and specificity of the data and the clinical domains supported by the data model. Examples of popular data models include PCORnet, Consolidated Clinical Document Architecture (CCDA), and Shared Health Research Informatics Network (SHRINE).

Questions and Considerations

A. What are the different architectural approaches?

Centralized data warehouse

Centralized data warehouses store all the data extracted from many different sites that use a given system
Centralized data warehouses are typically maintained by a coordinating center that ensures that the data are entered into the warehouse and available for use
Centralized data storage facilitates better data analysis as it allows statisticians to understand which data were collected, which are missing, and to conduct quality checks
All sites contributing to the centralized warehouse must address legal, regulatory, and proprietary data sharing issues
Contributing sites need to agree on a standard data interchange format

Federated/distributed data warehouse

In federated/distributed data warehouses, data are kept in a locally maintained data warehouse at each site
Data analysis queries can be submitted to the local sites, which run the analysis and respond with summary data
There are fewer organizational concerns about sharing potentially identifiable patient data, as the local site has control of the data and only reports aggregated results
Although data is held locally, this approach still requires different sites to agree on mapping of local types and potential values of data to the standard values and formats
Record linkage to data is more difficult
It may be difficult or impossible to replicate analyses, since they are conducted at the local level

B. What are the considerations for choosing a common data model?

Pooled PRO data have little value if there is not a consistent data model and meta-data
Considerations when selecting or creating a common data model include:
- Granularity of data and whether person-level analyses are supported
- De-identification and other limitations of data sets, including bins or categories of data rather than specific values (e.g. age range rather than date of birth)
- Data specificity (e.g. how de-identification was handled with respect to dates)
- Clinical domain(s) in the data model
- Governance of the data model
- Model use of standard interoperability references

C. What are some examples of data models to choose from?

There are hundreds of data models to consider
Here are examples of several popular options
- PCORnet: Developed by the Patient-Centered Outcomes Research Institute (PCORI). Describes meaning of each data item, and in some instances the context of the collected data
- Consolidated Clinical Document Architecture (CCDA): A general-purpose XML-based clinical data interchange format. It is commonly available in electronic health records that are certified by the Office of the National Coordinator – Authorized Testing and Certification Body. It is often used to move data from one system to another when the two systems have different internal data models
- i2b2 – Shared Health Research Informatics Network (SHRINE): An open-source, XML-based network that allows groups to link their aggregated counts of patients meeting selected inclusion and exclusion criteria for demographics and other variables
- Project-specific ad hoc data models: As opposed to choosing from an existing data model, a new data model can be created that includes only the data required for a specific project

View Last Chapter View Next Chapter

Relevant Primary Resources

The information presented here is an overview of pooling and exchanging data. For more detailed information please see the following sources:

Article

Users’ Guide to Integrating Patient-Reported Outcomes in Electronic Health Records

Learn more →

Background And Citing The Proteus-Practice Guide

Nothing in this Guide should be construed to represent or warrant that persons using this Guide have complied with all applicable laws and regulations. All individuals and organizations using this template have the responsibility for complying with the applicable laws and regulations or regulatory requirements for the relevant jurisdiction.

Each chapter of the Guide lists the key foundational resources that informed its content. To appropriately recognize the foundational resources, we encourage you to cite both the Guide and the relevant foundational resource(s). Recommended citations are provided here.

Suggested Citation

The PROTEUS Guide to Implementing Patient-reported Outcomes in Clinical Practice
A synthesis of resources. Prepared by Crossnohere N, Brundage M, Snyder C, and the Advisory Group, 2023. Available at: TheProteusConsortium.org.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.