The HCSRN’s Virtual Data Warehouse (VDW) facilitates multi-site research while protecting patient privacy and proprietary health practice information. Originally developed by the Cancer Research Network, the VDW now supports studies of cancer, drug safety, cardiovascular disease, mental health, and more. HCSRN organizations agree on data to make available for research and derive standard definitions and formats – but each HCSRN member organization maintains control of its own data via a “distributed” or “federated” model. That is the VDW is not a central database. Administrative, clinical and claims data are translated to a common set of data standards at each site.

The VDW is a cornerstone of HCSRN collaborative research, protecting privacy and fostering standardization.



Governance and Management

  • HCSRN Board: Provides overall policy and direction setting about content, resources, and access.
  • VDW Operations Committee (VOC): Manages development activities across the HCSRN, and provide technical input.
  • VDW Data Area Workgroups: Define, maintain, and interpret data file specifications, propose new variable, identify site-specific issues with data standards, and provide scientific input for each data area.
  • VDW Implementation Group (VIG): Site data managers and others who extract data from local systems, convert it to standard VDW structures, ratify the data specifications, and share best practices.

Financing and Staff

  • VOC staff support comes through a modest annual contribution HCSRN members make to a common operations fund. Member sites support analysts at each site who map local data to the common format and PIs and analysts serving on Data Area Workgroups and the VIG.


  • We use published data standards where available (e.g., CPT-4) and create our own when necessary.

Hardware and Software

  • Each site needs hardware and software (mainly SAS®) to store, retrieve, process, and manage datasets. VDW files are separate from health plan files. We have found that research activities are better supported when a specialized research data warehouse is maintained at each members center.


  • Sites contribute limited data documentation (e.g., the source of variables and any variation) to a password-protected Web portal.

Quality Control

  • Periodic checks look at ranges, cross-field agreement, implausible data patterns, and cross-site comparisons.

Specification Update Procedures

  • Specification and standards updates are developed by Data Area Workgroups. The VOC and VIG review them and set implementation milestones and target dates.

Available Data

  • Each institution’s VDW data remain at their site until a study-specific need arises. The minimum necessary required data are extracted after ethical, contractual and HIPAA requirements are met.

Data Domains

  • Laboratory Results
  • Enrollment and Demographics
  • Cancer
  • Pharmacy
  • Utilization
  • Vital Signs and Social History
  • Census




  1. Work with collaborators to develop the study protocol.

  2. Obtain relevant regulatory, contractual and ethical approvals.

  3. A SAS analytic program is developed at the lead study site.

  4. The project sites run the program against their local VDW and return project-specific datasets (often aggregate) to the lead site for data pooling. This process may be reiterative, depending on the complexity and availability of the data within the VDW.




  • Underlying data are collected for treatment, payment, and operations – not for research.

  • Source data that vary substantially within and across sites.

  • Health plans continually change their information systems, often requiring adaptation or re-implementation at sites.

  • Sharing data beyond project collaborators is complicated for technical, regulatory, and political reasons.

  • Maintaining these processes takes money. Project-specific grant funding does not support the level of cross-site and cross-project upkeep and knowledge sharing that is needed.

It takes time to:

  • Agree on the need for a new variable or data area.

  • Develop clear specifications to guide implementers and end-users.

  • Implement new variables at each site.

  • Verify and document the implementations.

  • Consult with users throughout.




The VDW is an example of a distributed (or federated) data-sharing model based on electronic clinical claims, and administrative health care data. It is applicable for multi-site health services and population health research. With planning and ongoing funding, it yields data across institutions and over time. Benefits for multi-site analysis include:

  • Improved data efficiency, accuracy, and completeness.

  • Analytical precision plus patient and institutional protections.

  • More generalizable results.