The Penn Medicine BioBank (PMBB) is a research program that recruits participants throughout the University of Pennsylvania Health System (5.7 million outpatient visits in the fiscal year 2018) by enrolling at the time of outpatient visits. Patients participate by completing a questionnaire, donating a blood sample, and allowing researchers access to their Electronic Health Record (EHR) information. The PMBB has recruited ~60,000 participants to date. Nearly 25% of PMBB participants (n ~15,000) are of African ancestry, consistent with the greater Philadelphia catchment area, making the PMBB one of the largest single institutional academic biobanks of African ancestry individuals in the country. Approximately 20,000 of these participants have already been genotyped (imputed to 1KG Phase3 dataset using Michigan Imputation Server) and 12,000 have whole exome sequence data. These are data from the PMBB-Release-2017.

Penn Medicine BioBank Team:

  • Meghan Livingstone
  • Yuki Bradford
  • Scott Damrauer
  • Stephanie DerOhannessian
  • Scott Dudek
  • Mike Feldman
  • Renae Judy
  • Anastasia Lucas
  • Daniel Rader
  • Marjorie Risman
  • Marylyn Ritchie
  • Giorgio Sirugo
  • Anurag Verma
  • Shefali Verma
  • JoEllen Weaver

Regeneron Genetics Center

The genetic data included in the PMBB allele frequency browser are in genome build Genome Reference Consortium Human Build 37 (GRCh37), also known as hg19.

We derive the clinical information about the PMBB participants from the Electronic Health Record (EHR). The EHR at Penn Medicine is based in Epic and is used by all Penn Medicine clinicians with consistent clinical definitions, formularies, clinical decision support, and other features. PennChart, the branded Epic product, now supports all 2,000 physicians practicing throughout the Penn health system. Penn Data Store (PDS), Penn Medicine’s clinical data warehouse, contains over 6 million patient records and other discrete clinical information amalgamated from 12 different source systems throughout the enterprise. The PDS uses standardized language from national coding systems including SNOMED, LOINC, and RxNORM; all data are being modeled in the OMOP data model.

Population Characteristics of Penn Medicine BioBank

Genotype ChipExome Sequence
Total Patients19,51510,900
Female (%)7856 (41%)4432 (40.7%)
Median Age (at enrollment), yr6667
Body mass index30.02 (13 - 83)
RaceEuropean American115808198
African American59882172
Ad Mixed American795304
East Asian7979
South East Asian114114
Other57533

Here we show the first two principal components of ancestry, generated using common variants. Colors indicate ancestry groups. AFR = African; AMR = Native American, EAS = East Asian; EUR = European, SAS = South Asian

Relatedness between the individuals was estimated using identify-by-descent. We removed the related individuals with pi-hat threshold of 0.25 which account for relatives up-to first cousins.