Genes & Health has generated and collated genetic and NHS health data from multiple sources, with individual written volunteer consent.
Health Data
At Genes & Health, we annually refresh our repository of health data from:
- NHS Primary care (from London GP surgeries via Integrated Care Boards). Includes diagnoses, pathology, prescribing, demographics (including Townsend and IMD) and many other data types.
- NHS Secondary care (from London and Bradford NHS Trust hospitals). Includes diagnoses, pathology, inpatient prescribing, imaging results (and in some cases images), specialist datasets (e.g. iQemo, Aria).
- National NHS England (ex-NHS Digital) (including HES, cancer, maternity, demographics and multiple other datasets)
- The 1 page questionnaire at study entry. This questionnaire is designed to gather brief demographic information (for recontact and to obtain NHS number), current health status, including type-2 diabetes, and details on parental relatedness.
- Other data collected at volunteer recall, or by REDCap or Qualtrics survey. For example: Attitudes towards the genetic risks associated with type 2-diabetes; a whole cohort mental health questionnaire and online cognitive assessment (2024-).
Genetic Data
Multiple types of genetic data have been generated, including:
- Illumina GSAv3 chip genotyping arrays (all volunteers, open access) with TOPMed-r3 imputation. Note that r3 gives substantial improvement over r2 for Genes & Health as there are many more South Asian volunteers in the r3 WGS panel.
- Low-mid depth exome sequencing* (~5000 volunteers from 2015-2019, sequenced at Wellcome Sanger Institute, open access). * we no longer recommend use of this dataset, superceded by high depth exomes.
- High depth exome sequencing (all volunteers, open access following a priority period for Industry Consortium partners). 150bp paired end high depth exomes using Twist Clinical Research Alliance reagent (sequenced at Broad Institute).
GWAS data, precomputed for all available phenotypes is freely downloadable under CC BY-SA licence.
( use gcloud CLI, install from Google Cloud SDK documentation and then access at gs://genesandhealth_publicdatasets/ )
Omic Data
To date, Genes & Health has re-engaged with over 2,000 individuals, inviting them to participate in ‘recall by genotype’ studies. Participation in these studies necessitates a visit to one of our clinical facilities located in Whitechapel, Bradford, or Manchester, during which participants contribute small samples of blood and/or urine.
In 2025, we will release datasets of
- single cell RNA-seq on 1,521 individuals (CARDINAL project, PI Nicole Soranzo). As well as additional CITEseq, ATACseq, 5’scRNAseq on smaller subsets.
- plasma proteomics (OLINK, Somalogic, Seer) on ~1,500-1,700 individuals.
- plasma metabolomics (Metabolon) on ~1,700 individuals.