Submit SNP Array Data

PharmGKB defines SNP array data as data regarding genomic alternations within the genome, such as single nucleotide polymorphisms (SNPs) that are detected with high-density arrays (e.g., Affymetrix 100K arrays) which may also detect deletions, amplifications, and loss-of-heterozygosity on a genome scale.

Currently, PharmGKB accepts data from the following platforms:

  • Affymetrix
  • Illumina

In addition to the raw data files, PharmGKB also requires that the SNP Array Template is completed for each submission. The data in this template ensure that there is some minimal meta-data associated with your submission so that it will be appropriately indexed on the website. Please note that existing fields in the templates, including the instructions, must not be altered. Altering the template will cause your submission to be rejected.

If your SNP array covers more than 65,536 positions, they will not fit in the Excel file. In this case, please provide a text file named rsids.txt that contains the list of RSIDs, one per line. If a position does not have an RSID, then use the Golden Path position in the format "chrZ:N" (Z being the chromosome number and N being the numerical portion of the position). Example: chr5:176811098. Zip up the Excel file and the rsids.txt file together and submit that instead.

PharmGKB subject and sample IDs can be obtained from submission@pharmgkb.org.

The SNP Array Template file or zipped template file/rsID file set should be submitted by email to submission@pharmgkb.org.

For Affymetrix data, PharmGKB requires the Affymetrix .CEL file. This file is generated from the .dat file by the Affymetrix MAS 5 or Affymetrix GeneChip Operating Software (GCOS). The native .cel file format for GCOS is a proprietary binary format. To upload GCOS .cel files into the database, open the GCOS Manager program and export the .cel file. This converts it into a text file the database can understand.

For Illumina data, PharmGKB requires the Illumina FinalReport file exported from the Illumina software as a .txt file. The output file must be produced from the Illumina software. The following columns are required:
GC Score
Allele1 - Top
Allele2 - Top
Top Genomic Sequence
Chr

The Affymetrix .CEL file or the Illumina FinalReport file should be submitted by FTP to ftp.pharmgkb.org. Please contact us for login information.

If the submission passes validation it will be given a Submission ID and will be put in the queue to be processed by PharmGKB. Once processed, it will be uploaded to the PharmGKB preview site and an email will be sent to let you know of its availability for review.

If you have any further questions, comments, or need assistance, please contact the PharmGKB team.

PharmGKB® is a registered trademark of HHS and is financially supported by NIH/NIGMS. It is managed at Stanford University (GM61374).
©2001-2010 PharmGKB.