X-chromosomal and autosomal data from the Human Genome Diversity Panel, analyzed in S Ramachandran, NA Rosenberg, MW Feldman, and J Wakeley (2008), "Population differentiation and migration: coalescence times in a two-sex island model for autosomal and X-linked loci". Theor Pop Biol. Vol. 74:291-301

  • readme [txt]
  • Plot of fraction of heterozygous loci out of those loci with non-missing data, from 36 X-linked loci genotyped in 1064 individuals. [pdf]
  • Archive of X-linked and autosomal genotype data files used in this study. [tar archive] [zip archive]
  • Note: the archives generate a new directory when extracted, and also include the readme and plot available above.

Outbreak data, analyzed in KF Smith et al. (2014), "Global rise in human infectious disease outbreaks". J Roy Soc Interface Vol. 11: 20140950

  • Outbreak data: including README and Excel workbook [zip archive]
  • Sample disease parser: including README, as well as example input and output files [zip archive]

A new approach for inferring population size changes over time, developed and implemented in JA Palacios, J Wakeley, and S Ramachandran (2015) "Bayesian nonparametric inference of population size changes from sequential genealogies". Genetics Vol. 201: 281-304

  • R code: including a dynamic document compiled using the knitr R package and test data. [zip archive]

Phoneme data for 2082 languages, analyzed in N Creanza et al. (2015) "A comparison of worldwide phonemic and genetic variation in human populations" Proc Natl Acad Sci USA Vol. 112: 1265-1272

  • includes: README, presence-absence data for 728 phonemes in 2082 languages, along with metadata for languages studied and phonemes compiled by Merritt Ruhlen. [zip archive]

pong: fast analysis and visualization of latent clusters in population genetic data

pong is a freely available software package, released by Behr et al. (2016, Bioinformatics), for post-processing output from clustering inference using population genetic data. It combines a a network-graphical approach for analyzing and visualizing membership in latent clusters with an interactive D3.js-based visualization. pong outpaces current solutions by more than an order of magnitude in runtime while providing a user-friendly, interactive visualization of population structure that is more accurate than those produced by current tools. Thus, pong enables unprecedented levels of scale and accuracy in the analysis of population structure from multilocus genotype data.

pong requires Python 2.7 and a modern web browser (e.g. Chrome, Firefox, Safari). pong is not compatible with Internet Explorer. pong is hosted on PyPI and can thus be easily installed with pip by running:

pip install pong

Resources

PEGASUS: the Precise, Efficient Gene Association Score Using SNPs

PEGASUS is a freely available software package, released by Nakka et al. (2016, Genetics), for combining SNP-level p-values into gene scores and conducting gene-level association tests with a phenotype of interest. PEGASUS computes gene scores of association analytically and produces gene scores with as much as 10 orders of magnitude higher numerical precision than competing methods.

PEGASUS requires Perl 5, R (3.0.2 or higher), PLINK (1.07; 1.9 beta 3, 7 Jun is also okay) , and the R packages corpcor and CompQuadForm.

Resources

SWIF(r): SWeep Inference Framework (controlling for co*r*relation)

SWIF(r) is freely available software, released by Sugden et al. (2018, Nature Communications), for calculating SNP-based probabilities of adaptation based on training simulations from a demographic model. Code for training and running SWIF(r), as well as for calibrating the probabilistic output and visualizing learned distributions can be found at the SWIF(r) git repository.

SWIF(r) requires Python v2.7, Matplotlib v1.7, SciPy v0.16, and Scikit-learn v0.17.

Resources

  • the SWIF(r) git repository, which contains example training data and output, code for training and running SWIF(r), and for calibrating probabilistic output and visualizing trained distributions.

WINGS: Ward clustering to identify Internal Node branch length outliers using Gene Scores

WINGS is freely available software, released by McGuirl, Smith et al. (2020, Genetics), for identifying groups of phenotypes sharing a core set of genes enriched for mutations in cases. Code in MATLAB for running WINGS can be found at the WINGS git repository.

WINGS requires MATLAB with the Statistics and Machine Learning Toolbox

Resources