Download

Relate is available for academic use. To see rules for non-academic use, please read the LICENCE file, which is included with each software download.

Pre-compiled binaries (last updated: 02/09/2019)


In the downloaded directory, we have included a toy data set. You can try out Relate using this toy data set by following the instructions on our getting started page.

If you have any problems getting the program to work on your machine or would like to request an executable for a platform not shown here, please send a message to leo.speidel [at] outlook [dot] com.

We document changes to previous versions in a change-log.

1000 Genomes coalescence rates, allele ages, and selection p-values


We deposited Relate-estimated coalescence rates, allele ages, and p-values for evidence of positive selection for the 1000 Genomes Project here.
These were obtained by estimating the joint genealogy of all 1000 GP populations and then extracted the embedded genealogy for each population. For the genealogy of each population, we jointly estimated the population size history and branch lengths. Variants segregating in more than one population therefore have correlated but different allele ages in each population.

Relater: R package for handling Relate output files


I made an R package for handling Relate output files (MIT license). It's still under development but the main functionality is working.

You can install this package, e.g., using
library(devtools)
devtools::install_github("leospeidel/relater")

Add-on modules


The input file format of Relate is the haps/sample file format used as an output format of ShapeIt. We provide code to convert files from the hap/legend/sample and vcf file formats. Relate also comes with tools for determining the ancestral allele using an ancestral genome and tools for filtering SNPs using a genomic mask.
We provide code for converting individual trees of interest into .newick format. This file format is convenient, for instance, when visualising a tree. We also provide code for extracting genealogies corresponding to a subpopulation.
Relate can be used to plot a marginal tree of interest. This tree corresponds to the LCT region, where a mutation at SNP rs4988235 is believed to be responsible for Lactose tolerance in Europeans (here GBR). We can see that the derived allele at this SNP has spread rapidly in GBR, which is indicative of strong positive selection.
Relate can be used to estimate population sizes and separation histories between populations. The figure shows the separation history between FIN and GBR in the 1000GP data set. The inset shows the matrix of coalescent rates between pairs of haplotypes 9,000 years ago. Rows and columns are sorted by population labels of haplotypes, as indicated by the colour on the left of the matrix.
Relate can be used to estimate mutation rates through time. The figure shows TCC to TTC mutation rates for all 26 populations in the 1000GP data set. Trends shared between mutation categories are eliminated by dividing by the overall average mutation rate. For each population, the mutation rate is normalized such that its mean over time equals one. Consistent with previous estimates, we see an increase in the mutation rate of TCC to TTC mutations about 10,000 to 30,000 years ago in Europeans and Southern Asians.
Relate can be used to detect evidence for positive selection. We calculate a p-value for selection evidence that quantifies how quickly a mutation has spread in the population. The figure shows a manhattan plot for GBR which indicates clear peaks in the LCT and MHC regions, which are known targets of positive selection.

Change-log


Date: 2nd September 2019
  • Fixed a bug in ConvertToTreeSequence that meant internal nodes were flagged as sample nodes.
Date: 13th August 2019
  • Fixed a small bug that meant memory was misallocated when the --memory value was set too large.
Date: 24th July 2019
  • Fixed a bug in PrepareInputFiles.sh that meant the script terminated before filtering SNPs using a genomic mask when input was not gzipped.
  • Changed the default for y axis scale in TreeView to not be a log-scale.
Date: 27th May 2019
  • Modified default criteria to terminate population size estimation. Now terminating when iteration at least 2 and "mean absolute error"/mu less than 0.1.
  • Modified memory allocation in Relate, where we were allocating too much memory for very small sample sizes (N less than 10).
  • Modified error message when poplabels file is misspecified in FinalizePopulationSize - you can just rerun this step again at the end.
  • Included the script RelateSGE.sh for running Relate on a SGE cluster.
Date: 16th May 2019
  • Fixed a bug in FinalizePopulationSize --mode EstimatePopulationSize, where I had forgotten to close a file leading to an error on some platforms.
  • Allowing .poplabels file to be tab separated.
  • Fixed a bug in TreeView which threw an error when completely fixed SNPs were included in the haps/sample file.
Date: 14th May 2019
  • Fixed a bug in FinalizePopulationSize introduced in v1.0.9, which threw an error message for correct input files.
  • Fixed a bug in SampleBranchLengths which was introducing small rounding errors.
Date: 13th May 2019
  • Fixed a bug in SampleBranchLengths.sh, where a filename was hard-coded by mistake.
Date: 3rd May 2019
  • Implemented a module which samples branch lengths from the posterior given tree topologies and effective population size histories.
  • Added some more detail to documentation for RelateSelection.
  • Added a more meaningful error message when .poplabels file is misspecified in EstimatePopulationSize.sh.
Date: 15th March 2019
  • Now requiring different input and output names for DetectSelection.sh when it overwrites files otherwise.
  • DetectSelection.sh now works also if additional columns are not appended to the .mut file using ./RelateFileFormats --mode GenerateSNPAnnotations. However, code is more efficient if these columns are appended.
  • Switched to using R package cowplot in TreeView.sh.
Date: 25th Feburary 2019
  • Fixed bug in the painting which affects tree topologies sometimes, but has only a small effect (not visible in distance measures when comparing to truth in simulations).
Date: 6th Feburary 2019
  • I added a function for converting anc/mut output files to tree sequence file format (tskit). Currenty, some information is lost by this conversion.
Date: 18th October 2018
  • I had introduced a bug in update v1.0.4 in the function RelateExtract --mode TreeAtSNPAsNewick which has been fixed.
Date: 13th October 2018
  • Substituted zcat with gunzip -c to fix a bug in Macs.
  • Added requirement of R version >= 3.3.1 for TreeView because of a known bug with grid.draw() in older versions.
  • Fixed bug in RelateExtract --mode TreeAtSNPAsNewick, which outputted the first tree when snp_of_interest was not a SNP in the data set. Changed option snp_of_interest to pos_of_interest and this function now prints the tree at the position of interest. In addition, output filename was not using the option -o, which has been corrected.
Date: 30th August 2018
  • Implemented pipeline for calculating p-values for selection evidence. Updated corresponing entry in the documentation.
  • Bug fix in RelateSelection --mode Frequency: Previous version had a bug whenever two internal nodes had same age.
  • Fixed function RelateFileFormats --mode ConvertFromVcf which previously parsed the vcf incorrectly in some cases.
Date: 16th July 2018
  • Implemented parallelization of module EstimatePopulationSize.
Date: 30th June 2018
  • Bug fix in parsing function of haps/sample.
Date: 4th June 2018
  • Initial release.