
Relate estimates genome-wide genealogies in the form of trees that adapt to changes in local ancestry caused by recombination. The method, which is scalable to thousands of samples, is described in the following paper. Please cite this paper if you use our software in your study.
Citations:
Relate is available for academic use. To see rules for non-academic use, please read the LICENCE file, which is included with each software download.
Alternatively, you can compile your own version by downloading the source code from this github repository.
In the downloaded directory, we have included a toy data set. You can try out Relate using this toy data set by following the instructions on our getting started page.
If you have any problems getting the program to work on your machine or would like to request an executable for a platform not shown here, please send a message to leo.speidel [at] outlook [dot] com.
We document changes to previous versions in a change-log.
We provide ancestral genomes, genomic mask files, precomputed coalescence rates, and recombination rates for hg37 and hg38 here.
Versions >1.1.0 include a few new features such as:
We deposited Relate-estimated coalescence rates, allele ages, and p-values for evidence of positive selection for the 1000 Genomes Project here.
These were obtained by estimating the joint genealogy of all 1000 GP populations and then extracted the embedded genealogy for each population.
For the genealogy of each population, we jointly estimated the population size history and branch lengths.
Variants segregating in more than one population therefore have correlated but different allele ages in each population.
You can download inferred trees from the links below:
I made an R package for handling Relate output files (MIT license). It's still under development but the main functionality is working.
You can install this package, e.g., usinglibrary(devtools)
devtools::install_github("leospeidel/relater")
This github repo contains functions to convert between Relate and tskit format (MIT license).
It doubles as a C++ library you can link to if you want to use some of our Relate functions in C++.
relate_lib/bin/Convert --mode ConvertFromTreeSequence \
--anc example.anc.gz \
--mut example.mut.gz \
-i example
relate_lib/bin/Convert --mode ConvertToTreeSequence \
--anc example.anc.gz \
--mut example.mut.gz \
-o example
Thanks to Nathaniel S. Pope, you can now also specify an argument that compresses these Relate-converted tree sequences by assigning the same age to nodes with identical descendant sets across adjacent trees.
relate_lib/bin/Convert --mode ConvertToTreeSequence \
--compress \
--anc example.anc.gz \
--mut example.mut.gz \
-o example
Relate can be used to plot a marginal tree of interest. This tree corresponds to the LCT region, where a mutation at SNP rs4988235 is believed to be responsible for Lactose tolerance in Europeans (here GBR). We can see that the derived allele at this SNP has spread rapidly in GBR, which is indicative of strong positive selection.
Relate can be used to estimate population sizes and separation histories between populations. The figure shows the separation history between FIN and GBR in the 1000GP data set. The inset shows the matrix of coalescent rates between pairs of haplotypes 9,000 years ago. Rows and columns are sorted by population labels of haplotypes, as indicated by the colour on the left of the matrix.
Relate can be used to estimate mutation rates through time. The figure shows TCC to TTC mutation rates for all 26 populations in the 1000GP data set. Trends shared between mutation categories are eliminated by dividing by the overall average mutation rate. For each population, the mutation rate is normalized such that its mean over time equals one. Consistent with previous estimates, we see an increase in the mutation rate of TCC to TTC mutations about 10,000 to 30,000 years ago in Europeans and Southern Asians.
Relate can be used to detect evidence for positive selection. We calculate a p-value for selection evidence that quantifies how quickly a mutation has spread in the population. The figure shows a manhattan plot for GBR which indicates clear peaks in the LCT and MHC regions, which are known targets of positive selection.
Relate can infer genealogies for non-contemporary samples, such as high coverage ancient genomes or time-stamped samples of bacteria or viruses. This plot was generated using the TreeViewSamples.sh script, using 100 sampled branch lengths. The tree represents the posterior mean times, and "error bars" indicate the 0.025 and 0.975 quantiles of the posterior density of coalescence ages.
Using the SampleBranchLengths.sh script, Relate can sample branch lengths from the posterior, which can then be used e.g., by CLUES to infer allele frequency trajectories and selection coefficients using an importance sampling scheme. See the CLUES repo for more details.