This page provides information about the computer program SPASIBA, an R package for spatial continuous assignment from genetic data.

SPASIBA provides functions to perform the following tasks:

Simulating data from a geostatistical model (function

`SPASIBA.sim`

)Inferring parameters of a covariance function model of spatial genetic variation (function

`SPASIBA.inf`

)Performing spatial prediction of allele frequencies (function

`SPASIBA.inf`

)Performing spatial assignment of individuals of unknown geographic origin (function

`SPASIBA.inf`

)

To run SPASIBA you need to have different things installed first: R, the R packages `INLA`

, `RandomFields`

and the R package `SPASIBA`

itself.

To instal R, follow instructions from CRAN.

To instal INLA, follow instructions from the R-INLA project homepage.

To install packages

`RandomFields`

, type in the R prompt

`install.packages('RandomFields')`

To install SPASIBA:

Download the file SPASIBA_0.0.2.tar.gz on your computer.

Assuming below that the file SPASIBA_0.0.2.tar.gz is stored in a folder named Downloads in your home folder (but it can be anywhere else), open R and set Downloads as working directory for your R session. You can do so in the menu Session -> Set Working Directory or by typing something along the line of:

`setwd('/home/gilles/Downloads')`

The folder ‘Downloads’ is now the working directory for your current R session (the folder R will look at in the first place for data reading and writing).

Now type the line below in the R command-line:

`install.packages(pkgs='SPASIBA_0.0.2.tar.gz' , repos=NULL , type='source')`

You can check that `SPASIBA`

has been installed correctly by trying to load it:

`library(SPASIBA)`

To use `SPASIBA`

, you need to have four data matrices under your R session:

A matrix of allele counts for the various reference (or training) populations. One row per population, one column per locus. Missing data not allowed.

A matrix with one row per population and one column per locus giving haploid population sample size. Missing data not allowed. Missingness of genotypes in the reference population is handled in this way: an individual with SNP genotype {0,1} will resut in allele counts {0,1} and haploid population size 2. An individual with SNP genotype {NA,1} will resut in allele counts {0,1} and haploid population size 1.

A matrix containing coordinates of reference sampling sites. One row per sampling site, two columns (xy cartesian coordinates or lon-lat). Missing data not allowed.

A matrix of genotypes of individuals of unknwon geographic origin. One row per indivdual, one column per locus. This should contain allele counts of an arbitrary reference allele at each SNP locus, hence 0,1, 2 or NA (missing data are allowed here).

Assuming these matrices exist somewhere as plain text files on your disk, you can read them from R with the `read.table`

function. If you have doubt about the format of the data, you can open the various files on the SPASIBA homepage data folder. See below for an example.

The main function `SPASIBA.inf`

return various objects stacked in a list. This includes a matrix of estimated coordinates for individuals of unknown geographic origin.

Besides the present web page, users can find information about the various functions from the R on-line help,

`?SPASIBA.inf`

In the example below, the data are stored on a folder on the SPASIBA homepage. It can be also a folder on your local computer or anywhere else onthe web.

```
## reading coordinates of reference populations
coord.ref = read.table('https://i-pri.org/special/Biostatistics/Software/Spasiba/data/coord.ref.txt')
# reading allele counts of reference populations
geno.ref = read.table('https://i-pri.org/special/Biostatistics/Software/Spasiba/data/geno.ref.txt')
geno.ref = as.matrix(geno.ref)
# reading haploid reference population sizes
size.pop.ref = read.table('https://i-pri.org/special/Biostatistics/Software/Spasiba/data/size.pop.ref.txt')
size.pop.ref = as.matrix(size.pop.ref)
## reading genotypes of individuals of unknown geographic origin
geno.unknown = read.table('https://i-pri.org/special/Biostatistics/Software/Spasiba/data/geno.unknown.txt')
geno.unknown = as.matrix(geno.unknown)
## reading true coordinates of individuals assumed here to be of unknown geographic origin
## if you have such a file you don't need the SPASIBA program!
true.coord.unknown = read.table('https://i-pri.org/special/Biostatistics/Software/Spasiba/data/true.coord.unknown.txt')
```

You can check that the various data matrices have been loaded properly with `head`

function, e.g.

`head(coord.ref[1:10,]) ## here inspecting 10 first lines only`

```
## loading the packages
require(INLA)
require(SPASIBA)
## Calling SPASIBA function for inference, prediction and assignment
res <- SPASIBA.inf(geno.ref=geno.ref,
ploidy=2,
coord.ref=coord.ref,
sphere=FALSE,
size.pop.ref=size.pop.ref,
geno.unknown=geno.unknown,
make.inf=TRUE,
loc.infcov = 1:30,
make.pred=TRUE,
make.assign=TRUE)
```

The R object returned and stored in `res`

by the code above is a list (an object consisting of several objects). The estimated coordinates of samples of unknown geographic origins is named `coord.unknown.est`

. It can be accessed as `res$coord.unknown.est`

and for example plotted together with sampling sites by

```
plot(res$coord.unknown.est,pch=3,col=3,cex=1.3,lwd=2,xlab='',ylab='',asp=1,
axes=TRUE,ylim=c(0,1.2))
legend(col=c(3,2,4),pch=c(3,1,1),#cex=c(1.3,1,1.5),
legend=c('estimate','ref pops','true'),
x=.8,y=1.2,border=FALSE)
points(coord.ref,col=2,pch=1,cex=1)
points(true.coord.unknown,col=4,cex=1.5,lwd=2)
arrows(x0=true.coord.unknown[,1],
y0=true.coord.unknown[,2],
x1=res$coord.unknown.est[,1],
y1=res$coord.unknown.est[,2],
code=2,length=0.1,angle=10,lwd=.3)
```

See post on the Molecular Ecologist for inspiration.

The model and algorithm underlying the SPASIBA program are described in

- Guillot, G., Jónsson, H., Hinge, A., Manchih, N., & Orlando, L. (2015). Accurate continuous geographic assignment from low-to high-density SNP data. Bioinformatics, 32(7), 1106-1108.

The INLA method and the SPDE-GMRF model are presented in

F. Lindgren, H. Rue, and E. Lindstr ̈m. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society, series B, 73(4):423–498, 2011.

T. G. Martins, D. Simpson, F. Lindgren, and H. Rue. Bayesian computing with INLA : New features. Computational Statistics and Data Analysis, 67:68–83, 2013.

H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society, series B, 71(2):1–35, 2009.

H. Rue, S. Martino, F. Lindgren, D. Simpson, A. Riebler, and E. Krainski. INLA: Functions which allow to perform full Bayesian analysis of latent Gaussian models using Integrated Nested Laplace Approximaxion, 2014. http://www.r-inla.org/.

D. Simpson, F. Lindgren, and H. Rue. Think continuous : Markovian gaussian models in spatial statistic. Spatial Statistics, 1:16–29,

b i o s t a t i s t i c s @ i-pri.org