Grade of Membership (GoM) model clustering of aDNA samples using DNA damage patterns

Performs a mixed membership model clustering of aDNA samples using DNA damage patterns- mutation, flanking base, distance from read end and strand break information. The default implementation of this model follows the modeling framework of Shiraishi et al (2015).

archaic_fit(
  dat,
  K,
  tol = 0.1,
  labs = NULL,
  gom_method = "independent",
  gom.control = list(),
  output_dir = NULL
)

Arguments

dat	Either (a) output from `archaic_prepare` or (b) a vector of directories hosting the MFF files that the user wants to jointly model (c) a matrix of counts with samples along the rows and the mismatch signatures along the columns with entries reporting the counts.
K	the number of clusters to fit to the model.
tol	The tolerance level of convergence of the GoM model fit
labs	The factor of labels used to group the samples in visualization. May be used to distinguish samples from different labs, or different library prep.
gom_method	The GoM method type. Defaults to `independent` model proposed by Shiraishi2015. The other option is to use the `full` model which is uses the implementation due to Taddy2012.
gom.control	Control parameters for the GoM model fit.
output_dir	The output directory where the model is saved. If NULL, it picks the current working directory.

Value

Fits a GoM model on the aggregated data from archaic_prepare and outputs both the clusters (represented by mismatch signature frequencies) and the mixing proportion of clusters represented in each sample/MFF file. It also returns an assessment score like the BIC, to compare the models.

References

Taddy2012. Taddy, M., 2012, March. On estimation and selection for topic models. In Artificial Intelligence and Statistics (pp. 1184-1193).

Shiraishi2015. Shiraishi, Y., Tremmel, G., Miyano, S. and Stephens, M., 2015. A simple model-based approach to inferring and visualizing cancer mutation signatures. PLoS genetics, 11(12), p.e1005657.