R/archaic_fit.R
archaic_fit.Rd
Performs a mixed membership model clustering of aDNA samples using DNA damage patterns- mutation, flanking base, distance from read end and strand break information. The default implementation of this model follows the modeling framework of Shiraishi et al (2015).
archaic_fit( dat, K, tol = 0.1, labs = NULL, gom_method = "independent", gom.control = list(), output_dir = NULL )
dat | Either
(a) output from |
---|---|
K | the number of clusters to fit to the model. |
tol | The tolerance level of convergence of the GoM model fit |
labs | The factor of labels used to group the samples in visualization. May be used to distinguish samples from different labs, or different library prep. |
gom_method | The GoM method type. Defaults to |
gom.control | Control parameters for the GoM model fit. |
output_dir | The output directory where the model is saved. If NULL, it picks the current working directory. |
Fits a GoM model on the aggregated data from archaic_prepare
and outputs both the clusters (represented by mismatch signature frequencies)
and the mixing proportion of clusters represented in each sample/MFF file.
It also returns an assessment score like the BIC, to compare the models.
Taddy2012. Taddy, M., 2012, March. On estimation and selection for topic models. In Artificial Intelligence and Statistics (pp. 1184-1193).
Shiraishi2015. Shiraishi, Y., Tremmel, G., Miyano, S. and Stephens, M., 2015. A simple model-based approach to inferring and visualizing cancer mutation signatures. PLoS genetics, 11(12), p.e1005657.