For each file in a directory in the directory vector dirs, aggregates the mismatch feature counts data from the Mismatch Feature Format (MFF) files to form a single .RData file that is saved in the given directory.

archaic_prepare(
  dirs,
  max_pos = 20,
  one_mismatch = FALSE,
  from_scratch = FALSE,
  delete = FALSE,
  output_rda = TRUE
)

Arguments

dirs

The directory/directories containing the MFF files.

max_pos

The maximum position from the ends of the reads for which mismatches are considered in aRchaic.

one_mismatch

Boolean indicating whether to use only reads containing one mismatch. Defaults to FALSE.

from_scratch

Boolean indicating whether to perform data preparation from scratch and regenerate the .RData file. The alternative would be look for the .RData file and update that file accordingly, based on the MFF (.csv) files present in the directory. Defaults to FALSE.

delete

Boolean indicating whether to delete from the .RData file, if present, rows corresponding to the files that are not present in the directory.

output_rda

If non-NULL, the processed data for each directory in dirs is saved as a .Rdata file.

Value

Returns a list with number of elements same as the number of elements/directories in the directory vectors dirs. Each of these elements is a matrix with rows being samples (each MFF file), the columns representing the mismatch signatures (comprising of features like mismatch type, flanking bases and strand break information) and the cells of the matrix reporting counts of the number of times a mutational signature of a given type is observed in an MFF file.