library(methClust)
Here we present a small demo of how methClust can be used to fit a Grade of Membership model on a simulated example. This example will be replaced with a more appropriate real example with the further development of this package.
We first define the grades of membership matrix \(\omega_{N \times K}\) where \(N\) is the number of samples and \(K\) is the number of clusters
n.out <- 500
omega_sim <- rbind( cbind( rep(1, n.out), rep(0, n.out)),
cbind( rep(0, n.out), rep(1, n.out)),
cbind( seq(0.6, 0.4, length.out = n.out),
1- seq(0.6, 0.4,length.out=n.out)) )
K <- dim(omega_sim)[2]
barplot(t(omega_sim),
col = 2:(K+1),
axisnames = F, space = 0,
border = NA,
main=paste("No. of clusters=", K),
las=1, ylim = c(0,1), cex.axis=1.5,cex.main=1.4)
Next we define the cluster profiles \(f_{K \times B}\) where \(0 \leq f_{kb} \leq 1\) and \(B\) denotes the number of genomic bins that the genome has been partitioned into.
m.out <- 200
freq_sim <- cbind(c(rep(0.8, m.out), rep(0.2, m.out), rep(0.5, m.out), rep(0.01, m.out)),
c(rep(0.01, m.out), rep(0.01, m.out), rep(0.5, m.out), rep(0.8, m.out)))
We next define matrices for the number of methylated sites M and unmethylated sites U per sample annd per bin, which are assumed to obey Equation 1.
prob <- omega_sim %*% t(freq_sim)
Y <- matrix(rpois(dim(prob)[1]*dim(prob)[2], 1000), dim(prob)[1], dim(prob)[2])
M <- matrix(0, dim(Y)[1], dim(Y)[2])
for(m in 1:dim(Y)[1]){
for(n in 1:dim(Y)[2]){
M[m,n] <- rbinom(1, Y[m,n], prob = prob[m,n])
}
}
U = Y - M
Check the dimensions the methylation matrix M and the unmethylation matrix (they should be same).
dim(M)
## [1] 1500 800
dim(U)
## [1] 1500 800
Now we fit methClust
topics <- meth_topics(M, U, K=2, tol = 10, use_squarem = FALSE)
##
## Estimating on a 1500 samples collection.
## log posterior increase: 112921576.8, 10933.8, 346.4, 347.9, 127.3, 38.2, 21.2, 14.6, 11.1, done.
The esimated bar chart representation of the grades of membership are as follows
barplot(t(topics$omega),
col = 2:(K+1),
axisnames = F, space = 0,
border = NA,
main=paste("No. of clusters=", K),
las=1, ylim = c(0,1), cex.axis=1.5,cex.main=1.4)
This R Markdown site was created with workflowr