Authors: Josep M. Badia [aut, cre] (https://orcid.org/0000-0002-5704-1124)
Last modified: 2021-12-03 16:51:03
Compiled: Fri Jan 7 20:40:41 2022
*This vignette describe in detail the annotate
function. It is advised a previous reading of the MS2ID introduction.
annotate
is the MS2ID function that annotates MS/MS query spectra; every query spectrum is compared with a reference library, and compounds with a similar spectrum are listed. This function requires an MS2ID reference library, as described in MS2ID introduction; here we will use the sample attached to MS2ID:
## Decompress the MS2ID library that comes with MS2ID
MS2IDzipFile <- system.file("extdata/MS2IDLibrary.zip", package = "MS2ID",
mustWork = TRUE)
library(utils)
MS2IDdirectory <- dirname(unzip(MS2IDzipFile, exdir = tempdir()))[1]
The following example shows the basic usage of the function annotate.
. Query spectra is provided pointing out the folder that contains the mzML files.
library(utils)
queryFile <- system.file("extdata/QRYspectra.zip", package = "MS2ID")
queryFolder <- file.path(tempdir(), "QRYspectra")
utils::unzip(queryFile, exdir = queryFolder)
#create the MS2ID object and annotate
library(MS2ID)
MS2IDobj <- MS2ID(MS2IDdirectory)
annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj)
Optionally, query spectra can also be provided under the form of an spectra object (Gatto, Rainer, and Gibb 2021). That facilitates the use of tools developed in the Spectra package to subset query spectra according to their retention time and MSLevel.
library(Spectra)
querySpectra <- Spectra::Spectra(dir(queryFolder, full.names = TRUE))
querySpectra <- querySpectra %>%
Spectra::filterMsLevel(2) %>%
Spectra::filterRt(c(100, 400))
#annotate
annotResult <- MS2ID::annotate(QRYdata = querySpectra, MS2ID = MS2IDobj)
The annotate
function returns an Annot
object.
This object stores the annotation so we can:
MS2IDgui
function.export2xlsx
function.hits()
: Returns a cross-reference data frame containing the annotation hits, the id of the spectra and compounds and:
qrySpectra()
: returns an Spectra
object (Spectra package) containing both successful query and consensus spectra (and their source spectra) (see consensus spectra).refSpectra()
: returns an Spectra
object (Spectra package) with the reference spectra present in the hits table.refCompound()
: returns a data frame containing metadata of the reference compounds present in the hits table.infoAnnotation()
: variables used on the annotate
function.In the example below, we use Spectra tools to browse the annotation results, although it is more advisable to use the visual browsing provided by the MS2IDgui
function.
#merge hits and compound info
result <- merge(x = hits(annotResult), y = refCompound(annotResult),
by.x = "idREFcomp", by.y = "id", all.y = FALSE)
head(result)
## idREFcomp cosine idREFspect idQRYspect propAdduct cmnMasses ID_db
## 1 1 0.9318017 32 2253 M+H 2 MoNA
## 2 1 0.9198663 29 2254 M+H 2 MoNA
## 3 1 0.9330611 29 2253 M+H 2 MoNA
## 4 1 0.9563727 30 2341 M+H 2 MoNA
## 5 1 0.9526843 33 2341 M+H 2 MoNA
## 6 1 0.9233419 32 2254 M+H 3 MoNA
## name formula exactmass inchikey smiles
## 1 Bilirubin C33H36N4O6 584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N NA
## 2 Bilirubin C33H36N4O6 584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N NA
## 3 Bilirubin C33H36N4O6 584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N NA
## 4 Bilirubin C33H36N4O6 584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N NA
## 5 Bilirubin C33H36N4O6 584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N NA
## 6 Bilirubin C33H36N4O6 584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N NA
library(Spectra)
#Subset spectra and metadata considering first hit query spectra
idQRYspect_1 <- result$idQRYspect[1]
result_1 <- dplyr::filter(result, idQRYspect == idQRYspect_1)
qrySpct_1 <- qrySpectra(annotResult)
qrySpct_1 <- qrySpct_1[qrySpct_1$id %in% result_1$idQRYspect]
refSpct_1 <- refSpectra(annotResult)
refSpct_1 <- refSpct_1[refSpct_1$id %in% result_1$idREFspect]
#compare query spectrum with its first hit reference spectrum
refSpct_draw <- refSpct_1[1]
refSpct_draw$intensity <- refSpct_draw$intensity/max(refSpct_draw$intensity)
qrySpct_1$intensity <- qrySpct_1$intensity/max(qrySpct_1$intensity)
plotSpectraMirror(qrySpct_1, refSpct_draw)
As a default, annotate
function tries to summarize adjacent MS/MS spectra into consensus spectra: the resulting consensus spectra will be annotated instead of the query spectra that summarizes (along with the query spectra non able to be consensued). This strategy diminishes artifacts and noise and reduce significantly the annotation time.
A group of adjacent query spectra is considered source for a consensus spectrum when all of them have the same precursor mass, collision energy and polarity. Also, every spectrum must be similar (cosine > consCos
argument) to the apex spectrum of the group and not too far away from it (less than 20 seconds). The resulting consensus spectrum will be formed by the fragments present in the majority of the source query spectra (ratio determined by the consComm
argument).
The final annot
object will contain not only the query spectra and the consensus spectra that succeeded in the annotation (i.e. with hits), but also the query spectra used to form the successful consensus spectra. Although it is recommended to use the MS2IDgui feature to elucidate the nature of the query spectra, it is also possible to check it by analyzing some of the variables contained in the annot
object.
The algorithm will not annotate the spectra with rol=3.
In annotation, subsetting the reference library not only reduces the computing time significantly -by taking advantage of the MS2ID backend and the fragments index: it also prunes the result and cuts off non-sense hits. For example, the cmnFrags = c(m, n)
argument limits the reference spectra so that reference spectra and query spectra have at least m peaks in common among their top n most intense peaks.
In addition, the cmnPrecMass
argument limits the reference spectra to those with the precursor mass of the query spectrum. On the other hand, cmnNeutralMass
limits reference spectra to those with a neutral mass plausible with the query precursor (considering all possible adducts).
annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj,
cmnFrags = c(3, 5),
cmnPrecMass = TRUE, cmnNeutralMass = TRUE,
cmnPolarity = TRUE, predicted = FALSE)
Other arguments subset the reference spectra according its experimental nature or the query spectrum polarization (predicted
and cmnPolarity
, respectively).
As a default, the annotate function uses cosine similarity as a metric to compare two spectra; its default threshold value to beat to consider the comparison a hit is 0.8.
The function also allows the simultaneous calculation of different metrics. In that case, a spectrum comparison is considered a hit when at least one of the metrics fulfills its threshold value. Note that to fulfill a threshold value has a different meaning depending on the metric: topsoe and squared_chord metrics return a lower number when the spectra are more similar so, unlike the rest, a hit will occur when the returned value is lower than its threshold.
annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj,
metrics = c("fidelity", "cosine", "topsoe"),
metricsThresh = c(0.6, 0.8, 0.6))
head(MS2ID::hits(annotResult))
## fidelity cosine topsoe idREFspect idQRYspect idREFcomp
## 2236.1 0.6182551 0.2691026 0.68745142 170 2236 10
## 2236.2 0.9486782 0.9867201 0.08079319 173 2236 10
## 2236.3 0.9750072 0.9981363 0.03686771 174 2236 10
## 2236.4 0.9849112 0.9994530 0.02179557 175 2236 10
## 2236.5 0.9070347 0.9408979 0.16563675 176 2236 10
## 2236.6 0.7294529 0.5787374 0.48779452 177 2236 10
## propAdduct cmnMasses
## 2236.1 M+H 6
## 2236.2 M+H 4
## 2236.3 M+H 3
## 2236.4 M+H 5
## 2236.5 M+H 5
## 2236.6 M+H 5
Moreover, the user can define its own metric by declaring a function as an argument; in the following example, foo
function uses cosine+1 as a distance metric.
foo <- function(finalMatrix){
vector1 <- finalMatrix[1,]
vector2 <- finalMatrix[2,]
CosplusOne <- 1+ suppressMessages(
philentropy::distance(rbind(vector1, vector2), method = "cosine")
)
names(CosplusOne) <- "CosplusOne"
return(CosplusOne)
}
annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj,
metrics = c("cosine"), metricsThresh = c(0.8),
metricFUN = foo, metricFUNThresh = 1.8)
head(MS2ID::hits(annotResult))
## cosine metricFunc.CosplusOne idREFspect idQRYspect idREFcomp
## 2236.2 0.9867201 1.986720 173 2236 10
## 2236.3 0.9981363 1.998136 174 2236 10
## 2236.4 0.9994530 1.999453 175 2236 10
## 2236.5 0.9408979 1.940898 176 2236 10
## 2236.9 0.9972315 1.997231 180 2236 10
## 2236.10 0.9994336 1.999434 181 2236 10
## propAdduct cmnMasses
## 2236.2 M+H 4
## 2236.3 M+H 3
## 2236.4 M+H 5
## 2236.5 M+H 5
## 2236.9 M+H 2
## 2236.10 M+H 5