Authors: Josep M. Badia [aut, cre] (https://orcid.org/0000-0002-5704-1124)
Last modified: 2021-12-03 16:51:03
Compiled: Fri Jan 7 20:40:41 2022

*This vignette describe in detail the annotate function. It is advised a previous reading of the MS2ID introduction.

Introduction

annotate is the MS2ID function that annotates MS/MS query spectra; every query spectrum is compared with a reference library, and compounds with a similar spectrum are listed. This function requires an MS2ID reference library, as described in MS2ID introduction; here we will use the sample attached to MS2ID:

## Decompress the MS2ID library that comes with MS2ID
MS2IDzipFile <- system.file("extdata/MS2IDLibrary.zip", package = "MS2ID",
                            mustWork = TRUE)
library(utils)
MS2IDdirectory <- dirname(unzip(MS2IDzipFile, exdir = tempdir()))[1]

The following example shows the basic usage of the function annotate.. Query spectra is provided pointing out the folder that contains the mzML files.

library(utils)
queryFile <- system.file("extdata/QRYspectra.zip", package = "MS2ID")
queryFolder <- file.path(tempdir(), "QRYspectra")
utils::unzip(queryFile, exdir = queryFolder)
#create the MS2ID object and annotate
library(MS2ID)
MS2IDobj <- MS2ID(MS2IDdirectory)
annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj)

Optionally, query spectra can also be provided under the form of an spectra object (Gatto, Rainer, and Gibb 2021). That facilitates the use of tools developed in the Spectra package to subset query spectra according to their retention time and MSLevel.

library(Spectra)
querySpectra <- Spectra::Spectra(dir(queryFolder, full.names = TRUE))
querySpectra <- querySpectra %>%
  Spectra::filterMsLevel(2) %>% 
  Spectra::filterRt(c(100, 400))
#annotate
annotResult <- MS2ID::annotate(QRYdata = querySpectra, MS2ID = MS2IDobj)

The annotate function returns an Annot object.

Annot object

This object stores the annotation so we can:

  • Browse graphically the results with the MS2IDgui function.
  • Export all the data as an xlsx file by using the export2xlsx function.
  • Extract any content by using the following getters (custom functions to get the info).
    • hits(): Returns a cross-reference data frame containing the annotation hits, the id of the spectra and compounds and:
      • propAdduct: the proposed adduct that would match the query precursor mass with the neutral mass of the reference compound.
      • cmnMasses: Number of fragments in common between the query and the reference spectrum
    • qrySpectra(): returns an Spectra object (Spectra package) containing both successful query and consensus spectra (and their source spectra) (see consensus spectra).
    • refSpectra(): returns an Spectra object (Spectra package) with the reference spectra present in the hits table.
    • refCompound(): returns a data frame containing metadata of the reference compounds present in the hits table.
    • infoAnnotation(): variables used on the annotate function.

In the example below, we use Spectra tools to browse the annotation results, although it is more advisable to use the visual browsing provided by the MS2IDgui function.

#merge hits and compound info
result <- merge(x = hits(annotResult), y = refCompound(annotResult), 
                by.x = "idREFcomp", by.y = "id", all.y = FALSE)
head(result)
##   idREFcomp    cosine idREFspect idQRYspect propAdduct cmnMasses ID_db
## 1         1 0.9318017         32       2253        M+H         2  MoNA
## 2         1 0.9198663         29       2254        M+H         2  MoNA
## 3         1 0.9330611         29       2253        M+H         2  MoNA
## 4         1 0.9563727         30       2341        M+H         2  MoNA
## 5         1 0.9526843         33       2341        M+H         2  MoNA
## 6         1 0.9233419         32       2254        M+H         3  MoNA
##        name    formula exactmass                    inchikey smiles
## 1 Bilirubin C33H36N4O6  584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N     NA
## 2 Bilirubin C33H36N4O6  584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N     NA
## 3 Bilirubin C33H36N4O6  584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N     NA
## 4 Bilirubin C33H36N4O6  584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N     NA
## 5 Bilirubin C33H36N4O6  584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N     NA
## 6 Bilirubin C33H36N4O6  584.2635 BPYKTIZUTYGOLE-IFADSCNNSA-N     NA
library(Spectra)
#Subset spectra and metadata considering first hit query spectra
idQRYspect_1 <- result$idQRYspect[1]
result_1 <- dplyr::filter(result, idQRYspect  == idQRYspect_1)
qrySpct_1 <- qrySpectra(annotResult)
qrySpct_1 <- qrySpct_1[qrySpct_1$id %in% result_1$idQRYspect]
refSpct_1 <- refSpectra(annotResult)
refSpct_1 <- refSpct_1[refSpct_1$id %in% result_1$idREFspect]
#compare query spectrum with its first hit reference spectrum
refSpct_draw <- refSpct_1[1]
refSpct_draw$intensity <- refSpct_draw$intensity/max(refSpct_draw$intensity)
qrySpct_1$intensity <- qrySpct_1$intensity/max(qrySpct_1$intensity)
plotSpectraMirror(qrySpct_1, refSpct_draw)

Consensus spectra

As a default, annotate function tries to summarize adjacent MS/MS spectra into consensus spectra: the resulting consensus spectra will be annotated instead of the query spectra that summarizes (along with the query spectra non able to be consensued). This strategy diminishes artifacts and noise and reduce significantly the annotation time.
A group of adjacent query spectra is considered source for a consensus spectrum when all of them have the same precursor mass, collision energy and polarity. Also, every spectrum must be similar (cosine > consCos argument) to the apex spectrum of the group and not too far away from it (less than 20 seconds). The resulting consensus spectrum will be formed by the fragments present in the majority of the source query spectra (ratio determined by the consComm argument).
The final annot object will contain not only the query spectra and the consensus spectra that succeeded in the annotation (i.e. with hits), but also the query spectra used to form the successful consensus spectra. Although it is recommended to use the MS2IDgui feature to elucidate the nature of the query spectra, it is also possible to check it by analyzing some of the variables contained in the annot object.

  • When the spectrum is a consensus spectrum, spectrumId_CONS and rtime_CONS variables show the id and the retention time of its source query spectra.
  • rol is an integer that makes explicit the nature of the spectrum:
    • 1: query spectrum not grouped so it does not form consensus.
    • 2: query spectrum grouped but not similar to the apex spectrum; it does not form consensus.
    • 3: query spectrum source for a consensus spectrum.
    • 4: consensus spectrum.

The algorithm will not annotate the spectra with rol=3.

qrySpct <- qrySpectra(annotResult)
#shor query data concerning consensus formation
head(Spectra::spectraData(qrySpct, 
                          c("id", "spectrumId_CONS", "rtime_CONS", "rol")))

Subsetting the reference library

In annotation, subsetting the reference library not only reduces the computing time significantly -by taking advantage of the MS2ID backend and the fragments index: it also prunes the result and cuts off non-sense hits. For example, the cmnFrags = c(m, n) argument limits the reference spectra so that reference spectra and query spectra have at least m peaks in common among their top n most intense peaks.
In addition, the cmnPrecMass argument limits the reference spectra to those with the precursor mass of the query spectrum. On the other hand, cmnNeutralMass limits reference spectra to those with a neutral mass plausible with the query precursor (considering all possible adducts).

annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj, 
                        cmnFrags = c(3, 5),
                        cmnPrecMass = TRUE, cmnNeutralMass = TRUE, 
                        cmnPolarity = TRUE, predicted = FALSE)

Other arguments subset the reference spectra according its experimental nature or the query spectrum polarization (predicted and cmnPolarity, respectively).

Annotation with different metrics

As a default, the annotate function uses cosine similarity as a metric to compare two spectra; its default threshold value to beat to consider the comparison a hit is 0.8.
The function also allows the simultaneous calculation of different metrics. In that case, a spectrum comparison is considered a hit when at least one of the metrics fulfills its threshold value. Note that to fulfill a threshold value has a different meaning depending on the metric: topsoe and squared_chord metrics return a lower number when the spectra are more similar so, unlike the rest, a hit will occur when the returned value is lower than its threshold.

annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj,
                        metrics = c("fidelity", "cosine", "topsoe"),
                        metricsThresh = c(0.6, 0.8, 0.6))
head(MS2ID::hits(annotResult))
##         fidelity    cosine     topsoe idREFspect idQRYspect idREFcomp
## 2236.1 0.6182551 0.2691026 0.68745142        170       2236        10
## 2236.2 0.9486782 0.9867201 0.08079319        173       2236        10
## 2236.3 0.9750072 0.9981363 0.03686771        174       2236        10
## 2236.4 0.9849112 0.9994530 0.02179557        175       2236        10
## 2236.5 0.9070347 0.9408979 0.16563675        176       2236        10
## 2236.6 0.7294529 0.5787374 0.48779452        177       2236        10
##        propAdduct cmnMasses
## 2236.1        M+H         6
## 2236.2        M+H         4
## 2236.3        M+H         3
## 2236.4        M+H         5
## 2236.5        M+H         5
## 2236.6        M+H         5

Moreover, the user can define its own metric by declaring a function as an argument; in the following example, foo function uses cosine+1 as a distance metric.

foo <- function(finalMatrix){
  vector1 <- finalMatrix[1,]
  vector2 <- finalMatrix[2,]
  CosplusOne <- 1+ suppressMessages(
    philentropy::distance(rbind(vector1, vector2), method = "cosine")
    )
  names(CosplusOne) <- "CosplusOne"
  return(CosplusOne)
}

annotResult <- annotate(QRYdata = queryFolder, MS2ID = MS2IDobj,
                        metrics = c("cosine"), metricsThresh = c(0.8),
                        metricFUN = foo, metricFUNThresh = 1.8)
head(MS2ID::hits(annotResult))
##            cosine metricFunc.CosplusOne idREFspect idQRYspect idREFcomp
## 2236.2  0.9867201              1.986720        173       2236        10
## 2236.3  0.9981363              1.998136        174       2236        10
## 2236.4  0.9994530              1.999453        175       2236        10
## 2236.5  0.9408979              1.940898        176       2236        10
## 2236.9  0.9972315              1.997231        180       2236        10
## 2236.10 0.9994336              1.999434        181       2236        10
##         propAdduct cmnMasses
## 2236.2         M+H         4
## 2236.3         M+H         3
## 2236.4         M+H         5
## 2236.5         M+H         5
## 2236.9         M+H         2
## 2236.10        M+H         5

References

Gatto, Laurent, Johannes Rainer, and Sebastian Gibb. 2021. Spectra: Spectra Infrastructure for Mass Spectrometry Data. https://github.com/RforMassSpectrometry/Spectra.