Paired Mass Distance(PMD) analysis for GC/LC-MS based non-targeted analysis
BY Miao Yu 2021-01-21 翻译by fallingstar 2022-05-18
Introduction of Paired Mass Distance analysis
pmd
package use Paired Mass Distance1 (PMD) relationship to analysis the GC/LC-MS based non-targeted data. PMD means the distance between two masses or mass to charge ratios 2. In mass spectrometry, PMD would keep the same value between two masses and two mass to charge ratios(m/z). There are two kinds of PMD involved in this package: PMD from the same compound and PMD from different compounds. In GC/LC-MS or XCMS based non-targeted data analysis, peaks could be separated by chronograph and same compound means ions from similar retention times or ions co-eluted by certain column.
pmd
包使用成对质量距离(pmd)关系来分析基于GC/LC-MS的非靶向数据。PMD是指两个质量之间的距离或质量电荷比。在质谱分析中,PMD将在两个质量和两个质荷比(m/z)之间保持相同的值。此包中涉及两种PMD:来自同一化合物的PMD和来自不同化合物的PMD。在基于GC/LC-MS或XCMS的非靶向数据分析中,色谱峰可通过计时器分离,相同的化合物(表示保留时间相似的离子或通过特定柱共同洗脱的离子)
PMD from the same compound
For MS1 full scan data, we could build retention time 3 (RT) bins to assign peaks into different RT groups by retention time hierarchical clustering analysis. For each RT group, the peaks should come from same compounds or co-elutes. If certain PMD appeared in multiple RT groups, it would be related to the relationship about adducts, neutral loss, isotopologues or common fragments ions.
对于MS1全扫描数据,我们可以构建保留时间(RT)箱,通过保留时间层次聚类分析将峰值分配到不同的RT组。对于每个RT组,峰应来自相同的化合物或共洗脱物。如果某些PMD出现在多个RT组中,则可能与加合物、中性丢失、同位素或常见碎片离子的关系有关。
PMD from different compounds
The peaks from different retention time groups would like to be different compounds separated by chronograph. The PMD would reflect the relationship about homologous series or chemical reactions.
不同保留时间组的峰可能是用计时器分离的不同化合物。PMD可以反映同系物系列或化学反应之间的关系。
GlobalStd algorithm use the PMD within same RT group to find independent peaks among certain data set. Then, structure/reaction directed analysis use PMD from different RT groups to screen important compounds or reactions.
GlobalStd算法使用同一RT组内的PMD在特定数据集中找到独立的峰值。然后,使用来自不同RT组的PMD来筛选重要的化合物或反应从而进行结构/反应导向分析。
Data format
The input data should be a list
object with at least two elements from a peaks list:
- mass to charge ratio with name of
mz
, high resolution mass spectrometry is required - retention time with name of
rt
However, I suggested to add intensity 4 and group information to the list for validation of PMD analysis.
输入数据应为“list”对象,其中至少有两个组成部分来自峰值列表:
- 质量电荷比,名称为“mz”,需要高分辨率质谱
- 名称为“rt”的保留时间` 然而,我建议将强度和组信息添加到列表中,以验证PMD分析。
In this package, a data set from in vivo solid phase micro-extraction(SPME) was attached. This data set contain 9 samples from 3 fish with triplicates samples for each fish. Here is the data structure:
library(pmd)
data("spmeinvivo")
str(spmeinvivo)
#> List of 4
#> $ data : num [1:1459, 1:9] 1095 10439 10154 2797 90211 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:1459] "100.1/170" "100.5/86" "101/85" "103.1/348" ...
#> .. ..$ : chr [1:9] "1405_Fish1_F1" "1405_Fish1_F2" "1405_Fish1_F3" "1405_Fish2_F1" ...
#> $ group:'data.frame': 9 obs. of 2 variables:
#> ..$ sample_name : chr [1:9] "1405_Fish1_F1" "1405_Fish1_F2" "1405_Fish1_F3" "1405_Fish2_F1" ...
#> ..$ sample_group: chr [1:9] "fish1" "fish1" "fish1" "fish2" ...
#> $ mz : num [1:1459] 100 101 101 103 104 ...
#> $ rt : num [1:1459] 170.2 86.3 84.9 348.1 48.8 ...
You could build this list
or mzrt
object from the xcms
objects via enviGCMS
package. When you have a xcmsSet
object or XCMSnExp
object named xset
, you could use enviGCMS::getmzrt(xset)
to get such list. Of course you could build such list by yourself.
GlobalStd algorithm
GlobalStd algorithm try to find independent peaks among certain peaks list. The first step is retention time hierarchical clustering analysis. The second step is to find the relationship among adducts, neutral loss, isotopologues and common fragments ions. The third step is to screen the independent peaks.
GlobalStd算法试图在特定的峰值列表中找到独立的峰值。第一步是保留时间层次聚类分析。第二步是找出加合物、中性损失、同位素和常见碎片离子之间的关系。第三步是筛选独立峰。
Here is a workflow for this algorithm:
knitr::include_graphics('https://yufree.github.io/presentation/figure/GlobalStd.png')
STEP1: Retention time hierarchical clustering 保留时间层次聚类分析
pmd <- getpaired(spmeinvivo)
#> 75 retention time cluster found.
#> 369 paired masses found
#> 5 unique within RT clusters high frequency PMD(s) used for further investigation.
#> The unique within RT clusters high frequency PMD(s) is(are) 28.03 21.98 44.03 17.03 18.01.
#> 719 isotopologue(s) related paired mass found.
#> 492 multi-charger(s) related paired mass found.
plotrtg(pmd)
This plot would show the distribution of RT groups. The rtcutoff
in function getpaired
could be used to set the cutoff of the distances in retention time hierarchical clustering analysis. Retention time cluster cutoff should fit the peak picking algorithm. For HPLC, 10 is suggested and 5 could be used for UPLC.
此图将显示RT组的分布。函数“getpaired”中的“rtcutoff”可用于设置保留时间层次聚类分析中距离的截止值。保留时间群集截止应符合峰值拾取算法。对于HPLC,建议使用10,UPLC可使用5。
Global PMD’s retention time group numbers should be around 20 percent of the retention time cluster numbers. For example, if you find 100 retention time clusters, I suggested you use 20 as the cutoff of empirical global PMD’s retention time group numbers. If you don’t specificlly assign a value to ng
, the algorithm will select such recommendation by default setting.
全局PMD的保留时间组数应为保留时间组总数的20%左右。例如,如果您总共找到100个保留时间组,我建议您经验性地使用20作为全局PMD保留时间组数的阈值。如果没有为“ng”变量指定特定的值,算法将在默认设置下选择此类建议。
Take care of the retention time cluster with lots of peaks. In this case, such cluster could be co-eluted compounds on certain column. It would be wise to trim the retention time window for high quality peaks. Another important hint is that pre-filter your peak list by black samples 5 or other quality control samples. Otherwise the running time would be long and lots of pmd relationship would be just from noise.
在一个保留时间组中有很多峰值的情况下要小心。这种簇可以是特定柱上的共洗脱化合物。明智的做法是调整保留时间窗口,以获得高质量的峰值。另一个重要提示是,通过黑色样本或其他质量控制样本对峰值列表进行预过滤。否则程序的运行时间会很长,而且计算到的很多pmd关系都是由噪声引起的(并没有意义)。
STEP2: Relationship among adducts, neutral loss, isotopologues and common fragments ions 找出加合物、中性损失、同位素和常见碎片离子之间的关系
The ng
in function getpaired
could be used to set cutoff of global PMD’s retention time group numbers. If ng
is 10, at least 10 of the retention time groups should contain the shown PMD relationship. You could use plotpaired
to show the distribution.
函数“getpaired”中的“ng”可用于设置全局PMD保留时间组数的阈值。如果“ng”为10,则至少10个保留时间组应包含在预定的PMD关系之中。可以使用“plotpaired”显示其分布情况。
plotpaired(pmd)
You could also show the distribution of PMD relationship by index:
# show the unique PMD found by getpaired function
for(i in 1:length(unique(pmd$paired$diff2))){
diff <- unique(pmd$paired$diff2)[i]
index <- pmd$paired$diff2 == diff
plotpaired(pmd,index)
}
This is an easy way to find potential adducts of the data by high frequency PMD from the same compound. For example, 21.98 Da could be the mass distances between [M+H]+[M+H]+ and [M+Na]+[M+Na]+. In this case, user could find the potential adducts or neutral loss even when they have no preferred adducts list. If one adduct exist in certain analytical system, the high frequency PMD will reveal such relationship. The high frequency PMD list could also be used to check the fragmental pattern of in-source reactions as long as such patterns are popular among all collected ions.
这是一种通过高频PMD从同一化合物数据中找到潜在加合物的简单方法。例如,21.98 Da是[M+H]+[M+H]+和[M+Na]+[M+Na]+之间的质量距离。在这种情况下,用户可以找到潜在的加合物或中性损失,即使他们没有首选加合物列表。如果某一分析体系中存在一个加合物,高频PMD将揭示这种关系。高频PMD列表也可用于检查源内反应的碎片模式,只要这些模式在所有收集的离子中都很普遍。
STEP3: Screen the independent peaks
You could use getstd
function to get the independent peaks. Independent peaks mean the peaks list removing the redundant peaks such as adducts, neutral loss, isotopologues and comment fragments ions found by PMD analysis in STEP2. Ideally, those peaks could be molecular ions while they might still contain redundant peaks.
您可以使用“getstd”函数来获取独立的峰值。独立峰是指去除步骤2中PMD分析发现的加合物、中性损失、同位素和注释碎片离子等冗余峰的峰列表。理想情况下,这些峰可能是分子离子,而它们可能仍然包含冗余峰。
std <- getstd(pmd)
#> 8 retention group(s) have single peaks. 14 23 32 33 54 55 56 75
#> 11 group(s) with multiple peaks while no isotope/paired relationship 4 5 7 8 11 41 42 49 68 72 73
#> 9 group(s) with multiple peaks with isotope without paired relationship 2 9 22 26 52 62 64 66 70
#> 4 group(s) with paired relationship without isotope 1 10 15 18
#> 43 group(s) with paired relationship and isotope 3 6 12 13 16 17 19 20 21 24 25 27 28 29 30 31 34 35 36 37 38 39 40 43 44 45 46 47 48 50 51 53 57 58 59 60 61 63 65 67 69 71 74
#> 291 std mass found.
Here you could plot the peaks by plotstd
function to show the distribution of independent peaks:
plotstd(std)
You could also plot the peaks distribution by assign a retention time group via plotstdrt
:
par(mfrow = c(2,3))
plotstdrt(std,rtcluster = 23,main = 'Retention time group 23')
plotstdrt(std,rtcluster = 9,main = 'Retention time group 9')
plotstdrt(std,rtcluster = 18,main = 'Retention time group 18')
plotstdrt(std,rtcluster = 67,main = 'Retention time group 67')
plotstdrt(std,rtcluster = 49,main = 'Retention time group 49')
plotstdrt(std,rtcluster = 6,main = 'Retention time group 6')
Extra filter with correlation coefficient cutoff 带有相关系数阈值的额外滤波器
Original GlobalStd algorithm only use mass to charge ratio and retention time of peaks to select independent peaks. However, if intensity data across samples are available, correlation coefficient of paired ions could be used to further filter the random noise in high frequency PMDs. You could set up cutoff of Pearson Correlation Coefficient between peaks to refine the peaks selected by GlobalStd within same retention time groups. In this case, the numbers of selected independent peaks will be further reduced. When you use this parameter, make sure the intensity data are from real samples instead of blank samples, which will affect the calculation of correlation coefficient.
原GlobalStd算法仅利用峰的质量电荷比和保留时间来选择独立的峰。然而,如果样本间的强度数据可用,则可以使用成对离子的相关系数来进一步过滤高频PMD中的随机噪声。您可以设置峰值之间Pearson相关系数的阈值,以优化GlobalStd在相同保留时间组内选择的峰值。在这种情况下,所选独立峰的数量将进一步减少。使用此参数时,请确保强度数据来自真实样本,而不是空白样本,这将影响相关系数的计算。
std2 <- getstd(pmd,corcutoff = 0.9)
#> 8 retention group(s) have single peaks. 14 23 32 33 54 55 56 75
#> 23 group(s) with multiple peaks while no isotope/paired relationship 2 4 5 7 8 10 11 15 18 26 35 39 41 42 49 50 59 62 68 69 70 72 73
#> 14 group(s) with multiple peaks with isotope without paired relationship 9 12 22 24 27 28 34 51 52 57 60 64 66 71
#> 3 group(s) with paired relationship without isotope 1 53 74
#> 27 group(s) with paired relationship and isotope 3 6 13 16 17 19 20 21 25 29 30 31 36 37 38 40 43 44 45 46 47 48 58 61 63 65 67
#> 120 std mass found.
Validation by principal components analysis(PCA) 主要成分分析验证
You need to check the GlobalStd algorithm’s results by principal components analysis(PCA). If we removed too much peaks containing information, the score plot of reduced data set would show great changes.
您需要通过主成分分析(PCA)检查GlobalStd算法的结果。如果我们删除了太多包含信息的峰值,则缩减数据集的得分图将显示出巨大的变化。
library(enviGCMS)
par(mfrow = c(2,2),mar = c(4,4,2,1)+0.1)
plotpca(std$data,lv = as.numeric(as.factor(std$group$sample_group)),main = "all peaks")
plotpca(std$data[std$stdmassindex,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(std$stdmassindex),"independent peaks"))
plotpca(std2$data[std2$stdmassindex,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(std2$stdmassindex),"reduced independent peaks"))
You might find original GlobalStd algorithm show a similar PCA score plot with original data while GlobalStd algorithm considering intensity data seems change the profile. The major reason is that correlation coefficient option in the algorithm will remove the paired ions without strong correlation. It will be aggressive to remove low intensity peaks, which are vulnerable by baseline noise. However, such options would be helpful if you only concern high quanlity peaks for following analysis. Otherwise, original GlobalStd will keep the most information for explorer purpose.
您可能会发现,原始GlobalStd算法显示了与原始数据类似的PCA得分图,而考虑强度数据的GlobalStd算法似乎会改变轮廓。主要原因是算法中的相关系数选项会去除相关性不强的成对离子。去除易受基线噪声影响的低强度峰值将是一种积极的做法。然而,如果您只关注后续分析的高质量峰值,则此类选项将非常有用。否则,原始的GlobalStd将保留大部分信息以供探索需要。
Comparison with other pseudo spectra extraction method 与其他伪谱提取方法的比较
GlobalStd algorithm in pmd
package could be treated as a method to extract pseudo spectra. You could use getcluster
to get peaks groups information for all GlobalStd peaks. This function would consider the merge of GlobalStd peaks when certain peak is involved in multiple clusters. Then you could choose export peaks with the highest intensities or base peaks in each GlobalStd merged peaks groups. Meanwhile, you could also include the correlation coefficient cutoff to further improve the data quality.
pmd包中的GlobalStd算法可以作为一种提取伪谱的方法。您可以使用“getcluster”获取所有GlobalStd峰值的峰值组信息。当某个峰值涉及多个集群时,此函数将考虑GlobalStd峰值的合并。然后,您可以在每个GlobalStd合并峰组中选择具有最高强度的输出峰或基峰。同时,还可以在其中使用相关系数阈值,以进一步提高数据质量。
stdcluster <- getcluster(std)
# extract pseudospectra for std peak 71
idx <- unique(stdcluster$cluster$largei[stdcluster$cluster$i==71])
plot(stdcluster$cluster$mz[stdcluster$cluster$largei==idx],stdcluster$cluster$ins[stdcluster$cluster$largei==idx],type = 'h',xlab = 'm/z',ylab = 'intensity',main = 'pseudo spectra for GlobalStd peak 71')
# export peaks with the highest intensities in each GlobalStd peaks groups.
data <- stdcluster$data[stdcluster$stdmassindex2,]
# considering the correlation coefficient cutoff
stdcluster2 <- getcluster(std, corcutoff = 0.9)
# considering the correlation coefficient cutoff for both psedospectra extraction and GlobalStd algorithm
stdcluster3 <- getcluster(std2, corcutoff = 0.9)
We supplied getcorcluster
to find peaks groups by correlation analysis only. The base peaks of correlation cluster were selected to stand for the compounds.
我们提供了“getcorcluster”,仅通过相关分析找到峰群。选择相关簇的基峰代表化合物。
corcluster <- getcorcluster(spmeinvivo)
#> 75 retention time cluster found.
# extract pseudospectra 1@46
peak <- corcluster$cluster[corcluster$cluster$largei == '1@46',]
plot(peak$ins~peak$mz,type = 'h',xlab = 'm/z',ylab = 'intensity',main = 'pseudo spectra for correlation cluster')
Then we could compare the compare reduced result using PCA similarity factor. A good peak selection algorithm could show a high PCA similarity factor compared with original data set while retain the minimized number of peaks.
然后,我们可以使用PCA相似因子来比较减少的结果。一个好的峰值选择算法可以在保持最小峰值数的同时,与原始数据集相比显示出较高的PCA相似度。
par(mfrow = c(3,3),mar = c(4,4,2,1)+0.1)
plotpca(std$data[std$stdmassindex,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(std$stdmassindex),"independent peaks"))
plotpca(std$data[stdcluster$stdmassindex2,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(stdcluster$stdmassindex2),"independent base peaks"))
plotpca(std$data[stdcluster2$stdmassindex2,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(stdcluster2$stdmassindex2),"independent reduced base peaks"))
plotpca(std$data[corcluster$stdmassindex,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(corcluster$stdmassindex),"peaks without correlationship"))
plotpca(std$data[corcluster$stdmassindex2,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(corcluster$stdmassindex2),"base peaks without correlationship"))
plotpca(std$data,lv = as.numeric(as.factor(std$group$sample_group)),main = paste(nrow(std$data),"all peaks"))
plotpca(std$data[stdcluster3$stdmassindex2,],lv = as.numeric(as.factor(std$group$sample_group)),main = paste(sum(stdcluster3$stdmassindex2),"reduced independent base peaks"))
pcasf(std$data, std$data[std$stdmassindex,])
#> pcasf
#> 0.9993497
pcasf(std$data, std$data[stdcluster$stdmassindex2,])
#> pcasf
#> 0.9993578
pcasf(std$data, std$data[stdcluster2$stdmassindex2,])
#> pcasf
#> 0.999346
pcasf(std$data, std$data[corcluster$stdmassindex,])
#> pcasf
#> 0.9471586
pcasf(std$data, std$data[corcluster$stdmassindex2,])
#> pcasf
#> 0.9497193
pcasf(std$data, std$data[stdcluster3$stdmassindex2,])
#> pcasf
#> 0.713527
In this case, five peaks selection algorithms are fine to stand for the original peaks with PCA similarity score larger than 0.9. However, the independent base peaks retain the most information with relative low numbers of peaks.
在此案例,五个峰值选择算法可以很好地代表PCA相似度得分大于0.9的原始峰值。然而在相对峰数较少的情况下,独立的基峰保留的信息最多。
Structure/Reaction directed analysis 结构/反应导向分析
getsda
function could be used to perform Structure/reaction directed analysis. The cutoff of frequency is automate found by PMD network analysis with the largest mean distance of all nodes.
getsda
函数可用于执行结构/反应导向分析。通过PMD网络分析,以所有节点的最大平均距离自动找到频率阈值。
sda <- getsda(std)
#> PMD frequency cutoff is 6 by PMD network analysis with largest network average distance 6.67 .
#> 53 groups were found as high frequency PMD group.
#> 0 was found as high frequency PMD.
#> 1.98 was found as high frequency PMD.
#> 2.01 was found as high frequency PMD.
#> 2.02 was found as high frequency PMD.
#> 6.97 was found as high frequency PMD.
#> 11.96 was found as high frequency PMD.
#> 12 was found as high frequency PMD.
#> 13.98 was found as high frequency PMD.
#> 14.02 was found as high frequency PMD.
#> 14.05 was found as high frequency PMD.
#> 15.99 was found as high frequency PMD.
#> 16.03 was found as high frequency PMD.
#> 19.04 was found as high frequency PMD.
#> 28.03 was found as high frequency PMD.
#> 30.05 was found as high frequency PMD.
#> 31.99 was found as high frequency PMD.
#> 33.02 was found as high frequency PMD.
#> 37.02 was found as high frequency PMD.
#> 42.05 was found as high frequency PMD.
#> 48.04 was found as high frequency PMD.
#> 48.98 was found as high frequency PMD.
#> 49.02 was found as high frequency PMD.
#> 54.05 was found as high frequency PMD.
#> 56.06 was found as high frequency PMD.
#> 56.1 was found as high frequency PMD.
#> 58.04 was found as high frequency PMD.
#> 58.08 was found as high frequency PMD.
#> 58.11 was found as high frequency PMD.
#> 63.96 was found as high frequency PMD.
#> 66.05 was found as high frequency PMD.
#> 68.06 was found as high frequency PMD.
#> 70.04 was found as high frequency PMD.
#> 70.08 was found as high frequency PMD.
#> 74.02 was found as high frequency PMD.
#> 80.03 was found as high frequency PMD.
#> 82.08 was found as high frequency PMD.
#> 88.05 was found as high frequency PMD.
#> 91.1 was found as high frequency PMD.
#> 93.12 was found as high frequency PMD.
#> 94.1 was found as high frequency PMD.
#> 96.09 was found as high frequency PMD.
#> 101.05 was found as high frequency PMD.
#> 108.13 was found as high frequency PMD.
#> 110.11 was found as high frequency PMD.
#> 112.16 was found as high frequency PMD.
#> 116.08 was found as high frequency PMD.
#> 122.15 was found as high frequency PMD.
#> 124.16 was found as high frequency PMD.
#> 126.14 was found as high frequency PMD.
#> 144.18 was found as high frequency PMD.
#> 148.04 was found as high frequency PMD.
#> 150.2 was found as high frequency PMD.
#> 173.18 was found as high frequency PMD.
Such largest mean distance of all nodes is calculated for top 1 to 100 (if possible) high frequency PMDs. Here is a demo for the network generation process.
如此可以为第1-100位(可能的话)高频pmds计算节点间最大平均距离计算,以下是一个pmd网络生成的示例:
library(igraph)
#>
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#>
#> decompose, spectrum
#> The following object is masked from 'package:base':
#>
#> union
cdf <- sda$sda
# get the PMDs and frequency
pmds <- as.numeric(names(sort(table(cdf$diff2),decreasing = T)))
freq <- sort(table(cdf$diff2),decreasing = T)
# filter the frequency larger than 10 for demo
pmds <- pmds[freq>10]
cdf <- sda$sda[sda$sda$diff2 %in% pmds,]
g <- igraph::graph_from_data_frame(cdf,directed = F)
l <- igraph::layout_with_fr(g)
for(i in 1:length(pmds)){
g2 <- igraph::delete_edges(g,which(E(g)$diff2%in%pmds[1:i]))
plot(g2,edge.width=1,vertex.label="",vertex.size=1,layout=l,main=paste('Top',length(pmds)-i,'high frequency PMDs'))
}
Here we could find more and more compounds will be connected with more high frequency PMDs. Meanwhile, the mean distance of all network nodes will increase. However, some PMDs are generated by random combination of ions. In this case, if we included those PMDs for the network, the mean distance of all network nodes will decrease. Here, the largest mean distance means no more information will be found for certain data set and such value is used as the cutoff for high frequency PMDs selection.
在这里,我们可以发现越来越多的化合物将与更高频的pmds连接。同时,所有网络节点的平均距离将增加。然而,一些pmd是由离子的随机组合产生的。在这种情况下,如果我们将这些PMD包括在网络中,则所有网络节点的平均距离将减少。此处,最大平均距离意味着无法找到特定数据集的更多信息,该值用作选择高频PMDs的截止值。
You could use plotstdsda
to show the distribution of the selected paired peaks.
plotstdsda(sda)
You could also use index to show the distribution of certain PMDs.
par(mfrow = c(1,3),mar = c(4,4,2,1)+0.1)
plotstdsda(sda,sda$sda$diff2 == 2.02)
plotstdsda(sda,sda$sda$diff2 == 28.03)
plotstdsda(sda,sda$sda$diff2 == 58.04)
Structure/reaction directed analysis could be directly performed on all the peaks, which is slow to process:
结构/反应导向分析可直接对所有峰进行,但处理速度较慢:
sdaall <- getsda(spmeinvivo)
#> PMD frequency cutoff is 104 by PMD network analysis with largest network average distance 14.06 .
#> 6 groups were found as high frequency PMD group.
#> 0 was found as high frequency PMD.
#> 2.02 was found as high frequency PMD.
#> 28.03 was found as high frequency PMD.
#> 31.01 was found as high frequency PMD.
#> 58.04 was found as high frequency PMD.
#> 116.08 was found as high frequency PMD.
par(mfrow = c(1,3),mar = c(4,4,2,1)+0.1)
plotstdsda(sdaall,sdaall$sda$diff2 == 2.02)
plotstdsda(sdaall,sdaall$sda$diff2 == 28.03)
plotstdsda(sdaall,sdaall$sda$diff2 == 58.04)
Extra filter with correlation coefficient cutoff 带有相关系数阈值的额外滤波器
Structure/Reaction directed analysis could also use correlation to restrict the paired ions. However, similar to GlobalStd algorithm, such cutoff will remove low intensity data. Researcher should have a clear idea to use this cutoff.
结构/反应导向分析也可以使用相关性来限制成对离子。然而,与GlobalStd算法类似,这种截止将删除低强度数据。研究人员应该有一个明确的想法来使用这个阈值。
sda2 <- getsda(std, corcutoff = 0.9)
#> PMD frequency cutoff is 6 by PMD network analysis with largest network average distance 6.67 .
#> 41 groups were found as high frequency PMD group.
#> 0 was found as high frequency PMD.
#> 1.98 was found as high frequency PMD.
#> 2.01 was found as high frequency PMD.
#> 2.02 was found as high frequency PMD.
#> 11.96 was found as high frequency PMD.
#> 12 was found as high frequency PMD.
#> 13.98 was found as high frequency PMD.
#> 14.02 was found as high frequency PMD.
#> 14.05 was found as high frequency PMD.
#> 15.99 was found as high frequency PMD.
#> 16.03 was found as high frequency PMD.
#> 19.04 was found as high frequency PMD.
#> 28.03 was found as high frequency PMD.
#> 30.05 was found as high frequency PMD.
#> 31.99 was found as high frequency PMD.
#> 33.02 was found as high frequency PMD.
#> 42.05 was found as high frequency PMD.
#> 48.98 was found as high frequency PMD.
#> 49.02 was found as high frequency PMD.
#> 54.05 was found as high frequency PMD.
#> 56.06 was found as high frequency PMD.
#> 58.04 was found as high frequency PMD.
#> 58.08 was found as high frequency PMD.
#> 63.96 was found as high frequency PMD.
#> 66.05 was found as high frequency PMD.
#> 68.06 was found as high frequency PMD.
#> 70.08 was found as high frequency PMD.
#> 74.02 was found as high frequency PMD.
#> 80.03 was found as high frequency PMD.
#> 82.08 was found as high frequency PMD.
#> 88.05 was found as high frequency PMD.
#> 93.12 was found as high frequency PMD.
#> 94.1 was found as high frequency PMD.
#> 96.09 was found as high frequency PMD.
#> 108.13 was found as high frequency PMD.
#> 110.11 was found as high frequency PMD.
#> 112.16 was found as high frequency PMD.
#> 116.08 was found as high frequency PMD.
#> 122.15 was found as high frequency PMD.
#> 124.16 was found as high frequency PMD.
#> 126.14 was found as high frequency PMD.
plotstdsda(sda2)
Structure/reaction directed analysis for peaks/compounds only data 仅限峰/化合物数据的结构/反应导向分析
When you only have data of peaks without retention time or compounds list, structure/reaction directed analysis could also be done by getrda
function.
当您只有没有保留时间或化合物列表的峰数据时,也可以通过“getrda”功能进行结构/反应导向分析。
sda <- getrda(spmeinvivo$mz[std$stdmassindex])
#> 36668 pmd found.
#> 3 pmd used.
Wrap function for GlobalStd algorithm GlobalStd算法的Wrap函数
globalstd
function is a wrap function to process GlobalStd algorithm and structure/reaction directed analysis in one line. All the plot function could be directly used on the list
objects from globalstd
function. If you want to perform structure/reaction directed analysis, set the sda=T
in the globalstd
function.
globalstd
函数是一个包装函数,用于在一行中处理globalstd算法和结构/反应导向分析。所有绘图函数都可以直接用于“globalstd”函数中的“list”对象。如果要执行结构/反应导向分析,请在“globalstd”函数中设置“sda=T”。
result <- globalstd(spmeinvivo, sda=FALSE)
#> 75 retention time cluster found.
#> 369 paired masses found
#> 5 unique within RT clusters high frequency PMD(s) used for further investigation.
#> The unique within RT clusters high frequency PMD(s) is(are) 28.03 21.98 44.03 17.03 18.01.
#> 719 isotopologue(s) related paired mass found.
#> 492 multi-charger(s) related paired mass found.
#> 8 retention group(s) have single peaks. 14 23 32 33 54 55 56 75
#> 11 group(s) with multiple peaks while no isotope/paired relationship 4 5 7 8 11 41 42 49 68 72 73
#> 9 group(s) with multiple peaks with isotope without paired relationship 2 9 22 26 52 62 64 66 70
#> 4 group(s) with paired relationship without isotope 1 10 15 18
#> 43 group(s) with paired relationship and isotope 3 6 12 13 16 17 19 20 21 24 25 27 28 29 30 31 34 35 36 37 38 39 40 43 44 45 46 47 48 50 51 53 57 58 59 60 61 63 65 67 69 71 74
#> 291 std mass found.
Use independent peaks for MS/MS validation (PMDDA) 使用独立峰值进行MS/MS验证
Independent peaks are supposing generated from different compounds. We could use those peaks for MS/MS analysis instead of DIA 6 or DDA 7. Here we need multiple injections for one sample since it might be impossible to get all ions’ fragmental ions in one injection with good sensitivity. You could use gettarget
to generate the index for the injections and output the peaks for each run.
假设独立峰由不同的化合物产生。我们可以使用这些峰进行MS/MS分析,而不是DIA或DDA。在这里,我们需要对一个样品进行多次注入,因为可能不可能在一次注入中获得所有离子的碎片离子并具有良好的灵敏度。您可以使用“gettarget”为注入生成索引,并输出每次运行的峰值。
# you need retention time for independent peaks
index <- gettarget(std$rt[std$stdmassindex])
#> You need 10 injections!
# output the ions for each injection
table(index)
#> index
#> 1 2 3 4 5 6 7 8 9 10
#> 25 16 33 22 32 28 46 34 29 26
# show the ions for the first injection
std$mz[index==1]
#> [1] 100.5107 112.0183 115.9640 118.0652 132.0779 137.0472 137.9885 149.9530
#> [9] 155.1293 155.1295 167.0709 170.0330 170.0932 174.9383 176.0305 181.9872
#> [17] 192.1380 195.1138 197.1285 198.1852 228.1973 236.9406 239.1490 245.1944
#> [25] 251.2385 261.2591 265.4216 267.1772 267.9535 270.3185 270.3185 270.3185
#> [33] 273.8902 277.1815 277.1896 286.3101 288.2546 291.0712 296.9066 301.1419
#> [41] 305.2480 309.2046 311.2560 313.1439 313.3297 337.3298 341.0180 341.3512
#> [49] 353.3603 358.2640 367.9923 372.3197 375.2147 376.3179 383.2052 385.2926
#> [57] 386.2783 394.8754 399.1231 399.3274 413.2660 416.0377 416.3062 420.3193
#> [65] 424.8970 429.3692 432.9177 440.8696 442.3373 444.3844 485.2901 494.8114
#> [73] 498.9017 514.8764 521.1371 538.1637 541.3942 543.1198 543.4015 556.4416
#> [81] 559.4247 560.2188 567.1783 568.8923 570.2830 593.1578 596.1559 598.8366
#> [89] 605.2231 608.8745 617.4657 630.7621 638.3081 640.1961 642.1942 680.4633
#> [97] 692.4941 695.5039 707.8415 711.3532 736.4916 744.8477 750.7856 752.5158
#> [105] 764.5237 768.6393 771.6456 779.5153 786.8255 787.5110 808.6510 813.3338
#> [113] 814.4154 826.6806 833.8251 840.3407 855.8153 864.3202 867.6960 868.4448
#> [121] 911.7489 925.4477 929.8218 949.8072 970.7962 997.8091
std$rt[index==1]
#> [1] 86.3490 85.3860 85.0640 639.2070 212.6550 161.8250 727.2225
#> [8] 1079.6400 775.8610 785.8260 538.2950 215.0585 235.9920 216.1500
#> [15] 166.9580 48.8480 462.2210 169.9680 170.6010 612.2700 453.1570
#> [22] 147.8960 170.2755 615.0550 639.1010 639.1010 145.8285 416.1050
#> [29] 146.3950 895.9095 823.9060 802.2200 145.9680 588.6960 439.6780
#> [36] 895.6960 486.4080 161.3960 145.1090 583.7690 599.8400 600.9120
#> [43] 585.6975 581.3045 636.9560 595.1260 717.1835 639.3140 648.9580
#> [50] 480.8010 762.5750 659.8815 218.8600 594.6970 574.3400 561.2700
#> [57] 493.3975 217.1550 527.9020 582.4810 665.0290 762.5750 503.1510
#> [64] 687.8085 213.7130 613.1270 213.9420 214.8080 404.5340 582.4815
#> [71] 582.6970 870.6220 213.7270 215.7020 762.5770 762.7890 628.3425
#> [78] 762.7890 439.2500 546.9100 439.4630 170.0260 762.3630 213.7270
#> [85] 473.5890 819.4060 819.5150 216.9670 762.5750 213.9410 455.1500
#> [92] 145.1850 630.7000 818.9790 818.7645 468.4360 528.2230 698.3510
#> [99] 214.2010 215.2290 714.7460 214.3560 217.4155 522.4370 773.9350
#> [106] 624.0560 613.5550 519.6690 215.4870 692.6730 628.5550 214.9295
#> [113] 490.2940 628.4480 214.0170 213.3340 215.4970 639.2070 632.6270
#> [120] 493.9370 650.4570 476.5790 213.3340 214.6300 213.9110 213.3590
Shiny application Shiny应用
An interactive document has been included in this package to perform PMD analysis. You need to prepare a csv file with m/z and retention time of peaks. Such csv file could be generated by run enviGCMS::getcsv()
on the list
object from enviGCMS::getmzrt(xset)
function. The xset
should be XCMSnExp
object or xcmsSet
object. You could also generate the csv file by enviGCMS::getmzrt(xset,name = 'test')
. You will find the csv file in the working dictionary named test.csv
.
Then you could run runPMD()
to start the Graphical user interface(GUI) for GlobalStd algorithm and structure/reaction directed analysis.
Conclusion 结论
pmd
package could be used to reduce the redundancy peaks for GC/LC-MS based research and perform structure/reaction directed analysis to screen known and unknown important compounds or reactions.
pmd
软件包可用于减少GC/LC-MS研究的冗余峰,并进行结构/反应导向分析,以筛选已知和未知的重要化合物或反应。
注释
-
成对质量差 ↩
-
In mass spectroscopy, the mass-to-charge ratio (symbols: m/z, m/e) of a cation is equal to the mass of the cation divided by its charge. ↩
-
保留时间;Retention time is the time that a solute spends in a column or it can be defined as the time spent in the stationary and mobile phases. The longer retention time depends on the interaction of the analyte with the stationary phase. The stronger the interaction, the more will be the interaction time. ↩
-
质谱信号强度 ↩
-
黑试样 ↩
-
DIA(data-independent acquisition,数据非依赖性采集)技术是近年来发展起来的一种新的质谱技术,属于非标记蛋白质组学方法。采用数据非依赖性扫描模式:将质谱整个全扫描范围分为若干个窗口,然后对每个窗口中的所有离子进行检测、碎裂,从而无遗漏、无差异地获得样本中所有离子的信息。从而降低样本检测的缺失值,同时提高定量准确性和重复性,实现大样本队列中高稳定,高准确的蛋白质组定量分析。 与DDA技术相比,DIA技术的优势包括:(1)采集所有的离子信息,实现更高的数据覆盖度;(2)减少采集的随机性,实现更高的检测重现性、稳定性;(3)采用碎片离子定量,定量精密度、准确性、线性范围大大提高。基于上述技术优势,DIA技术尤其适用于大规模样本的高度覆盖、稳定和可追溯地分析。 ↩
-
DDA为数据依赖性采集方式,选择一级质谱中特定数量的肽段分子(如信号强度最强的前10个离子)进行高能碰撞碎裂,产生的碎片离子送入二级质谱检测。 ↩