Data analysis of GC-MS and LC-MS in Envrionmental Science
By Miao Yu 2020-06-04 部分翻译by fallingstar 2022-05-20
Qualitative and quantitative analysis of contaminants are the core of the Environmental Science. GC/LC-MS might be one of the most popular instruments for such analytical methonds. Previous works such as xcms
were devoloped for GC-MS data. However, such packages have limited functions for environmental analysis. In this package. I added functions for various GC/LC-MS data analysis purposes used in environmental analysis. Such feature could not only reveal certain problems, but also help the user find out the unknown patterns in the dataset of GC/LC-MS.
污染物的定性和定量分析是环境科学的核心。GC/LC-MS可能是此类分析方法中最常用的仪器之一。以前的工作,如“xcms”是为GC-MS数据开发的。然而,此类软件包的环境分析功能有限。在此包中。我为环境分析中使用的各种GC/LC-MS数据分析目的添加了功能。这种特征不仅可以揭示某些问题,还可以帮助用户发现GC/LC-MS数据集中的未知模式。
Data analysis for single sample
Data structure of GC/LC-MS full scan mode
GC/LC is used for separation and MS is used for detection in a GC/LC-MS system. The collected data are intensities of certain mass at different retention time. When we perform analysis on certain column in full scan mode, the counts of different mass were collected in each scan. The drwell time for each scan might only last for 500ms or less. Then the next scan begins with a different retention time. Here we could use a matrix to stand for those data. Each column stands for each mass and row stands for the retention time of that scan. Such matrix could be treated as time series data. In this package, we treat such data as matrix
type.
在GC/LC-MS系统中,GC/LC用于分离,MS用于检测。收集的数据是不同保留时间下一定质量的强度。当我们在全扫描模式下对某一列进行分析时,在每次扫描中收集不同质量的计数。每次扫描的drwell时间可能仅持续500ms或更短。然后,下一次扫描以不同的保留时间开始。在这里,我们可以使用矩阵来表示这些数据。每列代表每个质量,每行代表该扫描的保留时间。这种矩阵可以被视为时间序列数据。在这个包中,我们将这些数据视为“matrix”类型。
For high-resolution MS, building such matrix is tricky. We might need to bin the RAW data to make alignment for different scans into a matrix. Such works could be done by xcms
.
对于高分辨率MS,构建这样的矩阵很棘手。我们可能需要将原始数据装箱,以便将不同扫描对齐到一个矩阵中。这些工作可以由“xcms”完成。
Data structure of GC-MS SIM mode
When you perform a selected ions monitor(SIM) mode analysis, only few mass data were collected and each mass would have counts and retention time as a time seris data. In this package, we treat such data as data.frame
type.
当您执行所选离子监测器(SIM)模式分析时,仅收集了少量质量数据,并且每个质量都将计数和保留时间作为时间序列数据。在这个包中,我们将这些数据视为“data.frame“类型。
Data input
You could use getmd
to import the mass spectrum data as supported by xcms
and get the profile of GC-MS data matrix. mzstep
is the bin step for mass:
data <- enviGCMS:::getmd('data/data1.CDF', mzstep = 0.1)
You could also subset the data by the mass(m/z 100-1000) or retention time range(40-100s) in getmd
function:
data <- enviGCMS:::getmd(data,mzrange=c(100,1000),rtrange=c(40,100))
You could also combined the mass full-scan data with the same range of retention time by cbmd
:
data <- cbmd(data1,data2,data3)
Visulation of mass spectrum data
You could plot the Total Ion Chromatogram(TIC) for certain RT and mass range.
plottic(data,rt=c(3.1,25),ms=c(100,1000))
You could also plot the mass spectrum for certain RT range. You could use the returned MSP files for NIST search:
plotrtms(data,rt=c(3.1,25),ms=c(100,1000))
The Extracted Ion Chromatogram(EIC) is also support by enviGCMS
and the returned data could be analysised for molecular isotopes:
plotmsrt(data,ms=500,rt=c(3.1,25))
You could use plotms
or plotmz
to show the heatmap or scatter plot for LC/GC-MS data, which is very useful for exploratory data analysis.
plotms(data)
plotmz(data)
You could change the retention time into the temprature if it is a constant speed of temperature rising process. But you need show the temprature range.
plott(data,temp = c(100,320))
Data analysis for influnces from GC-MS
enviGCMS
supplied many functions for decreasing the noise during the analysis process. findline
could be used for find line of the boundary regression model for noise. comparems
could be used to make a point-to-point data subtraction of two full-scan mass spectrum data. plotgroup
could be used convert the data matrix into a 0-1 heatmap according to threshold. plotsub
could be used to show the self backgroud subtraction of full-scan data. plotsms
shows the RSD of the intensity of full scan data. plothist
could be used to find the data distribution of the histgram of the intensities of full scan data.
enviGCMS
在分析过程中提供了许多降低噪声的功能。“findline”可用于噪声边界回归模型的查找线。“comparems”可用于对两个全扫描质谱数据进行点对点数据相减。“plotgroup”可用于根据阈值将数据矩阵转换为0-1热图。“plotsub”可用于显示全扫描数据的自背景差。“plotsms”显示全扫描数据强度的RSD。“plothist”可用于查找全扫描数据强度的组图的数据分布。
Data analysis for molecular isotope ratio
Some functions could be used to caculate the molecular isotope ratio. EIC data could be import into GetIntergration
and return the infomation of found peaks. Getisotoplogues
could be used to caculate the molecular isotope ratio of certain molecular. Some shortcut function such as batch
and qbatch
could be used to caculate molecular isotope ratio for mutiple and single molecular in EIC data.
一些函数可以用来计算分子同位素比。EIC数据可以导入“GetIntegration”,并返回找到的峰值信息。“GetSotologues”可以用来计算某些分子的分子同位素比率。一些快捷函数如“batch”和“qbatch”可用于计算EIC数据中多分子和单分子的分子同位素比。
Quantitative analysis for short-chain chlorinated paraffins(SCCPs)
enviGCMS
supply function to perform Quantitative analysis for short-chain chlorinated paraffins(SCCPs) with Q-tof data. Use getsccp
to make Quantitative analysis for SCCPs.
If you want a graphical user interface for SCCPs analysis, a shiny application is developed in this package. You could use runsccp()
to power on the application in a browser.
enviGCMS
的提供函数来利用Q-tof数据对短链氯化石蜡(SCCP)进行定量分析。使用“getsccp”对短链氯化石蜡进行定量分析。
如果您想要一个用于短链氯化石蜡分析的图形用户界面,那么可以使用此软件包中开发一个shiny应用。您可以使用“runsccp()”在浏览器中打开应用程序。
Data analysis for multiple samples
In environmetnal non-target analysis, when multiple samples are collected, problem will raise from the heterogeneity among samples. For example, retention time would shift due to the column. In those cases, xcms
package could be used to get a peaks list across samples within certain retention time and m/z. enviGCMS
package has some wrapped function to get the peaks list. Besides, some specific functions such as group comparision, batch correction and visulization are also included.
在环境非靶向分析中,当采集多个样本时,样本之间的异质性会引发问题。例如,保留时间会因列而改变。在这些情况下,“xcms”包可用于在一定的保留时间和m/z内获取样本的峰值列表。“enviGCMS”包具有一些包装函数来获取峰值列表。此外,还包括一些具体的功能,如组比较、批量校正和可视化。
Wrap function for xcms
package
-
getdata
could be used to get thexcmsSet
object in one step with optimized methods -
getdata2
could be used to get theXCMSnExp
object in one step with optimized methods -
getmzrt
could get a list asmzrt
object with peaks list, mz, retention time and class of samples fromxcmsSet
/XCMSnExp
object. You could also save relatedxcmsSet
andxcmsEIC
object for further analysis. It also support to output the file for metaboanalyst, xMSannotator, Mummichog pathway analysis and paired mass distance(PMD) analysis. -
getmzrtcsv
could read in the csv files and return a list for peaks list asmzrt
object
Data imputation and filtering
-
getimputation
could impute NA in the peaks list. -
getfilter
could filter the data based on row and colomn index. -
getdoe
could filter the data based on rsd and intensity and generate group mean, standard deviation, and group rsd. -
getpower
could compute the power for known peaks list or the sample numbers for each peak -
getoverlappeak
could get the overlap peaks by mass and retention time range for two different list objects
Visulation of peaks list data
-
plotmr
could plot the scatter plot for peaks list with threshold -
plotmrc
could plot the differences as scatter plot for peaks list with threshold between two group of data -
plotrsd
could plot the rsd influnces of data in different groups -
gifmr
could plot scatter plot for peaks list and output gif file for mutiple groups -
plotpca
could plot pca score plot -
plothm
could plot heapmap -
plotden
could plot density of multiple samples -
plotrla
could plot Relative Log Abundance (RLA) plots -
plotridges
could plot Relative Log Abundance Ridge (RLAR) plots
Summary
In general, enviGCMS
could be used to explore single data or peaks list from GC/LC-MS and extract certain patterns in the data with various purposes.