MSKCC分子分型实验步骤翻译(部分)

rainoffallingstar 于 2022-01-08 发布

MSKCC分子分型实验步骤翻译(部分)

原料与工具

  1. SP142 assay (Ventana, AZ) #用于PD-L1表达免疫组化

  2. 高纯度FFPET RNA分离试剂盒(Roche)#提取RNA

  3. Qubit和安捷伦生物分析仪 #RNA数量和质量评估

  4. TruSeq RNA Access technology (Illumina) # 生成全转录组数据

  5. R/Bioconductor软件包GenomicAlignments

  6. Foundation One T7 assay (Foundation Medicine Inc., Cambridge, MA, USA)

  7. STAR v2.7.2b

  8. STAR-Fusion v1.9.1

  9. consensus NMF clustering (CRAN. R package version 0.22.0, Brunet et al., 2004)

  10. the random forest machine learning algorithm (R package randomForest)

  11. QuSAGE analysis (R/Bionconductor qusage v2.18.0)

  12. 其他(详见实验步骤)

实验步骤

1.PD-L1免疫组化与评分

PD-L1 expression was assessed by immunohistochemistry using the SP142 assay (Ventana, AZ). Tumors were characterized as PDL1+ if PD-L1 staining of any intensity on immune cells covered >=1% of tumor area occupied by tumor cells, associated intratumoral,and contiguous peri-tumoral desmoplastic stroma.

2.RNA处理

Formalin-fixed paraffin-embedded (FFPE) tissue was macro-dissected for tumor area using H&E as a guide. RNA was extracted using the High Pure FFPET RNA Isolation Kit (Roche) and assessed by Qubit and Agilent Bioanalyzer for quantity and quality. First strand cDNA synthesis was primed from total RNA using random primers, followed by the generation of second strand cDNA with dUTP in place of dTTP in the master mix to facilitate preservation of strand information. Libraries were enriched for the mRNA fraction by positive selection using a cocktail of biotinylated oligos corresponding to coding regions of the genome. Libraries were sequenced using the Illumina sequencing method.

以H&E染色为指导,对福尔马林固定石蜡包埋(FFPE)组织进行肿瘤区域的宏观解剖。使用高纯度FFPET RNA分离试剂盒(Roche)提取RNA,并通过Qubit和安捷伦生物分析仪进行数量和质量评估。用随机引物从总RNA中诱导第一条cDNA链合成,然后在底物中用dUTP代替dTTP产生第二条cDNA链,以便于保存序列信息。通过使用与基因组编码区相对应的生物素化寡聚物混合物进行正选择得到的mRNA分数。使用Illumina测序方法对这些基因进行测序。

3.生成RNA-seq数据及其处理

Whole-transcriptome profiles were generated using TruSeq RNA Access technology (Illumina). RNA-seq reads were first aligned to ribosomal RNA sequences to remove ribosomal reads. The remaining reads were aligned to the human reference genome (NCBI Build 38) using GSNAP (Wu and Nacu, 2010; Wu et al., 2016) version 2013-10-10, allowing a maximum of two mismatches per 75 base sequence (parameters: ‘-M 2 -n 10 -B 2 -i 1 -N 1 -w 200000 -E 1-pairmax-rna = 200000 –clip-overlap). To quantify gene expression levels, the number of reads mapped to the exons of each RefSeq gene was calculated using the functionality provided by the R/Bioconductor package GenomicAlignments. Raw counts were adjusted for gene length using transcript-per-million (TPM) normalization, and subsequently log2-transformed. Raw and processed data are available under the data sharing agreement.

全转录组数据通过TruSeq RNA Access technology (Illumina)生成,RNA序列首先与核糖体RNA序列比对以排除之,其他序列与人类基因组序列对比,为了量化基因的表达水平,使用R/Bioconductor软件包GenomicAlignments提供的功能计算了映射到每个RefSeq基因外显子上的读数。原始计数根据基因长度进行了调整,使用每百万转录本(TPM)归一化测量基因长度,并随后进行对数转换。?

4.DNA突变及复制数的统计

DNA Mutation and Copy-Number Profiling by Foundation One

Comprehensive genomic profiling (综合基因组分析CGP) was carried out using the Foundation One T7 assay (Foundation Medicine Inc., Cambridge, MA, USA) in a Clinical Laboratory Improvement Amendments (CLIA)-certified, College of American Pathologists (CAP)-accredited laboratory. Approval for genetic evaluation was obtained from the Western Institutional Review Board (Protocol No. 20152817).

Hybrid capture was carried out for all coding exons from up to 395 cancer-related genes plus select introns from up to 31 genes frequently rearranged in cancer. We assessed all classes of genomic alterations (GA) including short variant (missense, stop, nonstart, splice site point mutations as well as short indels), biallelic deletions, amplifications and rearrangement alterations, as previously described (Frampton et al., 2013). Shallow copy-number loss (CN=1) was called using similar methodology to arm-level calling.

对多达395个癌症相关基因的所有编码外显子和多达31个经常在癌症中重排的基因的选定内含子进行了混合捕获。我们评估了所有类别的基因组改变(GA),包括短的变异(错义、停止、非开始、剪接点突变以及短的 indels)、双亲缺失、扩增和重排改变,如以前所述(Frampton等人,2013)。浅层拷贝数损失(CN=1)的调用方法与臂级调用类似。

Normalized coverage data for exonic, intronic, and SNP targets accounting for stromal admixture were plotted on a logarithmic scale and minor allele SNP frequencies were concordantly plotted. Custom circular binary segmentation further clustered targets and minor allele SNPs to define upper and lower bounds of genomic segments. Signal-to-noise ratios for each segment were used to determine whether it was gained or lost. The sum of those segment sizes determined the fraction of each segment gained or lost. For gene alteration analyses described in this manuscript, we leveraged position-level information to define per-gene alteration profiles, and dichotomized every gene’s mutational profile as altered (including copy-number loss or gain) or non-altered.

将外显子、内隐子和SNP目标的归一化覆盖率数据以对数尺度绘制,小等位基因SNP的频率也被一致绘制出来。自定义循环二元分割进一步对目标和小等位基因SNP进行聚类,以定义基因组片段的上限和下限。每个区段的信噪比被用来确定它是获得还是失去。这些片段的大小之和决定了每个片段获得或失去的部分。对于本稿中描述的基因改变分析,我们利用位置级别的信息来定义每个基因的改变概况,并将每个基因的突变概况二分为改变(包括拷贝数损失或增加)或不改变。

5.监测基因融合

Paired trimmed/clipped and de-duplicated RNAseq reads were used to identify gene fusion events. Reads were aligned using STAR v2.7.2b with default parameters to the GRCh38 genome. This aligned output was used as input to STAR-Fusion v1.9.1 (Haas et al. using the developer-supplied gencode v33 CTAT library from April 6, 2020. We required each fusion gene to be supported by atleast two reads

成对的修剪/剪裁和去掉重复的RNAseq读数被用来识别基因融合事件。用STAR v2.7.2b(默认参数为GRCh38基因组)对比。这个对比的输出结果被用作STAR-Fusion v1.9.1的输入。我们要求每个融合基因至少要有两个读码框支持

6.T效应和血管生成性基因印记的阈值定义和验证

T-effector and Angiogenesis Gene Signature Threshold Definition and Validation

RNA-seq data from the randomized Phase II trial IMmotion150 were processed as described above. Transcriptional signature scores were derived from T-effector and Angiogenesis signatures (McDermott et al., 2018) for each sample, and hazard ratios were calculated at various gene expression scores. Gene expression score cutoffs of 2.93 (40% prevalence) and 5.82 (50% prevalence) were defined for the T-effector and Angiogenesis signatures in IMmotion150 based on a combination of prevalence and Hazard Ratio plateauing. These absolute thresholds were prospectively applied to the IMmotion151 data to classify tumors with high and low T-effector and Angiogenesis signatures. Cox-proportional hazard regression models were fit to compare PFS in atezolizumab+bevacizumab or sunitinib-treated patients in gene expression high and low subsets.

7.非负矩阵分解(Non-negative Matrix Factorization ,NMF)

We selected 3072 genes (top10%) with the highest variability across tumors , using Median Absolute Deviation (中位数绝对偏差,MAD) analysis. Subclasses were then computed by reducing the dimensionality of the expression data from thousands of genes to a few metagenes using consensus NMF clustering (CRAN. R package version 0.22.0, Brunet et al., 2004). This method computes multiple k-factor factorization decompositions of the expression matrix and evaluates the stability of the solutions using a cophenetic coefficient. The most robust consensus NMF clustering of 823 patient tumors using the 3072 most variable genes selected and testing k=2 to k=8 was identified as k=7.

8.在IMmotion150中验证NMF 集丛

To validate molecular subtypes derived in IMmotion151, we used the random forest machine learning algorithm (R package randomForest) to derive a classifier and then predict the NMF clusters in an independent data set (IMmotion150). A random forest classifier involves learning a large number of binary decision trees from random subsets of a training set. These trees in the classifier can then be used in a predication algorithm to identify the similarity of a given sample to a given class in the training set. Before learning the random forest classifier, we preprocessed the data to generate the training set. First, we limited the gene expression matrix in the test and training set to the top 10% most variable genes in IMmotion151 (n = 3,072), from which the initial NMF classification was derived and we normalized (z-score transformed) the gene expression values in each set to ensure that the test and training set were on the same scale. Finally, we learned the random forest classifier on the IMmotion151 derived trained data and then utilized the classifier to predict the NMF classes in IMmotion150. We subsequently evaluated expression of gene expression signatures assessed in IMmotion151 (Figure 1C) in the NMF clusters identified in IMmotion150. (Figure S4).

9. 基因表达的定量集分析(QuSAGE)

To understand biological pathways underlying NMF clustering, we conducted QuSAGE analysis (R/Bionconductor qusage v2.18.0) to compare each cluster to all others, leveraging MSigDb hallmark gene sets to identify enriched pathways within each cluster. Enrichment scores were represented as a heatmap (Figure 1B).

10.基因印记与评分方法

Gene signatures were defined as follows:

  1. Angiogenesis: VEGFA, KDR, ESM1, PECAM1, ANGPTL4, CD34; T-effector: CD8A,EOMES, PRF1, IFNG, CD274

  2. Fatty Acid Oxidation /AMP-activated protein kinase (FAO/AMPK): CPT2, PPARA, CPT1A, PRKAA2,PDK2, PRKAB1; Cell Cycle: CDK2, CDK4, CDK6, BUB1B, CCNE1, POLQ, AURKA, MKI67, CCNB2;

  3. Fatty Acid Synthesis (FAS)/Pentose Phosphate: FASN, PARP1, ACACA, G6PD, TKT, TALDO1, PGD;

  4. Stroma: FAP, FN1, COL5A1, COL5A2, POSTN, COL1A1,COL1A2, MMP2;

  5. Myeloid Inflammation: CXCL1, CXCL2, CXCL3, CXCL8, IL6, PTGS2;

  6. Complement Cascade: F2, C1S, C1R, CFB,C3;

  7. Omega Oxidation: CYP4F3, CYP8B1, NNMT, MGST1, MAOA, CYP4F11, CYP4F2, CYP4F12;

  8. snoRNA: SNORD38A, SNORD104,SNORD32A, SNORD68, SNORD66, SNORD100.

Signature scores are calculated as the median z-score of genes(基因的中位z分数) included in each signature for each sample. When summarized by patient group, as in Figure 1D, log2-transformed expression data were first aggregated by patient group using the mean, and subsequently converted to a group z-score.

11.量化和统计分析

All analyses were conducted using Rv3.6.1. Unless otherwise stated, all comparisons for continuous variables use the two-sided Mann-Whitney test (R function wilcox.test) for two groups and the Kruskal-Wallis test (R function kruskal.test) for more than two groups. Dunn’s post-hoc test was applied with Benjamini-Hochberg multiple testing correction for pairwise comparisons. For categorical variables, Pearson’s Chi-squared test with continuity correction was used (R function chisq.test). Unless otherwise stated, FDR-adjusted p-values are reported. : p<0.05; **: p<0.01; **: p<0.001. Survival analyses were conducted using Cox-proportional hazard models using the R survival package (v3.1.7). Log-rank p-values were reported for survival analyses including more than two groups. For all boxplots, the horizontal line represents the median. The lower and upper hinges correspond to the first and third quartiles. The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the interquartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge.