Abstract
Accurate prioritization of immunogenic neoantigens is key to developing personalized cancer vaccines and distinguishing those patients likely to respond to immune checkpoint inhibition. However, there is no consensus regarding which characteristics best predict neoantigen immunogenicity, and no model to date has both high sensitivity and specificity and a significant association with survival in response to immunotherapy. We address these challenges in the prioritization of immunogenic neoantigens by 1) identifying which neoantigen characteristics best predict immunogenicity, 2) integrating these characteristics into an immunogenicity score, NeoScore, and 3) demonstrating an improved association of the NeoScore with response to immune checkpoint inhibition compared to mutational burden. One thousand random and evenly split combinations of immunogenic and non-immunogenic neoantigens from a validated dataset were analyzed using a regularized regression model for characteristic selection. The selected characteristics, the dissociation constant and binding stability of the neoantigen:MHC class I complex and expression of the mutated gene in the tumor, were integrated into the NeoScore. A web application is provided for calculation of the NeoScore. The NeoScore results in improved, or equivalent, performance in four test datasets as measured by sensitivity, specificity, and area under the receiver operator characteristics curve compared to previous models. Among cutaneous melanoma patients treated with immune checkpoint inhibition, a high NeoScore had a greater association with improved survival compared to mutational burden. Overall, the NeoScore has the potential to improve neoantigen prioritization for the development of personalized vaccines and contribute to the determination of which patients are likely to respond to immunotherapy.
Introduction
Cancers arise through mutations in the genome of healthy human cells. As these mutations occur, some will produce mutated proteins, which have the potential to be processed into neoantigens that bind MHC class I and are presented on the cell surface. These neoantigens then act as tumor-specific targets with the potential to elicit a cytotoxic CD8+ T cell response (Tran, Robbins, and Rosenberg 2017; Schumacher and Schreiber 2015; Yarchoan et al. 2017; J. P. Ward, Gubin, and Schreiber 2016). Tumor-specific neoantigens have strong potential to be targets of T cell-mediated destruction, because they are not subject to immune tolerance or non-reactivity to self. However, there are two key ways in which the above mechanism may fail in tumor destruction. For one, recent evidence suggests that most neoantigens do not elicit an immune response in their natural state, termed immunologic ignorance (Linette et al. 2019). Second, once a T cell response is mounted to a neoantigen, that response may become exhausted over time due to inhibitory signals from the tumor microenvironment (Rausch and Hastings 2017).
Circumventing these limitations to re-invigorate the host immune response has been the goal of many recent cancer therapies. To address immunologic ignorance, personalized cancer vaccines have been created and have demonstrated early success (Sahin et al. 2017; Hilf et al. 2019; Keskin et al. 2019; Carreno et al. 2015; Gubin et al. 2015; Ott et al. 2017). These vaccines have taken several forms, including direct exposure to neoantigens (Ott et al. 2017), neoantigen-encoding RNA vaccines (Sahin et al. 2017), neoantigen-loaded dendritic cell vaccines (Carreno et al. 2015), and adoptive transfer of neoantigen-specific T cells (Tran et al. 2014; Yee et al. 2002; Zacharakis et al. 2018). Each of these methods requires accurate knowledge of the neoantigens presented by the tumor cell with the potential to elicit an immune response. In silico prioritization methods have been used to prioritize which set of neoantigens should be experimentally tested, but the ability to prioritize the immunogenicity of each neoantigen, with high sensitivity and specificity, is still limited (Wells et al. 2020; Zhou et al. 2019; Wood et al. 2018; Bjerregaard et al. 2017; Kim et al. 2018).
Exhaustion of T cells and attenuation of T cell activation can be overcome by immune checkpoint inhibition, such as monoclonal antibodies against PD-1 or CTLA-4. Immune checkpoint inhibition blocks inhibitory signals to the T cells to enhance T cell-mediated tumor destruction (Gubin et al. 2014; Van Allen et al. 2015; Lommatzsch, Bratke, and Stoll 2018). However, immune checkpoint inhibition is only effective in a subset of patients (Rausch and Hastings 2017), and there is no consensus on how to prioritize which patients will respond (McGranahan et al. 2016; Hellmann et al. 2019; Rizvi, Hellmann, et al. 2015; Van Allen et al. 2015; Snyder et al. 2014; Łuksza et al. 2017; Zhou et al. 2019). In a pan-cancer analysis, Yarchoan et al. demonstrated that cancer types with a higher mutational burden, such as melanoma, had improved response to anti-PD-1 therapy compared to cancer types with a lower mutational burden (Yarchoan, Hopkins, and Jaffee 2017). There are limitations to mutational burden as a predictor of response to immune checkpoint inhibition.
First, in multiple myeloma, there was an association of increased tumor mutational burden with decreased response to immune checkpoint inhibition (Miller et al. 2017). Second, in melanoma, the association between the tumor mutational burden and response to immune checkpoint inhibition was confounded by the melanoma subtype (Liu et al. 2019). Finally, in lung cancers that progressed after treatment with immune checkpoint inhibitors, there was an increase in the tumor mutational burden compared to the pretreatment state (Anagnostou et al. 2017), thus, contradicting the expectation that tumor cells resistant to immune checkpoint inhibition would have a low number of neoantigens. Together, these findings suggest that tumor mutational burden is not sufficient for predicting response to immune checkpoint inhibition.
Several recent papers have been dedicated to predicting neoantigen immunogenicity based on the characteristics of validated immunogenic neoantigens (Zhou et al. 2019; Łuksza et al. 2017; Wood et al. 2018; Wells et al. 2020; Schmidt et al. 2020). However, there is no consensus regarding which neoantigen characteristics are important for the prioritization of immunogenic neoantigens or the best way to combine the characteristics into an overall immunogenicity score. To identify the characteristics that best prioritize the immunogenicity of the neoantigens, a model-based approach was applied to evaluate neoantigen characteristics encompassing expression, processing, presentation, and T cell recognition in prioritizing neoantigen immunogenicity. The selected characteristics were then combined into an overall logistic regression model called the “NeoScore.” The development of the NeoScore has largely focused on melanoma, except for one lung cancer patient included in the TESLA consortium dataset (Wells et al. 2020).
Immune checkpoint inhibition and personalized neoantigen vaccines are particularly effective in mutation-rich melanoma (Hodi et al. 2018; Rizvi, Garon, et al. 2015; Solomon, Beavis, and Darcy 2020; Shemesh et al. 2021). However, even in melanoma, a positive outcome from these interventions is not assured (Rausch and Hastings 2017). To assess the clinical utility of the NeoScore and its ability to improve assessment of outcome in melanoma, the relationship of the NeoScore to survival in response to immunotherapy was tested using the datasets of Van Allen et al. 2015 and Liu et al. 2019.
Results
Overlap of Strelka and GATK Mutect2 identifies the maximum number of validated immunogenic mutations
The first step in effective prioritization of neoantigens is to identify a high-fidelity list of somatic mutations. Isolating somatic mutations in cancers is more difficult than variant calling in normal tissue since cancers do not follow the typical rules of copy number or heterozygosity and often consist of multiple clonal populations with normal tissue contamination. The TESLA consortium validated neoantigens derived from mutations identified by 25 teams. The methods used for identifying the somatic mutations were not reported, making it difficult to reproduce all neoantigens (Wells et al. 2020). To maximize the immunogenic mutations identified, three highly rated programs were used to identify single nucleotide variants (SNVs) and small insertions and deletions (indels) for the data from the TESLA consortium: VarScan2, GATK Mutect2, and Strelka (Koboldt et al. 2012; Benjamin et al. 2019; Saunders et al. 2012). A large degree of overlap was found in the ability of each program to identify the 34 validated immunogenic mutations. As shown in Figure 1, GATK Mutect2 and Strelka successfully identified 27/34 mutations, while VarScan2 identified 24/34 (Supplementary Tables 1). The mutations from each of these programs overlapped, such that 27 was the maximum number of immunogenic mutations identified. To ensure that the success of VarScan2 in identifying the immunogenic mutations was not hindered by over-filtering, the unfiltered output was assessed and only one immunogenic mutation had been eliminated by filtration steps (Supplementary Table 1). GATK Mutect2 and Strelka identified the same number of immunogenic mutations with or without their respective filters (data not shown). LoFreq was also tested for the identification of somatic mutations, as LoFreq is optimized to call low-frequency mutations (Wilm et al. 2012). However, LoFreq did not identify any additional immunogenic mutations (Supplementary Figure 1, Supplementary Table 1). The neoantigens derived from unidentified immunogenic mutations may be due to the reference genome version, peptide generation steps, or mutations identified by programs that were not tested. Based on these results in the TESLA consortium data, the overlap of GATK Mutect2 and Strelka was used in applying our model to predict progression-free survival in response to immune checkpoint inhibition in melanoma. Overall, the combination of GATK Mutect2 and Strelka identified greater than or equal to the number of validated immunogenic neoantigens as in all other reported pipelines (Wells et al. 2020), while simultaneously decreasing the total number of potential neoantigens by 89.44%.
Dissociation constant and binding stability of the neoantigen:MHC class I complex as well as expression are significantly different between immunogenic and non-immunogenic neoantigens
A set of computationally predicted neoantigen characteristics were calculated for the expression, processing, presentation, and T cell receptor (TCR) recognition of SNV and small indel-derived neoantigens (Figure 2A). Figure 2B demonstrates that the set of characteristics considered is inclusive of all of those considered by the three models to which the NeoScore is compared. To begin, the distribution of each of these characteristics was assessed for immunogenic and non-immunogenic neoantigens. A log transformation was applied to standardize characteristics with a large degree of skewness. The neoantigens assessed here and used to fit the overall model were restricted by three inclusion criteria: 1) they had been tested for immunogenicity by the TESLA consortium, 2) they were identified by GATK Mutect2 and Strelka, and 3) they had expression data. The final dataset used includes 5 patients with a total of 344 neoantigens, which had been tested for their ability to elicit a T cell response using multimer staining. The TESLA consortium found that 26 of the 344 tested neoantigens elicited a T cell response in an unvaccinated state (Wells et al. 2020).
Expression was considered in two ways: 1) mRNA expression level as calculated from RNAseq data and 2) variant allele frequency (VAF). Immunogenic neoantigens had a significantly higher expression level than non-immunogenic neoantigens (p=6.835×10−6, Figure 2C). The clonality of the neoantigen was calculated by FastClone (Xiao et al. 2020), but FastClone did not converge for 4/6 of the samples, so the VAF was used instead. No significant difference in the VAF was observed between immunogenic and non-immunogenic neoantigens (p=0.360, Figure 2D).
Processing steps for the neoantigen were considered as both the proteasomal cleavage potential and TAP (transporter associated with antigen processing) potential scores from netCTLpan (Stranzl et al. 2010). While there was a higher average for both characteristics in the immunogenic than non-immunogenic neoantigens, no statistically significant difference was observed (p=0.858 and p=0.848, respectively) (Figure 2E-F). The binding to the MHC class I molecule was then considered by both the dissociation constant and stability of the MHC class I:neoantigen interaction. The dissociation constant was calculated by netMHCpan (Reynisson et al. 2020). The MHC class I dissociation constants were significantly lower in immunogenic than non-immunogenic neoantigens, indicating a higher binding affinity of immunogenic neoantigens to MHC class I (p=2.69×10−7, Figure 2G). The binding stability was calculated by netMHCstabpan (Rasmussen et al. 2016) and showed significantly higher values for immunogenic over non-immunogenic neoantigens. (p=4.199×10−5, Figure 2H).
The ability of the neoantigen to stimulate a T cell response was predicted using the model created by Łuksza et al. to calculate a characteristic referred to as the TCR recognition potential (Łuksza et al. 2017). The TCR recognition potential is a probabilistic model that considers the sequence similarity between the neoantigen and a known T cell epitope from the Immune Epitope Database (IEDB) as a proxy for the binding affinity of the neoantigen:MHC class I-TCR interaction. The TCR recognition potential showed no statistically significant difference between the immunogenic and non-immunogenic neoantigens (p=0.670, Figure 2I). Two methods were then applied to account for T cell development. During maturation in the thymus, T cells expressing TCRs with high avidity to normal human peptides undergo apoptosis. Thus, a neoantigen with a high degree of sequence similarity to a normal human peptide is less likely to elicit a T cell response. The first method used to account for T cell development was the sequence similarity to the closest matched normal human peptide. The sequence similarity was calculated by performing a BLAST search and subsequently a BLOSUM62 alignment matrix as described (Wood et al. 2018). The resulting sequence similarity was normalized by dividing by the neoantigen length. No statistically significant difference was observed in the sequence similarity for immunogenic and non-immunogenic neoantigens (p=0.206, Figure 2J). The second method considered was the amplitude. The amplitude is calculated by adjusting the dissociation constant of the neoantigen:MHC class I complex by the dissociation constant of the closest matched human peptide:MHC class I complex. The amplitude adjusts for the regulation of TCR specificities during T cell maturation but also considers that only a normal human peptide capable of binding an MHC class I molecule will significantly impact the immunogenicity of the neoantigen. The amplitude was not significantly different between immunogenic and non-immunogenic neoantigens (p=0.206, Figure 2K).
One final characteristic considered was the hydrophobicity of the neoantigen. The hydrophobicity of the neoantigen has been proposed to be associated with greater neoantigen immunogenicity because of the hydrophobicity of key anchor residues in the MHC class I binding cleft, as well as residues in the TCR, having the greatest interaction with the neoantigen (Chowell et al. 2015). Mixed results have been reported for the association of hydrophobicity and immunogenicity to date (Chowell et al. 2015; Zhou et al. 2019; Łuksza et al. 2017; Wells et al. 2020). One reason is the use of different methods for hydrophobicity, three of which are considered here. In the first method, the hydrophobicity is calculated as a fraction of the neoantigen residues that are hydrophobic (Wells et al. 2020). The TESLA consortium reported a significantly lower hydrophobicity of immunogenic neoantigens compared to non-immunogenic neoantigens, which is in the opposite direction as expected. While there was not a statistically significant difference in the neoantigens included in our analysis, the hydrophobicity fraction still had a lower average for immunogenic than non-immunogenic neoantigens (Figure 3, p=0.211). No difference in hydrophobicity was seen in additional datasets from Carreno or Strønen et al., and a significantly higher hydrophobicity of immunogenic neoantigens was observed in the dataset from Ott et al. (Figure 3, p=0.0168). The second method calculated the hydrophobicity fraction using the empirical observations from Chowell et al. to determine which amino acids would increase the likelihood of immunogenicity (Chowell et al. 2015). The method using the observations from Chowell et al. considers both hydrophobicity and other chemical properties such as side chain bulkiness and polarity. No differences in hydrophobicity were seen across the four datasets using the empirical observations from Chowell et al. (Figure 3). The final method considered the hydrophobicity at the anchor residues. A mutation that changed a previously hydrophobic anchor residue to a hydrophilic residue was given a score of zero, while all other changes or no change were given a score of one (Łuksza et al. 2017). While there was no statistically significant difference in hydrophobicity for any of the four datasets using the method of Łuksza et al., three out of four datasets showed a greater percentage of immunogenic neoantigens without a loss of hydrophobicity at an anchor residue (Figure 3). Only the method from Łuksza was included moving forward, since it had the greatest consistency across datasets.
Next, the degree of correlation between the characteristics calculated for each neoantigen was assessed. Only two characteristics were significantly correlated: the dissociation constant and the binding stability (Figure 4A). Despite their correlation, there is evidence that the two characteristics both contribute to accurate neoantigen prioritization. While the dissociation constant assesses the affinity of the interaction between the neoantigen and the MHC class I molecule, the stability predicts the length of time that the neoantigen will remain bound. The importance of binding stability is demonstrated in Figure 4B, where clustering of the immunogenic neoantigens in the upper left-hand corner can be observed, indicating the influence of both characteristics in determining immunogenicity.
Capietto et al. recently found that a different set of neoantigen characteristics will influence immunogenicity, if the mutation occurs in an anchor residue compared to mutations in non-anchor residues (Capietto et al. 2020). They demonstrated that, if a mutation occurred in an anchor residue, the amplitude had a greater predictive value than the dissociation constant of the neoantigen:MHC class I complex alone. To assess the influence of the mutation position, the distribution of the amplitude and the unadjusted dissociation constant for neoantigens with a mutation in an anchor or a non-anchor residue were analyzed. To maximize the chances of detecting a significant difference with the relatively low number of immunogenic neoantigens derived from mutations in anchor residues (n=13), the analysis of the impact of mutation’s position was performed across the combination of four datasets (Strønen et al. 2016; Ott et al. 2017; Carreno et al. 2015; Wells et al. 2020). As shown in Supplementary Figure 2, no statistically significant difference was observed for the amplitude with either anchor or non-anchor residue mutations. While the anchor-residue mutations did have a higher average amplitude in immunogenic neoantigens, the difference was not statistically significant. In contrast, the dissociation constant was significantly lower in immunogenic neoantigens than non-immunogenic neoantigens. Therefore, there was not a compelling reason to fit separate models for neoantigens with mutations in anchor residues and those with mutations in non-anchor residues. Furthermore, because the amplitude is mathematically dependent on the dissociation constant, the amplitude was not included moving forward.
Regularized regression approach creates a neoantigen prioritization model, NeoScore, that outperforms existing models that score each neoantigen
While analysis of the distribution of each characteristic determined those with a statistically significant difference between immunogenic and non-immunogenic neoantigens, we next assessed whether other characteristics influenced the ability to optimally prioritize SNV and small indel-derived immunogenic neoantigens. A regularized regression is an independent, model-based approach to determine the group of characteristics that are important for discriminating between immunogenic and non-immunogenic neoantigens, whether or not each characteristic is statistically significant. A logistic model using an elastic net-based regularization was unable to identify individual characteristics that were predictive of immunogenicity after shrinkage penalties were applied; a consequence that is often seen with a small effective sample size. The effective sample size for a binomial analysis is based on the class with the smallest number of observations, which is the immunogenic neoantigens (n=26). Consequently, a cross-validation approach was applied to select the best subset of neoantigen characteristics. One thousand combinations of 26 non-immunogenic and 26 immunogenic neoantigens were randomly selected, and the elastic net regularized regression was fit on each combination (Figure 5A). Performing 1000 random samples allowed for the examination of the impact of neoantigen characteristics on immunogenicity while adjusting for the small effective sample size. The number of times each characteristic was selected out of the 1000 random combinations was tracked. The dissociation constant, binding stability, and mRNA expression were each selected over 800 of the 1000 times, while no other characteristic was selected over 500 times (Figure 5B). These fits were consistently able to distinguish immunogenic and non-immunogenic neoantigens, as demonstrated by the high density of area under the receiver operator characteristics curve (AUC) values around the mean of 0.85 (25th percentile 0.818, 75th percentile 0.885) (Figure 5C).
Based on the results of the model-based regression approach, a logistic regression model was fit with the dissociation constant, binding stability, and expression in the TESLA consortium data. The final logistic regression model will be called the “NeoScore.” The equation for the model is as follows: Where E is the scaled, log-transformed expression value, Kd is the log-transformed dissociation constant in units of nanomolar (nM), and S is the log-transformed stability measured as the half-life of the binding interaction in units of hours. The coefficients for expression and stability were both positive, as expected since higher expression and longer binding times likely increase the chance of a neoantigen to elicit an immune response. The coefficient for the dissociation constant was negative, as expected since a lower dissociation constant indicates a higher binding affinity. These coefficients match the observed directions of change from Figure 2. Raw data from NetMHCpan, NetMHCstabpan, and Salmon can be processed and combined to return a set of neoantigens prioritized by their NeoScore using the following web application: https://bordene.shinyapps.io/MHCI_neoantigen_prioritization/.
In the TESLA consortium dataset, the NeoScore had an AUC of 0.846, which exceeds the AUC of 0.70 needed to be considered a discriminatory model (Hosmer, Lemeshow, and Sturdivant 2013). The NeoScore also outperformed the AUC of the Łuksza model (AUC=0.615) (Table 1; Figure 6A). The NeoScore was then tested in four additional datasets. The NeoScore outperformed the Łuksza model in the Cohen (0.834 AUC vs. 0.689 AUC) and Strønen datasets (0.739 AUC vs. 0.620 AUC) (Cohen et al. 2015; Strønen et al. 2016) (Table 1; Figure 6B,C). In the Carreno dataset, the NeoScore slightly outperformed Łuksza and the pTuneos hydrophobicity model (0.704 AUC for NeoScore, 0.696 AUC for pTuneos hydrophobicity, and 0.657 AUC for Łuksza) (Table 1; Figure 6D). Published results of the immunogenicity scores from the pTuneos model were used (Zhou et al. 2019), as the model was not able to be successfully run with the other datasets. Both the hydrophobicity-only model and the full model provided by pTuneos are included, as the hydrophobicity model outperformed the full model. Similarly, in the Ott dataset, the NeoScore slightly outperformed the Łuksza model (0.609 AUC for NeoScore and 0.575 AUC for Łuksza) (Ott et al. 2017) (Table 1; Figure 6E).
Since the model by the TESLA consortium consists of a single set of recommended thresholds for the neoantigen:MHC class I dissociation constant, neoantigen:MHC class I binding stability, and expression, the NeoScore could not be compared to the TESLA consortium model in terms of the AUC. Therefore, an optimal threshold for the NeoScore was selected using optimal cutpointr from R statistical software (https://github.com/thie1e/cutpointr) that maximized the sum of the sensitivity and specificity in the TESLA dataset and classified the NeoScore into high and low immunogenicity. The reported sensitivity and specificity for each dataset are based on the threshold optimized in the TESLA dataset (−2.3346). Across all datasets considered, the sensitivity and specificity of the NeoScore were statistically equivalent to that achieved by the TESLA consortium, as demonstrated by the overlap of the 95% confidence interval (Table 1).
Despite the high performance in the TESLA and Cohen datasets, there is a marked decrease in the performance of the NeoScore in the Carreno, Strønen, and Ott datasets. A likely cause for the decreased performance is how the immunogenicity was tested in these datasets. TESLA and Cohen both tested for reactive T cells present in the patient with no additional T cell stimulation. In contrast, Carreno et al. administered a dendritic cell vaccine with each of the predicted neoantigens and subsequently tested for an immune response to each neoantigen (Carreno et al. 2015). Strønen et al. exposed PBMCs from healthy patients to dendritic cells transfected with the neoantigen of interest. They then tested for an immune response to those neoantigens (Strønen et al. 2016). Ott et al. immunized patients with pools of long synthetic peptides and then tested for an immune response to each neoantigen that could be generated from the long peptides (Ott et al. 2017). None of these three methods rely on the expression of the neoantigen in the tumor to activate neoantigen-specific T cells. Therefore, a logistic regression model was fit to the TESLA dataset using only the neoantigen:MHC class I binding stability and dissociation constant. The following abbreviated NeoScore model was obtained: The coefficients obtained for stability and dissociation constant are comparable to those obtained in the full NeoScore model. The threshold for the abbreviated NeoScore was optimized in the TESLA dataset (−2.6269) and then tested in the Cohen, Strønen, Carreno, and Ott datasets. As expected, the abbreviated NeoScore underperformed compared to the full NeoScore model in Cohen and TESLA, but outperformed the full NeoScore model in both Carreno and Strønen (Table 1; Figure 6). These results suggest that expression predicts neoantigen immunogenicity when priming of T cell responses is dependent on expression of the neoantigen by the tumor. However, when the T cells are stimulated independently of expression by the tumor, the predictive benefit of expression is undermined. The poor performance of Ott across models remains unexplained. The Ott dataset did not significantly differ from the other datasets in terms of the distribution of the location of the mutations (anchor vs. non-anchor residues) or the general distribution of the characteristics (data not shown). Overall, elimination of expression enhanced the performance of the model when considering the immune response stimulated independently of the tumor.
To further reaffirm the subset of neoantigen characteristics selected, a logistic regression model was fit using all nine neoantigen characteristics from Figure 5B, which did not notably improve the AUC (Supplementary Figure 3). Furthermore, the optimism, as calculated by the RMS package in R, is much higher for the model that included all neoantigen characteristics (8.20%) than the NeoScore (1.78%). The optimism indicates the likelihood that the model is overfitting the training data, which would make a less generalizable model. The lack of benefit from the additional characteristics adds support that the subset of characteristics selected is optimal for prioritizing immunogenic neoantigens. Overall, the model with the neoantigen:MHC class I dissociation constant and binding stability, and mRNA expression showed the best potential to consistently separate immunogenic and non-immunogenic neoantigens in validated datasets.
A high NeoScore has a greater association with improved survival compared with mutational burden in cutaneous melanoma patients treated with immunotherapy
Once the ability of the NeoScore to discriminate immunogenic from non-immunogenic neoantigens with high sensitivity and specificity was established, the model was applied to assess clinical outcome in melanoma as measured by progression-free survival in response to immune checkpoint inhibition. The survival analysis was performed across two datasets, one with treatment with anti-CTLA-4 monoclonal antibodies (Van Allen et al. 2015) and the other with anti-PD-1 monoclonal antibodies (Liu et al. 2019). In both datasets, the cohort was restricted to immunotherapy-naive patients with cutaneous melanoma (Liu et al. 2019; Van Allen et al. 2015), which allowed us to assess the factors that drive response to immune checkpoint inhibition in melanoma with a high tumor mutational burden and no previous immunoediting. While the range of mutational burden in cutaneous melanoma is wide (1,368-33,591 for the Van Allen dataset and 864-24,292 for the Liu dataset), all samples have a high mutational burden due to UV-induced, DNA mutations. The final cohort sizes were 34 for Van Allen and 53 for Liu. However, the statistical power of the survival analysis is limited by the number of deaths observed in each dataset, resulting in an effective sample size of 22 for Van Allen and 20 for Liu.
Identification of somatic mutations, generation of neoantigens, as well as calculation of expression, neoantigen:MHC class I dissociation constant and binding stability for each of the patients was performed as described in the methods. These three characteristics were then used to calculate the NeoScore for each neoantigen as described above. Since each patient has many neoantigens with a wide range of NeoScores, there are many potential ways to summarize the neoantigen profile for the sake of comparison between patients. Three ways were selected here: 1) the mutational burden, 2) the neoantigen burden, and 3) the highest NeoScore for a neoantigen from the patient, referred to as the “maximum NeoScore.” The mutational burden and neoantigen burden were attempted for the sake of comparison to the literature, while the maximum NeoScore was used in accordance with the principle that a small number of the most immunogenic neoantigens can drive the immune response. Survival analysis was then performed separately for the mutational burden, neoantigen burden, and the maximum NeoScore.
Although an optimal NeoScore threshold was determined for the prediction of neoantigen immunogenicity, when applied as the threshold for the survival analysis there was not a significant association between a maximum NeoScore that exceeded the optimized threshold and increased progression-free survival (data not shown). There are two possible reasons that there is not a significant association at the threshold optimized in the TESLA dataset. First, the optimal threshold was developed using a validated dataset that only included a subset of possible neoantigens, but here the clinical applicability is assessed on a full set of neoantigens. Second, the NeoScore was trained on the natural T cell responses in the absence of any treatment, but patients in the survival datasets underwent treatment with different immunotherapies. Treatment with immunotherapy may increase the immune response to the tumor and decrease the threshold for immunogenicity required to elicit a response, so unique optimal thresholds for survival analysis were determined for the maximum NeoScore using the MaxStat package from R statistical software (Hothorn 2007). MaxStat assesses both the distribution of the variable (here the maximum NeoScore) and the survival across patients to select a threshold that maximizes the difference in the survival between the groups with a high and low NeoScore. Since MaxStat optimizes the threshold to detect a difference in survival, the results of the survival analysis with the optimized thresholds will need to be confirmed in an independent test dataset.
Next, the association of mutational burden and neoantigen burden with survival in response to immune checkpoint inhibition was evaluated. For the sake of consistency, MaxStat was also used to select the optimal threshold for the mutational burden and neoantigen burden. In the Liu dataset, there was no association between mutational burden and progression-free survival in response to immune checkpoint inhibition (p=0.37) (Figure 7A). In the Van Allen dataset, a high tumor mutational burden was significantly associated with poor progression-free survival (p=0.047) (Figure 7B), which is consistent with the findings of Liu et al. 2019. The neoantigen burden was strongly correlated with the mutational burden in both datasets (Figure 7C and D). On survival analysis, the same results were observed for neoantigen burden as for mutational burden where a high neoantigen burden was not associated with improved progression-free survival in the Van Allen dataset and was associated with decreased progression-free survival in the Liu dataset (data not shown).
When tested with the MaxStat split, patients in the Van Allen dataset with a high maximum NeoScore (above -2.77) had significantly improved progression-free survival (p < 0.0001) (Figure 7E). Similarly, in the Liu dataset, patients with a high maximum NeoScore (above -2.47) had longer progression-free survival, but the difference was not statistically significant (p=0.084) (Figure 7F). Despite the lack of a statistically significant difference, both datasets show that progression-free survival is higher for patients with high maximum NeoScore compared to lower maximum NeoScore. Overall, these results suggest an improved association of the NeoScore with survival following treatment with immunotherapy, compared with tumor mutational burden, in tumors with high tumor mutational burden.
Discussion
Prioritization of immunogenic neoantigens is critical for applications to both the development of personalized vaccines and the identification of patients that are likely to benefit from treatment with immune checkpoint inhibition. However, many neoantigen characteristics have been suggested in the literature to date with no consensus on which characteristics impact whether the neoantigen generates a T cell response. Additionally, no model has demonstrated both the ability to score each neoantigen with high sensitivity and specificity and significant association with survival in response to immune checkpoint inhibition. The successes of this study are as follows: 1) identification of those neoantigen characteristics of greatest importance in determining neoantigen immunogenicity, 2) the combination of these characteristics into a single overall immunogenicity score, the NeoScore, with practical applications to personalized vaccine development, 3) integration of the NeoScore into a web application for easy use, and 4) demonstration of the clinical significance of the NeoScore in melanoma.
A model-based statistical prediction approach was used to select the characteristics of SNV and small indel-derived neoantigens that were most predictive of immunogenicity. The dissociation constant and binding stability of the neoantigen:MHC class I complex, and the expression were the combination of neoantigen characteristics best able to discriminate between immunogenic and non-immunogenic neoantigens. While the identification of these three characteristics is consistent with the recent findings of the TESLA consortium (Wells et al. 2020), it is important to note that the approaches taken by our group and the original group differed in several key ways. First, a completely agnostic approach was applied to the selection of characteristics, including all characteristics that have been suggested in the literature to date; whereas the TESLA consortium began with those shown to have statistical significance. Taking a completely agnostic approach ensured the greatest potential to select the combination of characteristics that maximized the separation of immunogenic and non-immunogenic neoantigens. Second, additional characteristics were included that were not considered by the TESLA consortium, including variant allele frequency and sequence similarity to the closest matched human peptide. Finally, these results were expanded by combining the characteristics into an overall immunogenicity score, the NeoScore. A web application has been made available to calculate the NeoScore or the abbreviated NeoScore and provide a list of neoantigens prioritized by their predicted immunogenicity. The web application is expected to streamline the application of NeoScore for research purposes.
One of the key advantages of the NeoScore is that it allows for the prioritization of neoantigens that may not exceed the single set of thresholds provided by the TESLA consortium. While a threshold for the NeoScore was optimized in the TESLA dataset and demonstrated strong performance across the test datasets, the optimized threshold is necessarily conservative for two reasons. First, all datasets used to build and test the NeoScore consisted of neoantigens that had already been prioritized by the original group. Thus, the NeoScore is trained to discriminate between top candidates. Second, the NeoScore is trained on the natural T cell responses to neoantigens with no stimulation by a therapeutic agent. Treatment with immunotherapy or personalized vaccines may be able to elicit an immune response to neoantigens with a lower NeoScore. The conservativeness of the optimized threshold is highlighted by the fact that lower thresholds are selected for survival analysis, which suggests that a patient with a maximum NeoScore below the threshold may still be able to elicit an immune response to the tumor in the context of immunotherapy. Therefore, all neoantigens can be ranked in order of their NeoScore. Researchers applying NeoScore to a new dataset can first rank the predicted neoantigens, then decide on the optimal number of neoantigens to test for a patient based on the unique neoantigen profile of the given patient.
A surprising finding is the lack of association between mutational burden and progression-free survival in the Van Allen dataset and the association of increased mutational burden with decreased progression-free survival in the Liu dataset. These results are inconsistent with the association of increased mutation burden with increased response rate to immune checkpoint inhibition observed across cancer types (Yarchoan, Hopkins, and Jaffee 2017). An explanation for the lack of association of mutation burden with survival following treatment with immune checkpoint inhibition is that the cutaneous melanoma subset consists of all tumors with a high mutational burden. A tumor with a particularly low mutational burden may not have any neoantigens, causing it to have a poor response to immune checkpoint inhibition. However, a high mutational burden alone does not guarantee a good response to immune checkpoint inhibition. For example, the patient in the Liu dataset with the lowest progression-free survival had 12,973 mutations, but not a single neoantigen predicted to be immunogenic. Similarly, the patient with the second lowest progression-free survival in the Van Allen dataset had 22,666 immunogenic neoantigens with none predicted to be immunogenic. Both cases highlight that, for tumors with a high mutational burden, there are additional characteristics beyond mutational burden that are of importance in determining the immune response. As demonstrated by our work, a high maximum NeoScore has an improved association with progression-free survival compared to mutational burden.
The literature to date supports that mutational burden is associated with response to therapy across cancers (Yarchoan, Hopkins, and Jaffee 2017) or across cancer sub-types, but not within tumors with a high mutational burden. Within non-small cell lung cancer, there was a significant increase in response rates and survival in patients with a higher mutational burden than those with a lower mutational burden (Rizvi, Hellmann, et al. 2015). These results reflect a split between the patients with a mutational signature from smoking carcinogens and those with no evidence of exposure to smoking carcinogens. In contrast, within small cell lung cancers, which are nearly universally associated with smoking, there was a weaker association of mutational burden with response to therapy (Hellmann et al. 2019). Three independent studies demonstrated a significant association between high mutational burden and improved response to immune checkpoint inhibition with either anti-CTLA-4 or anti-PD-1 monoclonal antibody treatment in melanoma patients (Snyder et al. 2014; Van Allen et al. 2015, Liu et al. 2019). However, these studies included cutaneous and occult melanoma, which have a high mutational burden, as well as acral and mucosal melanoma, which have lower mutational burdens. As demonstrated here, there is no association between high mutational burden and improved progression-free survival in the Van Allen and Liu datasets when restricted to cutaneous melanoma. As noted by Snyder et al., the patient in their dataset with the highest number of mutations had minimal or no benefit from anti-CTLA-4 monoclonal antibody treatment. Overall, in combination with prior studies, our results highlight the importance of considering the immunogenicity of the neoantigen in predicting the response to immune checkpoint inhibition.
Despite the successes of our work to date, there are several limitations that highlight areas for future research. First, the maximum immunogenicity score is not able to account for the response to therapy of all patients. One possible reason that the NeoScore is not able to fully predict treatment response is that the model did not include neoantigens derived from gene fusions, products of noncanonical open reading frames, canonical open reading frames with a frameshift, or large indels. To our knowledge, there is no available dataset that has validated neoantigens derived from these sources. Given that recent work has demonstrated that a single neoantigen from a gene fusion product can drive complete tumor regression (Yang et al. 2019) and non-canonical proteins disproportionately generate MHC class I binding neoantigens (Ruiz Cuevas et al. 2021), patients misclassified by our model may have had neoantigens from one of these classes of mutations. Additionally, since each of the validated datasets tested the immunogenicity of neoantigens that had already been prioritized by the original group, there may be classes of neoantigens that are immunogenic but were not included in any test dataset. This is particularly important given recent evidence that there are classes of neoantigens that have very low binding to MHC class I (Brennick et al. 2020). The inclusion of neoantigens derived from a broader set of mutations may alter the neoantigen characteristics selected as important by our regularized regression approach. For example, neoantigens derived from gene fusions, large indels, or frameshifts are likely to have less sequence similarity to normal human peptides than an SNV or small indel-derived neoantigen, causing characteristics such as the sequence similarity or amplitude to be of greater importance. The need to consider additional types of mutations underscores the importance of generating a validated dataset, including these additional mutations, and repeating characteristic selection. A second reason that the NeoScore may not be able to account for the response to therapy of all patients is that the model does not consider whether the neoantigens that ranked as immunogenic were able to elicit a CD4+ T cell response. Studies have observed that the most effective vaccines are those with neoantigens that elicit a combined CD4+ and CD8+ response (Kreiter et al. 2015; Alspach et al. 2019; Ossendorp et al. 1998; Bennett et al. 1997; Xu, Kallinteris, and von Hofe 2012). These studies highlight the importance of creating and integrating a similar prioritization model for MHC class II-restricted neoantigens.
This work has successfully identified the key neoantigen characteristics associated with neoantigen immunogenicity using an agnostic approach that considered all the characteristics suggested by the literature to date. These characteristics have been integrated into a single, overall score, the NeoScore, that predicts the immunogenicity of each neoantigen with high sensitivity and specificity. Finally, the maximum NeoScore has improved association with progression-free survival in response to treatment with immune checkpoint inhibition in cutaneous melanoma compared to mutational burden. NeoScore is anticipated to improve neoantigen prioritization for the development of personalized vaccines and the determination of which patients are likely to respond to immunotherapy.
Data Availability
WES and RNAseq data for the TESLA consortium dataset are available through Synapse. Validation data for the TESLA consortium data is available through the supplementary material from the Wells et al. manuscript. All tested neoantigens, expression data, and validation information for the test datasets are provided through supplementary data tables from their respective publications. RNAseq data for the Cohen dataset was obtained through the NCBI with accession number SRP062169. WES and RNAseq data for both the Liu and Van Allen datasets are accessible through dbgap with the accession number phs000452. Links to all datasets are provided below.
https://www.synapse.org/#!Synapse:syn21048999/wiki/603788
https://ars.els-cdn.com/content/image/1-s2.0-S0092867420311569-mmc4.xlsx
https://trace.ddbj.nig.ac.jp/DRASearch/study?acc=SRP062169
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4607110/bin/JCI82416sdt2.xls
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4549796/bin/NIHMS713231-supplement-Tables_S1-S3.xlsx
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5577644/bin/NIHMS892660-supplement-Supp_4.pdf
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000452.v3.p1
Author contributions
Conceptualization E.S.B., K.H.B., M.A.W., and K.T.H. Methodology E.S.B, B.J.L., M.A.W., and K.T.H. Software E.S.B. Formal Analysis E.S.B. and B.J.L. Investigation E.S.B. Resources K.H.B., M.A.W., and K.T.H. Writing -- Original Draft E.S.B. and K.T.H. Writing -- Reviewing and Editing E.S.B., K.H.B., B.J.L., M.A.W., and K.T.H. Visualization E.S.B. Supervision K.T.H. Funding Acquisition K.T.H.
Declaration of interests
The authors declare no competing interests.
Methods
Training datasets
For training, whole-exome sequencing (WES) and RNA sequencing (RNAseq) data were obtained from the TESLA consortium database on Synapse (Wells et al. 2020). The data come from four patients with melanoma and a single lung cancer patient. Neoantigens were used for model creation if they were 1) tested for immunogenicity by the TESLA consortium, 2) identified by either GATK Mutect2 or Strelka, and 3) had expression data. A description of the datasets used, and all relevant accession details are provided in Supplementary Table 2.
Test datasets
For testing of the model, lists of tested neoantigens from melanoma tumors were obtained (Strønen et al. 2016; Cohen et al. 2015; Carreno et al. 2015; Ott et al. 2017). These datasets contain 357 tested neoantigens (n=7 immunogenic neoantigens) (Cohen et al. 2015), 149 tested neoantigens (n=18 immunogenic neoantigens) (Ott et al. 2017), 57 tested neoantigens (n=11 immunogenic neoantigens) (Strønen et al. 2016) and 21 tested neoantigens (n=9 immunogenic neoantigens) (Carreno et al. 2015). The neoantigens were tested for immunogenicity with tetramer staining and cytokine secretion (Cohen et al. 2015), ELISPOT (Ott et al. 2017), and multimer staining (Carreno et al. 2015; Strønen et al. 2016). Cohen and Carreno also provided the expression data quantified as either fragments per kilobase of transcript per million mapped reads (FPKM), reads per kilobase of transcript per million mapped reads (RPKM), or transcripts per million (TPM) (Strønen et al. 2016; Cohen et al. 2015; Carreno et al. 2015; Ott et al. 2017). For the Cohen dataset, no expression data was provided. The RNAseq data from the Cohen dataset was obtained and analyzed as described below for read count quantification. Accession information is in Supplementary Table 2.
Survival analysis datasets
For survival analysis, WES and RNAseq data were obtained from the Van Allen et al. and Liu et al. cohort (all accession information is in Supplementary Table 2). The inclusion criteria for the Van Allen dataset were as follows: 1) both WES and RNAseq data available, 2) cutaneous melanoma as the primary lesion. All patients in the Van Allen cohort underwent treatment with an anti-CTLA-4 monoclonal antibody (ipilimumab). Inclusion criteria for the Liu et al dataset were as follows: 1) both WES and RNAseq data available, 2) cutaneous melanoma as the primary lesion, 3) no prior treatment with an anti-CTLA-4 monoclonal antibody, and 4) HLA types identifiable with arcasHLA (Orenbuch et al. 2020). All patients in the Liu cohort underwent treatment with an anti-PD-1 monoclonal antibody (nivolumab or pembrolizumab).
Quality control and trimming
WES and RNAseq FASTQ files from the TESLA consortium and Liu et al. and RNAseq FASTQ files from Cohen et al. were visualized for quality using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). FASTQ files were trimmed for quality using Trimmomatic (Bolger, Lohse, and Usadel 2014) IlluminaClip with the following parameters: seed_mismatches = 2, palindrome_clip_threshold = 30, simple_clip_threshold = 10, leading = 10, trailing = 10, winsize = 4, winqual = 15. Quality was then re-visualized after trimming.
Read mapping
Trimmed WES reads were mapped to the GRCh38.p7 reference genome, from 1000 genomes (Clarke et al. 2017), and read group labels were added using BWA-mem (Li 2013). SAM files were converted to BAM, coordinate sorted, and read mates were corrected using SAMtools v. 1.4 (Li et al. 2009). The BAM files were then converted to pileup format using SAMtools v. 1.4 (Li et al. 2009).
Identification of somatic mutations
Single nucleotide variants (SNVs) and small insertions/deletions (indels) were identified using four programs, along with their recommended filters: GATK Mutect2 version 4.1.7.0 with default parameters, Varscan2 version 2.3.9 with minimum coverage of 10, minimum variant allele frequency of 0.08, and somatic p-value of 0.05, Strelka version 2.9.2 with default parameters, and LoFreq version 2.1.1 with default parameters (Benjamin et al. 2019; Koboldt et al. 2012; Saunders et al. 2012; Wilm et al. 2012). Results for GATK Mutect2 were filtered with the recommended FilterMutectCalls, and Varscan2 results were filtered using the Perl false-positive filter (https://github.com/ckandoth/variant-filter). Results from all four programs were examined with and without their respective filters. LoFreq results were not filtered to allow maximal potential to identify the missing mutations. Matched normal samples were used as the reference for each sample.
Variant annotation and neoantigen generation
Somatic mutations were separated from those SNVs that fell within 1 bp of an indel position, as these are likely to be false positives due to alignment errors. The mutations were annotated using the Variant Effect Predictor (VEP) tool from Ensembl version 90.9 (McLaren et al. 2016). Then, 17mer amino acid sequences were generated for each mutation using pVAC-Seq tools version 3.0.5 (Hundal et al. 2016). Finally, the 17mers were split into all possible 9 or 10mers where the mutation of interest was in every location. The full pipeline from read mapping through to the identification of somatic mutations is available at https://github.com/SexChrLab/Cancer_Genomics.
Calculation of neoantigen characteristics
For each of the validated neoantigens, 10 separate neoantigen characteristics with potential significance in predicting expression, processing, MHC binding, and T cell recognition potential were calculated as described here. The full pipeline for calculation and processing of the neoantigen characteristics can be found at https://github.com/ElizabethBorden/Process_peptide_lists. A log transformation was applied if the distribution of the characteristic had a large degree of skewness. The difference in the values for each of the neoantigen characteristics between immunogenic and non-immunogenic neoantigens was assessed using a two-sample, two-sided t-test. Correlation coefficients were calculated using Spearman correlation coefficients. P-values below 0.05 were considered statistically significant.
Expression
For the datasets from Cohen et al., the TESLA consortium, and Liu et al., transcriptome assembly and read count quantifications were completed with Salmon version 0.11.3, using the Ensembl GRCh38.p7 reference genome (Cunningham et al. 2015; Patro et al. 2017). mRNA expression in units of TPM was log-transformed, and a constant of 0.1 was added to all values before the transformation to avoid taking the log of zero. To account for the different units used across each dataset, expression values were centered and normalized by subtracting the mean and then dividing by the standard deviation.
Clonality
Copy number variation was calculated with sequenza (Favero et al. 2015), and clonality was calculated using the deconvolution software, FastClone (Xiao et al. 2020).
Variant allele frequency
Variant allele frequency (VAF) as calculated by GATK Mutect2 (Benjamin et al. 2019). Since the VAF was only calculated by GATK Mutect2, but not for Strelka, 14 missing data points were estimated as the average of the rest of the data. No statistically significant difference in the VAF was observed between immunogenic and non-immunogenic neoantigens, either with or without these data points.
Processing
Cleavage and TAP transport potentials were calculated for each of the available neoantigens using NetCTLpan1.0 (Stranzl et al. 2010).
Dissociation constants
Dissociation constants of the neoantigen:MHC class I complex were calculated in nanomolar (nM) units using NetMHCpan4.0 (Reynisson et al. 2020). These values were log-transformed before inclusion in the model.
Binding stability
Binding stability of the neoantigen:MHC class I complex was then calculated as the half-life in units of hours using NetMHCstabpan1.0 (Reynisson et al. 2020; Rasmussen et al. 2016). These values were log-transformed before inclusion in the model.
Hydrophobicity method from the TESLA consortium
The number of hydrophobic residues was divided by the total number of residues in the neoantigen to create a “hydrophobicity fraction” (Wells et al. 2020). In keeping with the methods of the TESLA consortium, hydrophobic residues here were isoleucine, leucine, phenylalanine, methionine, tryptophan, valine, and cysteine.
Hydrophobicity with empirical prevalence
The hydrophobicity fraction was calculated as described for the TESLA consortium using the amino acids found to be empirically prevalent (Chowell et al. 2015). Proline, leucine, and methionine were considered to have a high probability and given a score of +2. Glycine, tryptophan, phenylalanine, isoleucine, and valine were found to have a medium probability and given a score of +1. All others were given a score of 0.
Hydrophobicity Łuksza method
A neoantigen was given a score of zero if the mutation at the anchor residue changed from a hydrophobic to a hydrophilic residue (Łuksza et al. 2017). All other neoantigens were given a score of one. Hydrophobic neoantigens here were defined as alanine, isoleucine, leucine, methionine, phenylalanine, tryptophan, tyrosine, and valine.
TCR recognition
Prediction of TCR recognition potential, R, was calculated as described (Łuksza et al. 2017). A BLOSUM62 similarity matrix was used to assess the sequence similarity between a neoantigen and the closest matched known T cell epitope from the Immune Epitope Database (IEDB) (Vita et al. 2015). The sequence similarity was then used in place of binding energies, and the TCR recognition potential was calculated as: Where |s, e| is the sequence similarity, a is the horizontal displacement of the binding curve, and k sets the steepness of the curve at a. Based on the model fit by Łuksza et al., the parameters a and k were set to 26 and 4.87 respectively (Łuksza et al. 2017). These parameters were optimized for both melanoma and lung cancer patients, the two cancer populations included in our training and test datasets. Finally, Z(k) is the partition function over the unbound state and all bound states, calculated as follows:
Sequence Similarity
The closest matched human peptide was identified using Blast v. 2.10.1 (Camacho et al. 2009), and the sequence similarity of the potential neoantigen to the closest matched human peptide was calculated using a BLOSUM62 matrix, as described (Wood et al. 2018). The sequence similarity was normalized across neoantigen length by dividing by the number of amino acids.
Amplitude
Dissociation constants for the neoantigen:MHC class I complex were then adjusted by the dissociation constants for the closest matched normal human peptide:MHC class I complex, a characteristic called amplitude. The ratio was taken of the dissociation constant of the closest matched human peptide:MHC class I to the dissociation constant of the neoantigen:MHC class I as follows (Łuksza et al. 2017):
Analysis of anchor vs. non-anchor residue mutations
Neoantigens were separated by their mutation position (anchor vs. non-anchor residues). Then, both the amplitude and the dissociation constant of the neoantigen:MHC class I complex were compared between the immunogenic and non-immunogenic neoantigens within each group. Comparisons were done using a two-sample, two-sided t-test. P-values below 0.05 were considered statistically significant.
HLA typing
HLA types were identified on normal tissue samples for each patient in the Liu et al. dataset using arcasHLA (Orenbuch et al. 2020). One limitation of the arcasHLA method is that all 6 HLA alleles were not determined for each patient and at least one HLA (A, B, or C) was excluded for 18 patients. HLA types provided in the supplementary data were used for each patient in the Van Allen et al. dataset (Van Allen et al. 2015).
Elastic net regression modeling
The TESLA consortium data was split into 1000 random subgroups containing all the immunogenic neoantigens (n=26) and an equal number of randomly selected non-immunogenic neoantigens. An elastic net regression was performed using the glmnet package in R for each of these splits (Engebretsen and Bohlin 2019). The selected coefficients and area under the receiver operator characteristics curve (AUC) from each model were tabulated across the 1000 splits.
Logistic regression modeling
Logistic regression modeling was performed with the RMS package in R statistical software (https://cran.r-project.org/web/packages/rms/rms.pdf).
Survival analysis
To avoid scaling expression values on a patient-level basis, coefficients were determined on the TESLA consortium data for log-transformed, non-scaled TPM expression data. The intercept changed to -4.0494 and the expression coefficient to 0.4749, but there was no change the coefficients of the dissociation constant or binding stability and no affect the performance of the model. The optimal threshold for each of the mutational burden, neoantigen burden, and maximum NeoScore were determined using MaxStat (Hothorn 2007). Then the survival analysis was performed between scores above the selected threshold and below using a Kaplan Meier analysis and a log-rank test for statistical significance. P-values below 0.05 were considered statistically significant.
Acknowledgments
We thank Dr. Tanya N. Phung for her advice and assistance in comparing the software for the identification of SNVs and indels and integrating the pipeline from raw whole-exome sequencing data to potential neoantigens. We thank Dr. Chi Zhou for his assistance with the pTuneos software. We thank Anngela Adams for her assistance in polishing figures. This work was supported in part by the Springboard Initiative from the University of Arizona College of Medicine-Phoenix (K.T.H), and the University of Arizona College of Medicine-Phoenix M.D./Ph.D. Program (E.S.B).