Abstract
Aging-related processes such as cellular senescence are believed to underlie the accumulation of diseases in time, causing (co-)morbidity, including cancer, thromboembolism and stroke. Intervening into these processes may delay, stop or reverse morbidity. To study the link between (co-)morbidity and aging, by exploring biomarkers and molecular mechanisms of disease-triggered deterioration, we will recruit 50 patients with pancreatic ductal adenocarcinoma, 50 patients with (thromboembolic) ischemic stroke and 50 controls, at Rostock University Medical Center. We will gather routine blood data, clinical performance measurements and patient-reported outcomes at up to 9 points in time, and in-depth transcriptomics & proteomics at two early time points. Aiming for clinically relevant biomarkers, the primary outcome is a composite of probable sarcopenia, clinical performance (described by ECOG Performance Status for patients with pancreatic ductal adenocarcinoma and the Modified Rankin Scale for patients with stroke) and quality of life. Further outcomes cover other aspects of morbidity such as cognitive decline, and of comorbidity such as vascular or cancerous events. The data analysis is comprehensive in that it includes biostatistics & machine learning, both following standard role models & additional explorative approaches. Predictive biomarkers for interventions addressing senescence may become available if the biomarkers that we find are predominantly related to aging / cellular senescence. Similarly, diagnostic biomarkers will be explored for their relationship to aging / cellular senescence. Our findings will require validation in independent studies, and our dataset shall be useful to validate the findings of other studies. In some of the explorative analyses, we shall include insights from systems biology modeling as well as insights from preclinical animal models. We humbly suggest that our detailed study protocol and data analysis plan may also guide other biomarker exploration trials.
In Brief The SASKit (“Senescence-Associated Systems diagnostics Kit for cancer and stroke”) study primarily aims to discover novel biomarkers for deterioration of health and (co-)morbidities triggered by pancreatic ductal adenocarcinoma or ischemic stroke.
Introduction
Study Rationale and Aims
The primary aim of the SASKit (“Senescence-Associated Systems diagnostics Kit for cancer and stroke”) study is to discover a set of molecular biomarkers for outcomes after pancreatic ductal adenocarcinoma (PDAC) and ischemic stroke (IS), which are specifically useful to predict disease-triggered deterioration of health (“disease deterioration” for short) in terms of probable sarcopenia 1, reduced clinical performance and quality of life (QoL). The outcomes also include the (co-)morbidity of vascular events (here defined as stroke, myocardial infarction, and venous or arterial thromboembolism) in patients with PDAC, which are observed frequently apart from sarcopenia. Also included is the (co-)morbidity of any kind of cancer and of cognitive decline following IS. Moreover, we consider mortality, as the most canonical outcome. Following up on the primary aim, we will investigate the nature of the molecular biomarkers to find out whether cellular senescence and other aging-associated processes are contributing to disease deterioration. As a secondary aim, we will search for diagnostic biomarkers related to cellular senescence and other aging-related processes that may differentiate healthy controls from PDAC or IS patients. Therefore, in the following we motivate our study by describing the prevalence and the outcomes of PDAC and IS, the known predictors of these outcomes, and the specific prevalence of co-morbidity and known predictors for this co-morbidity. The role of cellular senescence in aging and disease is described in Box 1. The background of the cancerous and vascular comorbidity is described in Box 2. Avoiding unclear or circular terminology, we define a biomarker in a very general fashion, simply as a feature (data point) f1 that successfully predicts another feature f2 at a later time-point 2, in a biomedical context. Here, features may be composite ones, based on the measurement of individual features. Often, feature f1 refers to molecular data, while feature f2 refers to phenotypic data, such as clinical outcomes. Ultimately, we aim to identify biomarkers that are easy to measure, and that are then validated in other studies to predict a clinically relevant outcome.
Aging and cellular senescence
Extra lifetime gained over the last century led to the widespread emergence of age-related diseases that are rarely seen in younger people. Older patients are thus more likely to display several comorbidities, which makes treatment difficult and expensive. Over the last years, strong evidence has accumulated that the presence of senescent cells (i.e. non-dividing, arrested but metabolically active cells that escape apoptosis) is causally involved in diseases such as atherosclerosis, cancer, fibrosis, pancreatitis, osteoarthritis, Alzheimer disease and metabolic disorders 43,44. Evidence that senescent cells are not only correlated with aging and diseases, but are instead causally involved, comes from recent studies, which transplanted senescent cells from old into young mice 45. This resulted in persistent functional impairment as well as spread of cellular senescence to host tissues. Another strong line of evidence comes from experiments that actually removed senescent cells from aged mice by senolytics 45-47. In each case an increase in lifespan and a delay of typical age related diseases was observed. Most recently, the results of human pilot trials of putative senolytic treatments in case of idiopathic pulmonary fibrosis and osteoarthritis have been reported. One team 48 treated idiopathic pulmonary fibrosis patients with dasatinib and quercetin and demonstrated safety as well as notable improvements in some physical abilities. Furthermore, a human phase-1 study demonstrated that a senolytic compound, which was applied locally in patients with osteoarthritis of the knee, was safe and well-tolerated 49. A clinically meaningful improvement in several measures, including pain, function, as well as modulation of certain senescence-associated secretory phenotype (SASP) factors and disease-related biomarkers was observed after a single dose.
Pancreatic ductal adenocarcinoma: prevalence and outcomes
The incidence of pancreatic cancer is increasing; in 2017 the global incidence was 5.7 per 100,000 person-years 3. Age is the most important risk factor, and incidence peaks at 65 to 69 years in males and 75 to 79 years in females 3. Pancreatic ductal adenocarcinoma (PDAC) is the most common histological type of pancreatic cancer4. The disease is characterized by late clinical presentation 5, early metastases and poor prognosis, with a one-year survival rate in Europe of only 15% 6. Many patients have unresectable disease at the time of diagnosis, either as locally advanced disease or already with metastases. Therefore therapy is palliative consisting of chemotherapy and/or best supportive care. Disease deterioration with weight loss and low muscle strength, that is, cachexia and sarcopenia 7, will follow, for some patients rapidly (within a few weeks) and for others during a longer interval of one or two years. Recent developments in oncology have not shown much benefit in clinical trials of patients with PDAC 8. Inflammation, desmoplasia and early metastases are deemed responsible for the difficulties in targeting the disease. Moreover, vascular events are frequent problems in the course of PDAC and may contribute to disease deterioration or early death. Venous thromboembolism is the most common event occurring in up to 34% of patients with metastatic PDAC 9 10, but arterial ischemic events, like stroke, are also reported 11-14, see also Box 2. Therefore, deterioration and mortality in PDAC can not only be explained by tumor progression as such, but other factors like sarcopenia/cachexia and vascular events contribute as well. Furthermore, we suggest that the underlying cause of all these factors are aging-related processes such as cellular senescence and chronic inflammation.
Cellular senescence and the comorbidity of cancer and vascular events
Some cancers such as PDAC can trigger vascular events by hyper-coagulation, reflecting Trousseau’s syndrome first reported 150 years ago 11. In turn, strong associations between coagulation, cellular senescence and the SASP were demonstrated recently 50. While cellular senescence can suppress PDAC and cancerous proliferation in general, it also triggers tumor progression by fostering inflammatory processes, including the SASP, while on the other hand, after ischemic stroke, it attenuates recovery51-55. For both diseases, causal influences can be traced back to molecular determinants: PAI-1 (also known as SERPINE1 and part of the SASP) is involved in cancer-triggered thromboembolism 52,54 and stroke recovery in animals 56. Other proteins involved in cellular senescence, specifically inflammatory cytokines such as IL6, and the lesser known osteopontin and gelsolin, are also markers for both PDAC and stroke 57-60. The cyclin-dependent kinase CDK5 61 is implicated in the progression of PDAC as well as in the recovery from stroke 55,62. Moreover, apart from being genetic risk factors 63,64, the most prominent drivers of cellular senescence (p16/CDKN2A and p21/CDKN1A) also promote PDAC progression 65 and endothelial embolic and arteriosclerotic mechanisms of stroke 66. Finally, two small-molecule interventions into cellular senescence, fisetin and quercetin, are both potential treatments of both PDAC and stroke. In case of stroke, the blood-brain-barrier is passed by quercetin which improves stroke outcome 67. In case of PDAC it was observed that quercetin inhibits pancreatic cancer growth in-vitro and in-vivo 68. Fisetin is found in various fruits (especially strawberries) and it is chemically similar to quercetin, with strong putative senolytic effects, extending lifespan of mice even when intervention with fisetin started only at an advanced age 69. In a study involving nude mice implanted with prostate cancer cells, treatment with fisetin significantly retarded tumor growth 70. Also, in case of lung cancer, there is evidence for the beneficial effects of fisetin. One study showed that fisetin provides protection against benzo(a)pyrene [B(a)P]-induced lung carcinogenesis in albino mice 71 and another in vivo study demonstrated the synergistic effects of fisetin and cyclophosphamide in reducing the growth of lung carcinoma in mice 72. Several other studies have also demonstrated its anticarcinogenic, neurotrophic and anti-inflammatory effects that are beneficial in numerous diseases, including pancreatic cancer and stroke 73.
Pancreatic ductal adenocarcinoma: known biomarkers and clinical scores
In PDAC patients there is a lack of established scores describing the risk of disease deterioration and the risk of sarcopenia/cachexia in particular. Referring to the endpoint of overall survival, some recent studies tried to establish inflammation-based scores to better characterize outcome in PDAC. In a retrospective analysis of 386 patients with PDAC of different stages, CRP/Alb ratio, neutrophil– lymphocyte ratio (NLR), platelet–lymphocyte ratio (PLR) and modified Glasgow prognostic score (mGPS) were studied 15. In patients with locally advanced and metastatic disease, the CRP/alb ratio was an independent factor of poor survival 15. Another retrospective study evaluating CA19-9, CEA, CRP, LDH and bilirubin levels in locally advanced and metastatic pancreatic cancer patients treated with chemotherapy showed an independent prognostic significance for overall survival only for CA 19-9 decline during treatment 16. Other studies have evaluated risk factors for thromboembolic events in pancreatic cancer patients and more generally in patients with cancer 17 (see also Box 2). The Khorana score, developed more than ten years ago, is widely used to estimate venous thromboembolic risk in the population of cancer patients 18; it integrates standard laboratory parameters (platelet count, hemoglobin, leukocyte count), body mass index (BMI) and the cancer site (with pancreatic cancer and gastric cancer classified as very high risk). Still, its performance was questioned in a retrospective cohort of pancreatic cancer patients 19 and in a prospective cohort study of patients with different cancer types, among them 109 with pancreatic cancer 17. The clinical association of PDAC, sarcopenia/cachexia and thromboembolism is well-described 11, but still not understood in its pathophysiology 20. Within the SASKit study we aim to identify biomarkers and molecular mechanisms contributing to this clinical association, by investigating their relation to clinically relevant outcomes.
Ischemic stroke, prevalence and outcomes
Ischemic stroke (IS) occurs in the German population with an incidence of 236 per 100,000 per year 21. The mean age of acute stroke patients is 73-74 years, with more than 80% of patients being over 60 years old. After a first stroke, nearly 5% of patients suffer a second stroke within a year. Mortality after IS is about 12% within one year and about 30% within five years 21. Mild to moderately disabled stroke survivors showed an elevated prevalence of sarcopenia >6 months after onset of stroke compared with non-stroke individuals (13.2% vs 5.3%) 22. The mechanisms underlying sarcopenia include loss of muscle mass, reduction of fibre cross-sectional area and increased intramuscular fat deposition occurring between 3 weeks and 6 months after stroke in both paretic and non-paretic limbs 23. Comorbid, or subsequent cancer may facilitate sarcopenia after IS. A US nationwide inpatient sample study reported that 10% of hospitalized IS patients have comorbid cancer, 16% of them with gastrointestinal cancer and 1% with PDAC, and that this association may be on the rise 24. Additionally, within two years after IS, another 2% to 4% of patients receive a new cancer diagnosis 25-27. Within the SASKit study we aim to identify biomarkers to predict outcome after IS in terms of general health state (i.e. sarcopenia, deterioration of clinical performance, cognitive functioning, frailty) and quality of life, as well as (co-)morbidity, as we do for the PDAC cohort.
Ischemic stroke, known biomarkers and clinical scores
In an early study of 956 patients with acute IS, determinants of long-term mortality were age, obesity, cardiac arrhythmias, diabetes mellitus, coronary heart disease and organic brain syndrome at discharge from hospital; interestingly, hypercholesterolaemia and smoking did not affect long-term outcome 28. More recent studies uniformly identified age and stroke severity, usually assessed on the NIHSS or similar scales, as biomarkers of long-term functional outcome and mortality after stroke 29 30. Fibrinogen has been related to long-term outcome after stroke 31 32. There have been conflicting data on the predictive value of serum bilirubin levels on the long term risk of cardiovascular disease. While some studies are in favor of a predictive value (e.g.: 33-35), others are not (e.g.: 36). Also, CRP levels have been reported to impact the functional long-term outcome after IS 37, and early neurological deterioration after IS has been related to decreasing albumin levels, elevated CRP and fibrinogen levels 38. Potential biomarkers for occult cancer in IS patients include elevated D-dimers, fibrinogen, and CRP; infarction in multiple vascular territories; and poor nutritional status 39. Interestingly, IS patients with elevation of at least two of the following coagulation-related serum markers, that is, D-dimer, prothrombin fragment 1.2, thrombin-antithrombin complex and fibrin monomer, in the post-acute phase of stroke, were more likely to have occult cancer or recurrent stroke during follow-up for 1.4±0.8 years 40. In another study of acute IS patients, high D-dimer levels at admission were independently associated with recurrent stroke and all-cause mortality during follow-up for up to 3 years 41. These findings underpin the idea of shared risk factors for unfavorable outcomes in IS as well as cancer and they suggest that there may be coagulation-related biomarkers indicating an early stage of carcinogenesis or stroke (see also Box 2). Nevertheless, the clinical biomarkers that currently exist for predicting outcome are limited in their performance and clinical utility, and there is a need to overcome the limitations of current predictive models 42.
Methods
The presentation is based on the reporting recommendations for tumor marker prognostic studies (REMARK), that is, items (1) – (11) of the REMARK checklist 74.
Study design
The SASKit (“Senescence-Associated Systems diagnostics Kit for cancer and stroke”) study is designed as a prospective, observational, cohort study to identify biomarkers for disease deterioration in patients with PDAC or with IS and, specifically, for the (co-)morbidities of these diseases including vascular events and sarcopenia following the diagnosis of PDAC as well as cancer and cognitive decline following IS. All patients will be treated for their diseases in accordance with current guidelines or therapy standards and at the physician’s discretion. Due to the observational study design, regular treatment of the patient is not affected apart from sampling blood (20 to 80 ml at up to 7 time-points over the next years). Assessment of disease deterioration will be based on standardized clinical performance measurements, and patient reported outcomes based on questionnaires (see below for details). Additionally, data from clinical charts and information from the general practitioner will be collected. The SASKit study is divided into two subtrials with a common control group, both featuring essentially the same outcomes, predictor measurements and data analysis approaches.
Characteristics of participants (patients and controls)
In the first subtrial (PDAC-subtrial), patients with an initial diagnosis of PDAC in locally advanced or metastatic stage without previous systemic therapy will be considered for enrollment, whereas patients with a (thromboembolic) IS of the supratentorial brain region within the past 5 to 10 days, with a definitive brain infarction volume >10 ml in an assessment by magnetic resonance imaging (MRI) will be considered for the second subtrial (IS-subtrial). Except for some explorative analyses, the subtrials will be analyzed separately.
Within both subtrials, eligible as controls are those without PDAC or IS and with no other malignant disease or other (hemorrhagic) stroke during the past two years. Potential controls will be recruited from persons who have lived in the same household as the patient within the last 2 years, have a maximum age difference of 12 years and are neither brothers nor sisters (i.e. spouses, second-degree relatives or friends). The controls are selected so that the age and gender structure approximately reflects the age and gender distribution of the patients. Therefore, the age and gender of the patients will be continuously recorded, and the controls selected in such a way that their frequency distribution of gender at any time corresponds approximately to that of the currently recruited patients.
The following criteria lead to exclusion from participation in the study for both patients and controls, at time of recruitment:
previous or current medical tumor therapy
other cancer within the past 2 years
previous stroke with persistent deficit
myocardial infarction within the past 2 years
therapeutic anticoagulation within the past 2 years for longer than 1 month
pre-existing dementia
chronic heart failure stage NYHA IV
terminal renal insufficiency with hemodialysis
known HIV infection
known active hepatitis C
pregnancy
age < 18 years.
Both subtrials will be implemented according to the same standardized protocol. After written informed consent of each participant, patients and controls will be followed up at 3, 12, 24, 36 and 48 months after their inclusion in the trial, whenever possible. The PDAC-subtrial includes an additional time-point for examinations at 6 months after inclusion, given that mortality due to PDAC is expected to be accelerated as compared to IS.
The study is expected to start in the second quarter of 2020 and will finish with the last participant’s follow up at 48 months. Until that time, we expect that 50 PDAC patients, 50 IS patients, and 50 controls participated in the trial. The study will be conducted at the Rostock University Medical Center (UMR), Germany at Clinic III - Hematology, Oncology, Palliative Medicine and at the Department of Neurology; the institutions of the other co-authors are supporting the study in a variety of ways. The study protocol has been approved by the ethics committee of the UMR. The study is registered at German Clinical Trials Register (DRKS00021184) and will be conducted following ICH-GCP.
General health- and disease-related and demographic data
General data of the study participants will be recorded at the beginning of the study (“month 0”) and consist of the following: age, sex, BMI, temperature, blood pressure, heart rate (ECG). Furthermore, through interviews the following additional data will be recorded: vascular risk factors (arterial hypertension, diabetes, hyperlipidaemia, smoking habits), history of vascular events (stroke, myocardial infarction, venous or arterial thromboembolism), atrial fibrillation, history of cancer, current medication, surgery or blood transfusions in the past three months and vascular or cancerous events affecting any first degree relatives. These data may provide influential factors for explorative analyses, or be employed to interpret and discuss the results of the study.
Blood sampling
Blood sampling will be done in a standardized fashion, that is, fasting and between 8 and 10 am, for all assays. Routine blood parameters will be recorded at the time-points described above (months 0 to 48). These consist of differential blood count, INR (International normalized ratio of prothrombin time), partial thromboplastin time, D-dimers, fibrinogen, factor XII, albumin, bilirubin, high-sensitive CRP, CA19-9, cholesterol, and HbA1c.
Experimental blood analysis (PAI-1 and omics) will be done for patients at month 0 in case of PDAC, at month 0 or at month 3 in case of stroke (where the 3-month time point is taken if it reflects a better state of the patient as described by the NIHSS), and furthermore at month 3 in case of PDAC, and at month 12 in case of stroke. For controls, the experimental blood analysis will be carried out at month 0 and at month 12, assuming that for these, data do not change much in the 3 months after baseline. The justification for taking the better state in case of stroke is the maximization of differences with the 12 months follow-up data. In terms of practicality (being able to calculate a biomarker signature sooner), however, the state at month 0 should be selected for all stroke patients. Since the blood sample will be taken pre-processed and frozen at month 0 in all cases, we are in principle able to perform the experimental blood analysis for all stroke patients at month 0, and we can do this analysis in retrospect if deemed necessary. We also take blood of PDAC patients at month 12, to have the option to do an experimental blood analysis if deemed useful. In the following we will refer to the baseline time-point (month 0, or month 3 in cases of stroke patients that improved) and the landmark time-point (month 3 for PDAC patients and month 12 for stroke patients and controls). The experimental blood analysis is done earlier for PDAC because of high expected mortality within the first year.
The experimental blood analysis includes PAI-1 (see Box 2) as well as high-throughput (omics) analyses, that is, transcriptomics and proteomics analysis in T-cells and proteomics of serum. T cells are of interest because these were reported to carry the strongest signal with respect to cellular senescence, based on the marker p16 75. We intend to measure gelsolin and osteopontin as well, provided that sufficiently standardized assays become available in due time; the blood collected for this measurement shall otherwise be used to measure cytokines/chemokines such as IL6, IL8 and TNFC, which are part of the SASP, by ELISA assays. At time of writing, we do not yet have reliable estimates on the amount of blood cells still available for measuring protein expression, so an antibody-based protein array (in case of low amounts), or mass spectrometry (in case of sufficiently high amounts) will be used alternatively. For the blood serum, we intend to use the same protein measurement method. In the default case of a protein array, we plan to use the novel but dedicated “Senescence Associated Secretory Phenotype (SASP) Antibody Sampler Kit” (consisting of approx. 10 SASP-related proteins being measured; Cell Signaling Technology) for both cellular and serum proteomics. Further exploratory molecular analyses not (yet) funded but permitted based on the ethics approval include the following: single-cell analyses of blood, methylation assays for calculating epigenetic clocks 76, genetics by SNP array or whole-genome sequencing, and telomere length. A separate ethics approval was granted for an optional skin biopsy; skin microbiome analyses are planned as well.
Blood sample processing for the experimental analysis will be performed according to standard operating procedures (SOP) at the research laboratory of Clinic III - Hematology, Oncology, Palliative Medicine. The procedures include flow cytometric control of the sampling quality including distribution of cell types and vitality as performed in routine diagnostics. Isolation of peripheral blood mononuclear cells (PBMCs) will also be performed following the SOP used by the laboratory in routine diagnostics. T-Cell separation will be performed according to an established work flow based on magnetic bead purification via Miltenyi MACS following manufacturer’s instructions. T cell fraction purity as well as vitality will then be verified by flow cytometric analyses as described above. Nucleic acid isolation as well as protein isolation will be further performed according to the SOP of the research laboratory performed using column separation (Qiagen, Hilden Germany). RNA integrity values (RIN) will be analysed using an Agilent Scientific Instruments Bioanalyzer as instructed by the manufacturer. RIN values above 6 will qualify for RNAseq or Clariom D Array analyses; for RNAseq average reads per sample will be set at approx. 40 × 10e6.
Clinical performance measurements and patient-reported outcomes
At baseline and at each follow-up, handgrip strength (“grip strength” for short) is measured using a digital hand dynamometer (Jamar Plus). The test is performed while sitting comfortably, shoulder adducted, elbow placed on the tabletop and flexed to 90 degrees, with the forearm and wrist in a neutral position77. The highest value of three measurements of maximal isometric contraction of the dominant hand, or if paralyzed due to IS, contraction of the unaffected hand, is documented in kg. Further, the following clinical performance measurements are evaluated by the study physician or study nurse according to standard protocols: ECOG Performance Status (ECOG PS)78, modified Rankin Scale (mRS)79, Canadian Study on Health & Aging Clinical Frailty Scale (CSHA-CFS)80, NIH-Stroke Scale (NIHSS)81, Montreal Cognitive Assessment (MOCA)82. All raters are certified for the applicable scores (mRS, NIHSS, MOCA). Patient-reported outcomes (measured by questionnaires) are the following: EQ-5D-5L and EQ-VAS (generic evaluation of QoL in 5 domains and overall on a visual analog scale)83, HADS-D (evaluation of anxiety and depression)84, WHODAS 2.0 (WHO Disability Assessment Schedule)85, and, for patients with PDAC, FACIT-Pal (evaluating QoL with focus on palliative symptoms and needs)86,87. All questionnaires are administered following the suppliers’ instructions.
Follow up data
Apart from the clinical and patient-reported outcomes, further follow-up data are BMI, temperature, blood pressure, heart rate (ECG), atrial fibrillation, current medication, tumor treatment, comorbidity (any vascular or cancer event), hospital admissions or palliative care. Additionally, based on clinical charts and information from the general practitioner, we will record medication, (co-)morbidity and mortality. Just like the general health- and disease-related and demographic data recorded at time of recruitment, these data may provide influential factors for explorative analyses, or be employed to interpret and discuss the results of the study.
Endpoints
In both subtrials, the primary endpoint is a composite measure of “disease deterioration” defined as the first occurrence within a follow-up interval of at least one of the following.
Sarcopenia, measured by grip strength less than 27 kg for males and less than 16 kg for females (according to the revised European consensus, EWGSOP2, 1).
Deterioration of clinical performance, that is, of the ECOG PS by at least two points (PDAC-subtrial), or of the mRS by at least one point (IS-subtrial).
Deterioration of QoL, described as a reduction of the EQ-5D-5L by at least 0.07 in the index score, and deterioration of at least 7 points in the EQ-VAS (ranging from 0-100).
Deterioration will be considered between baseline (month 0) and the respective follow-up investigation. As described above, for patients with IS who have improved their condition (measured by NIHSS) within the first 3 months, this time point (month 3) will be used as a baseline instead. Item (a) is the deterioration from “no sarcopenia” to “probable sarcopenia” as defined by current consensus1. Grip strength has been widely used for assessing muscle strength, which is currently used as the most reliable measure of muscle function, loss of which indicating sarcopenia1. ECOG PS is established in describing the general condition of patients with cancer, whereas mRS is established in patients with stroke. Death is reflected by both scores as ECOG PS of 5 or mRS of 6, and it will always consider death from any cause. The EQ-5D-5L evaluates QoL in five dimensions (mobility, self-care, usual activity, pain/discomfort, and anxiety/depression), all relevant for patients with PDAC and IS. Furthermore, it is a generic score so that results will be comparable for different diseases (as recently described in patients with stroke88) and for the general population89). Even though disease-specific scores might evaluate symptom burden in even more detail, the EQ-5D-5L was recently shown to be comparable to QoL scores developed specifically for pulmonary embolism and deep vein thrombosis (that is, PEmb-QoL, VEINES-QOL/Sym and PACT-Q2) in terms of acceptability, validity and responsiveness90. A clinical deterioration in EQ-5D-5L is described as a minimal important difference in the range from 0.07 to 0.09 index points and in VAS from 7 to 1091which is the basis for the definition of item (c). Controls reach their endpoint by the same definition as the subcohort for which they serve as control; in any integrative analysis of both subtrials, a deterioration of the mRS by at least one point will be used as the criterion (instead of ECOG PS), because stroke patients in general have a slower deterioration than PDAC patients, and controls naturally have the slowest expected deterioration.
The primary composite endpoint and all secondary endpoints will be evaluated in a first analysis, based on data obtained until summer 2021, and in a second analysis, based on data obtained until summer 2023, and in a third analysis at the end of the study. The second analysis may be delayed until data of 90% of the study participants are available (at least including the month 12 follow up) and it may then constitute the “main” analysis of the study.
The following secondary endpoints are evaluated:
each component of the primary endpoint (separately);
occurrence of disease-specific (co-)morbidities, as follows
new vascular events (stroke, myocardial infarction, venous or arterial thromboembolism), specifically in patients with PDAC;
new cancer, specifically in patients with IS;
probable sarcopenia (based on grip strength);
cognitive decline (deterioration of MOCA by 3 points from best value at baseline);
frailty, defined as a CSHA-CFS level of 6, 7, or 8;
all-cause mortality.
Further, a sum-score summarizing all measurements of phenotypic variables (grip strength, clinical performance measurements, comorbid events, mortality) will be considered as a surrogate for “aging”, normalizing all continuous-scaled components in order to obtain a common scale with an average of zero and standard deviation of one. The components of the sum-score will all be given equal weight.
Predictors
While all phenotypic features (grip strength, clinical performance, patient reported outcomes, comorbid events, mortality) are contributing to the definition of endpoints (as dependent variables/parameters), all routine and experimental blood features (PAI-1, omics) are considered to be potential predictors; these are also called the independent variables/parameters. This delineation is justified by (a) the paradigm that (clinical) relevance is tied to high-level phenotypes describing health and survival, specifically including QoL2, and (b) the goal of developing a “senescence-associated systems diagnostics kit” that includes a careful selection of biomarkers contributing, as much as possible, also to molecular-mechanistic insights into PDAC, IS and their (co-)morbidity, which we hypothesize to be related to cellular senescence and aging. Age and gender will be included as mandatory covariates (also termed confounders, that is, predictors which we do not aim to explore, or which we wish to improve upon) in all statistical models. Further covariates are smoking, the baseline NIHSS score in case of IS, as well as locally-advanced vs metastatic PDAC and modality of treatment in case of PDAC. As described, the successful predictors identified by our study, following the statistical analyses outlined below, are called biomarkers; we wish to stress that these are only candidates for the ultimate goal of clinically validated biomarkers; in particular, they still need to be validated in further studies (based, e.g., on other cohorts). A set of biomarkers is also called a biomarker signature.
Blinding and pseudonymization
No blinding will be done during the study. However, the primary composite endpoint will be documented without subjective influence due to standardized definitions. Thus, detection bias will be kept at a minimal extent. Furthermore, information bias will be minimized as we will use simple measurements, which are applied in daily practice or are self-reported and easy to perform (e.g. EQ-5D-5L). The rigorous inclusion of all eligible patients within the recruitment period will help to minimize selection bias. All patient data are pseudonymized to all investigators except for the attending physician and study nurse. Since all major data analyses are based on known information about the outcomes (e.g., supervised machine learning with cross-validation), the data analysis will also be performed based on the pseudonymized data. Protection of personal and clinical data of all patients and controls will follow all relevant legal regulations.
Sample size
No formal sample size calculation was performed a-priori for this observational study. The prevalence of PDAC combined with the requirement to complete the study within a reasonable timeframe implied a target of 50 patients per group (PDAC, IS and control group). Nevertheless, a power analysis revealed that a sample size of 50 patients will have 80% power to detect a significant difference by a non-parametric Wilcoxon statistic between an AUC of 0.75 for a particular biomarker signature compared to the null hypothesis value of 0.5 at a significance level of 5% under the assumption that about three times as many patients will reach the primary endpoint, compared to patients who will not reach the primary endpoint92.
Data Analysis Plan
General considerations
The guiding criteria for biomarker identification in the SASKit study are the maximization of the predictive signal, clinical relevance/utility, biomedical/molecular/clinical interpretability, and practicality/cost. Given the relatively low number of participants in this in-depth study, to maximize the signal for the endpoints and predictors given as outlined above, we must aim to use all available information. Regarding endpoints, whenever possible, we thus wish to consider the (censored) time-to-event information inherent in the baseline and follow-up examinations, and in the mortality data. The primary endpoint was defined to integrate expected clinical utility and maximum signal. In defining the (secondary) endpoints, we considered an array of clinically relevant single endpoints as well as a sum-score of all phenotypic measurements; we hypothesize that the latter carries the largest amount of signal. Given the small sample, we cannot set aside an extra validation dataset. (For the predictors considered to be covariates/confounders, please see the section on “Predictors”, above.)
Data quality assessment and cleaning
The need for (and the amount of) data cleaning cannot easily be estimated beforehand; we plan to follow the MarkAGE guidelines 93 to deal with missing values, and to detect and rectify outliers and batch artefacts.
Predictor/Feature integration
Regarding predictors (features), we first need to remember that we measure at baseline (at months 0 or 3) and at one landmark (main followup, that is, at months 3 or 12). While use of baseline features is unrestricted, use of landmark features is, of course, restricted to predict outcomes after the landmark. Further, we need to handle the high dimensionality of the omics features. Here, upfront feature integration, e.g., by averaging measurements as described below, is considered preferable specifically for the high-dimensional omics data, for the following reasons.
A small feature space allows for an easier understanding and interpretation, see, e.g., 94.
Integrated features can be used as input for both the standard biostatistics and the standard machine learning parts of the analysis.
Use of few features is more time-tested than newer methods featuring the joint calculation of the prediction model and the selection of the features, albeit the latter are quite often claimed to be superior by their developers.
Naturally, feature integration avoids multicollinearity and overfitting, and multiple testing is less of an issue. This counters the “curse of dimensionality” and “de-noises” the data towards better prediction performance 94,95.
Feature integration allows the handling of feature heterogeneity, which in our case refers to routine blood measurements as well as various omics data types.
In the explorative analyses, systems biology modelling and the parallelogram approach are both supposed to deliver further small sets of integrated, highly informative features, which may, e.g., dominate systems behaviour, or which are believed to translate well from animal models to humans (see below).
While most features will be available for the baseline and the landmark time-point, utilizing baseline data is clinically more useful, simply because the prediction for the endpoint is available much earlier. Nevertheless, in the explorative analyses, we will investigate the predictive power of changes in feature measurements from baseline to landmark, given that such changes may be more informative about future disease deterioration (and other endpoints) than just baseline values.
Specific omics data feature integration
Notably, we face a heterogeneous “multi-view” dataset, usually referred to as “multi-omics”. Our feature integration approach (see above) is also known as a “late integration” type of analysis, implying that measurements for different omics data types are reduced early on to activation scores for pathways or subnetworks that are then integrated at a “late” level. To calculate the activation scores for subnetworks, we use, by default, the ExprEssence/FocusHeuristics linkscore 96,97, taking the links (gene/protein interactions) from a functional interaction network defaulting to STRING. Our experience with the linkscore motivates us to include this method as one of the approaches proposed for feature integration in the following, influencing the calculation of up to 10 features on which the standard biostatistics and machine learning shall be based. Specifically, we take the average expression measurement for all patients (as a list of expression values, one per gene) and the average for all controls (as a list of expression values, one per gene) to calculate a linkscore for each STRING interaction, and assemble a “condensed” network including all interactions with a linkscore in that percentile for which the 50 highest-scoring interactions are shown. These interactions form subnetworks. We then take the average linkscore for each subnetwork as the subnetwork activation score. Alternative methods such as keypathwayminer will be used in the exploratory analyses, see below. For the pathways (such as KEGG), we will calculate pathway activation scores using Gene Set Variation Analysis (GSVA) 98. This method calculates pathway activation scores from expression data, is suited for use with microarray as well as RNAseq data and performed strongly in a recent benchmarking analysis 99. The GSVA-based pathway activation scores can subsequently be compared between patients and controls in the same way as normal gene expression data, calculating, for each pathway, a fold-change of the pathway activation scores between patients and controls. Here, we average over all patients and over all controls, respectively, using the limma R package and adjusting for age and gender of the individual patient/control pathway activation. An example of this approach is given in the GSVA publication, where differential pathway activation was identified between acute lymphoblastic lymphoma and mixed-lineage lymphoma 98. The major downside of feature integration may be information loss; subsequent statistical and machine-learning-based analyses receive only a tiny fraction of the amount of information that is available in total.
Gene expression data (transcriptomics) will be our preferred omics data type. Nevertheless, proteins are closer to the phenotype than transcripts, so we wish to not ignore these. Therefore, we prepare to deal with both kinds of proteome data that we may expect (see “Experimental blood analyses”, above), as follows.
Large-scale data, likely based on mass spectrometry, in the order of hundreds or more proteins that can be identified and measured in all the conditions investigated differentially.
Small-scale data, likely based on antibody arrays, in the order of tens or less.
Except for the raw data preprocessing depending on the platform, once log-fold changes describing differential expression are established, we thus expect to handle the large-scale proteome data essentially the same as the transcriptomics data, and the small-scale proteome data similarly to the blood routine data, for cells and serum alike. Overall, the omics data are expected to come along three main coordinates, that is,
as blood cell transcriptomics and proteomics as well as serum proteomics;
longitudinal in time (for baseline and landmark); and
for PDAC, IS and control.
All coordinates can be exploited for differential analyses, even though the PDAC and IS data will be analyzed separately except for some integrative explorative analyses (see below). In the explorative analyses, the longitudinal transcriptomics of the patients and controls will also be analyzed together, see below. For the standard biostatistics and machine learning analyses, we plan to employ 5 approaches to feature integration, each yielding a shortlist of 5 integrated features, as follows.
(5 features) A first shortlist of features will consist of the following expert selection from the routine blood measurements (incl. PAI-1): neutrophil-lymphocyte-ratio, fibrinogen, high-sensitive C-reactive protein, albumin and PAI-1.
(5 features) For the cellular gene expression measurements, we use ExprEssence/FocusHeuristics (see above) to calculate the top-5 subnetworks scoring highest.
(5 features) Again for the cellular gene expression measurements, we use GSVA (see above) to calculate the top-5 most strongly changing pathways as features.
+ 5) (10 features)
In case of dealing with large-scale serum proteomics data, we proceed as in (2) + (3);
In case of dealing with small-scale serum proteomics data, we proceed as follows:
if the number of features measured successfully is in the order of 10, we refrain from any processing;
if the number of features is in the order of around 10-100, we select the 10 features with the smallest p-values indicating differences between the mean values of patient and control, based on a t-test.
For genomic features as per (2), the feature measurements for an individual patient or control will then be the average linkscores of the 5 selected subnetworks. For genomic features as per (3), the feature measurements for each patient/control will be the GSVA scores of the 5 selected pathways. By construction, we expect the resulting features to reflect the up/downregulation of disease-related transcripts/proteins or pathways/subnetworks. Using the GSVA-based integrated features as input to the biostatistical analyses employing Cox proportional hazard models, we are in fact closely following the “Survival analysis in ovarian carcinoma” example as described in the GSVA publication 98. Regarding the expert selection from the routine blood measurements, we are aware that some of these features may be considered to have an almost trivial relationship to outcome prediction for the diseases we study; e.g. fibrinogen may correlate strongly with the size of the stroke-damaged brain area and may thus be considered a covariate. However, to our knowledge, none of these features are validated clinical biomarkers, and it is quite possible that a combination of simple biomarkers is key to the best possible prediction. We selected the neutrophil-lymphocyte-ratio specifically because it is cheap to measure; it is, however, like many other blood-based features, easily influenced by acute infection.
Exploratory feature integration
Apart from the FocusHeuristics/ExprEssence linkscore, we employ alternatives such as keypathwayminer 100. Further, we calculate pathway activation scores for the following senescence-related KEGG pathways, which include PAI-1 (see the Introduction) but do not refer to a specific disease, as of February 2020: Cellular senescence, HIF-1 signaling pathway, p53 signaling pathway, Apelin signaling pathway, Hippo signaling pathway, Complement and coagulation cascades. “Early integration” by, e.g., first averaging transcript and protein expression on a single-gene basis, is also planned.
Choice of data analysis methods for biomarker discovery
We will consider two main approaches of data analysis, one motivated by statistical methods, the other by machine learning approaches. While this delineation may ultimately be meaningless, we consider that regression is the core ingredient of the former, while supervised learning characterizes the latter. We will apply “standard” methods (mostly in biostatistics) and explore novel approaches (mostly in machine learning; preserving signal implies a focus on supervised approaches in this case). Data analysis for biomarker discovery trials in a clinical setting is usually described with a biostatisticians’ mindset, who also developed methods to cope with the high dimensionality of omics data (see below). On the other hand, the challenges of omics data also spurred the recent publication of many methods adopting machine learning, which however did not yet make it into clinical trial analysis routine, but which we wish to test (see below). We will focus on methods readily available in SAS or as R packages. Notably, the correct choice of method depends in part on known unknowns such as the strength of the signal (incl. the amount of missing data) in the routine blood measurements and the omics.
Prediction model quality measures
Unlike intervention trials with their highly standardized aim of establishing a statistically significant superiority (or non-inferiority) of one intervention compared to another (or to standard of care), observational biomarker trials are a more recent development with fewer precisely quantified criteria of success, and a stronger need to consider the effect size: even if a biomarker signature enables a significant improvement in predicting an outcome, raising the accuracy of the prediction, say, from 70% to 75% may not be clinically meaningful, depending on prevalence of the condition to be predicted, the cost of the biomarker measurement, etc. We thus aim to identify biomarkers making a maximum of difference in prediction accuracy, if we are able to compare to established scores (see also below). For the biostatistics part, the concordance statistics (c-index) will be used as an overall measure of predictive accuracy, and time-dependent ROC curves and AUC will be used to summarize the predictive accuracy at different cut-off points in time. For the machine learning part, the cross-validated accuracy and AUC/c-index, following 94, are used, and to take care of a potential Simpson’s paradox we will either analyse the data stratified by gender, or we will add such an analysis and check for consistency. More generally, to investigate the role of confounders (and, if necessary, to correct for these) in the machine learning part, we wish to use the permutation technique described in 101. We expect that we can identify a set of biomarkers that affords an accuracy of 75% or more or an AUC of 0.75 or more in correctly predicting the primary endpoint with a precision of +/- 12% 102. This estimate of precision is based on half the width of a 95% confidence interval (CI) for a probability of 75%, by extension of Table 6 of 102, which shows precision up to a sample size of N=30.
Standard biostatistical analyses
A Cox proportional hazards regression model adjusted for age and gender will be used to estimate the hazard ratio (HR) and corresponding 95% CI to predict the primary composite endpoint separately within the PDAC cohort and IS cohort. The 5 shortlists of 5 features (see above) will be providing the canonical predictors, analyzed together. For selection of the most important features that might be related to the primary endpoint we will use a procedure proposed by Sauerbrei et al. 103, as follows. First, 100 bootstrap samples will be generated. Then, a multivariate Cox proportional hazards regression model with backward elimination with selection level of 0.05 will be fitted to each replication of the original data set. In a second step features with a relative selection frequency of 30% or less over all bootstrap samples will be eliminated. In a third step each feature Xi for which the hypothesis of independence in combination with a feature Xj can be rejected will be eliminated if Xi is less important when Xj is included in the model, or if it does not gain importance when Xj is excluded from the model. All remaining features will be included in the final model. Graphical and numerical methods will be performed to establish the validity of the proportionality assumption 104 in the final model. Results will be reported as p-values, HRs and corresponding 95%-CIs. A p-value of p ≤0.05 will be interpreted as indicating statistical significance. From the final model a risk score will be calculated by multiplying the individual feature measurement of a patient with the estimated regression coefficient of each feature. The c-index will be used as an overall measure of predictive accuracy of the resulting score, a time-dependent ROC curve and AUC will be used to summarize the predictive accuracy of the score at specific times. All secondary endpoints will be evaluated using the same approach as for the primary endpoint except for the sum-score used as a surrogate for “aging”. For this endpoint, a linear mixed effects model with random intercept and spatial power covariance structure will be fitted to the data to estimate the progression of “aging”. The covariance structure is chosen to reflect the unequal intervals of follow up investigations. Model assumptions and model fit will be checked by visual inspection of residuals, and influence diagnostics. Missing values will be taken into account by a likelihood-based approach within the framework of mixed linear models with the assumption that missing values occur at random. Results will be reported as p-value assessed at a level of significance of 5% accompanied by the value of the test statistic and degrees of freedom. In addition, 95% CIs for the progression (slope) will be provided.
Additional exploratory biostatistical analyses
Again, the primary composite endpoint as well as all secondary endpoints will be evaluated separately within the PDAC cohort and IS cohort of the respective sub-trials. In a first approach, univariate Cox proportional hazard models adjusted for age and gender will be calculated for each omics feature (R package survival) using a cut-off of 0.05 on the false discovery rate. In a second approach, all omics features will be simultaneously considered in a multivariate Cox model, adjusted for age and gender. Towards this aim, a component-wise likelihood-based boosting algorithm proposed by Binder and Schumacher 2008105 (R package CoxBoost) will be used to develop a biomarker signature.
Standard machine learning
For the machine learning part, the primary outcome and all secondary outcomes give rise to an assignment of predictor/feature lists to survival times, one such list per study participant, for which biomarkers are then learned in a supervised fashion. As described, in the standard analyses, feature integration (see above) will precede the actual calculation of the model (“deep” learning approaches that take in “all” features are part of the exploratory analyses, see below). In the same way as the standard biostatistics analyses, the same 5 shortlists of 5 features each (see above) will be providing the canonical predictors, analyzed together. Exploiting time-to-event information, we will employ random survival forests (RSF) as described by 106with the following advantages.
RSF can now be considered a time-tested approach, and it was the subject of a recent extensive review 65 and of a systematic comparison with LASSO approaches in the case without feature selection (107, see their Table 7 for its competitive performance which is not reflected in their abstract).
RSF can also work on essentially all features, without a preceding feature integration/selection step, and then be compared, in the explorative machine learning analyses described below, to survival support vector machines (SSVM) and to a novel method Path2Surv that “conjointly” performs feature selection and model training, see 94.
RSF was recently compared to Cox-nnet 108, a neural network approach which we consider as very promising for the exploratory part, see also below.
RSF offers a considerable degree of interpretability, given that RSFs are derived from decision trees.
RSF is considered “completely data driven and thus independent of model assumptions” and “in case of high dimensional data, limitations of univariate regression approaches such as overfitting, unreliable estimation of regression coefficients, inflated standard errors or convergence problems do not apply”65.
In the machine learning part, we calculate accuracy and AUC/c-index using cross-validation to make the best use of our limited sample size, following the setup of 94 and 107 (who, however, set aside separate validation datasets).
Additional exploratory machine learning
Apart from the more time-tested standard machine learning described above, we will also explore methods that were proposed recently, for which it is less straightforward to tell whether these methods are fit-for-purpose in our case, even though they are usually claimed to be superior by their developers based on some test/validation data sets. Specifically, as mentioned above, we expect to test Path2Surv and SSVM 94 as well as Cox-nnet 108 (without prior feature integration); the latter in particular promises a high degree of interpretability. We further explore CNet (employing the censored-data variant), for interpretable biomarkers. We also plan to employ the PASNet 109, SurvivalNet 110 and SVRc 70 packages. The longitudinal transcriptomics of the patients and the controls may also be analyzed integratively based on the “optimal discovery procedure” 111, considering, however, that landmark feature data can only be used to predict events after the landmark. Finally, we will map the differential omics data onto a human “healthspan pathway map” 112, that is, a set of clusters/pathways based on health-related genetic data that we assembled recently.
Explorative systems biology modelling, explorative parallelogram approach and transfer learning
As mentioned, systems biology modelling and parallelogram 113,114 extrapolation are supposed to deliver small sets of highly informative features, by contributing features that are dominating model behaviour or that are shown to translate from the SASKIt animal model data. Given the comparatively small number of study participants (but in-depth measurements), we also wish to explore “transfer learning”, which aims to utilize large amounts of public knowledge in the form of latent variables. Specifically, we plan to use, and wish to develop further, the Multiplier 115 approach motivated by the analysis of rare-disease data. Multiplier utilizes the RNASeq-based recount2 compendium, and apart from the functional network and pathway data that we use in the feature selection part, this compendium is expected to be our main source of biological knowledge that enters the calculations for biomarker discovery.
Miscellaneous exploratory approaches and discovery of diagnostic biomarkers
We will also use unsupervised machine learning to generate descriptive multi-omics correlation networks, as they were most recently employed by 116, there supplemented by linear mixed effects models using (un-)restricted maximum likelihood approaches; in this very recent biomarker discovery trial of similar design as ours, but with many more longitudinal omics measurement time-points than ours, we could not identify other biomarker discovery methods being used. If genetic data become available, we will include these in some analyses; specifically, we will investigate the added value of expression quantitative trait loci (eQTL) analyses. PDAC and IS data will be analyzed together in some integrative exploratory analyses. In that case, the occurence of specific endpoints will be evaluated according to the group membership (PDAC or IS). This means that in addition to the biomarker signature, a group variable, indicating PDAC or IS patients, will be included in the analysis, to assess the difference in the progression of the respective endpoints between PDAC and IS patients. We also wish to compare PDAC and IS patient data to data of healthy controls (adjusted for age and gender) by means of logistic regression models with the aim of identifying candidate biomarkers for the diagnosis of the respective disease; we then specifically investigate the association of these diagnostic biomarker candidates with cellular senescence and other aging-related processes (see also the next paragraph).
Further analyses, and comparison with existing biomarkers and biomarker signatures
Towards the end, we will investigate the overlap for the various biomarker identification approaches we employed, assuming that the most frequently found biomarkers may be the most robust and valid ones. Moreover, we will compare with existing biomarkers and signatures. Regarding the prediction of vascular events, we will specifically calculate the Khorana and related scores17 for comparison, and report the difference in performance. Further, for all biomarkers we find, we will check their association with cellular senescence, by manual inspection, literature investigation, comparison to CellAge117 and the SASP Atlas50 or by formal enrichment analyses if the number of biomarkers is sufficiently large to do this in a meaningful way. Also, in a final step, we plan to identify and filter out the biomarkers that are volatile in the controls. In addition, a comparison of the biomarker profiles before and after the co-morbid event is aimed for. Finally, for publicly available data of other trials with a sufficient overlap with our predictors, we will use these as validation datasets.
Discussion
Limitations
Arguably, the most serious limitation of the SASKit study is the low number of participants. We mentioned above that in the 4-year-time-frame of the entire study, at the Rostock University Medical Center we cannot expect to recruit many more than the 50 PDAC patients to be included in this study; we could recruit more stroke patients and more controls, but given the call for proposals that allowed this exploratory (not confirmatory) study to be applied for and funded, we considered that within a limited budget, in-depth omics characterization, animal models (to be detailed in a follow up publication) and a comprehensive data analysis plan including systems biology modelling were important aspects of our study that we did not want to exclude.
The two most obvious risks to the main goal of finding good biomarkers for the primary outcome based on the standard data analysis are the following. First, we found it hard to estimate the distribution of events as defined by the primary outcome; we cannot exclude that too many events take place already at the start of the study, or until the first follow-up, specifically in the PDAC subtrial, limiting the amount of information available to the subsequent time-to-event analyses. Then again, had we defined the primary outcome more conservatively, there would have been a chance that not enough events happen until the end of the study. Second, we could not identify role-model publications reporting results of biomarker explorations that made use of machine learning methods, except for, to some extent,116, so that we enter unknown territory to some degree. The two most obvious risks to our goal of investigating the role of cellular senescence in the (co-)morbidity of PDAC and IS could be an insufficient prevalence of co-morbid events, and the complex role of treatment in case of PDAC, where additional cellular senescence is most likely triggered by therapeutic intervention118. Then again, all molecular high-throughput analyses are essentially explorative and we are open to discovering biomarkers of disease that do not relate to any of our pre-specified hypotheses.
Implications
We designed the SASKit study to synergistically deliver upon a couple of aims that we consider to be of relevance for specific disease prognosis and treatment as well as for primary, secondary and tertiary prevention. Employing clinical performance measurements and patient-reported outcomes, we aim for clinical relevance and we suggest that prognostic biomarker signatures for general health and QoL are perhaps more important than (progression-free) survival, although there is much more data about the latter than the former. Moreover, good disease treatment options are still lacking for PDAC as well as for stroke, and the more we find cellular senescence implicated in disease deterioration, at least in a subgroup of patients with a specific biomarker signature, the more confidently we can suggest, and further explore, seno-therapeutic interventions for these two diseases.
Notably, we are in the process of starting a parallel human study testing, in healthy elderly people, interventions into cellular senescence, based on food rich in seno-interventional compounds, and we expect that many aspects of the study design presented herein will be adopted in that parallel study. That study will also investigate aging- and senescence-related outcomes, and as such it can be seen as a test of a cautious yet potentially very effective approach to primary prevention; if the diagnostic biomarkers we find in the SASKit study relate to cellular senescence, this observation would constitute further evidence for (cautious) seno-interventions, moving towards a kind of universal approach of disease prevention by tackling fundamental aging-related processes (see Boxes 1 and 2).
Secondary prevention, aiming to reduce the impact of a disease that has already occurred, can ultimately be supported by the SASKit study, if we can demonstrate, and (in follow up studies) confirm, a distinctive role of cellular senescence (and/or other aging-related processes such as inflammation/inflammaging119) in disease deterioration as defined here. Finally, evidence for tertiary prevention by seno-therapeutic intervention, aiming to attenuate the impact of an ongoing disease, is also an option based on how accurate, relevant and specific our biomarkers will be.
Last but not least, we expect that the in-depth molecular analyses that we wish to conduct will provide mechanistic insights into the etiology of the diseases we study here, which we just see as models for the investigation of the fundamental role of aging in general and cellular senescence in particular in disease and dysfunction.
Data Availability
N/A
Conflict of Interest
Dr. Walter reports personal fees from Ipsen Pharma, grants and personal fees from Merz Pharma, personal fees from Allergan, personal fees from Bristol-Myers Squibb, personal fees from Daiichi Sankyo, personal fees from Bayer Vital, personal fees from Boehringer Ingelheim, personal fees from Pfizer, personal fees from Thieme, and personal fees from Elsevier Press, all outside the submitted work. The other authors have nothing to disclose.
Funding
We acknowledge the financial support by the Federal Ministry of Education and Research (BMBF) of Germany for the SASKit study (FKZ 01ZX1903A). The funder had no role in the design of the study.
Footnotes
Abbreviations
- AUC
- Area Under the Curve
- BMI
- Body Mass Index
- CA19-9
- Carbohydrate Antigen
- CEA
- Carcinoembryonic antigen
- CI
- Confidence interval
- CRP
- C-reactive protein
- ECOG
- Eastern Cooperative Oncology Group
- HR
- Hazard ratio
- INR
- International normalized ratio
- IS
- Ischemic Stroke
- LDH
- Lactate dehydrogenase
- NIHSS
- NIH-Stroke Scale
- NYHA
- New York Heart Association
- PDAC
- Pancreatic Ductal Adenocarcinoma
- PS
- Performance status
- QoL
- Quality of Life
- ROC
- Receiver-Operator Characteristic
- RSF
- Random survival forests
- SASKit
- Senescence-Associated Systems diagnostics Kit for cancer and stroke
- SASP
- Senescence Associated Secretory Phenotype
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.
- 10.
- 11.↵
- 12.
- 13.
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.
- 27.↵
- 28.↵
- 29.
- 30.
- 31.
- 32.
- 33.↵
- 34.
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.
- 59.
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.↵