## ABSTRACT

Deep learning approaches can uncover complex patterns in data. In particular, variational autoencoders (VAEs) achieve this by a non-linear mapping of data into a low-dimensional latent space. Motivated by an application to psychological resilience in the Mainz Resilience Project (MARP), which features intermittent longitudinal measurements of stressors and mental health, we propose an approach for individualized, dynamic modeling in this latent space. Specifically, we utilize ordinary differential equations (ODEs) and develop a novel technique for obtaining person-specific ODE parameters even in settings with a rather small number of individuals and observations, incomplete data, and a differing number of observations per individual. This technique allows us to subsequently investigate individual reactions to stimuli, such as the mental health impact of stressors. A potentially large number of baseline characteristics can then be linked to this individual response by regularized regression, e.g., for identifying resilience factors. Thus, our new method provides a way of connecting different kinds of complex longitudinal and baseline measures via individualized, dynamic models. The promising results obtained in the exemplary resilience application indicate that our proposal for dynamic deep learning might also be more generally useful for other application domains.

## Introduction

Their currently is a renaissance of artificial intelligence techniques where, e.g., deep learning can provide human-level performance for detecting patterns in images, when trained on large data collections^{1}. Deep learning has further been successful with non-image data, where humans cannot directly detect patterns, and also for rather small sample sizes^{2}. Importantly, deep learning has recently been combined with time dynamic techniques such as ordinary differential equations (ODEs)^{3–6}, which promises both the advantages of hypothesis-driven modeling and detection of potentially complex patterns. This is why we turned to a combination of deep learning and ODEs when faced with modeling psychological resilience. We propose a novel approach for combining variational autoencoders^{7} (VAEs)—which offer latent representations—with ODEs. Specifically, each individual receives an own set of ODE parameters for obtaining individualized models of their anticipated mental health response to stressor exposure and for subsequently identifying associated predictive factors from additional baseline measurements.

Psychological resilience is the maintenance or rapid recovery of a healthy mental state during and after times of adversity^{8}. One element of the definition is that both mental health and adversity can change over time and may even do so permanently. Seminal resilience studies^{9} investigated how mental health changes in response to a single potentially-traumatic life event and identified common mental health trajectory patterns such as for resilient individuals (showing stably good or improving mental health in the months or years after the event) and more vulnerable individuals (showing stably poor or worsening mental health). These studies implicitly assume that the observed temporal changes in mental health are due to only *a single stressor event* and that individual differences in the mental health trajectory can be explained by some baseline individual characteristic. However, most individuals are *continuously exposed to more or less severe stressors*. These may include macrostressors (severe life events) but also more “mundane” microstressors, or daily hassles^{10}, which are also known to have an impact on mental health^{11,12}. Further, major life events are likely to be followed by other macro- or microstressors^{13,14}. For instance, this means that deteriorations in mental health in the aftermath of a trauma may also have other causes than the trauma. Conversely, not developing long-term mental health problems may be a consequence of only moderate stressor exposure after the trauma. In this case, a resilient trajectory would not result from some protective baseline characteristics but benign life circumstances. The necessity to consider the ongoing, more or less continuous nature of stressor exposure is even more obvious when trying to understand resilience to longer-lasting or chronic adversity, such as adverse life circumstances or life transition phases. In short, investigating resilience should ideally involve repeated longitudinal measurements of stressors and mental health^{8,9,13,15}.

This necessity of longitudinal stressor and mental health measurements poses considerable challenges to data acquisition. Usually, mental health and stressor data are acquired intermittently, and not always are measurements equally spaced. The Mainz Resilience Project^{12} (MARP) also faces these problems; self-report measurements are made using online questionnaires every three months *±* two weeks. Additionally, a relevant number of observations (13.7%) are missing and the individual time series differ considerably in the numbers of observations (on average 9.1 observations, with a *min* = 2, *max* = 14, and *sd* = 3.5 in the data set used for learning the individual trajectories). Further, the three-monthly microstressor assessment of daily hassles only covers the past week. In this empirical context—which is prototypical for many longitudinal studies—the challenge is to transform the irregularly sampled mental health and stressor exposure data into continuous time series that are amenable to dynamic analysis.

In the current application, we learn the long-term individual individual mental health and stressor trajectories in the latent space, obtained from a VAE, with a system of ODEs that allows for encoding assumptions on the interplay of stressors and mental health. The ODE system is parameterized by another neural network, which receives longitudinal summary statistics as input, which also allows for different numbers of potentially irregularly spaced observation time points. On a more abstract level, our proposed new method showcases the versatility of combining deep learning techniques with modeling based on domain knowledge and underlines the usefulness of this particular new method in an application of practical relevance.

Dynamic techniques, such as differential equations, have already been proposed for modeling resilience^{16}. Montpetit et al.^{17} presented a coupled version of the multivariate latent differential equation model^{18} for resilience, which estimates the actual values of the latent variable, as well as the slope and curvature simultaneously. However, this model does not allow for individualized parameters. Driver and Voelkle^{19} allow for individualized differential equations, but only for rather simple linear transformations for mapping from observed values to latent space, also requiring assumptions on the distribution of differential equation parameters, which may be problematic with respect to the topology in parameter space. In contrast, our proposal strives to combine the advantages of potentially complex non-linear transformations and individualized differential equations, making as little distributional assumptions as possible, as made possible by a combination of VAEs and individualized ODEs.

Naturally, there are other techniques that could potentially be used. For example, growth mixture models identify average starting levels and trajectories and allow for—normally distributed—deviation from the overall intercept and slope. However, these require an initial dimension reduction step (e.g., calculating a sum score or a factor solution), which is not informed by the longitudinal model and frequently based on rather simple linear transformations. By contrast, our method is designed to solve the problem of dimension reduction and trajectory modeling iteratively. Latent class mixture modeling has also been employed in resilience research^{9} to identify subgroups in the mental health response to trauma, to thereby derive categorical prediction targets (i.e., class membership); we expand on this by providing a set of continuous prediction targets. For enforcing smoothness across time, convolutional inference nets have been proposed, which also can find the lower-dimensional representations and connect them via Gaussian processes^{20}. Yet, this does not provide an explicit model, i.e., cannot be used for determining the anticipated stress response.

A brief sketch of the algorithm is provided next. After this, we show how these individualized models can be linked to baseline measurements for identifying resilience factors, before concluding with a discussion and outlook. In the methods section, we provide more technical details of the proposed method, in particular on how the challenging estimation problem is addressed, and describe the MARP study on psychological resilience, which motivated methods development and is used for illustration. We also provide a Jupyter Notebook on GitLab that shows how our approach can recover structure from exemplary simulated data, and can be used as a starting point for adapting the approach to other applications.

## Results

### An overview of the algorithm, data structure, and training procedure

Figure 1 shows how the data flows through the algorithm. We map observations into a latent space using VAEs^{7}. Specifically, separate VAE encoder networks, for mental health (mh; 28 General Health Questionnaire^{21} items, where higher values indicate more severe mental health problems) and stressor load (sl; the MIMIS battery of 58 daily hassles^{22}) respectively, reduce the number of dimensions to one for each type of measurement (Figure 1a). The encoder weights, i.e., the parameters that determine the non-linear mapping, and the decoder weights, which specify the mapping from the latent space back to the level of original measurements, are trained by minimizing the evidence lower bound^{7} based on a Poisson log-likelihood to reflect the count character of the data.

The mapping of the observed values in the latent space is performed for each individual and time point where measurements are available, allowing to get a first impression of temporal trajectories by plotting the latent values along time (Figure 1b). Subsequently, each individual time course is aggregated into several summary statistics, again separately for the latent mental health and stressor values. These summary statistics, e.g., include the integral over an interpolated step function and the difference of the first and last values (see Table 1 for a full list), and are chosen such that they do not require fixed observation time points or the same number of measurements for all individuals. These summary statistics then serve as inputs to a feed-forward ODEnet (Figure 1c), which provides parameters for an ODE system. Thus, optimal combinations and transformations of the summary statistics are determined empirically. For the resilience application, we chose a rather simple ODE system, to reduce complexity when faced with a limited number of individuals and observations. Specifically, the parameters of the ODEnet are trained to minimize the squared distance between the solution of the ODE system and the latent values at the point in time when measurements were obtained from the individuals. We allow the stressor time course to additionally incorporate abrupt external inputs (i.e., sudden changes in the life situation). Accordingly, new observations can override the current state obtained from the ODE. For this purpose, the ODE solver is stopped at the measurement time points, and the latent values for stressors (but not for mental health) are updated. Subsequently, the individualized solutions of the ODE systems are visualized together with the latent values (Figure 1d).

### ODE system to model resilience trajectories in the latent space

Figure 1d) shows an exemplary solution of an individualized ODE system, which connects expected values of the distributions in the latent space. We use an ODE system that couples the latent representations of mental health and stressor load. The exact design of such an ODE system is a crucial modeling decision since it governs how each component changes and, accordingly, requires domain expertise. Given our sample size and sampling frequency, we work with a rather simple ODE system to keep the number of estimated ODE parameters *η _{i}* small. Specifically, we use the system
where changes in , the latent value reflecting mental health problems, and , the latent value reflecting stressors load, are driven by the current own value. Additionally, mental health changes in response to stressor load. Separate parameter values

*η*

_{i,}_{1}

*− η*

_{i,}_{3}for each individual

*i*allow for individualized trajectories of mental health and stressors.

Negative values for *η _{i,}*

_{1}and

*η*

_{i,}_{3}effectively realize system-inherent damping

^{16}, where high mental health and stressor values, respectively, are more quickly driven back to low latent mental health and stressor values. Thus, a high negative

*η*

_{i,}_{1}in particular reflects good recovery from mental health problems (since the majority of observations are mapped into the positive valued latent space or close to zero, see Figure 1c). Positive values for

*η*

_{i,}_{2}realize the adverse effect of stressors on mental health. A low positive

*η*

_{i,}_{2}value thus reflects a low mental health impact of stressors in the individual.

At each realized measurement, the value of the integrator of the latent stressor load is updated to the mapping of the actually observed value . This reflects the obvious notion that stressor levels are only partly driven by an endogenous property of the ODE system (i.e., damping) but mainly reflect exogenous forces, that is, the sudden occurrence or absence of stressors that lead to abrupt changes in the latent stressor values (see above).

The benefits of such an ODE system in comparison to discrete-time models like regression are crucial for analyzing the data from the MARP study at hand. Most importantly, differential equations take all available information at the precise time into account. Thereby, irregular sampling intervals and entirely (or partly) missing observations are dealt with straightforwardly.

### Person-specific ODE parameters

An additional advantage of our approach is that the ODE parameters for every respondent are estimated in an unrestricted manner, i.e., without explicit constraints regarding their distribution, direction, or strength. This is accomplished by estimating the parameters of the ODE system *η _{i,}*

_{1}to

*η*

_{i,}_{3}with the ODEnet—a feed-forward neural network—with 16 summary statistics as inputs (see Figure 1c and Table 1 for details). While we are using linear ODE system for the specific resilience application, i.e., also an analytical solution exists, the proposed approaches enables more complicated non-linear systems. Importantly, the summary statistics are selected with data availability in mind. Since some MARP participants just started recently to provide data, all summary statistics need to be computable with only two observations of mental health and stressors (not necessarily at the same time), which is the minimum requirement to be included in the longitudinal analysis.

The ODEnet provides the individual parameters of the ODE system as outputs. Accordingly, the overall structure of how mental health and stressors change is the same for all respondents and specified by the ODE system. Yet, the values of the summary statistics—included in the ODEnet—differ from person to person. Therefore, each individual receives an own variant of the ODE system. The ODEnet is trained to minimize the squared difference between the ODE solutions and and the latent values and , using a standard neural network backpropagation algorithm. In contrast to previous methods (e.g., growth mixture models), the individual parameters do not obey any predefined distribution; they can be estimated in the absence of such restrictions by optimizing the fit to the data.

### Quantification and prediction of resilience

In addition to longitudinal measurements of mental health and stressor load, the MARP study incorporates a large battery of additional baseline measurements for identifying potential resilience factors, i.e., characteristics of individuals that might predict reduced reactivity of mental health to stressor load. For identifying such resilience factors, we can either predict the individual parameters *η _{i}* directly or apply an artificial stress test to each individual (see Figure 2). This artificial stress test means that the latent mental health value of an individual is initialized to an average value, and the latent stressor load value is set to an initial value that corresponds to a high stressor level. The stress reaction pattern of each individual then is characterized by how the latent values develop according to the individualized ODE system, if no further external input is provided. Predicting each of the ODE parameters separately allows for investigating the different dimensions of resilience in our model (i.e., recovery of mental health (

*η*

_{i,}_{1}) and reactivity of mental health to stressors (

*η*

_{i,}_{2})) independently. In contrast, the artificial stress test considers all parameters jointly, to obtain a holistic picture of resilience. This also serves as a diagnostic tool to investigate whether the training of the algorithm succeeded. We choose four points in time (after 5, 9, 15, and 20 months) and predict the individual mental health values with measurements from the baseline battery, covering proteomics, behavioral measures, and questionnaires.

Specifically, we use the lasso, a regularized regression approach that provides variable selection^{23}. The amount of regularization is chosen according to prediction performance, specifically by 6-fold cross-validation. For a robust result, this procedure is repeated 1000 times on resampling data sets, obtained by sampling with replacement, resulting in inclusion frequencies^{24}. Thus, a baseline measurement would receive inclusion frequency equal to or close to zero, if there would be no connection between the baseline measurement and our proposed quantification of resilience by the stress test. Quite in contrast, we find rather strong relations between our derived quantification of resilience and some baseline measurement, which could thus be considered to be potential resilience factors (see Figure 3). Since the present paper has its focus on the proposed new method, and not on resilience research, we do not report specific selected variables and refrain from further interpretation.

## Discussion

We combine deep generative learning with a system of ODEs to obtain individualized dynamic models, and illustrate this novel approach in an application on psychological resilience. Our proposal methods can handle problems typical for real-world applications, specifically irregular measurement time points and limited sample size. In particular, the augmentation of neural networks with ODEs allows us to incorporate domain expertise, which can dramatically reduce the sample size requirements of deep learning^{6}. An important ingredient of this approach is differential programming, i.e., automatic differentiation in complex models, which allows for tailoring algorithms to particular needs, such as combining neural networks with ODEs. This framework is more specifically used in our proposed method to allow the integrator of the ODE solver to jump to updates of the latent stressor load values.

We show that this flexible, non-linear, and versatile combination of scientific modeling and machine learning can be utilized to quantify resilience and identify resilience factors, and agree with others^{3,6}that neural differential equations promise considerable opportunities for this particular field and other disciplines working with longitudinal data.

From a subject matter point of view, the most substantial benefit of our approach are the resulting individualized dynamic models. Building upon the overall structure of stress research, according to which mental health problems change due to stressor load, and incorporate the main lesson from resilience research that there are significant individual differences in how people deal with stressors and recover after stressful episodes. In fact, these individual differences are what resilience research is investigating. Instead of reporting population-level prevalence (like, e.g., what percentage of people can be labeled as resilient) we are interested in modeling the mental health trajectory of each person in response to his or her stressor load. This perspective on each individual also addresses the needs of health professionals who are interested in personalized predictions tailored to their particular client, strongly informed by his or her unique history. Hence, we are interested in the most flexible approach that allows for capturing personal nuances best. Compared to this, we argue that it is of secondary interest (at most) to assign people to categorical labels like resilient and vulnerable or force individual deviations from the grand mean to be aligned with predefined distributions. In this regard, we argue that the continuous individual differences—expressed here as ODE parameters and harnessed with an artificial stress test—are meaningful quantifications of resilience and help to identify resilience factors from large baseline batteries.

A limitation of the proposed model is the simplicity of the ODE system. There are valid theoretical reasons to assume that resilience might be better captured using an additional second-order derivative (i.e., acceleration) of mental health problems. While our first-order ODE system was a deliberative decision in consideration of the small number of observations and individuals (for dynamic modeling and deep learning standards at least), future research might experiment with more complicated systems of differential equations. Also, our approach assumes stationarity and accordingly neglects the possibility of long-term learning and plasticity (i.e., temporal changes in resilience). While we could extend our model to weaken this assumption, our main focus in this work is on the identification of resilience factors based on all available longitudinal measures. Future work will investigate repeated baseline observations and time-dependent predictors. Yet, already the individualized dynamic models enabled by our new method provide an important building block for better understanding psychological resilience. Therefore, we are confident that other applications that required individualized dynamic models might benefit from our approach.

## Methods

### Dimensionality reduction per time point with a VAE for count data

We choose variational autoencoders (VAEs) to find a lower-dimensional latent representation because they provide a flexible framework with numerous potential extensions and applications. VAEs comprise a recognition model and a generative model. The purpose of training the recognition model (aka inference model or encoder) is to find the variational parameters *φ* of a neural network to approximate the posterior distribution of the latent variable *z* given the inputs *x*, i.e., *q _{φ}*(

*z|x*). The parameters of the generative model (aka decoder)

*θ*are trained to increase the log-likelihood of the inputs given random samples from the posterior distributions. To train the recognition and generative model simultaneously, we maximize the evidence lower bound (ELBO) of the marginal log-likelihood log where the first term of the right hand is the expectation of the log-likelihood of

*x*given

*z*with respect to

*q*(

*z|x*). The Kullback-Leibler divergence (

*D*) penalizes deviations of the posterior from the prior. For computation, we plug in the Poisson log-likelihood and the closed form of

_{KL}*D*for a Gaussian prior and posterior. Thereby, our training objective becomes where

_{KL}*µ*and

_{i}*σ*are the mean and standard deviation of the observation and

_{i}*λ*is the expected value of the Poisson distribution. The Poisson distribution does not perfectly fit the data since we found moderate levels of overdispersion for some of our items. Yet, the Poisson distribution avoids an additional parameter, which might be difficult to determine, by coupling expected value and dispersion. We used

*J*nodes and

*tanh*activation functions in the middle layers. In the final layer, we used a

*ReLU*activation function to strictly pass non-negative values to

*λ*. All neural networks were trained with Flux.jl

^{25}and the Adam

^{26}optimizer with standard decay rates (

*β*

_{1}= 0.9 and

*β*

_{2}= 0.999).

### Individual ODE parameter estimation with the ODEnet

The parameters of the ODEnet *τ* were trained to minimize squared difference of the trajectory at the precise point in time *t* and the mean of the latent space distribution *µ _{zi,t}*.

The ODEnet is a separate feed-forward neural network with two layers and 16 inputs; the mid-layer and end-layer have three nodes (due to data size considerations). As inputs, we use a mixture of integrals, first and last observations, differences, and autocorrelations (see Table 1).

We used a piecewise constant step function to approximate the integral of the individual trajectories defined as *s _{i}* = where

*c*is the value of or on the interval

_{m}*I*= (

_{m}*t*

_{i−}_{1}

*,t*) and

_{i}*l*(

*I*) the length (

_{m}*t*

_{i}−t_{i−}_{1}) of this interval. We also included . It was left to gradient descent/backpropagation to find the best combination of these inputs to minimize

*ℒ*(

_{ODE}*τ,η*) given the value of the ODE at a particular point in time

_{i}*f*(

_{zi}*t*) which, in turn, depends on the individual set of ODE parameters

_{i}*η*.

_{i}We use a *ReLU* activation function in the middle layer and no transformation in the final layer. The choice of inputs is restricted by the minimum requirement of two observed values for each group of variables at any point in time. We scale the inputs to ensure approximately equal numerical size. For reasons explained above, we update the integrator at every realized measure of . Such steps are implemented in DifferentialEquations.jl^{27} and differentiable through neural nets via DiffEqFlux.jl^{28}. All solved ODE systems start the the first measurement *µ _{zi,}*

_{1}. The ODEnet was trained with the Adam optimizer

^{26}and standard decay rates (

*β*

_{1}= 0.9 and

*β*

_{2}= 0.999). To deal with unit non-response,

*ℒ*(

*θ,φ*;

*x*) and

*ℒ*(

_{ODE}*τ,η*) are only evaluated at actual measurement time points.

_{i}### Training

We experimented with different training strategies. The best results are obtained with a combination of separate pretraining of the VAE and ODEnet. Subsequently, we also compared joint training of all involved components vs. a sequential strategy where we trained the VAE components separate from the ODEnet in one epoch. This leads to further decreases in the loss components. In this analysis, we pretrain both VAEs with a learning rate *α* = 5^{−5} for 80 epochs. Then, we pretrain the ODEnet separately for 50 epochs with *α* = 1^{−6}. Then, we sequentially train both nets in the same epoch with *α* = 1^{−6} for 100 epochs.

### MARP Data

The Mainz Resilience Project (MARP) is an ongoing study that started in 2016 with a planned study duration per participant of seven years and is conducted by the University Medical Center Mainz and the Leibniz Institute for Resilience Research^{12}. All available observations (*n _{sl}* = 1.233 and

*n*= 1.196) were used to train the VAEs. To be included in the longitudinal analysis, at least two observations of mental health and stressor load—at any point in time within the observation window of 3.5 years—were necessary. This basic requirement was met by

_{mh}*N*= 166 participants. Four respondents were excluded due to unusual patterns. In this longitudinal data set, on average 9.1 observations (

*min*= 2,

*max*= 14,

*sd*= 3.5) per person are available. The majority of the longitudinal measures are gathered via an online assessment roughly every three months. Besides the request to provide data on their mental health and stressors (daily hassles and life events, of which only daily hassles are used here), the participants went through an extensive baseline examination, which includes in total

*p*= 350 neuroimaging, behavioral, proteomics, and survey data variables. One hundred fifty-five participants met the longitudinal requirements and had a complete baseline battery; hence, they were included in the lasso analysis (see Figure 2). Participants were recruited in a critical life phase aged 18 to 20 at study inclusion with a prehistory of critical life events.

## Data Availability

Data cannot be made publically available at the moment.

## Author contributions statement

H.B. and G.K. developed the idea, constructed and tested the models, and wrote the paper. All authors contributed through discussions and reviewed and approved the manuscript.

## Additional information

Competing interests: R.K. receives advisory honoraria from JoyVentures, Herzlia, Israel. We have no conflict of interest to declare.

## Acknowledgements

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 777084 (G.K., S.P., H.E., O.T., R.K.) and the State of Rhineland-Palatinate, Germany (MARP program, DRZ program, Leibniz Institute for Resilience Research; M.K., A.Sch., M.W., O.T., R.K.).