A Deep Learning Based Cardiac Cine Segmentation Framework for Clinicians - Transfer Learning Application to 7T

Background Artificial neural networks have shown promising performance in automatic segmentation of cardiac magnetic resonance imaging. However, initial training of such networks requires large amounts of annotated data and generalization to different vendors, field strengths, sequence parameters, and pathologies is often limited. Transfer learning has been proposed to address this challenge, but specific recommendations on the type and amount of data required is lacking. In this study we aim to assess data requirements for transfer learning to cardiac 7T in humans where the segmentation task can be challenging. In addition, we provide guidelines, tools, and annotated data to enable transfer learning approaches of other researchers and clinicians. Methods A publicly available model for bi-ventricular segmentation is used to annotate a publicly available data set. This labelled data set is subsequently used to train a neural network for segmentation of left ventricular and myocardial contours in cardiac cine MRI. The network is used as starting point for transfer learning to the segmentation task on 7T cine data of healthy volunteers (n=22, 7873 images). Structured and random data subsets of different sizes were used to systematically assess data requirements for successful transfer learning. Results Inconsistencies in the publically available data set were corrected, labels created, and a neural network trained. On 7T cardiac cine images the initial model achieved DICELV=0.835 and DICEMY=0.670. Transfer learning using 7T cine data and ImageNet weight initialization significantly (p<10-3) improved model performance to DICELV=0.900 and DICEMY=0.791. Using only end-systolic and end-diastolic images reduced training data by 90%, with no negative impact on segmentation performance (DICELV=0.908, DICEMY=0.805). Conclusions This work demonstrates the benefits of transfer learning for cardiac cine image segmentation on a quantitative basis. We also make data, models and code publicly available, while providing practical guidelines for researchers planning transfer learning projects in cardiac MRI.


62
Image segmentation, which is of great interest in cardiac magnetic resonance imaging is applied 63 to partition acquired images into functionally meaningful regions, allowing the extraction of 64 quantitative static measures such as myocardial mass, left ventricle (LV) volume, right ventricle 65 (RV) volume, and wall thickness, as well as dynamic measures such as wall motion and the 66 ejection fraction (EF). Cardiac cine MRI is the accepted gold standard for this assessment of 67 cardiac function 1 and anatomy and is therefore of paramount clinical importance 2,3 . Proper 68 segmentation of such data sets is a tedious and time-consuming process that has increasingly 69 been tackled using various deep learning approaches [4][5][6][7] . intelligence. This led to ever increasing applications in medical imaging such as MRI 8 where 76 tasks nowadays range from data acquisition and image reconstruction 9-11 , image restoration 12,13 , 77 to image registration 14,15 , segmentation [16][17][18][19] as well as classification 20,21 and outcome 78 prediction 22,23 . 79 There is consensus in the field that the limited availability of labelled or annotated data due to 80 data access, privacy issues, missing data harmonization, and data protection is one of the main 81 obstacles for future clinical applications of deep neural networks 17,19,24 . While some resources 82 like the UK Biobank 25 already exist to address this issue, the high quality standards and the 83 amount of work required to organize and maintain such a resource makes data access expensive. 84 In addition, such data may already exceed the quality that is available in clinical routine cardiac 85 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10.1101/2020 MRI. This leads to neural networks, which perform very well for a very specific task within a 86 confined data space, where training and testing data share the same distribution. However, these 87 networks usually lack generalization capabilities. While methods such as data augmentation, 88 transfer learning, weakly-, self-supervised, and unsupervised learning have been applied to 89 overcome the issue of small datasets in research, it is unclear how much data is really required 90 in order to create a well-generalizing network or to apply transfer learning. 91 In this work, we aim to enable researchers and clinicians in cardiology to apply deep learning-92 based segmentation models in their respective research by providing guidelines and easily 93 accessible tools as well as annotated data for transfer learning. We create labels for a public 94 data set, the Data Science Bowl Cardiac Challenge Data 26 (further referred to as Kaggle data 95 set) which, at this point, does not have segmentation labels. We further create a base network 96 for LV segmentation using these labels and evaluate its performance on 7T human cine data. In 97 addition, we assess if transfer learning improves model performance for the 7T segmentation 98 task and analyze how much and which data is required. The framework provided in this study 99 in combination with access to scripts and the data used, will enable researchers to reproduce 100 our results and apply deep learning based segmentation in their respective field. 101 102 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10.1101/2020

103
The Kaggle Data Set 104 As mentioned above, cardiac MRI is the gold standard for the assessment of cardiac function, 105 a key indicator of cardiac disease. The 2015 Data Science Bowl challenged participants to 106 create an algorithm for automatic assessment of end-systolic and end-diastolic volumes (ESV 107 and EDV) and thus, ejection fraction, based on cardiac cine MRI. The data set consists of a 108 training, a validation, and a test set and once the challenge has ended, all sets and their 109 corresponding volume information (end-systolic and end-diastolic) was made available for 110 research and academic pursuits, leading to a total of 1140 "annotated" cardiac MRI 111 examinations of normal and abnormal cardiac function. Images are in DICOM format resolving 112 up to 30 phases of the cardiac cycle. While we will focus on short axis images in this study, the 113 Kaggle data set also contains alternative views. Examinations were done on 1.5 T and 3.0 T 114 systems (Siemens Magnetom Aera and Skyra, Siemens Healthineers, Erlangen, Germany) with 115 applications of both FLASH and TrueFISP sequences. An overview of the complete data set 116 and its variation in patient data and sequence parameters is given in Table 1.  117   118   119 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 17, 2020. given in the online repository. Training with a weight-decay of 0.02 and a batch-size of 32 was 158 done for 30 epochs with frozen weights (lr = 1e -4 ) and another 30 epochs with unfrozen weights 159 (lr = 1e -5 ). Details regarding frozen and unfrozen weights are provided in the online repository. 160 The smallest training set (p5) was used initially, image size was 256x256, and moderate data 161 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10. 1101/2020 In the first step we evaluated the influence of the architecture (VGG16, ResNet34, ResNet50) 168 compared to the fully convolutional Network by Bai et al 4 . trained on UKBB data (further 169 referred to as UKBB model). Due to memory limitations, we had to reduce the batch size for 170 training of the VGG16 and the ResNet50 models. 171 In the second step, we assessed variations in the loss function such as cross-entropy (default), 172 generalized DICE 34 , and focal loss. In the third and last step we evaluated the influence of the 173 number of training images using the confidence sets p5, p10, and p15. 174 We assessed the influence of training data resolution, training a model with lower input 175 resolution (128x128, r34_CE_p5_128 All assessments regarding transfer learning to 7T data are done using model: r34_CE_p5_s2. 191 As initial point of comparison we used the UKBB model to create labels for 7T data, in order 192 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10. 1101/2020 to assess generalization capability of a model, which was trained on a very homogeneous data 193 set (UKBB). 194 Following approval of the local ethics committee (7/17-SC), n=22 (14 female, 8 male) were 195 examined using a 7T whole body MRI system (Siemens MAGNETOM Terra,Erlangen,196 Germany) and a 1TX/16RX thorax coil (MRI Tools, Berlin, Germany) 35 . Written informed 197 consent was obtained prior to all measurements. Patient age was 22-53 years, body weight 52-198 95 kg, and height: 151-185cm. For triggering, both the integrated ECG and an external acoustic 199 triggering system (MRI Tools, Berlin, Germany) were used in order to synchronize 200 measurements with the heartbeat, choosing whichever method provided a more stable trigger 201 signal during the examination. Images were obtained using a cardiovascular (CV) GRE cine-202 sequence. Sequence parameters were: TE = 3.57ms, FOV = 340 mm x 320 mm, interpolated 203 voxel size = 0.66 x 0.66 x 6 mm, GRAPPA acceleration factors: R = 2 and R = 3. Depending 204 on the heart rate 6-11 segments and 20-35 cardiac phases were measured using retrospective 205 gating. Short axis CINE stacks for volumetric evaluation varied in the number of slices (14-17) 206 and multiple breath-holds (~13s) were necessary to acquire the whole stack. Images were 207 assigned into training, validation and test sets (14, 5, 3 subjects and 5076, 1842, 955 images, 208 respectively). All images were manually segmented by an expert radiologist (TR). Three data 209 sets of the test set were additionally segmented by an expert cardiologist (WS), in order to 210 obtain an estimate of interobserver-variability. 211 212

Starting Point for Model Training -7T Human 213
To assess the efficacy of transfer learning for LV segmentation based on clinical 1.5T and 3T 214 data and experimental (human) 7T data, we compare models with varying degrees of training 215 and transfer learning. Using a U-Net architecture with a ResNet34 backbone (r34_CE_p5_s2), 216 we generated the following three models: 217 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Data Requirements for Model Training -7T Human 225
To assess how much and what data is required for convergence of a model we trained all models 226 (R, TL, TL 2 ) with subsets of the training data. These subsets were created in two ways: 227 1) Complete subject data (all slices and all phases) from 14, 7, 3, 1 subjects (5076, 2626, 228 1001, 306 images, respectively); Partial subject data (only end-systolic and end-diastolic 229 images) from all subjects (448 images)  230 2) Shuffle all images once, create a list of images (1-5076), and generate subsets 231 corresponding to the respective image numbers from subset 1, always starting the count 232 with image #1 233 When training with subsets, the model is exposed to a smaller number of images in every epoch. 234 We therefore increased the number of epochs for the subsets to correct for this effect. 235 236 237 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Parameter Search 240
Results of the parameter search are illustrated in Figure 1, showing the absolute distance 241 between the EF predictions based on model segmentation and ground truth data provided by 242 Kaggle. Overall the impact of parameter variation on model performance was small (3.64-243 4.06% mean distance to ground truth EF). 244 In a first approach to interpret these results, we compared varying architectures, such as 245 ResNet34, ResNet50, and VGG16 with the UKBB model ( Figure 1A). All models led to lower 246 mean and median distance values compared to the UKBB model (table 2). The lowest median 247 distance values were found using a ResNet50 (2.79%), while the lowest mean distance values 248 were found using a ResNet34 (3.64%). Differences in the absolute distance between the models 249 (r34, r50) were rather small (Δ0.08%), however. Considering computational demand, we 250 selected the ResNet34. 251 In the next step of the parameter search we evaluated model performance using varying loss 252 functions, namely cross-entropy, generalized DICE, and focal loss ( Figure 1B). Using the 253 generalized DICE score led to the highest mean (3.93%) and median (3.07%) distance values. 254 Median distance values were similar for cross-entropy and focal loss (2.87% vs 2.86%), while 255 the mean distance value was lowest using cross-entropy (3.64%). 256 We thus selected cross-entropy for the next step of the parameter search, where we evaluated 257 model performance using varying confidence sets: 5%, 10%, 15% ( Figure 1C). Using the 258 various confidence sets only slightly affected median distance values (2.87%, 2.89%, 2.91%). 259 Based on EF predictions the model: r34_CE_p5_s1 performed best achieving a mean distance 260 value of 3.64%. 261 262 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10. 1101/2020 volunteers, while the subsets consist of 7, 3, and 1 volunteer. For the most part, curves follow 312 the trend described for the full data set, while each reduction in volunteers led to lower starting 313 points. Peak performances remain similar with a reduction to 7 volunteers, but drop using subset 314 n3, in particular for models R and TL. Only for a very small number of training images (n1)  For small subsets, such as n3 and n1, starting points as well as peak performances of all models 318 is higher using the random selection of training images instead of all images from a set (3/1) of 319 volunteers. The same trend is shown for the set n7 using models R and TL. 320 Using only end-systolic and end-diastolic images led to similar convergence speed and peak 321 performance regarding DICE scores compared to the full data set ( CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

329
In this study, we successfully used a specialized, publicly available model 4 to produce labels 330 for a public data set of clinical 1.5 and 3T cardiac cine MRI, enabling access to more annotated 331 data. Based on these labels we created a basic AI model, other researchers can use for their 332 individual segmentation tasks. In addition, we applied transfer learning to segmentation of 7T 333 human cine data, demonstrating that models based on these labels and a moderate amount of 334 new domain data enable state-of-the-art segmentation results. 335 One of the obstacles to get started in deep learning based segmentation is the large amount of 336 annotated data required to train an initial model. In this study we circumvent this problem by 337 using the public Kaggle data set, to which we provide labels. The quality of these labels was 338 evaluated using the volume information (end-systolic and end-diastolic volumes) included in 339 the original Kaggle data set. Therefore, careful data curation had to be applied to avoid data 340 inconsistencies (slice spacing, changes in image dimensions and image resolution, as well as 341 missing slices) within individual patients. In addition, we found that label quality was connected 342 to image orientation and image resolution. Scores (mean distance between labels and Kaggle 343 "ground truth"), data curation scripts, as well as labels are provided in the online repository, 344 enabling future use in other studies. We want to point out that label quality and accuracy was 345 assessed via comparison to volume information only, with rare exceptions of visual 346 confirmation. Thresholds of 5%, 10%, and 15% (deviation to the "ground truth") for the subsets 347 used in this study were chosen arbitrarily. With 54540, 162480, and 239350 images 348 respectively, we assumed these three sets to provide the reasonable compromise between label 349 accuracy and label quantity needed to assess data requirements in this specific transfer learning 350 application. 351 Based on the now annotated data we trained initial segmentation models with varying 352 architectures (ResNet34, ResNet50, VGG16), varying loss functions (cross-entropy, 353 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10.1101/2020.06.15.20131656 doi: medRxiv preprint generalized DICE, focal loss), varying training sets (p5, p10, p15). The final model we selected 354 was a ResNet34, using cross-entropy as a loss function, and the p5 set for training with an image 355 resolution of 256x256. While we selected this model based on performance (mean distance to 356 ground truth EF), overall impacts of parameter variations (3.64-4.06% mean distance to ground 357 truth EF) were rather small. Similar to the use in this study, researchers or clinicians can use 358 this model as a starting point for their respective transfer learning applications. 359 Considering the performance of this model on 7T human cine data (DICELV: 0.84, DICEMY: 360 0.67), generalization capability appears limited. This is also true for the UKBB model (7T 361 human cine, DICELV: 0.67, DICEMY: 0.52). As the authors 4 point out, the UKBB model was 362 "trained on a single data set, the UK Biobank dataset, which is a relatively homogenous dataset" 363 and might therefore "not generalize well to other vendor or sequence datasets". With respect to 364 the performance on 7T data this just means that, compared to the UKBB dataset, the Kaggle 365 data set contains image patterns and characteristics more similar to the 7T data we acquired. In 366 addition, it emphasizes why improvements in generalization 37-39 are needed and why we applied 367 an additional step of transfer learning to 7T data. 368 Due to differences in training data our initial models based on UKBB labels outperformed the 369 UKBB model on the Kaggle data. While the UKBB model was trained on the homogeneous 370 UKBB data, our models were trained on the heterogeneous Kaggle data itself. In addition, we 371 applied data augmentation with respect to rotations and contrast and used only Kaggle data with 372 the most accurate (top 15%) labels. 373 While multiple studies 4,5,26,40 have demonstrated great image segmentation results for one 374 specific dataset, these models have not been tested on other datasets or initially lack 375 generalization capability. In this study, we show that transfer learning leads to improved model 376 performance. DICE scores achieved on 7T human cine data prior to and after transfer learning 377 were DICELV: 0.84, DICEMY: 0.67 and DICELV: 0.92, DICEMY: 0.81, respectively. This was 378 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10.1101/2020.06.15.20131656 doi: medRxiv preprint comparable to human inter-observer variability (DICELV: 0.94 and DICEMY: 0.81) and is within 379 the range of state-of-the-art results, despite the relatively small set of training data 19 . In addition, 380 inter-observer-variability in EDV (3.5%) and ESV (10.5%) between our model and the expert 381 radiologist are in good agreement with literature reports (EDV: 2.5-5.3%, ESV: 6.8-13.9%) 41 382 based on SSFP CMR imaging. 383 Typically, segmentation of the left ventricle is done to evaluate ejection fraction, a clinically 384 used parameter. In this study we show that the model based volume prediction on the test set is 385 very accurate for apical, mid-cavity and basal slices, with the exception of the most basal slice, 386 where myocardial tissue moves in and out of plane throughout the cardiac cycle. Since we do 387 not have a "ground-truth" segmentation for the Kaggle data and no information on labelling 388 protocols, we do not know if there is any consistency in the definition of basal slices or the 389 inclusion or exclusion of papillary muscle. 390 While transfer learning allows models to adapt to similar tasks and new datasets, containing 391 new characteristics and patterns, this step also requires new labels. This aspect is often a 392 limitation, since labelled medical data is difficult to acquire, particularly in areas that require 393 domain-specific knowledge. In addition, the manual labelling process for high quality 394 segmentations itself is often tedious and labor intensive. In this study we show that transfer 395 learning applications (ImageNet weights to Kaggle data to 7T data) for cardiac cine 396 For small training datasets (n≤1001) we show that a random selection of images from multiple 402 volunteers leads to better performance compared to the selection of all images from a smaller 403 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10.1101/2020 number of volunteers (n=3 or n=1, figure 6). Generalization capabilities of a model increase 404 with the amount of variation provided in the training data and thus using data from a multitude 405 of patients or volunteers, where morphology and therefore image content and contrast differ, 406 may be more beneficial than providing the same number of more coherent images from a small 407 number of volunteers. Furthermore we demonstrate that the number of required images can 408 drastically be reduced (from 5076 to 448 images), using labelled data from specific heart 409 phases, end-diastolic and end-systolic, instead of all images. This may be possible, because In summary, how much and which kind of data should be included in the transfer learning 417 process should be carefully considered prior to labelling new data. In particular, the notion to 418 provide data patient by patient may result in higher data requirements than necessary. There are 419 various other routine cardiac MR examinations such as T2, T1, LGE, and even T2 * that require 420 segmentation 38,39,42 . Transfer learning applications to image segmentation of such varying 421 contrasts may benefit from the amount of annotated data and the framework provided in this 422 study. 423 With respect to future use of this annotated data we recommend researchers take the following 424 steps: 425 1) use the pre-trained model we provide (r34_CE_p5_s2) 426 2) re-train with training data from the new domain and tune hyper parameters using 427 validation data from the new domain 428 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10. 1101/2020

3) evaluate model performance on a test set from the new domain 429
In this study, we used only the 5-15% of the most accurate kaggle labels to create our base 430 models. Thus, researchers attempting to train their own base network using the labelled Kaggle 431 data should always assess label quality. 432 There are some limitations connected to the use of the Kaggle dataset. While there are variations 433 in measurement parameters, such as resolution, FOV, matrix size, TE, TR, bandwidth, and slice 434 thickness, most examinations (~90%) were done at 1.5T. In addition, all data was acquired 435 using Siemens whole body MRI systems. Models trained using this dataset might thus not 436 generalize well to other vendor datasets, requiring transfer learning as demonstrated in this 437

study. 438
Since no disease-related information is provided in the Kaggle dataset, we have no knowledge 439 which and how many pathological patterns are currently represented in the dataset. In this study 440 we demonstrate that transfer learning to 7T data of healthy human volunteers enables DICE 441 scores of DICELV: 0.92 and DICEMY: 0.81. A clinical application would require a performance 442 assessment or transfer learning for specific cardiac pathologies, both beyond the scope of this 443 cardiology-related methodological work. 444 Furthermore, the accuracy of the labels we created was assessed based on comparison to 445 provided volume information only and visual confirmation of the contours may be biased, 446 because we do not know if the provided volume information is based on consistent definitions 447 of basal slices or the inclusion or exclusion of papillary muscle. This should be considered when 448 creating models based on this dataset. In general, there is a need for a standard benchmark 449 dataset, where labels are based on standardized protocols and images are representations of 450 diverse clinical phenotypes (diseases, vendors, field strengths, sequences, protocols). 451 452 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

453
In this study, we provide access to annotated cardiac cine MRI data, and AI models, which can 454 be used as a starting point for transfer learning applications. Using such a base model, we 455 demonstrate that transfer learning from clinical 1.5 and 3T cine data to 7T cine data is feasible 456 with moderate data requirements, enabling future applications to other cardiac MRI 457 examinations such as T2, T1, LGE, and even T2 * . Furthermore, we show that not all data has the 458 same value with respect to transfer learning approaches and that careful selection of the training 459 data may drastically reduce data requirements. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10.1101/2020

32.
Simonyan is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10. 1101/2020  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 17, 2020. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10. 1101/2020 Architectures (r34: ResNet34, r50: ResNet50, VGG16: v16, UKBB). B: Loss functions 619 (Cross-entropy: CE, DICE, focal loss). C: Confidence sets (p5: 5%, p10: 10%, p15: 620 15%). 621 622 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 17, 2020. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 17, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 17, 2020. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 17, 2020. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 17, 2020. . https://doi.org/10. 1101/2020