A Voice App Design for Heart Failure Self-Management: A Pilot Study

There is a growing interest to investigate the feasibility of using voice user interfaces as a platform for digital therapeutics in chronic disease management. While mostly deployed as smartphone applications, some demographics struggle when using touch screens and often cannot complete tasks independently. This research aimed to evaluate how heart failure patients interacted with a voice app version of an already existing digital therapeutic, Medly , using a mixed-methods concurrent triangulation approach. The objective was to determine the acceptability and feasibility of the voice app by better understanding who this platform is be best suited for. Quantitative data included engagement levels and accuracy rates. Participants (n=20) used the voice app over a four week period and completed questionnaires and semi-structured interviews relating to acceptability, ease of use, and workload. The average engagement level was 73%, with a 14% decline between week one and four. The difference in engagement levels between the oldest and youngest demographic was the most significant, 84% and 43% respectively. The Medly voice app had an overall accuracy rate of 97.8% and was successful in sending data to the clinic. Users were accepting of the technology (ranking it in the 80 th percentile) and felt it did not require a lot of work (2.1 on a 7-point Likert scale). However, 13% of users were less inclined to use the voice app at the end of the study. The following themes and subthemes emerged: (1) feasibility of clinical integration: user adaptation to voice apps conversational style, device unreliability, and (2) voice app acceptability: good device integration within household, users blamed themselves for voice app problems, and voice app missing desirable user features. The voice app proved to be most beneficial to those who: are older, have flexible schedules, are confident with using technology, and are experiencing other medical conditions.

4 50 Introduction 51 Background 52 Chronic diseases are the leading cause of death and disability worldwide, with over 41 million 53 people dying each year due to these diseases (1). Cardiovascular diseases, such as heart attacks 54 and high blood pressure are responsible for most chronic disease deaths (17.9 million people) 55 (1). Patient self-care is considered to be essential in the prevention and management of chronic 56 diseases (2) as studies have shown the benefits of this approach, which include improved health 57 outcomes, decreased clinic visits, and decreased health costs (3). Mobile health, also referred to 58 as mHealth, is a type of digital health technology that involves the use of mobile devices for 59 medical and public health practice (4), and enables the integration of self-care support into a 60 patient's routine (5). While mHealth apps are one of the most popular tools for helping patients 61 with chronic conditions manage their health at home (6), there is research to suggest voice apps 62 are an emerging platform that will create alternative interaction models that some patient 63 demographics may find more accessible.
64 Voice user interfaces (VUIs) are becoming more prevalent in the healthcare field for a variety of 65 different purposes. With VUIs the user is able to interact with a computing system using only 66 speech, with voice apps being one example of this technology. The primary advantage of 67 implementing VUIs in any environment is simplicity, since it does not require the user to interact 68 with a hand-held technology, as we are typically accustomed to. So far, VUIs have been used to 69 help those who have speech or hearing difficulty, to improve patient engagement, as well as 70 aging in place (7). This technology has also been used in the clinical setting by supporting . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. ; https://doi.org/10.1101/2022.04.06.22273509 doi: medRxiv preprint 5 71 physician note transcription and the patient registration process (7). Devices that offer VUI 72 capability are also continuing to gain popularity in consumer households and are becoming more 73 integrated into our daily lifestyles due to their convenience, ease of use, and affordability. As a 74 result, there is growing interest to investigate the feasibility of using smart speakers to improve 75 patient engagement, with a specific focus on chronic disease management. 76 Heart Failure 77 Previous research has begun to investigate the potential for patients to manage their heart failure 78 (HF) using a voice app (publication pending). HF is a condition that develops after the heart 79 muscle becomes damaged or weak due to cardiovascular diseases, such as heart attacks and high 80 blood pressure. When the heart muscle becomes damaged or weakens, it is unable to pump 81 enough blood to meet the body's needs for blood and oxygen (8). Medly is an evidence-based, 82 HF self-management program that has been developed by the University Health Network (UHN) 83 and is implemented as part of the standard of care at UHN's Ted Rogers Center of Excellence for 84 Heart Failure clinic (9). This program is currently available to patients through a smartphone app 85 and enables them to log clinically relevant physiological measurements and symptoms, which is 86 then used in the Medly algorithm to generate an automated self-care message. A voice app 87 version of Medly has already been built as part of previous work, and a usability study has been 88 performed with the voice app at UHN's Heart Failure Clinic. 89 Objectives 90 The results from a previous usability study show promise that a voice app for chronic disease 91 management, such as HF, is feasible to deploy and acceptable by patients (publication . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. 110 A total of 20 participants were recruited for this study, based on findings from the literature 111 which suggested that a sample size between 10 and 30 users (10) is appropriate to use for pilot . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. ; https://doi.org/10.1101/2022.04.06.22273509 doi: medRxiv preprint 7 112 studies. To help mitigate potential bias, we aimed to recruit both participants who were 'new' to 113 Medly (less than 2 month since being onboarded to the program), and also those who were 114 'existing' (more than 2 months since being onboarded) Medly patients. In the end we recruited a 115 total of 7 new and 13 existing Medly patients.
116 Recruited participants were required to perform a double-entry of their Medly measurements for 117 the four week duration, more specifically they were asked to first input their Medly

Proctor et al.'s Implementation Outcomes
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. ; https://doi.org/10.1101/2022.04.06.22273509 doi: medRxiv preprint 8 Acceptability The perception among patients that the Medly voice app is agreeable or satisfactory.

Feasibility
The extent to which the Medly voice app can be successfully used by patients.

Unified Theory of Acceptance and Use of Technology 2 (UTAUT2)
Effort expectancy The degree of ease associated with using the Medly voice app.
129 Questionnaires and semi-structured interview questions were influenced by the System Usability 130 Scale (SUS) (13) and NASA Task Load Index (TLX) (14) standardized assessment tools.
131 Quantitative data was also gathered through semi-structured interviews by asking participants 132 how often the voice app misheard their measurements, how many times they were required to 133 correct wrongly recorded data, and how many times they missed inputting their measurements 134 and why (engagement levels). Accuracy rates were calculated by comparing the measurements 135 inputted on the smartphone app versus the voice app.
136 Data Collection 137 The study coordinator performed an onboarding session over the phone with each participant to 138 help them set-up and access the Medly voice app, and provided them with an instructions manual . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. ; https://doi.org/10.1101/2022.04.06.22273509 doi: medRxiv preprint 9 139 (Fig S1, Multimedia Appendix 1). Participants were also required to answer a baseline 140 questionnaire (Table S1, Multimedia Appendix 2) to help the study coordinator understand their 141 comfort levels with using technology. Participants were made aware that they needed to perform 142 a double entry of their Medly measurements for the four week duration and were told to prioritize 143 the Medly smartphone app, namely to input measurements on the phone first, and to only follow 144 guidance from the smartphone app. 158 questionnaires, both overall and question specific. Data was categorized in different ways using 159 various characteristics and then analyzed to identify any trends or commonalties.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. ; https://doi.org/10.1101/2022.04.06.22273509 doi: medRxiv preprint 10 160 The quantitative data was then triangulated with the qualitative data findings, namely from the 161 semi-structured interviews. Interview transcripts were analyzed by the study coordinator (AB) 162 using an deductive approach. Specifically, the transcripts were analyzed under the guidance of 163 Proctor et al.'s Implementation Outcomes framework, with a focus on the acceptability and 164 feasibility constructs. Sub-themes were then identified to better describe the study findings. The 165 transcripts and coding was organized using Microsoft Word.

Characteristics of Study Participants
168 A total of 20 patients were recruited for the study, with a fairly even split among genders 169 (females: 9/20, 45%, male: 11/20, 55%) and an average age of 57.8 (SD 13.1) years. All patients 170 who were recruited were required to be enrolled in the Medly program, with a mix between those 171 who have just recently (defined as less than 2 months since the time of recruitment) been 172 onboarded to the program (7/20 patients, 35%) and those who have been enrolled in the program 173 for longer (13/20 patients, 65%). Other patient characteristics were also collected for the 174 purposes of this study, such as: comfort levels with technology and whether or not they have 175 used a smart speaker before through a baseline questionnaire (18 out of the 20 participants 176 returned this questionnaire). The statistics for each of these characteristics can be seen in Table   177 2.
178 Table 2. Patient characteristics used to categorize and sort data in the study.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Sex, n (%)
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. 185 In addition to calculating the overall engagement levels, patient characteristics from Table 3 186 were also used to group the study population and compare the results among sub-groups to 187 identify any noticeable trends. These results can be seen in . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. ; https://doi.org/10.1101/2022.04.06.22273509 doi: medRxiv preprint 15 201 There were also consistently higher engagement levels with the group that had never interacted 202 with smart speakers before when compared to those who have (7.6% difference). Both groups 203 steadily declined in engagement as the weeks progressed, with similar overall differences 204 between week one and four averages.
205 Over the four week duration (28 days), nine entries (out of 411) were incorrect measurements 206 that were submitted using the Medly voice app, indicating an overall accuracy rate of 97.8%. The 207 errors varied between weight and blood pressure. A small subset of participants (four) were not 208 able to successfully submit their correct readings which led to the nine errors that were recorded.  Feasibility of Medly Voice 231 The NASA-TLX was used in this study to better assess the perceived workload when using the 232 Medly voice app by the study participants. A 4% increase was seen in average scores between 233 week two and four results, indicating a slightly higher workload. While the averages for each of 234 the questions were fairly low, questions relating to: 1) success rates, 2) how hard they needed to 235 work to accomplish the task, and 3) feeling of discouragement, irritation and stress scored the 236 worse when compared to the rest of the questions. These results can be seen in Fig S2, 237 Multimedia Appendix 4. Participants also felt less successful with using the Medly voice app at 238 the end of the study than they did at the end of week two (22% difference in results).

Acceptability of the Medly Voice App
239 When analyzing the scores based on the different age groups, it was found that the youngest 240 demographic felt they needed to work the most (highest average of 2.67) when compared to the 241 middle-age (average of 1.61) and oldest demographics (average of 2.12). It was also found that . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. ; https://doi.org/10.1101/2022.04.06.22273509 doi: medRxiv preprint 17 242 those who were new to Medly specifically felt more rushed when using the voice app and less 243 successful when inputting their measurements, when compared to those who have been on the 244 Medly program for a longer time (~15% difference in scores for each question). The difference in 245 average scores for those who described themselves as less confident when using technology 246 consistently gave poorer scores for each of the questions, indicating they had a more difficult 247 time than those who described themselves as confident (Table S1, Multimedia Appendix 4).
248 In summary, the youngest age group felt they needed to work the most, the study population 249 collectively felt they needed to provide slightly more effort as time went on, and those who were 250 less familiar with technology had more difficulty using the voice app when compared to those 251 who were more confident.
252 The UTAUT2 questionnaire was used to better understand participants' thoughts regarding 253 facilitating conditions, effort expectancy, habit, and behavioral intention when it comes to using 254 the voice app. The biggest difference between week two and four results was with whether they 255 would use the Medly voice app in the future, with a 13% decline in the average score. The oldest 256 demographic was the least keen on using it in the future, while the middle-aged demographic 257 was the most interested in future use. When asked if the voice app became a habit, those who 258 used the technology before agreed more than those who did not have experience using the device 259 (19% difference in responses).
260 Overall, all participants felt the voice app required low effort to use, and that it was easy for them 261 to operate. They were less certain with whether or not using the voice app had become a habit for 262 them (this can be supported with engagement levels), and were least certain about whether they 263 would use the voice app in the future, as seen in Table S2, Multimedia Appendix 4.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022.

280
"I learned how to get into her rhythm as opposed to her getting into my rhythm."

[Participant 04]
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. ; https://doi.org/10.1101/2022.04.06.22273509 doi: medRxiv preprint 19 282 Specific strategies were employed to change their speaking style and most often involved 283 modifying the volume, tone, pace, and style they spoke at. Different strategies seemed to work 284 better for different participants, specifically with the pace at which they spoke at. 297 Another interaction strategy employed by most participants involved using the device's 298 touchscreen capabilities. In most cases this alternative input was the favorable approach over 299 using voice since it was simpler to use and most importantly, faster.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. 357 Some participants also described their experience as "pleasant" when interacting with the device, 358 and others specifically feeling the need to use manners and to be polite while conversing with it: . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. ; https://doi.org/10.1101/2022.04.06.22273509 doi: medRxiv preprint 502 demographics in the study, especially those who primarily spoke languages other than English.
503 Lastly, because most participants from this study never interacted with a smart speaker before, 504 their thoughts and feedback may be influenced by the fact that they were interacting with a novel 505 technology. As a result, their thoughts on the device itself could be reflected in their responses, 506 even though any voice user interface device could have been used for the study. 507 Conclusions 508 This study utilized a mixed methods approach to investigate the acceptability and feasibility of 509 deploying a voice app for DTx used in chronic disease management. Our findings were 510 consistent with previous research when it came to engagement levels, with the oldest age group 511 showcasing the best, most consistent results. We recommend this platform be offered to those 512 who: are older (60+ years), have less busy schedules, exhibit high confidence levels when using 513 technology, or experience symptoms (such as fatigue or headaches) from chronic conditions. 514 While the technology could benefit from some advancements, participants were successful in 515 finding ways to improve their conversational experience, proving that an app like this could be 516 feasible to deploy in the clinic for future use. 517 Conflicts of Interest 518 JC and HR are part of the team that founded the Medly system under the intellectual property 519 policies of the UHN and may benefit from future commercialization of this technology.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 10, 2022. ; https://doi.org/10.1101/2022.04.06.22273509 doi: medRxiv preprint 31 520 Acknowledgments 521 First, the authors wish to thank the patients who participated in this study. Thank you to the 522 Medly nurse coordinators: Mary O'Sullivan, Sarvatit Bhatt, Eva Pavic, Tina Carriere, and 523 Annabelle Fontanilla for helping with the recruitment process. We also wish to express our 524 gratitude to Quynh Pham and Patrick Ware in helping guide this research project's methodology, 525 as well as Cait Nuun, Madison Taylor, and Denise Ng for their REB expertise and guidance.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.