ABSTRACT
Objective Little is known about the concerns of people living with diabetes. This study aims to identify key diabetes-related concerns in the USA and primary emotions associated with those concerns using information shared on Twitter.
Research Design and Methods A total of 11.7 million diabetes-related tweets in English were collected between April 2017 and July 2019. Machine learning methods were used to filter tweets with personal content, to geolocate (to the US) and to identify clusters of tweets with emotional elements. A sentiment analysis was then applied to each cluster.
Results We identified 46,400 tweets with emotional elements in the USA from which 30 clusters were identified; five clusters (18% of tweets) were related to insulin pricing with both positive emotions (joy, love) referring to advocacy for affordable insulin and sadness emotions related to the frustration of insulin prices, five clusters (12% of tweets) to solidarity and support with a majority of joy and love emotions expressed. The most negative topics (10% of tweets) were related to diabetes distress (24% sadness, 27% anger, 21% fear elements), to diabetic and insulin shocks (45% anger, 46% fear) and comorbidities (40% sadness).
Conclusions Our work has enabled us to describe the main topics of interest about diabetes and their associated emotions. Insulin pricing is a central diabetes-related concern and is associated with feelings of sadness. Social media data are complementary to traditional diabetes epidemiology and can contribute to public decision making around key topics of interest around diabetes.
What is already known about this subject?
It is very challenging to collect data a population level to truly understand what are the key concerns of people with diabetes in the real life
Social media such as Twitter can serve as a relevant source of information
There are key challenges for a global and fair access to insulin
What are the new findings?
This work represents an unprecedented overview of the large diabetes online community in the USA
With 18% of the tweets related to insulin pricing, we have observed that it is a major concern in the diabetes community in the USA. People regularly express fear, anger and sadness about potential diabetes-related complications and comorbidities. However, there is a great amount of support and solidarity among the diabetes online community with numerous posts related to positive emotions
How might these results change the focus of research or clinical practice?
Our work presents a reproducible approach to easily capture relevant information about the main concerns of people with diabetes that is usually not available in typical clinical or epidemiological settings. This could help to shape more relevant public health actions by targeting what really matters to people with diabetes.
INTRODUCTION
Stress, fears and negative emotions related to the management of diabetes are considered to be the most important psychosocial health factor in the management of diabetes.1 It has been shown that diabetes distress is associated with decisional conflict and therefore has an impact on day-to-day disease management and the long term risk of diabetes-related complications.2 Reducing diabetes-related distress may improve HbA1c and reduce the burden of disease among people with diabetes.3,4 However, current diabetes management guidelines dedicate little attention to the prevention and management of diabetes-related distress. This is due, in part, to the limited knowledge of the sources of stress, anxiety and concerns among people with diabetes and to the difficulty to capture5,6 them with existing evaluation scales such as the Problem Areas in Diabetes Scale (PAID or PAID-T for adolescents)7 or the Diabetes Distress Scale (DDS).5,8,9 By neglecting the impact of psychological factors, future interventions may not adequately address the needs of people with diabetes. Moreover, there is no extensive study in the literature about the impact of socioeconomic environment on diabetes-related concerns.
Social media data offer a unique opportunity to supplement current measures of diabetes-related distress and assess the sentiments of people with diabetes given the very active online diabetes community,10,11 in particular on Twitter. Twitter is a microblogging and social networking service with 321 million monthly active users.12
The aims of the present study were to identify diabetes-related concerns in the USA and to identify the primary emotions and sentiments associated with them.
METHODS
Data collection
A tweet extraction engine was developed in April 2017 and since then has collected more than 11.7 million (status July 2019) diabetes-related tweets in English via Twitter’s Streaming Application Programming Interface (API), based on a list of diabetes-related keywords, such as diabetes, hypoglycemia, hyperglycemia and insulin, from all over the world (see Supplement S1 for the full list of keywords used). All data collected in this study were publicly posted on Twitter. Therefore, according to the privacy policy of Twitter,13 users agree to have this information available to the general public. In this study, we restricted the analysis to 46,400 US-based tweets with personal and emotional content (see Figure 1 for an overview over the workflow).
For further details concerning data representation, data preprocessing, training of machine learning classifiers to filter tweets with personal content, from institutional tweets (organisations, advertisement, research news, etc.) related to diabetes; detect and exclude jokes; predict the gender and type of diabetes of users as well as the geolocation algorithm to determine tweets in the USA, please refer to Supplementary 2. For the present work, the statistical unit considered is a tweet and not a user profile. This implicates that a user may have several tweets.
Identifying emotions
To identify emotions among diabetes-related tweets, we combined detailed Parrot’s classification of emotions14 with a dictionary of emotional keywords present in the two most common tools to assess psychological health in people with diabetes, namely the Problem Areas in Diabetes questionnaire and the Diabetes Distress Scale. Parrot identified over 100 emotions and conceptualized them as a tree structured list (full list Supplement S3) with the six primary emotions being joy, love, surprise, sadness, anger and fear. To capture as many tweets as possible containing emotions, we hypothesized that the synonyms of the words from Parrot’s classification should be included as well. Synonyms were identified using the WordNet database.15
Moreover, due to the nature of Twitter as a microblogging service (short messages), people use emoticons and emojis that express special meanings. Emoticons constitute a metacommunicative pictorial representation of a facial expression using punctuation marks and letters, such as ‘:-)’ or ‘:D’.16 Emojis are incorporated into sets of characters available in mobile phones, such as ‘’ or ‘’. We used Wolny’s categorisation of emojis and emoticons into emotional categories17 similar to the six primary emotions defined by Parrot. This allowed us to determine a subset of emotional tweets containing either emotional words or emojis/emoticons.
Sentiment Analysis
Sentiment analysis is commonly used to investigate the positive or negative opinion within a tweet. We used a widely accepted18–20 human-validated sentiment analysis tool Valence Aware Dictionary for Sentiment Reasoning (VADER).21 VADER was specifically developed for social media data and combines a lexicon and the processing of the sentence characteristics to determine sentence polarity. VADER computes sentiment and valence for each word level and provides positive, negative and neutral scores at the sentence level. We used the compound score, which is unidimensional and a normalized measure of sentiment between −1 (most negative) and +1 (most positive).
Topic extraction
We applied the unsupervised machine learning algorithm K-means22 using cosine similarity, to group the emotional tweets into topics/clusters. In order to define the right number of clusters k, the Silhouette score,23 measuring how on average each data point is closer to its cluster’s center than to any other cluster, was used. We obtained the highest score for k=30 clusters. Each topic/cluster was then labelled according to the most contributing tweets (those closest to the topic center) and the most frequent words (top words).
Assessment of the mean income
We studied the associations between the topics of interest for people with or talking about diabetes and the mean income of the city, based on data from the 2017 American Community Survey.24 The tertiles for the mean household income were calculated: low income - [$24,609, $67,224], medium income - [$67,224, $86,758], high income - [$86,758, $394,259]. Each geolocated tweet was associated with the mean income for its corresponding city and then assigned to its respective tertile: low income, medium (med) income, high income. Associations between topics and mean income tertile groups were assessed with Chi-squared tests. For each topic the p-value was calculated using a Chi2 test between the binary variable if a tweet belongs to the corresponding topic and the tertile categories of the city-level mean income from the tweet’s origin.
Python (version > 3.6) and the packages scikit-learn25 and gensim26 were used to implement the methods described above. Algorithms related to the present study are open source under the following address: https://github.com/WDDS/Tweet-Diabetes-Classification.
RESULTS
Spatial distribution of diabetes-related tweets
This analysis was based on 167,743 geolocated, diabetes-related tweets from the USA, see Figure 2. The highest number of tweets were seen in California (N: 18,551) and Texas (N: 14,237), whereas Vermont (N: 197) and Wyoming (N: 131) had a low number of tweets. At the city level, New York City (N: 9,663), Los Angeles (N: 5,301) and Chicago (N: 4,884) had the highest number of tweets.
Topics of interest
Among all US diabetes-related tweets, 46,407 were identified as emotional tweets (28% of tweets) of which 14,485 (31%) were written by men, 20,228 (44%) by women and 11,694 (25%) from unknown sex; 20,285 (44%) were predicted as from people with type 1 diabetes, 4,375 (9%) from type 2 diabetes and 21,747 (47%) diabetes type unknown. Figure 3 shows the detailed description of the 30 topics of interest about diabetes. In most topics, an over-representation of women could be observed, in particular in topics 10, 23 and 26 related to the importance of affordable insulin and the oral glucose tolerance test (OGTT). In contrast, men more frequently discussed topics around diabetes-related stories (topic n°6) and ‘diabetic/insulin shock’ (topic n°29). Discussions about ‘advocacy for affordable insulin’ (topic n°10) and enjoying the exchange in the diabetes online community (topic°5) were almost exclusively dominated by people with type 1 diabetes. The only topics where people with type 2 diabetes had significantly more tweets than on average is topic 19 related to the confusion between type 1 and type 2 diabetes or in topic 17 when they tweeted about the anniversary of their diagnosis.
Primary emotions related to the topics of interest
Figure 4 illustrates the sentiment and emotions distribution over the 30 topics. In Supplemental S4 the most common emotional words and emojis/emoticons have been listed for each topic. Overall, we found that the most positive topics were related to support and solidarity within the diabetes online community (topic n°1) with 72% of emotions related to joy and about inspiring relatives living with diabetes (topic n°2) with joy (36%) and love (31%) elements. In contrast, the most negative topics were related to ‘diabetes-related comorbidities’ (topic °30) of which 40% of the emotions were related to sadness, 19% to anger and 22 % to fear; ‘diabetic/insulin shock’ (topic n°29) with an over-representation of anger (45% of all emotions) and fear (46%); “diabetes distress” (topic n°28) with sadness (24%), anger (27%) and fear (21%) emotions expressed. Topics 2 (“Inspiring relatives living with diabetes”) and 4 (“Diabetes awareness / Support / Donation”) showed a higher proportion of surprise emotions. The topics in which people with or talking about diabetes exchange, support each other and show solidarity (topics 1, 2, 3, 4, 5) contained joy and love emotions most frequently. Topic 9 was related to beauty products with diabetes-related names and was considered irrelevant for our analyses and disregarded.
Insulin pricing
Insulin pricing is a major topic of interest mentioned in 5 of the 30 topics (18% of tweets). A key topic associated with insulin pricing was related to the frustration of not being able to afford insulin (topic n°25), in which many tweets expressed sadness (24%), anger (21%) and fear (20%). Topic 13 regrouped tweets in which people are looking for “glucose guardians” who would be able to support them financially in obtaining insulin. Topic 10 described the tweets of people who are fighting for affordable insulin, which are represented by the hashtag ‘#insulin4all’. The positive emotions in this group showed mutual support.
Associations between topics of interest and mean income
We observed that topics such as n°10 ‘advocacy for affordable insulin’, n°22: ‘insulin’ and n°25: ‘frustration with insulin prices’ were positively associated with mean household city income such that cities with higher incomes were more likely to post tweets about these topics (P<0.001 for all) (see Supplementary 5). Similarly, we found that topics 24: ‘misunderstandings of diabetes’ (P<0.001), 19: ‘confusion between type 1 and type 2 diabetes’ (P<0.05), 12: ‘life with type 1 diabetes’ (P<0.01) and 21: ‘glucose variability’ (P<0.05) were positively associated with mean city income. Likewise, positive associations were found between topics 24 and 19 which are related to language use and the way people talk about type 1 and type 2 diabetes with cities with higher incomes being more likely to talk about these subjects.
In contrast, we observed that topics 5: ‘diabetes social media advocacy’ (#DSMA) group enjoying online support (P<0.001), 18: ‘day-to-day stories about diabetes’ (P<0.001) and 26: ‘oral glucose tolerance test (OGTT)’ (P<0.001) were negatively associated with mean city income such that cities with lower income were more likely to post these tweets.
Furthermore, cities with medium mean income tweet about topic 15: ‘Insulin pump complications’ more frequently (P<0.001).
DISCUSSION
Our findings suggest that Twitter is useful to capture diabetes-related topics of interest and the emotions associated with those topics. Our key findings suggest that there is a great amount of support and solidarity among the diabetes online community with numerous tweets related to joy and love emotions observed. On the other hand, there also is considerable frustration (annoyance) about the ability of people to distinguish between type 1 and type 2 diabetes and fear, anger and sadness related to diabetes-related complications and comorbidities as well as around insulin pricing.
We found that insulin pricing is a major concern in tweets shared in the US (18% of the emotional tweets related to this topic) and is associated with both positive (joy, love) and negative (sadness, anger, fear) emotions. People frequently share their frustration with insulin prices,27 access to insulin,28 and search for ways to help them finance it by resorting to a “glucose guardian” or from donations which are often accompanied with negative emotions. Positive emotions are present when it comes to solidarity in the fight for affordable insulin and the exchange over insulin prices in the community in general. We observed associations of topics addressing insulin pricing to be more frequent in cities with high mean incomes. This does not indicate that people living in cities with a high mean household income feel more concerned about the insulin prices, but that they probably have a greater ability to organize and gather around this issue. A large number of tweets geolocated in cities with a high mean household income include the hashtag ‘#insulin4all’, a campaign that unites the diabetes community around the access to treatment for everyone. The Diabetes Social Media Advocacy (#DSMA) group on Twitter is a community of people exchanging about diabetes related topics, in which the price of insulin is a central concern.
Comparison with the literature
Hughes et al. suggested the importance of understanding emotions and self-control (regulation of thoughts, emotions, and behavior) for health outcomes in adolescents with diabetes.29 Hagger et al. reviewed diabetes distress among adolescents with type 1 diabetes and found that a substantial proportion experienced elevated diabetes distress and that this is often associated with suboptimal glycemic control.30 Richman et al. suggested that positive emotions may play a protective role in the development of disease.31 Ogbera et al. showed that emotional distress is associated with poor self-care in type 2 diabetes.32 A study conducted by Iturralde et al. showed that anxiety is highly comorbid with depression among individuals with type 2 diabetes.4 Our study aligns with these results showing that emotions and diabetes distress topics are frequent and concern people with diabetes or people talking about diabetes on social media.
Similar to Nguyen et al.,33 who showed that individuals living in zip codes with high percentages of happy and physically-active tweets had lower obesity prevalence based on geolocated Twitter data, we have shown gradients between topics related to diabetes on Twitter and the household income level of their city.
It is known that there are key challenges for a global and fair access to insulin.34 With respect to insulin pricing, we are the first to exhibit and quantify, on a large sample of people with or talking about diabetes, the extent of the crisis in the USA based on social media data. In addition, we have also been able to highlight the different emotions and fears associated with the crisis around insulin pricing and therefore the potential impacts on the health of people with diabetes.
Strengths and Limitations
This study has numerous strengths. First, a major advantage of using social media data is that information is expressed spontaneously, on a large scale, in a real-time manner, in what can be considered as an open digital space with flat role hierarchy for information sharing and development of online communities. This potentially minimises biases such as responder bias that you would observe in traditional and observational studies. We evaluated tweets related to diabetes from a large number of people with a great variability in their profiles. The methodologies developed in this study present an innovative way to concentrate on relevant (personal, emotional) geolocated tweets (US), to identify topics of interest and emotions shared within topics. This approach is able to capture trends in the diabetes online community and socioeconomic factors can be associated with social media data at the ecological level. Moreover, topics are very less medically oriented which allows us to identify alternative topics which would not have been detected in a traditional clinical setting.
There are, however, several limitations to consider. First, diabetes-related concerns expressed on Twitter may not be representative of all people with diabetes. However, it has been previously suggested that it can be partially offset by the large variability in the profiles of people sharing data in the first place, which is key in digital epidemiology.35 Indeed, we have observed a large variability in their profiles, allowing us to identify various clusters of interest. Besides, the precision of our filter classifiers (personal content, jokes), gender and type of diabetes classifiers is not perfectly accurate which means that we cannot guarantee that 100% of tweets are posted by actual people with diabetes and it was sometimes impossible to define the sex or type of diabetes. Besides, our results should be interpreted in the context of the Twitter population solely. For instance, we found an over-representation of people with type 1 diabetes and women in our study when compared to what is known from the worldwide diabetes epidemiology literature36 which may be explained by the younger demographic of Twitter users.37 We were unable to account for several clinical and environmental factors that may help us tease out these associations. The label provided by the researchers for each topic is not exclusive. By refining the tweets in each topic, more subtopics could be defined. This could be a future direction to investigate. Finally, the geolocation of tweets was partially based on locations the users provided, which might not be their true location. No causal inference can be made regarding the link between the mean household income per city and the topics of interest of people residing in the corresponding city; and would be subject to the ecological fallacy. A person tweeting from a city of low mean income does not have necessarily a low income. A natural perspective of our work is to extend our analyses to include more countries and languages.
Conclusion
In this study we investigated diabetes-related topics and their associated emotions. We showed that insulin pricing is a central concern and comes with feelings of sadness. We have also shown that feelings, emotions and stress, which are not usually well taken into account in traditional epidemiology, are essential elements to capture the full picture of the real-life of people with diabetes. Our work should encourage future studies to take them into account. Future intervention studies should focus on reducing the level of stress and fears in people with diabetes.
Social media provides a useful observatory for diabetes issues as it is a direct source to capture information about people with diabetes feelings, emotions, beliefs and fears related to diabetes, diabetes treatment and complications among the large and active diabetes online community. The use of Twitter analysis on diabetes could inform the public debate about diabetes issues and help to contribute directly to public and clinical decision making.
Data Availability
All code is publicly available in this github repository: https://github.com/WDDS/Tweet-Diabetes-Classification Data is available upon reasonable request
Role of the funding source
This work has been supported by the MSDAvenir Foundation, the French Speaking Diabetes Society and the Luxembourg Institute of Health. These study sponsors had no role in the design nor the interpretation of the results of the present study. Adrian Ahne, Francisco Orchard and Thomas Czernichow are supported by Epiconcept Company. Epiconcept was involved in the data collection and writing of the report. No study sponsor influenced the decision to submit the paper for publication.
Declaration of Interest
The authors declare that they have no competing interests. The study sponsors had no role in the study design, analyses and interpretation of the results.
Acknowledgment
Guy Fagherazzi takes full responsibility for the work as a whole, and for the decision to submit and publish the manuscript.
The authors’ contributions were as follows: G.F. designed the research; A.A. and G.F. conducted the research; F.O. and A.A collected the data; A.A., F.O. and G.F. analyzed data; A.A., G.F., F.O., T.C, X.T interpreted the data; A.A. and G.F. drafted the article; G.F., F.O., T.C., X.T., B.B., J.L.H., C.P. and S.P. revised the manuscript critically; GF had primary responsibility for the final content of the manuscript. All authors read and approved the final manuscript.