Abstract
Background Diagnoses of “personality disorder” are prevalent among people using community secondary mental health services. Whilst the effectiveness of a range of community-based treatments have been considered, as the NHS budget is finite, it is also important to consider the cost-effectiveness of those interventions.
Aims To assess the cost-effectiveness of primary or secondary care community-based interventions for people with complex emotional needs that meet criteria for a diagnosis of “personality disorder” to inform healthcare policy making.
Method Systematic review (PRESPORO #: CRD42020134068) of five databases, supplemented by reference list screening and citation tracking of included papers. We included economic evaluations of interventions for adults with complex emotional needs associated with a diagnosis of ‘personality disorder’ in community mental health settings published between before 18 September 2019. Study quality was assessed using the CHEERS statement. Narrative synthesis was used to summarise study findings.
Results Eighteen studies were included. The studies mainly evaluated psychotherapeutic interventions. Studies were also identified which evaluated altering the setting in which care was delivered and joint crisis plans. No strong economic evidence to support a single intervention or model of community-based care was identified.
Conclusion There is no robust economic evidence to support a single intervention or model of community-based care for people with complex emotional needs. The review identified the strongest evidence for Dialectical Behavioural Therapy with all three identified studies indicating the intervention is likely to be cost-effective in community settings compared to treatment as usual. Further research is needed to provide robust evidence on the cost-effectiveness of community-based interventions upon which decision makers can confidently base guidelines or allocate resources.
Introduction
Globally, it is estimated that approximately eight percent of the population experience complex emotional needs that meet the diagnostic criteria for ‘personality disorder’ which is described broadly as an enduring and pervasive pattern of emotional and cognitive difficulties which affect the way in which a person relates to others or understand themselves(1). The diagnosis is often associated with high rates of co-morbidity (2) high levels of service use (3) and high treatment costs (4).
In European and North American community secondary mental healthcare settings, the prevalence of ‘personality disorder’ diagnoses are estimated to be above 40% (5). A range of psychological therapies show some evidence of effectiveness; dialectical behavioural therapy (DBT) and Mentalisation-Based Treatment (MBT) have the most well-established evidence base (6,7). While establishing clinical effectiveness of psychological therapies and models of care is crucial, it is also important for decision makers to consider their value for money. Health and social care resources are limited and, as such, there are competing demands for scarce resources. It is therefore a growing requirement that assessments of new treatments and therapies include an economic evaluation (8). Through robust economic evaluation decision makers can consider the opportunity cost of funding one intervention over another, as for every potential gain from a funded intervention, given a limited funding pool, there are potential losses from the next best alternative option that is forgone (9).
Economic evaluation seeks to compare the costs of an intervention against its outcomes. The main approaches are cost-effectiveness analysis (CEA), cost-benefit analysis (CBA), cost consequences analysis (CCA), or cost-utility analysis (CUA) (10). CEAs compare the incremental cost of an intervention to incremental changes in outcomes using a condition specific measure. For complex emotional needs these may include an assessment of functioning or distress. CBA measures outcomes using monetary units and are uncommon in healthcare evaluations partly due to the challenge of valuing health effects in monetary terms. CCA presents costs against a number of outcome measures to support multi-criterion decision making. Finally, cost-utility analysis (CUA) is a type of CEA and compares the incremental cost of an intervention against changes in a generic measure of health, usually quality-adjusted life years (QALYs) where one QALY is equivalent to one year of life in perfect health. Results can be compared against a willingness to pay (WTP) threshold which indicates how much a society will pay for a QALY. The National Institute for Health and Care Excellence use a threshold of up to £30,000 in England (11).
We are aware of two systematic reviews of economic evaluations of interventions for ‘personality disorder’(3,4). The scope of these reviews was limited to interventions for people with diagnoses of ‘borderline personality disorder’ only and the most recent papers included were published in 2014. The aim of this paper is to assess the cost-effectiveness of community-based interventions for people with complex emotional needs that meet criteria for diagnoses of ‘personality disorder’ compared to usual care (as defined by each study) or other active interventions. To do this we systematically review and assess the quality of relevant economic evaluations.
Methods
This systematic literature review was carried out in accordance with “The Preferred Reporting items for Systematic Reviews and Meta-Analyses” (PRISMA) checklist(12). A protocol for the search strategy and methods has been registered with PROSPERO (CRD42020134068). We use ‘complex emotional needs’ as our preferred terminology rather than ‘personality disorder’. This choice seeks to recognise that although some people find the diagnostic term helpful, many find it to be invalidating and stigmatising. However, we do use the term ‘personality disorder’ when referring to original papers and search strategy inclusion criteria (diagnostic inclusion criterion for the service model described) as appropriate.
Eligibility criteria
Inclusion criteria (detailed in Table 1) required that studies (i) included an economic evaluation where both costs and outcomes are reported and where a formal or informal linkage between costs and outcomes is made; (ii) included adults attending a general or specialist community mental health service with complex emotional needs that meet the diagnostic criteria for ‘personality disorder’ (other than ‘antisocial personality disorder’) (iii) evaluated community-based services, treatments and interventions; (iv) compared cost-effectiveness to usual care or another active intervention.
Publication date was limited to January 1st1990 to September 18th2019, and language of publication to English. Studies of interventions for adults with diagnosed ‘antisocial personality disorder’ were excluded, as there are often high levels of forensic service use which would limit or prevent the implementation of community-based interventions. Cost-minimisation studies were excluded as they require interventions to have an equivalent effect, which would be incredibly unlikely in this setting. Community-based services, treatments and interventions are defined as any services, treatments and interventions provided outside of an inpatient setting excluding pharmacological treatments.
Information sources and search terms
Electronic searches of the following bibliographic databases were conducted: MEDLINE, PsychINFO, EMBASE, Global Health, and NHSEED database. Search terms can be found in the appendix. Search domains covered are: ‘Personality Disorder’, ‘Community Care’ and ‘Economic Evaluation’. Citation tracking of included studies was conducted using Google Scholar. Reference lists of included studies were also screened.
Selection process and data collection
Three authors (JB, TS and RS) screened titles and abstracts against the inclusion criteria independently using a systematic review web application called Rayyan.qcri.org, which has a blinding feature. A third author (PM) independently reviewed and resolved disagreements. Full text screening was conducted by JB and TS independently (with PM resolving disagreements) using eligibility forms pertaining to each aspect of the inclusion criteria. Reviewers were not blinded to authors, institutions, or journals.
Data were extracted into standard forms by JB. Duplicate extraction was completed by TS for 25% of papers and PM resolved disagreements. Data were collected on the characteristics, methods, and results of each study. The study characteristics data included the country, funder, intervention (including levels of contact), the comparator, study design, economic approach (i.e. whether a cost-effectiveness or cost-utility analysis was used), a summary of the inclusion and exclusion criteria, the sample size, and sample characteristics (mean age, proportion female and proportion employed).
Data on the study methods included the follow-up period, perspective, costing (approach to recording service use, valuing the intervention, sources of unit costs, and discounting), and outcomes (main economic outcome, QALY derivation if relevant, and discounting). The perspective is the range of costs included in an analysis. A narrow perspective would only include costs relevant to the service provider. A broad perspective would include costs to other parts of society, such as the criminal justice system, social services, a patient’s employer, or voluntary/informal care. Inclusion of all impacts would constitute a societal perspective which is rarely achieved. However, here we follow the convention of defining a societal perspective as including healthcare and lost workdays. Resource use measurement methods include reviewing hospital records or conducting patient interviews/surveys. The source of unit costs includes national routine data, information from local providers, and fees or tariffs. The outcome is the measure of effectiveness used in the evaluation. This can be specific to a condition, such as an improvement in a condition-specific scale, or generic such as QALYs. For both costs and outcomes, data on any discounting has been recorded. Discounting is used to reflect the perceived lower value attached to future cost and outcomes, due to the widely accepted view of positive time preference. It is conventional to only discount costs and outcomes that occur beyond a one-year period (9).
Data analysis
Meta-analysis was not undertaken as the economic evaluations are context specific and the review does not focus on one particular outcome. Narrative synthesis was undertaken to describe the main study findings based on the reporting of the data items described above, including outcomes and costs for the intervention and comparator groups, and the cost-utility or cost effectiveness results. Results were often expressed as incremental cost-effectiveness or cost-utility ratios. These ratios are typically derived from mean values and due to variation around the mean, consideration should be given to uncertainty in these estimates.
Quality assessment
Study quality was assessed using the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement, which sets out a standard for the reporting of economic evaluations (13). The statement evaluates quality of reporting rather than the quality of evidence. Two statements relate to the introduction, sixteen to the methods, five to the results, and there is one statement each for the title, abstract, sources of funding, and conflicts of interest. JB appraised 50% of papers and AC appraised the remaining 50%. TS provided an independent appraisal of 25% of papers and any disagreements were resolved by PM. Each paper will be assigned a score, which corresponds to how many of the 28 statements on the CHEERS checklist the paper complies with (0 = does not comply, 0.5 = partially complies, and 1 = fully complies). Some of the statements in the checklist relate to the quality and fullness of reporting, which can be influenced by publications (e.g. the structure of the abstract) rather than the quality of research itself. A breakdown of each paper’s score has been reported in the results and in table 6, so scores for methodology and results can be assessed independently.
Results
Figure 1 shows the PRISMA flowchart pertaining to the review; eighteen papers were included in the review. One paper (Bamelis et al 2015) presented both cost-effectiveness and cost-utility analyses for each of two separate interventions (14). For reporting purposes, these interventions are treated separately (labelled ‘a’ and ‘b’) although both were compared to the same control group.
PM resolved 14 disagreements between JB & TS during title and abstract screening; two during full text screening; 14 over data extraction; and disagreements in 4 papers in the quality assessment.
Study Characteristics
Table 2 shows the characteristics of the studies included in the review. Studies were conducted in UK, Europe, and Australia, with sample sizes ranging from 34 to 642 (mean 195.2, SD 167.7). The youngest sample had a mean age of 28.4 years and the oldest sample mean was 39.2 years. The percentage of women in the samples ranged from 25% to 95%. The sample with the lowest rate of employment was 9.5% and the highest rate of employment was 68.5%, though seven studies did not report this.
Five of the studies included patients with any ‘personality disorder’ diagnosis (Grenyer et al 2018, McMurran et al 2016, Kvarstein et al 2013, Priebe et al 2012, Tyrer et al 2011) (15–19). Three of the evaluations looked at ‘Cluster C personality disorders’, described as avoidant, dependent, obsessive-compulsive (Bamelis et al 2015a, Bamelis et al 2015b, Soeteman et al 2011) (14,20) and one study looked at ‘Cluster B personality disorders’ described as antisocial, borderline, histrionic, and narcissistic (Soeteman et al 2010) (21). Eight studies only recruited participants with a ‘borderline personality disorder’ diagnosis (Blankers et al 2019, Borschmann et al 2013, Davidson et al 2010, Palmer et al 2006, Pasieczny & Connor 2011, Sinnaeve et al 2018, van Asselt et al 2008) (22–28). Nine of the studies required a recent episode of self-harm, attendance at an emergency department, or in-patient stay (Borschmann et al 2013, Davidson et al 2010, Grenyer et al 2018, Murphy et al 2019, Palmer et al 2006, Pasieczny & Connor 2011, Priebe et al 2012, Sinnaeve et al 2018, Tyrer et al 2004) (15,18,23–27,29,30). Two studies only included patients with other comorbid severe mental illness (Ranger et al 2009, Tyrer et al 2011) (31,32), of which one of these studies also required substance dependence (Tyrer et al 2011). One study included a general sample of which less than half had complex emotional needs that met personality disorder diagnostic criteria (Tyrer et al 2004) (30).
The interventions evaluated in the 18 studies are described in Table 7. Psychotherapeutic interventions included dialectical behavioural therapy (n=3), types of cognitive therapy (n=3), Nidotherapy (n=2), schema-focused therapy (SFT) (n=2), psycho-education with problem solving (PEPS) (n = 1), and mentalisation based therapy delivered in a day hospital setting (MBT) (n = 1). Other interventions comprised of altering the setting in which care was delivered (n=2), adopting a stepped-care approach (n=3), and developing joint crisis plans (JCPs) (n = 1).
Treatment as usual was the comparative intervention in half of the included papers (n=8). Other comparators included psychotherapy provided by out-patient services (n=4), psychotherapy provided by day-hospital services (n=2), care provided by assertive outreach services (n=2), transference-focused psychotherapy (n=1), and baseline data (n=1). More than three-quarters of the included papers employed a randomised controlled trial design (n= 14), two used Markov models, one a wait-list controlled trial, and one a quasi-experimental design.
A total of 33 different outcome measures were used across the eighteen studies (Table 5) with inconsistent approaches to discounting outcomes (five studies applied a discount rate of between 3-5% and 13 studies applied no discounting). Over half of the papers employed a cost-utility analysis (n=10). Other approaches used were cost-consequences analyses (n=5) and cost-effectiveness analyses (n=5). Outcomes and costs reported by all studies are reported in Table 4.
DBT
Each of the three studies of DBT used a different method of economic evaluation (Table 2). Murphy et al conducted a quasi-experimental study using a CUA, Pasieczny & Connor adopted a CCA for their wait-list controlled trial, and Priebe et al performed a randomised control trial based CEA(18,26,29). The mean (SD) follow-up period for these studies was 12 (6) months. Murphy et al and Pasieczny & Connor adopted a narrow healthcare perspective, while Priebe et al took a broader societal view including lost employment costs.
Two of the studies (Pasieczny & Connor and Priebe et al) reported significant improvements in clinical outcomes (Table 4). Pasieczny & Connor reported fewer suicide attempts, Emergency Department (ED) visits, admissions and inpatient days. Priebe et al reported a reduction in self-harm events (rate ratio 0.78 (95% confidence interval (CI) 0.76-0.80). Pasieczny & Connor and Murphy et al, both reported DBT to be less costly than the control, though neither reported statistical significance. Murphy et al reported an ICER for DBT of €1965 per QALY, with the sensitivity analysis suggesting a 62% probability that DBT was cost-effective at a threshold (i.e. how much society is considered to value a QALY) of €45,000 EUR per QALY. Priebe et al reported the ICER of DBT was £36 per percentage point decrease in the incidence of self-harm.
Quality appraisal found the mean score of studies evaluating DBT to meet 67% (SD = 13%) of the applicable criteria (Table 5). Priebe et al met the most criteria, scoring 15.5 out of 19 applicable statements. Both Priebe et al and Murphy et al had good scores for the reporting of their methods (7.5 out of 9 and 7 out of 10, respectively) and results (1.5 out of 3 and 1.5 out of 2, respectively). Pasieczny & Connor had the lowest quality scores overall (9.5 of 19).
Cognitive therapies
Details of the three studies evaluating cognitive therapies are given in Table 3. All three studies used data from randomised controlled trials. Palmer et al carried out a CUA, whereas Davidson et al and Tyrer et al (2004) employed a CCA (24,25,30). Tyrer et al (2004) evaluated manual-assisted cognitive therapy (MACT), with the other two evaluating cognitive behavioural therapy (CBT). The follow-up periods for these studies were the most varied for a specific intervention category (mean 36 months SD 31.75). Tyrer et al (2004) and Palmer et al took a societal perspective and Davidson et al employed a Health, social care and criminal justice perspective. All three used a version of the Client Service Receipt Inventory to record resource use, though Davidson et al and Palmer et al also used hospital records.
Only Tyrer et al (2004) reported a change in outcome that favoured cognitive therapies, the incidence rate of a first episode of self-harm per person year was 0.584 in the MACT group and 0.71 in the TAU group. The frequency of self-harm events remained relatively unchanged (MACT = 2.84 per year vs TAU = 2.54, after outliers had been excluded). Davidson et al and Palmer et al reported the difference in costs between CBT and TAU, though they were not found to be significant. Palmer et al found CBT was less costly but also less effective and reported an ICER of £6376 per QALY. When WTP is below the ICER, CBT has a higher probability of being cost-effective, but when WTP is above the ICER, TAU has a higher probability of being cost-effective. At the UK threshold of £30,000 per QALY, the probability of CBT being cost effective was only ∼25%, whereas the probability of TAU being cost-effective was ∼75%, making TAU the more cost-effective choice. Neither Davidson et al nor Tyrer et al (2004) calculated an ICER.
Quality appraisal found the papers had a mean score of 72% (SD 13%). Palmer et al had the second highest quality appraisal score, scoring 18 points across 19 applicable criteria (95%). Tyrer et al (2004) scored 13 out of 19, failing to meet the title and abstract, and methodological criteria well (0.5 of 2, and 6.5 of 9, respectively). Davidson et al had the lowest QA score of the three, again, scoring poorly on criteria for the title and abstract, and the methods, but also the disclosure criteria (1 of 2, 4 of 10, and 1 of 2, respectively).
Stepped care
Details of methods used in the three studies of stepped-care interventions are presented in Table 3. All three studies used data from randomised controlled trials. Sinnaeve et al carried out a CUA, Kvarstein et al employed a CEA and Grenyer et al conducted a CCA(15,17,27). The mean (SD) follow up period was 32 (18.33) months. The perspective of Kvarstein et al and Sinnaeve et al included healthcare and one other sector’s costs (social care and employment respectively), while Grenyer et al took the narrowest perspective of all the studies included in the review, only including in-patient stays.
The model of stepped care varied between studies. Grenyer et al considered a brief intervention clinic delivered within 36 hours of a crisis presentation, followed by longer term community based psychological therapy. A significant reduction in inpatient days was reported. Sinnaeve et al evaluated three months of DBT delivered as a residential intervention, followed by six months of outpatient DBT, and compared this to 12 months of standard outpatient DBT. Sinnaeve et al reported QALYs with mean (SD) utility scores; these were higher for the step down DBT care group than the comparator however this difference was not statistically significant (stepped care DBT = 0.65 (0.33) vs outpatient DBT = 0.62 (0.28)). Kvarstein et al evaluated intensive psychotherapy in day hospitals followed by outpatient individual and group therapy. Lower improvements in functioning were reported (measured on the Global Assessment of Functioning scale) compared to standard out-patient care (Table 4).
None of the studies reported significant differences in costs. Sinnaeve et al reported an ICER of €278,067 per QALY;at a threshold of €80,000 per QALY there was a 21% likelihood that stepped care was the most cost-effective option. Kvarstein et al reported an ICER for outpatient care compared to stepped care for the ‘avoidant personality disorder’ subgroup (but not the borderline subgroup) which was €1,092 per additional point gained in the Global Assessment of Functioning scale (GAF) (favouring outpatient care). No uncertainty analysis was reported.
Mean (SD) scores of applicable criteria met during quality appraisal were 70% (3%). Stepped care was the only intervention for which none of the papers scored above 80% on the quality appraisal. All three papers scored well in the criteria relating to the title and abstract, introduction, and discussion. However, all three papers failed to meet multiple criteria for the methods, resulting in a mean score of 5.3 out of 9.3 applicable criteria.
Nidotherapy
Two studies reported economic evaluations of Nidotherapy using data from a single pilot randomised controlled trial in a population with difficult to manage needs, ‘personality disorder’ diagnoses and other comorbid severe mental illness. Ranger et al evaluated the pilot using a CEA (31). Tyrer et al (2011) later evaluated Nidotherapy in a sub-group of patients with substance misuse issues, employing a CCA using data from the same trial (32). The studies used a broad perspective and a follow-up period of 12 months.
Ranger et al did not observe any significant effects on outcomes, with a trend towards reduced symptoms in the intervention group. Tyrer et al (2011) reported a significant reduction in bed days in secondary sub-group analyses and also found Nidotherapy was less costly, although the difference was not statistically significant at the p<0.05 confidence level. Ranger et al still found Nidotherapy to be dominant (i.e. it resulted in lower costs and better secondary outcomes than the comparator, although these results were not significant) and even at a threshold of £0 per point of improvement in the brief psychiatric rating scale (BPRS) it had a 60% likelihood of being cost-effective.
Quality appraisal scores were 64.9% for Ranger et al and 82% for Tyrer et al (2011). Both studies scored similarly on the reporting of their methods and results, however Ranger et al did not fully meet the quality criteria relating to the introduction, discussion, and disclosure.
Schema therapy
Two studies evaluated schema therapy using randomised controlled trials, and both conducted a CUA and CEA(14,28). Bamelis et al (2015a) compared schema therapy to TAU and van Asselt et al compared schema therapy to transference-focused psychotherapy. Both studies used the same measure of effectiveness, the proportion of recovered patients, and report this to be higher in the schema therapy group than the comparator group. For Bamelis et al (2015a) this was 81.4% vs 51.2%, respectively (‘recovered’ defined as a SCID-II score ≤15) and van Asselt et alreported this to be 52% vs 29%, respectively (‘recovered’ defined as a BPDSI score ≤15). Bamelis et al (2015a) found schema therapy produced a greater median gain in number of QALYs than TAU, though this was not significant (2.34 vs 2.23; p = 0.51). van Asselt et al report total QALYs for the schema therapy group was lower than transference-focused psychotherapy, this difference was also not significant (2.15 vs 2.27, respectively: 95% confidence interval −0.51 to 0.28). van Asselt et al reported an incremental difference in mean costs: schema therapy was €8,969 less costly than transference-focused psychotherapy, though this difference was not significant (95% CI = −21,775 to 3,546). Bamelis et al (2015a) also reported a lower mean (95% CI) cost for schema therapy compared to TAU: €23,805.00 (21,014 to 26,791) vs €26,333.00 (22,384 to 30,605).
Both studies found that schema therapy was dominant, with regard to cost per recovered patient and cost per QALY. Bamelis et al (2015a) showed schema therapy to have an 80% probability of being cost-effective at a WTP threshold of 0, for both cost per recovered patient and cost per QALY. Van Asselt et al reported the likelihood of schema therapy being cost effective, in terms of recovered patients and QALYs, was over 90% when WTP = 0. Both Bamelis et al (2015a) and van Asselt et al observed that as the threshold for recovered patients increased, the probability of schema therapy being cost-effective also increased, however, as the threshold for QALYs increased, the likelihood of schema therapy being cost-effective decreased.
In the quality appraisal, schema therapy studies scored the highest out of all the interventions, with a mean (SD) of 95% (0.0%) of the applicable criteria met. This was the highest mean quality appraisal score for an intervention (5.4% higher than ‘Setting’ at 89.6%). van Asselt et al fully met all applicable criteria for all sections except the methods, for which the score was 9.5 out of 10. Bamelis et al scored similarly well (19 out of 20) but lost half a point in both the methods and results sections.
Interventions defined by setting
Two evaluations conducted by Soeteman et al employed Markov models to evaluate the effect of care being provided in different settings for samples with either cluster B or cluster C ‘personality disorder’ diagnoses(20,21), using data from the SCEPTRE trial (33,34). Markov models were used in these cost-utility analyses comparing out-patient, day-hospital and inpatient care (one of the studies varied the duration of day-hospital and inpatient care between long and short term, Soeteman et al 2011). Both studies adopted a broad societal perspective, as well as repeating the models taking a narrower, health service provider perspective.
Soeteman et al reported that for a cluster B ‘personality disorder’ group care at a day-hospital was associated with the greatest QALY gains, and for cluster C ‘personality disorder’ group, short-term inpatient care produced the greatest estimated number of QALYs. For cluster B patients, out-patient care was the least costly and also dominated other settings (being less costly and more effective), and for cluster C patients, short-term day hospital treatment was the least costly, and all long-term options were dominated. Estimated cost effectiveness acceptability curves (CEAC) for cluster B showed out-patient care was 84% likely to be cost-effective when the threshold was €0 per QALY. For cluster C patients the estimated CEAC showed short-term inpatient care to be 60% likely to be cost-effective at a high WTP threshold of €80,000 per QALY compared to short-term day-hospital care.
The mean (SD) quality appraisal score for these studies was 9.6% (2.1%). Both papers scored similarly, fully meeting the title and abstract, introduction, and discussion criteria. As the papers were both models, they had the most applicable methods and results criteria, and scored reasonably well, Soetman et al (2010) scored 13.5 out of 14, and Soetman et al (2011) scored 12.5 out of 14. Both papers lost half a point in the results section for only partially meeting one of the criteria (2.5 out of 3). Soetman et al (2010) did not provide information on funding or conflict of interests, and so scored 0 out of 2 on the disclosure statements.
Other interventions
Four studies were identified which looked at other interventions: joint crisis plans (JCPs) (23), psycho-education with problem solving (PEPS) (16), clarification orientated psychotherapy (COP) (14) and mentalization-based treatment delivered in a day hospital (MBT) (22). All of them used randomised controlled trial data and employed a CUA, and two of these studies, Blankers et al and Bamelis et al (2015b), also conducted a CEA (Table 2). All of these studies took a societal perspective.
Borschmann et al did not report a significant difference in costs or outcomes however their economic analysis found JCPs to be dominant over usual care, with over 80% probability of being cost-effective when the threshold was £0 per QALY. This is not unusual as even in the absence of a significant clinical effect, there can still be a finding of cost-effectiveness. The latter combines point estimates of cost and outcome differences and in some cases, as here, can show high probabilities of interventions being cost-effective. McMurran et al found PEPs was also dominant, having a 64% chance of being more cost-effective compared to usual care with a threshold of £20,000 to £30,000 per QALY.
Bamelis et al (2015b) found COP was inferior to TAU and SFT. Blankers et al found MBT was dominated by TAU in the CUA but also found that in the CEA the ICER per patient in remission was €29,314. At a threshold of €45,000 the chance of MBT being cost effective (when considering ‘remission’ as an outcome) was only slight (55%).
Quality appraisal of Bamelis et al has already been discussed under the schema therapy results. Mean (SD) quality appraisal scores were the highest for the remaining three studies, being 89% (7.3%). Blankers et al met 83% of applicable criteria in the quality appraisal, meeting fully the criteria for title and abstract, introduction, discussion, and disclosure. In the results they met all but one criterion, which they partially fulfilled (results = 1.5 out 2), and in the methods they failed to meet three of 11 applicable criteria (methods = 8 out of 11). Quality appraisal of Borschmann et al found they met 83% of applicable criteria (15 out of 18). They fully met the criteria set out for the introduction, results and discussion, but were short one point in the title and abstract, methods and disclosure sections. McMurran et al met 16.5 of a possible 19 (87%) during quality appraisal. They fully met the criteria for the title and abstract, introduction, results, and disclosure. However, they failed to meet two criteria in the methods (score = 8 out of 10) and only partially met the criteria for the discussion (0.5 out of 1).
Discussion
Summary of main findings
This paper has reviewed economic evaluations of community-based interventions for people with complex emotional needs meeting criteria for ‘personality disorder’ diagnoses. A diverse range of interventions were identified, with no strong evidence found for the cost-effectiveness of any single intervention or model of care.
The strongest evidence was for DBT delivered in community settings: all three identified studies indicated the intervention is likely to be cost-effective compared to treatment as usual. However, consideration should be given to the limited pool of economic evidence when interpreting this finding. The review also identified evidence to support the use of schema focused therapy, joint crisis plans, stepped care (as described by Grenyer et al), Nidotherapy, psychoeducation with problem-solving, and MACT. The authors with lived experience highlighted that the review did not identify any economic evaluations which considered service user led interventions or workforce development interventions, such as those which focus on therapeutic alliance.
Across all 18 studies the evidence was weakened by small sample sizes or quality of evidence. Of the 12 studies which found evidence to favour the intervention evaluated, only five reported a statistically significant effect (however, even non-significant effects when combined with cost differences can indicate cost-effectiveness). Of the five studies with significant effects, there were also limitations around reliability of evidence in at least three of the studies: Grenyer et al took a narrow health provider perspective to evaluate their stepped care intervention, limiting relevance for health policy makers (15). The drivers for observed differences between groups were unclear and although significant differences in bed days were observed, there was no difference in admission rates overall (however, reduced bed days alone may be an important effect). The authors also acknowledge that as there were staff transfers between sites in the cluster RCT design, the reliability of evidence may be limited. Pasieczny & Connor used a non-randomised trial design which can lead to biased estimates of effect (26). Tyrer et al (2011) relied on a subsample of trial participants; unplanned subgroup analysis of trial data can lead to unreliable results by increasing the risk of chance findings (32).
Quality of evidence
While DBT, CBT and stepped care have been the most extensively researched, the numbers of economic evaluations for each of these interventions are relatively few and provide insufficient evidence upon which decision makers can confidently base guidelines or allocate resources. This contrasts with other areas such as depression or schizophrenia where reasonable agreement about interventions exist. The review found interventions were sometimes poorly described, limiting reproducibility and usefulness for implementation decisions. Several studies used data from the same trials to report subsequent sub-group analysis or longer term follow up (BOSCOT/POMCAT/SCEPTRE). This approach weakens evidence as risk of chance findings are increased and bias may be repeated across more than one study.
Limitations
Our search strategy included ‘community/outpatient’ terms which may have excluded some relevant studies which did not mention the setting in the title or abstract or where care was delivered in a day service. The potential impact of this is likely to be reduced by our supplementary strategies of citation tracking and reference list screening. As has been found in two related reviews(3,4), there was significant heterogeneity in the included studies which prevented meta-analysis of findings. The results are therefore reported in narrative form making interpretation of findings more complex. In addition, the comparator was most often ‘usual care’ which is context specific and rarely described in detail limiting generalisability. The review scope did not intend to include interventions for people with ‘antisocial personality disorder’ diagnoses. However, some of the identified studies considered samples with ‘any personality disorder diagnosis’, another required that patients had other comorbid severe mental illness, and another sample included individuals with complex emotional needs meeting ‘personality disorder’ diagnostic criteria as well as those not meeting this criteria. Given the general nature of the samples, and because these studies met all other inclusion criteria, they were included in the review. However, this presents a key limitation for identifying and interpreting findings specific to the population on interest and highlights that diagnostic categorisation, whilst essential for literature searching, is a limitation of the review scope. Finally, to simplify reporting, we assigned a quality assessment score to each paper which may risk over simplifying quality assessment where it is important to understand the areas of strength and weakness. To address this limitation, we have provided detail on areas of quality assessed in Table 6.
Recommendations for future research
The majority of included studies relied on data from small randomised controlled trials (RCTs). Whilst RCTs are the gold standard of evidence generation on the effectiveness of treatment, efficacy evidence can also be generated through evaluations in ‘real world’ settings. Such evidence can be more informative for decision making as results are often more generalisable. Economic modelling can also support decision making whilst presenting uncertainty in choices. Given the scarcity of economic evidence in this area, observational studies alongside the delivery of community services would be valuable.
The review found that while a range of economic approaches were used, studies using CUAs applied the most rigorous and transferable methods. The follow-up periods were longer, the perspectives were broader, and resource use was more often collected through a combination of patient report and patient records improving the accuracy of healthcare utilisation estimates. The main benefit of CUAs is that they enable comparison across disease areas by measuring effect in QALYs. All CUAs in this review used the EQ-5D to derive QALYs (in line with NICE guidelines). The EQ-5D has been criticised for not picking up all important aspects of health, and so for all health conditions consideration must be given to its reliability (does it produce consistent results), validity (does it measure what it intends to measure), and sensitivity (does the instrument identify genuine changes in health states). The EQ-5D has been shown to be moderately responsive to change in symptoms in individuals with ‘personality disorder’ diagnoses (35). There is a lack of evidence on its validity in capturing all important aspects of health in this population (36). Nonetheless, due to the measure’s simplicity and because it can be used to generate QALYs, it has been recommended as appropriate for use in this population (35).
A limitation of CUAs is that QALYs singularly focus on health benefits which prevents comparisons across sectors (e.g. comparing whether an education intervention may be more cost-effective than a health intervention) and may underestimate benefits where interventions have wider outcomes such as employment. A majority of studies included in this review took a societal perspective in measuring costs and effects. Five studies also used CCAs, presenting costs alongside a number of outcomes. Whilst this can make interpretation challenging, CCAs may be justified given the multifaceted nature of the conditions being studied and the potential breadth of effects. A broad perspective and presenting multiple outcomes alongside QALYs may therefore be appropriate where interventions aim to improve outcomes beyond health gain, and where cost implications may fall outside of the health and care budget.
Future research should aim to co-produce studies with people with lived experience of diagnoses of ‘personality disorder’ or complex emotional needs to help ensure important outcomes, as well as costs, are captured from a broad perspective. Choice of outcome measures should also be informed by previous studies and local guidelines. Consistency in measurement and reporting will support the development of a stronger evidence base for community-based interventions as information can be pooled across multiple studies. Researchers may wish to consult the ICHOM Standard Set for Personality Disorders when selecting measures (37).
Studies must only evaluate therapies which are well developed, hold face validity with people who are using and delivering services, and must be based on sound theoretical foundations. High quality research is also needed on models of care, including the intensity and duration of interventions, with clear description of how services are configured to enable reproducibility. Researchers may wish to refer to the forthcoming Finamore et al. “Systematic scoping of community-based service models for people with personality disorder” to support a standardised description and understanding of models of care. The logic models articulated in this paper may also support the identification of relevant outcome measures. Finally, the review identified a gap in the type of interventions evaluated with no service user led interventions or workforce interventions identified. These two areas should be considered key areas for future economic research.
Lived Experience Commentary by Eva Broeckelmann
Having spent years struggling to access suitable treatment for CEN, I am pleased to see the lack of robust economic evidence to support a single intervention or model of community-based care.
Especially considering the inherently heterogeneous nature of any given sample with a “PD” diagnosis - where e.g. two people with “BPD” may have no more than one highly subjective trait in common -, there simply is no ‘one-size-fits-all’ approach. Therefore, any study results must be treated with utmost caution before making generic policy decisions that indiscriminately apply to everyone with this label.
Despite being recommended by NICE, I do not consider the EQ-5D to be an appropriate outcome measure for this population. CEN symptoms can fluctuate so frequently that any snapshot in time on such a rudimentary measure is virtually meaningless for assessing long-term recovery, and it will be crucial to develop more nuanced alternatives for future studies.
Ultimately, the policy aim to prioritise specific interventions based on cost-effectiveness neglects the fact that strong therapeutic relationships with trusted clinicians are considerably more important for treatment success than the particular modality used. Thus, such comparisons are of limited value, whereas it could eventually be much more cost-effective to focus research and resources on improved training for clinicians instead.
Lived Experience Commentary by Tamar Jeynes
It is interesting that in life much interacts with and mirrors itself. This systematic review endeavoured to be robust: 18 economic evaluations in 19.5 years fit the inclusion criteria. Nine different interventions, each with their own access criteria for service users.
Collecting robust economic data involves resource. This luxury is afforded to better funded interventions, which then make a case for future funding. Many service users do not meet criteria for these interventions.
It is interesting to identify what is missing.
There are no economic evaluations of survivor led or co-produced interventions, such as co-facilitation of therapies, survivor-led networks or crisis houses. These are more likely to not have inclusion criteria, reaching people that other interventions cannot. They often do not have the resource to conduct economic evaluations. Many have ceased to exist. Excluded service users only have access to costly emergency services during crisis. Being excluded depletes the internal resources needed to value their worth. Many cease to exist.
It is interesting that in life much interacts with and mirrors itself. When an intervention cannot demonstrate its worth, it cannot access funding which includes resource to measure its worth. When the only interventions that can demonstrate their worth are ones with inclusion criteria, excluded service users remain without a service. The cycle continues.
Conclusions
There is no robust economic evidence to support a single intervention or model of community-based care for people with complex emotional needs that meet criteria for ‘personality disorder’ diagnoses. In line with clinical evidence, the review identified the strongest economic evidence for Dialectical Behavioural Therapy with all three identified studies indicating the intervention is likely to be cost-effective in community settings compared to treatment as usual. Consideration should be given to the limited availability of economic evaluations when interpreting this finding.
The studies included in this review were heterogeneous in terms of methods, outcome measurement (with 33 outcome measures used in the 18 studies), and the interventions studied. Future studies should aim to improve consistency in this field, and, given the paucity of evidence generated from small clinical trials, seek to evaluate existing and new services to provide ‘real world’ evidence on the cost and effects of therapies and services. Finally, whilst empirical evidence remains limited, economic modelling can support decisions on the best course of action.
Data Availability
Data availability is not applicable to this article as no new data were created or analysed in this study.
Contributions and Support
All authors contributed to the development of the methods and protocol. Searches, screening, data extraction, and quality appraisal were conducted by JB, TS, RS, AC and PM. JB, AC, SO, AS and PM drafted the manuscript. BLE, RS, SJ, EB, TJ and MC critically reviewed the manuscript. EB and TJ drafted the lived experience commentary. PM is the guarantor.
Funding
This paper presents independent research commissioned and funded by the National Institute for Health Research (NIHR) Policy Research Programme, conducted by the NIHR Policy Research Unit (PRU) in Mental Health (Grant number: PR-PRU-0916-22003). The views expressed are those of the authors and not necessarily those of the NIHR, the Department of Health and Social Care or its arm’s length bodies, or other government departments.
EB and TJ are also authors of the independently written Lived experience commentary which follows the Discussion.
Data Availability and Declarations
Data availability is not applicable to this article as no new data were created or analysed in this study.
None of the authors have any competing interests to declare.
Appendix Search Terms
Footnotes
Registered to PROSPERO (ID: CRD42020134068).