The NICE COVID-19 search strategy for Ovid MEDLINE and Embase: developing and maintaining a strategy to support rapid guidelines

Introduction The United Kingdom's (UK) National Institute for Health and Care Excellence (NICE) needs access to evidence on COVID-19 to develop rapid guidelines for healthcare professionals. This paper reports on how the NICE COVID-19 search strategy for identifying references in Ovid MEDLINE and Embase has been developed and maintained. Methods Each free-text line from the June 2020 version of the NICE COVID-19 search strategy was categorised as Critical, High, Medium, Low or Zero priority, according to the number of results and their relevance to NICE. Five search options were devised and tested by combining them with a search for drug treatments. The two prioritised options were compared to the COVID-19 Limit available in Ovid. New subject headings were tested and added. The selected option was refined to make the strategy simpler to use. Results The updated strategy combines free-text terms, categorised as Critical, High and Medium priority for NICE, with appropriate subject headings. Discussion The paper describes the challenges of maintaining a search strategy during the COVID-19 pandemic, as terminology continues to evolve. Conclusions A search strategy for identifying COVID-19 references, within the remit of NICE, has been developed. The recommended strategy could be considered for validation at an appropriate point in the pandemic. It is hoped that understanding how NICE has maintained its COVID-19 strategy will encourage further discussion on the challenges.


Acknowledgements
The authors would like to thank Caroline De Brún and Nicola Pearce-Smith from Public Health England for the original search terms and for helpful comments on the various versions of the strategy.

Use of NICE COVID-19 content internationally
Our COVID-19 rapid guidelines and evidence summaries are exempt from our overseas reuse application, licence and fee. This means you can: • adopt the guidelines for your own healthcare setting • adapt the guidelines by combining them with your own local content • translate the resultant outputs.
When using content from our COVID-19 rapid guidelines and evidence summaries you must: • make all your outputs reusing NICE content freely available to others • acknowledge the use of NICE content, and link to the source content on our website • only use the NICE logo if the original NICE guidance publication is used in its entirety without including additional content • tell us how our content has been used by emailing reuseofcontent@nice.org.uk, to support the evaluation and development of our guidance.
We cannot accept responsibility or liability for the use of our content in third party outputs.
Further information on reuse of content is available on the NICE website.

NICE COVID-19 rapid guidelines
The United Kingdom's (UK) National Institute for Health and Care Excellence (NICE) uses the best available evidence to develop recommendations on a range of health and social care topics (NICE, 2020c). In March 2020, NICE was given responsibility for developing a series of products on COVID-19, including rapid guidelines. NICE had published 24 rapid guidelines by January 2021 (NICE, 2021). The rapid guidelines focused on managing symptoms and complications, therapeutics, clinically vulnerable conditions and managing health services during the pandemic (Southall, Taske, Power, Desai, & Baillie, 2021).
To develop and maintain these guidelines, NICE needs access to evidence.
One of the ways to identify evidence is to search a database. This paper describes how a search strategy has been updated to ensure it identifies COVID-19 references from MEDLINE and Embase using the Ovid platform.

Purpose of the paper
The purpose of this paper is to provide information specialists and other expert searchers with a detailed description of how the NICE COVID-19 search strategy has been developed and maintained. Search strategies need to be adapted as the COVID-19 pandemic progresses and the information landscape develops. Search strategies developed when the condition and virus did not even have names in early 2020 need to be updated to ensure they are appropriate to later stages of the pandemic. It is unclear, in May 2021, how the pandemic will develop and whether any further adjustments will have to be made to ensure search strategies remain appropriate.
It is important to emphasise that the search strategy has been developed to support the NICE remit of managing symptoms, therapeutics, vulnerable conditions, and managing services. The strategy has not been tested for coverage of aspects of the COVID-19 pandemic that are outside of the NICE remit (e.g. vaccinations or serological testing).
The NICE COVID-19 strategy has not been validated and this is not intended to be a definitive search strategy. This strategy could be tailored by other organisations to suit their information needs. The purpose of this paper is to demonstrate the steps NICE has undertaken to ensure that the strategy is up to date. This paper also describes the technical challenges to searching for . It is unusual to describe the development process in such detail, showing how a strategy has evolved, rather than just presenting the final strategy. The issues discussed in this paper will be faced by all systematic searches on COVID-19. It is hoped that understanding how NICE has approached these issues will encourage further discussion. Once these issues have been resolved, the strategy could be validated and used as a search filter. Suggestions for further iterations of the strategy are encouraged, as are any opportunities to collaborate on an overarching COVID-19 search filter.

Literature searching process for NICE COVID-19 rapid guidelines
Literature searches for the NICE rapid guidelines are conducted according to the methods manuals for guidelines (NICE, 2020c) and health and social care emergencies (NICE, 2020b). The searches are undertaken in a range of sources, such as: MEDLINE and Embase (Ovid), the Cochrane COVID-19 Study Register (https://covid-19.cochrane.org) and NICE Evidence Search (https://www.evidence.nhs.uk). Using a wide range of sources, including some specific to COVID-19, enables NICE to use a more specific search strategy in MEDLINE and Embase.
As well as producing the rapid guidelines, NICE has maintained them by monitoring all new COVID-19 references added to MEDLINE ALL, Embase and a range of other databases and websites since 16 March 2020 (NICE, 2020b, section 17). These searches are run on a weekly basis so that NICE can capture the new references and assess whether any updates to the published rapid guidelines are necessary. By 9 March 2021, NICE had processed nearly 503,000 references from all sources, including approximately 332,000 from MEDLINE and 100,000 from Embase. These weekly searches are referred to as the 'Surveillance process', outlined below.

Developing the baseline NICE COVID-19 strategy
NICE began work on the COVID-19 rapid guidelines on 16 March 2020 and published the first three on 20 March 2020 (NICE, 2021). It was essential to develop specific search strategies that could be used easily under urgent time constraints. Searches for the rapid guidelines would be written, tested, peer reviewed and performed in a single day, compared to the standard NICE guideline process spanning several weeks.
Version 1 of the NICE COVID-19 search strategy was developed on 16 March 2020 from a list of terms (see Appendix A) developed by Public Health England (PHE) Knowledge and Library Services (Public Health England, 2021). This incorporated a variety of terms for the virus and condition that had been used in January 2020, such as novel coronavirus and nCoV . The strategy was continually developed over the subsequent weeks, as the evidence changed, new ideas emerged, and naming conventions were established. Appendix B provides a brief overview of the modifications made with each version. The strategy was peer reviewed by a NICE information specialist at each stage. Appendix C shows the terms that were removed from version 8 as they had not been used in the literature, such as "Ncorona". By June 2020, NICE had developed version 9 (see Strategy A in Appendix D) and this is the baseline strategy for testing in this paper.

Changes to the information landscape
The information landscape on COVID-19 has changed since March 2020. The volume of evidence has increased (Teixeira da Silva, Tsigaris, & Erfanmanesh, 2021). Terminology has been standardised since the World Health Organization (WHO) named the condition COVID-19 and the virus SARS-CoV-2 in February 2020 (World Health Organization, 2020). The National Library for Medicine has expanded Medical Subject Headings (MeSH) to include new specific terms, such as "COVID-19" (National Library of Medicine, 2020). The Emtree subject headings for Embase have also been updated (Elsevier, 2021). In Ovid, updates to MeSH were available from February 2021 and to Emtree from April 2021. The NICE search strategy from June 2020 needed to be updated to account for these changes.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021

Other COVID-19 strategies
There are a number of search strategies available for finding evidence on COVID-19. Lazarus et al. (2020)

Aims and objectives
The purpose of this paper is to show how the NICE COVID-19 search strategy has been developed and maintained and to describe the challenges of searching at this point in the pandemic (May 2021).
The aims for NICE when developing the latest updates to the NICE COVID-19 search strategy for Ovid MEDLINE and Embase were to simplify the strategy and increase its specificity.
The objectives for NICE during the development period from December 2020 to May 2021 were to: • Analyse the value of each free-text line used in version 9 of the NICE search strategy.
• Create various search strategy options, according to the contribution of each free-text line.
• Test the recall of the search strategy options.
• Refine the free-text terms used in the chosen option.
• Select appropriate subject headings for the final strategy.
• Consider the complexity and ease of use of the chosen option.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 A series of iterative steps were undertaken to check if the NICE COVID-19 strategy could be updated. The purpose of each step was to improve the specificity of the strategy, which measures the number of references that are not relevant and are not retrieved as a proportion of the total number of references not relevant (Jenkins, 2004). In other words, any amendments to reduce the number of results retrieved by the strategy should only remove references not relevant to NICE. This was done alongside changes that would make the strategy easier to use.

Developing the baseline strategy
The MEDLINE and Embase strategies were developed separately so that they were optimised for each database.
Appendix B provides a brief overview describing how versions 1-8 were developed from March to June 2020. The full MEDLINE strategies are presented in supplementary File A. Appendix C provides the results of a test that was run in May 2020 on the individual free-text terms contained in version 8. This resulted in several redundant terms being removed and the creation of version 9 in June 2020. The MEDLINE and Embase strategies used the same free-text terms, and the MeSH subject headings were mapped to Emtree.

Search volume from each free-text line
The testing was done in stages from December 2020 to April 2021. MEDLINE ALL and Embase (from 1974) were used throughout testing. The specific dates of the Ovid segment used are recorded in the title of the tables. The full search strategies for both databases are available in the supplementary information (File A for MEDLINE and File B for Embase).
The baseline strategy was run in December 2020 and the number of results from each of the 23 free-text lines was recorded. Individual terms were not tested. Instead, appropriate variations of a term were included, for example "SARSCoV2" and "SARS-CoV-2". Including free-text variants reflects . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 searching practice, for example in the guide to peer reviewing strategies (McGowan et al., 2016). Individual terms were refined at a later stage.

Relevant references from each free-text line
The next stage recorded the unique contribution from each of the 23 free-text lines. Testing was limited to references published in 2020 and 2021 during the COVID-19 pandemic to avoid retrieving irrelevant references from previous pandemics. For example, no relevant references were published before December 2019 with the free text "severe acute respiratory syndrome".
The results unique to each line were isolated by comparing strategies that did and did not contain the free-text line. For instance, to test Lines 1 and 5: Line 1: (or/1-23) NOT (or/2-23) Line 5: (or/1-23) NOT (or/1-4,6-23) Additional testing was carried out to establish whether the references were uniquely retrieved by that free-text line when subject headings were added to the strategy. For example: Line 1: (or/1-23) NOT (or/2-23) NOT Subject Headings Line 5: (or/1-23) NOT (or/1-4,6-23) NOT Subject Headings The strategies were repeated with limits applied to remove animal studies, non-English language papers and certain publication formats (letters, historical articles, comments, editorials, and news items). The search was also limited to references published in 2020-2021. The applied limits replicated the NICE rapid guideline development process more realistically (NICE, 2020b, section 8). The search was therefore done in the following format: Line 1: (or/1-23 AND Limits) NOT (or/2-23 AND Limits) Line 5: (or/1-23 AND Limits) NOT (or/1-4,6-23 AND Limits) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 A RIS file was downloaded from each free-text line containing unique references. The relevancy of these references was assessed by checking the decisions made in the weekly NICE Surveillance process. This process uses the baseline strategy, so NICE had already assessed each reference for relevance. NICE uses EPPI Reviewer version 5 (EPPI-R5) for reference and review management. The Surveillance results were obtained from the relevant EPPI-R5 file on 8 December 2020 to ensure that all the references for this test had been screened. 'Relevant' in this context means that the references were potentially of interest to NICE at title and abstract screening, it does not necessarily mean they would be included in a rapid guideline.

Categorising the free-text lines into five options
The free-text lines were categorised according to the number of references they retrieved and the number of these relevant to NICE. Each line of free text was categorised as either of 'Critical', 'High', 'Medium', 'Low' or 'Zero' importance to NICE. The categories are explained in Figure 1.
Once the free text had been categorised, five versions of the strategy were developed for further testing ( Figure 2). All five versions contained the same subject headings, as the purpose was to test the value of the free text.
Strategy A was the baseline and represented the current process. Strategies B-E are progressively more specific. The structure of the strategies is described in Figure 2 and the full searches are available in Appendix D.

Priority Definition
Critical Removing this line would mean >10 relevant references would have been missed.

High
Removing this line would mean 1-10 relevant references would have been missed.

Medium
Removing this line would not affect the number of relevant references but it has >1400 results so that suggests the terms are being used in the literature and could be useful in a sensitive search.

Low
Removing this line would not affect the number of relevant references but it has <750 results so that suggests the terms not often used.

Zero
These lines have zero results and so the terms have never been used in the COVID-19 literature.   Figure 3.
Search strategies for the drugs were obtained from previous NICE work. The search strategies for each drug were checked for consistency (e.g. fields used and truncation) and they were made as sensitive as possible (e.g. clinical trial ID numbers were included). The purpose was to test the free text in the five COVID-19 options and so the same drug terms were used in each test.
Test strategies were run in the Ovid segments updated on 15 February 2021, using Strategy A as the baseline. The strategies were in the format: (COVID-19 Strategy A AND Drug Terms) NOT (COVID-19 Strategy B

AND Drug Terms)
The strategies were also run with limits applied for English language, publication date, animal studies and certain publication formats.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10. 1101 To determine whether any of these results were relevant to NICE, the references that would be missed were screened. 'Relevant' meant a reference that could reasonably be expected in the search results, not one that would necessarily be included after full-text appraisal. The screening was done in EPPI-R5 by the two authors. All references were double screened, and any discrepancies were discussed and reconciled. This screening was done according to the criteria in Figure 3, which were derived from the appropriate NICE methods manual (NICE, 2020e).

Comparison with the Ovid COVID-19 Limit
The testing undertaken up to this point eliminated three of the options but it was insufficient to discriminate between all five of them. Further testing was required to analyse the benefits of Strategies C and D. They were compared . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10. 1101 to the COVID-19 Limit that is built into Ovid Embase (Wolters Kluwer, 2021a) and MEDLINE (Wolters Kluwer, 2021b). This comparison was chosen as the search strategies in Lazarus et al. (2020) were designed for PubMed and it was not known how they would perform in Ovid. It was felt to be a realistic comparison, as the Limit was designed using Ovid. The first comparison was made in this format: Ovid COVID-19 Limit NOT Strategy C followed by: The results from each search were downloaded as a RIS file and imported into EPPI-R5 for screening. The authors double screened all the results, before discussing and reconciling any discrepancies.

Refinements to create the recommended strategy
The Ovid COVID-19 Limit comparisons were used to make a final decision on which of the five options to pursue. The chosen strategy was refined further to ensure it was effective. A number of refinements were checked individually, as described in the results section. The refinements had the potential to reduce or increase the overall number of references retrieved. The references that would be added or missed by the refinements were double screened by the authors and any discrepancies were discussed and reconciled. The refinements included checking how best to incorporate subject headings. The final action reduced the number of lines to make it easier to run, without affecting the performance of the strategy.

Volume from each free-text line
The number of references retrieved by each line of free text ranged from 0-70,497 in MEDLINE (Table 1) and 0-69,314 in Embase (Table 2). Once the usual NICE limits were applied, retrieval ranged from 0-52,037 and 0-48,442 in MEDLINE and Embase respectively.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258749 doi: medRxiv preprint Table 1 shows that nine of the 23 lines retrieved no unique references in MEDLINE, while 14 lines had at least one reference that was not found by any other free text. In MEDLINE, these 14 lines made a unique contribution, even when the strategy incorporated MeSH headings (Table 1).
The numbers differed in Embase (Table 2), although the overall performance was similar: 14 lines found unique references and nine lines found zero. In Embase, 13 free-text lines still made a unique contribution after subject headings were added ( Table 2). The Emtree headings now retrieved the reference previously identified by the free text for "HCoV-19".
The highest number of unique references was associated with the line for "COVID-19", which retrieved 12,540 MEDLINE and 1051 Embase references (with limits and subject headings applied) that would have been entirely missed without this free-text line (Tables 1 and 2).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Similar results were obtained from Embase, with four lines having unique references when free text was used and two lines when limits and Emtree were applied (Table 4). In Embase, NICE would lose two unique relevant references without free text for "coronavirus" and 37 without "COVID-19" (with limits and Emtree applied). The Embase lines for "corona adjacent virus" and "SARS-CoV-2" had unique references when using free text but not when Emtree and limits were added.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Categorising the free-text lines into five options
The results in Tables 1-4 were used to categorise the free-text lines according to the criteria in Figure 1 and to create the five options in Figure 2, which are detailed in Appendix D. The same decisions on the free-text lines were made in both databases, although there were some minor discrepancies between the MEDLINE and Embase results (outlined in Tables 1-4). Relevant free-text terms in one database are likely to contribute in the other.
As indicated in Tables 1-2, four lines did not retrieve a single reference and were unlikely to contribute relevant references to NICE. These lines were categorised as 'Zero' and removed to create Strategy B. Several lines were categorised as 'Low' priority, including the line for pneumonia adjacent to Wuhan. This line did retrieve 44 unique references in MEDLINE (Table 1) and 37 in Embase (Table 2), when limits and subject headings were applied, although none were relevant to NICE (Tables 3-4). The terms were occasionally used in the literature but there was a low chance of the results being relevant to NICE.
Four lines, categorised as 'Medium', did not retrieve any unique references relevant to NICE but were retained in Strategy C because the free-text terms were often used in the literature. For example, the term "CoV" retrieved 30,044 references from MEDLINE (Table 1) and 30,759 from Embase (Table   2) without limits or subject headings applied, although none of the references unique to this line were relevant to NICE (Tables 3-4).
Two lines were categorised as 'High' and were included in Strategy D. The line featuring "corona adjacent to virus or viral" had no unique relevant references from MEDLINE (Table 3) and one from Embase (Table 4). The free-text terms for "SARS-CoV-2" had high recall (25,185 MEDLINE results in Table 1 and 24,612 from Embase in Table 2), but only two unique references from MEDLINE (Table 3) and three from Embase (Table 4) were relevant.
Two lines, categorised as 'Critical', were added to Strategy E. The term "coronavirus" retrieved 90 unique relevant references in MEDLINE (Table 3) and 1128 in Embase (Table 4) that would have been missed without this free . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Performance of the five options when searching for drugs
The five options were tested with sensitive searches for the 17 drugs listed in Figure 3 using the Ovid segments dated 15 February 2021. The references that would be missed by the more specific options were downloaded, as RIS files, and imported into EPPI-R5. Table 5 shows there were six MEDLINE results to review from Strategy C, 22 from D and 105 from E; and in Embase there were 15 from Strategy C, 78 from D and 86 from E. The references were screened independently and there was 88% agreement before reconciling the discrepancies.  Table 5 shows that Strategy B consistently retrieved the same number of results as the baseline. Strategy A was excluded from further testing as these additional free-text lines were not required.
Strategy E missed 72 relevant references from MEDLINE, including 33 when limits were applied. Strategy E missed a total of seven relevant references in Embase, of which four would be missed when limits were applied (Table 5).
As there was a high risk of missing relevant references from this increase in specificity, Strategy E was eliminated from further testing.
Strategy C and Strategy D were retained for further testing. Compared to Strategy A, Strategy C was shorter and retrieved six fewer references in MEDLINE and 15 in Embase, none of which were relevant (Table 5). Strategy D missed one relevant reference in both databases with limits applied (Seghatchian, 2021). It did not seem necessary to eliminate Strategy D at this stage, as there was potential to refine it further.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Comparison with the Ovid COVID-19 Limit
The recall of Strategies C and D was tested against the Ovid COVID-19 Limits. Strategy C had 78,422 MEDLINE results with limits applied and missed 308 of the 78,221 references in the Ovid Limit (Table 6). After screening, 11 of the 308 were found to be relevant to NICE. Strategy D had 78,183 MEDLINE results and missed 332 from the Ovid Limit, of which 31 were relevant (Table 6).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258749 doi: medRxiv preprint  Strategy D missed 20 more relevant references than Strategy C in MEDLINE (Table 6) and seven more in Embase (Table 7). Strategy C was therefore prioritised for the next test, as it achieved appropriate specificity for NICE. The difference between Strategy C and D is the inclusion of free text (categorised as 'Medium'), covering "CoV", "2019-nCoV", "2019 novel" and "severe acute respiratory syndrome".
Tests were run to investigate the references retrieved by Strategy C and missed by the Ovid COVID-19 Limit. The Ovid Limit in MEDLINE missed 514 references found by Strategy C (Table 8). Strategy C correctly identified 280 references (for example, they referred to coronaviruses in the abstract) and the rest were correctly excluded by Ovid (for example, "CoV" was an abbreviation for "Coefficients of Variation"). The 280 were screened again and 256 were excluded on relevance (for instance, references to Feline Coronavirus or Coronavirus NL63) and 24 were identified as relevant to COVID-19 for NICE. This process was repeated in Embase, which found that . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258749 doi: medRxiv preprint the Ovid Limit missed 758 references retrieved by Strategy C, 12 of which were relevant to NICE (Table 9).

Refinements to create the recommended strategy
The screening for Tables 6-9 highlighted several potential refinements that could be made to Strategy C. The full test strategies are available in the supplementary information (File A for MEDLINE and File B for Embase).
One of the issues identified in Tables 6-7 was that the Ovid COVID-19 Limit retrieved references that mentioned "pandemic" without specifying COVID-19.
For this reason, adding the free-text term "pandemic" was considered too broad. However, the free-text term "corona pandemic" was tested and rejected (Tables 10-11). Testing also showed that there was little value in using a wildcard to retrieve the misspelling "coronvirus".
The screening for Tables 8-9 highlighted the imprecision of the term "CoV", as it was used to abbreviate phrases such as "cut-off volume" and "central vessel trunk" as well as COVID-19. These terms could be excluded from the strategy using a Boolean NOT operator, as none of the 159 references in MEDLINE (Table 10) nor the 162 references in Embase (Table 11) discussed COVID-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258749 doi: medRxiv preprint 19. The decision to exclude these terms was adopted in the final strategy. The exclusions were only applied to this specific free-text line, rather than the whole strategy, to ensure references discussing COVID-19 were not inadvertently excluded.
In Strategy C, terms such as "nCoV2019*" and "nCoV-2019*" had been included. Testing showed that these terms were captured by "Ncov*"and could be removed, with no negative impact. Additional techniques were also explored but rejected: removing the truncation from "Ncov" would lose one relevant reference from both MEDLINE and Embase, while truncating the phrase "n-cov" added six MEDLINE and eight Embase references of no relevance to NICE (Tables 10-11).
Changes to the free text for "COVID-19" were tested. As Strategy C included "COVID-2019*", the term "COVID2019*" was checked for consistency. This addition was rejected as it retrieved no references in either database.
The term "covid" had been used without truncation in Strategy C. However, this contributed to relevant references being missed that were retrieved by the Ovid Limit (Tables 6-7). Applying unlimited truncation increased the MEDLINE results by 135, of which seven were relevant to NICE (Table 10). In Embase, only two of the 139 additional results were relevant (Table 11). To retrieve the relevant references, while limiting the number of irrelevant records, a more specific version was tested. By using the Ovid command to limit truncation to two characters, the number of references retrieved by Strategy C increased by four in both MEDLINE and Embase, of which two were relevant (Tables 10-11). Therefore, the term "COVID*2" was added to the strategy, which allowed the free-text line to be simplified, as this captured the individual terms: "COVID-19*", "COVID19*" and "COVID-2019*". Testing confirmed that removing these terms did not affect the results in either database.
Up to this point, the exploded MeSH headings "Coronavirus" and "Coronavirus infections" had been used. It was felt that the specificity of the strategy could be improved once Ovid had incorporated MeSH 2021. Several variations were tested. The first version included MeSH headings from higher . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258749 doi: medRxiv preprint in the hierarchy ("Coronavirus", "Betacoronavirus" and "Coronavirus Infections") to exclude irrelevant headings (such as "Alphacoronavirus 1") that were previously being retrieved. This reduced Strategy C by two results (Table 10). Next, using only headings from lower in the MeSH hierarchy ("SARS-CoV-2" and "COVID-19") was tested. This reduced Strategy C by 23 results, although four of these mentioned COVID-19, none were relevant to NICE. Additional MeSH 2021 headings were tested ("COVID-19 Testing" and "COVID-19 Vaccines") but none of the five results were relevant to NICE.
The baseline Embase strategy used exploded headings "Coronavirinae" and "Coronavirus infection". It also included the proposed Emtree headings "Coronavirus disease 2019" and "Severe acute respiratory syndrome coronavirus 2" using the subject heading (.sh) and proposed candidate (.dj) fields. The testing was done in April 2021, when the latest version of Emtree had been loaded into Ovid Embase. The test showed that "Coronavirus disease 2019" and "Severe acute respiratory syndrome coronavirus 2" were now full Emtree headings and could be used in that way.
This change reduced the results from Strategy C by 90. The next test found that using only headings from lower in the hierarchy ("Severe acute respiratory syndrome coronavirus 2" or "Coronavirus disease 2019") reduced results by 204, of which 22 were relevant to COVID-19 and one to NICE. The one relevant reference that NICE would miss (Cournoyer et al., 2021) discussed respiratory viruses in general (e.g. influenza) and had been indexed with the Emtree heading for SARS-CoV-1. The full-text paper was considered to have minimal value to NICE. Therefore, the decision to only use Emtree headings from lower in the hierarchy was retained.
The Emtree heading "Experimental coronavirus disease 2019", had been used to index 11 references at the time of testing. Although none of the results were unique to the heading, it could become important over time.
Therefore, the heading was included in Strategy C.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Simplifying the refined Strategy C
The revisions in Tables 10-11 were tested individually to judge their incremental impact. The refinements to be retained in the recommended version were consolidated into a single strategy. The terms were then condensed into as few lines as possible to make the strategy easier to run.
The final strategy is presented in Figure 4 for MEDLINE and Figure 5 for Embase has also been simplified in Figure 5 (the free text matches MEDLINE). Three Emtree headings (one exploded) have replaced the two exploded headings and two phrases used in the subject heading (.sh) and proposed candidate (.dj) fields.
Compared to the baseline Strategy A, the recommended strategy reduced the MEDLINE results from 94,170 to 93,844 and Embase from 82,957 to 82,513 with limits applied (Table 12). This increase in specificity from Strategy A has been achieved with the loss of only one reference of minimal relevance to NICE that does not directly refer to COVID-19 (Cournoyer et al., 2021). . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion
The recommended NICE COVID-19 search strategy is presented in Figures   4-5. This section discusses the decisions that were made to optimise the specificity of the search strategy, without affecting recall of references relevant to NICE.

Importance of free-text terms
The steps undertaken in this paper have updated the NICE COVID-19 search strategy to increase its specificity and to make it easier to run. Whilst the WHO have helped to stabilise the use of terminology (as reflected in the subject headings included in the strategy), the free-text demonstrates that a variety of terms continue to be used to refer to COVID-19.
The strategy must include sensitive free-text terms to capture relevant references that have not been indexed.  (Table 3) and 48 from Embase (Table 4) in December 2020 without the free-text line for COVID-19.

Evolution of language
The continual development of the COVID-19 strategy demonstrates the evolution of language within the health sector. When developing search strategies, it is important to capture scientific terms as well as broader terms.
However, some potentially relevant free-text terms and subject headings can generate noise and reduce the specificity of the strategy. The term "pandemic", for instance, has been used throughout the past year. COVID-19 is the first pandemic affecting some countries in a century, while other countries have more recent experience, for instance with H1N1 in 2009 (Fineberg, 2014). Pandemic preparedness was also discussed before the COVID-19 outbreak (Moon et al., 2015). Therefore, it is difficult to distinguish references discussing past pandemics (Shearer, Moss, McVernon, Ross, & . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10. 1101 McCaw, 2020) from articles referring to the COVID-19 pandemic (da Silva et al., 2020;Smiianov et al., 2020) when no other identifiers have been included in the title or abstract. On this basis, the term "pandemic" was not included in the strategy as the aim of the update was to increase specificity. The references screened for this paper that referred to "the pandemic" seemed to refer to events that had happened since January 2020, rather than being about the clinical condition COVID-19 or the virus SARS-CoV-2. The issue requires further investigation.
Similarly, some references relied on country-specific phrases. For instance "protect the NHS" (Hunter, 2020) and "test and trace" (Harding-Edgar, McCartney, & Pollock, 2020) are specific UK terms relating to the COVID-19 pandemic. These references were not retrieved by the final strategy as the title and abstracts did not contain terminology unique to COVID-19.
References that use phrases specific to a country, without including COVID-19 terms, rely on people inferring the context of the article. These references were not relevant to the NICE remit, although they do highlight an interesting challenge with terminology. A solution could be found in creating an additional search strategy, using localised terms in conjunction with a geographical search filter to capture relevant additional literature, but this would require further exploration.

Spelling and formatting issues
When screening the results found by the Ovid COVID-19 Limit for Tables 6-7, the authors found potentially relevant references that were missed due to spelling errors, for instance, "coronvirus" (Chen et al., 2020)  Formatting issues were also identified, such as: "COVIDSafe thoracic surgery" (Seco, Wood, & Wilson, 2020) which was not retrieved because "COVID" and "Safe" were not separated by a space. Other papers were missed due to the creation of new (and unique) terminology, for instance "covidology" (Goldfarb, 2021), "covidization" (Pai, 2020) and "#COVIDZero" (Vogel, 2020). Given that the terms were unique to these specific references, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258749 doi: medRxiv preprint the strategy was not amended to retrieve them. Unlimited truncation was not appropriate to maintaining specificity. There would be scope for organisations needing more sensitive searches than NICE to explore these terms further.
Note that three of these references have subsequently been indexed with headings that are retrieved by the NICE strategy (Chen et al., 2020;Seco et al., 2020;Vogel, 2020).

Truncation
The strategy includes the free-text term "COVID*2". The term is truncated to retrieve records that mention either the term COVID or variations of the term with up to two additional characters (e.g. letters, punctuation marks or numbers). Therefore, terms such as "COVID", "COVID19" and "COVID-19" are captured. Terms such as "Covidence" (a systematic-review tool) or "Covidien" (a company) are not retrieved by the strategy to maintain specificity. Testing showed that this approach retrieved the appropriate references, while reducing the number of free-text terms in the strategy.

Related conditions
During development, it was decided that terms for related named conditions (e.g. multisystem inflammatory syndrome, cytokine storm and Kawasaki disease) would not be included in the strategy. This was because the conditions are not triggered, exclusively, by COVID-19. The strategy retrieves references to secondary diseases linked to COVID-19 (Rubens, Akindele, Tschudy, & Sick-Samuels, 2021). It does not retrieve references, outside of the NICE remit, that do not state a link to COVID-19 (Shenker, Trogen, Schroeder, Ratner, & Kahn, 2020).

Other coronaviruses
The purpose of the strategy is to retrieve references on COVID-19 rather than is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
One of the changes between the baseline Strategy A and the recommended strategies in Figures 4-5 is that subject headings from lower in the hierarchy are used, for example "SARS-CoV-2" and "COVID-19" instead of exploding "Coronavirinae" or "Coronavirus infection". This increases specificity by reducing the number of results about other coronaviruses such as MERS, feline coronavirus (Takano, Satoh, & Doki, 2021) or porcine delta coronavirus (Gao et al., 2020). Consequently, the strategy does not retrieve references on MERS, SARS-CoV-1 or animal coronaviruses outside of the NICE remit, but it does allow for references that discuss them with COVID-19 to be retrieved.

Date limit
The NICE COVID-19 search strategy is limited by publication year, to retrieve references from 1 January 2020 to present day. This date range was used as references published prior to this date would not focus on COVID-19. The date limit reduces the number of irrelevant references retrieved on other types of coronavirus.

References reporting the initial outbreak in January 2020
The original search strategy, based on the list in Appendix A, included terms relating to the outbreak of COVID-19 in Wuhan, as the very first papers had broad titles and were difficult to locate with any accuracy, such as "Outbreak of pneumonia of unknown etiology in Wuhan, China" (Lu et al., 2020) or "Mysterious pneumonia in China" (Bagcchi, 2020). The recommended strategy  no longer uses free-text terms relating to respiratory symptoms in Wuhan or seafood markets in China. As Tables 1-4 show, they were not necessary to retrieve references relevant to the NICE remit.
Organisations that need to retrieve references published before the WHO's naming convention in February 2020 (World Health Organization, 2020) may want to test whether any of this free text should be retained. Some of these references may now have been indexed with the current subject headings.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Validation
The strategy in Figures 4-5 has not been validated against an external gold standard, as would be expected of a search filter (Jenkins, 2004). Validation against a gold standard should be undertaken to test how the strategy performs against an independent set of references, once the pandemic has reached a stage where this would be appropriate.

Screening process
The screening that has been undertaken for this update was done by the same information specialists that developed the strategy. For example, when screening the references to create Tables 5-11 the authors were aware whether marking a reference as relevant would favour the Ovid COVID-19 Limit or NICE strategy. To balance this, all references were double screened.
It was not in the authors' interests to create a filter that would either miss relevant references or unnecessarily reduce specificity for NICE.
The data for Tables 1-4 was collected in December 2020 and for Table 12 in April 2021. This means that the annual Ovid reload of data took place during the testing. Some of the testing might have different results if it were repeated after the reload.

NICE remit
The tests were run using the same limits that NICE applies when developing and monitoring its rapid guidelines. Using the language, format and animal study limits may have affected retrieval of references that other organisations would find relevant. The date limit has been used to restrict retrieval of references pre-dating this pandemic that are about other coronavirus infections, rather than COVID-19.
The MeSH 2021 terms for "COVID-19 testing" and "COVID-19 vaccines" were not included in this strategy as neither topic is within the remit of NICE. Terms on testing and vaccines would be considered for specific searches on these topics if NICE were to cover them.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 Long COVID NICE has published a rapid guideline on managing the long-term effects of COVID-19 (often described as "long COVID") and "post-COVID-19 syndrome" (NICE, 2020a). Figures 4-5 are not optimised for long COVID or post-COVID-19 syndrome, which require a multi-stranded approach and additional free text. The searches have been described in detail elsewhere (NICE, 2020d).

Variants of Concern and Variants of Interest
There has been international concern about the risks from variants of the SARS-CoV-2 virus (Mahase, 2021), such as variant VOC202101/02, the N501Y mutation and variant 501Y.V2. This is felt to be a low risk as references would be expected to refer to variations or mutations of SARS- . It is plausible that a reference without an abstract could have a title referring to, say, "Treating VOC202101/02". However, there are no specific MeSH headings for SARS-CoV-2 variants. At present, any MEDLINE references should be indexed with the subject headings used in

Search fields
The search fields for the free-text lines were consistent throughout development of the strategies in Figures 4-5. The fields used were the title (.ti), abstract (.ab), keyword heading (.kw) and, for MEDLINE, keyword heading word (.kf). These fields were chosen in March 2020 when there was . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 14, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 a lack of published information and terminology had not been established. It is acknowledged that additional testing could be done to confirm that the fields selected are the most appropriate. Further exploration would also be required to assess if additional fields would have a positive and meaningful impact on the strategy.

Search sources
The contribution of MEDLINE and Embase to NICE rapid guidelines has not been assessed and it is assumed that they will still be searched. NICE has continued to search individual sources rather than relying on curated collections. The first evaluation of the Cochrane COVID-19 Study Register (Metzendorf & Featherstone, 2021) has demonstrated promising results for curated collections. It would be worth exploring whether the comprehensiveness, accuracy, and currency of these collections meets the needs of NICE. A key issue for NICE is ensuring timely access to pre-prints. It is hoped that the discussion in this paper can inform the search strategies being used to populate curated sources.

Conclusions
The strategy is designed specifically for COVID-19 references relevant to NICE. The strategy could be validated against a gold standard at an appropriate point in the pandemic.
The strategy is not intended to be prescriptive, and its appropriateness would need to be reviewed before being used by other organisations. This detailed description of how the strategy has been developed and maintained is intended to highlight common issues that expert searchers are likely to encounter. Additional strategies could be created to retrieve a broader set of results, for example to capture localised terminology (e.g. "track and trace"), general pandemic preparedness, serological testing or vaccinations.
The free-text terms and subject headings used in the NICE baseline strategy have been updated to make it easier to run and improve specificity. The recommended NICE COVID-19 search strategy to retrieve references relevant to NICE from Ovid MEDLINE and Embase is now available for use.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
• The strategy is designed for the NICE remit on COVID-19.
• This is not configured for Long Covid, which should be searched separately.
• It does not aim to cover comprehensively other coronaviruses (such as MERS or SARS-CoV-1).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
• The strategy is designed for the NICE remit on COVID-19.
• This is not configured for Long Covid, which should be searched separately.
• It does not aim to cover comprehensively other coronaviruses (such as MERS or SARS-CoV-1).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. (outbreak* OR "respiratory illness" OR "respiratory disease" OR respiratory symptom* OR seafood market OR food market OR wildlife) and (Wuhan OR China OR Chinese) SARSCov19 SARS-CoV-2 WN-CoV Wuhan "Wuhan coronavirus" 2019 novel "2019-nCoV" . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. 17 March 2020 Quality Assurance of v1: • Added the free-text term "epidemic".
• Added the free-text term "Huanan" on the lines with the term "Wuhan". • Replaced AND with adj10 to increase specificity on the free-text line for "outbreak". • Added a free-text line for the phrase "severe acute respiratory syndrome*". • Added the kw field as this is a new area where the indexing is not up to date. v3 18 March 2020 Increased the precision as it was retrieving too many papers on other epidemics in China (e.g. malaria): • Split the lines on ("respiratory symptom*" or "seafood market") from ("outbreak* or wildlife* or pandemic") so that the adjacency could be made a lot narrower. v4 19 March 2020 This version was added to Appendix L of the Interim process (NICE, 2020b): • Identified and added the Emtree heading "Coronavirinae".
• Added "Coronavirinae" to the free text.
• Added "respiratory condition" to the free text and remodelled this line. • Made changes to the way that "Wuhan" and "Huanan" were included as free-text terms. v5 21 March 2020 Identified an additional MeSH heading: • Added Coronavirus Infections (and rechecked searches completed 16-20 March). v6 25 March 2020 Made the strategy more specific following feedback it was still over retrieving references about epidemics in China: • Changed the position of "Wuhan" in the strategy but also added "pneumonia" to the line on respiratory conditions. • Added the abbreviation "HCoV" for human coronaviruses to the free text; small impact as the relevant subject headings were already incorporated by exploding the ones higher in the hierarchy. • Identified that the .kf field would be a useful addition following extensive testing in MEDLINE (it is not available in Embase). • Changed all MEDLINE free-text lines to .ti,ab,kw,kf v7 8 April 2020 No substantive changes but some free-text lines were separated to make the strategy easier to read. v8 16 April 2020 Added terms to the free text for consistency: • nCoV19 or "nCoV-19" • "HCoV-19" or HCoV19.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258749 doi: medRxiv preprint v9 3 June 2020 In-depth review of the free-text lines and the structure (see Appendix C): • Reviewed the free text terms used for "COVID-19" and "SARS-CoV-2". • Removed some of the free-text terms that had zero hits.
• Checked all lines for consistency, adding some truncation.
• Tested and decided not to add some additional terms that had been identified, such as "Betacoronavirus". • Changed the structure slightly to remove references on pneumonia in China not relevant to this pandemic. v10 16 April 2021 Extensive testing of each free-text line.
Added new MeSH and Emtree headings now available in Ovid. Detailed comparison to the Ovid COVID-19 Limit. Complete restructuring to update the strategy.
The strategy: • has been developed for the NICE remit on COVID-19.
• does not include the MeSH headings "exp COVID-19 Testing" or "COVID-19 Vaccines". • includes subject headings for "COVID-19" and "SARS-CoV-2" rather than exploding the terms from higher in the hierarchy, as in previous versions. • does not aim to cover comprehensively other coronaviruses (such as MERS or SARS-CoV-1). • includes a publication date limit of 2019-current to restrict the number of references on other coronaviruses. • has not been tested for retrieval of references about variations of SARS-CoV-2 (no missing references identified). • uses COVID*2 to retrieve "COVID-19" and "COVID19" but avoid retrieving terms such as "Covidence"; a small number of references about COVID-19 that use unusual terminology (e.g. "covidology") are not retrieved but none of them are relevant to NICE. • removes some of the free-text lines relating to e.g.
"Wuhan" and "food markets"; this does not affect retrieval of references relevant to NICE but these terms could be useful for references written in January 2020 identifying the pandemic that have not been indexed. • does not include terms for related conditions that are not exclusively caused by COVID-19 e.g. Macrophage Activation Syndrome, Cytokine Release Syndrome and Multisystem inflammatory syndrome. • incorporates a Boolean NOT to exclude some phrases (e.g. coefficients of variation) that are abbreviated to CoV but are not relevant to COVID-19. • does not aim to cover pandemic preparedness, as this would incorporate other conditions such as influenza. • is not optimised for Long COVID (see NICE, 2020a).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258749 doi: medRxiv preprint x 27 or/1-26 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10.1101/2021.06.11.21258749 doi: medRxiv preprint x 22 or/1-21 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 14, 2021. ; https://doi.org/10. 1101