Quantification of FDG whole body PET studies and need for standardisation of FDG-PET studies

Positron emission tomography (PET) using 2-[18F]fluoro-2-deoxy-d-glucose (FDG) has become an important tool for diagnosis, staging, prognosis and treatment-response monitoring in oncology [1]. Vansteenkiste and Stroobants [2] reviewed the role of FDG-PET in respiratory oncology. In their paper, the authors addresses various aspects of FDG-PET in oncology and discuss amongst others that improved mediastinal lymph node staging of patients is obtained using PET compared with computed tomography(CT) alone. De Geus-Oei et al. [3] performed a systematic review on the predictive and prognostic value of FDG-PET in non-small cell lung cancer and concluded that the degree of FDG uptake is of prognostic value at initial presentation, after induction treatment before resection, and in case of recurrence. Weber et al. [4] studied the use of PET for monitoring cancer therapy and for predicting outcome. In his paper, the author addresses many issues involved in quantitative assessment of therapy-induced changes in FDG tumour uptake. To date, there are numerous papers on the promising role of FDG-PET in oncology [59].

While the important potential of FDG-PET in oncology has been demonstrated in many investigations and papers, many of these studies discuss the need for standardisation, especially for multi-centre trials and for meta-analyses. Standardisation becomes especially important when quantification of FDG-PET studies using standardised uptake values (SUV) is performed [1012]. Although visual inspection of FDG-PET images remains very important for diagnosis and response assessment, it has been shown that semiquantitative analysis (SUV) allows an objective assessment for lesion characterisation [13], prognostic stratification [3] and monitoring treatment response [14]. The latter is generally measured by the relative change of SUV during treatment. At the same time, comparison of SUV results obtained from different centres is hampered by the disperse variability in methodology of acquisition, image reconstruction and data analysis procedures applied. Boellaard et al. [10] showed that many technical factors, such as image reconstruction settings and region of interest strategies, can have a major impact on the outcome of SUV. Jaskowiak et al. [15] showed the impact of OSEM reconstruction settings on SUV. Stahl et al. [16] and Krak et al. [17] evaluated the effects of various clinical and technical factors on test–retest variability and absolute outcome of SUV and the impact of using various SUV measures on observed treatment response. Recently, Westerterp et al. [12] evaluated quantification of FDG-PET studies in multicentre trials with a focus on methodological variability among centres. These and other studies showed the need for harmonisation of FDG-PET across centres.

The need for standardisation was already addressed in 1999 by a position paper of Young et al. [18]. In that paper, various methods for quantification of FDG-PET studies, such as visual inspection, use of SUV or full kinetic analysis, were discussed. In addition, factors affecting FDG uptake are described. Taking into account these factors, a set of recommendations was presented. More recently, in 2006, Shankar et al. [19] published the concensus recommendations for FDG-PET studies as indicator of therapeutic response in patients in National Cancer Institute Trials. The guidelines given in the latter paper focus on patient preparation, image acquisition, image reconstruction, quantitative and semiquantitative image analysis, quality assurance, reproducibility and other factors in FDG-PET studies before and after a therapeutic intervention. Larson and Schwartz [20] stated that these NCI guidelines provide an excellent starting point for standardisation of the use of FDG-PET in clinical trials. Other authors, such as Zijlstra et al. [21] and Juweid et al. [22], stressed on the need for harmonisation of interpretation of FDG-PET studies in lymphoma either using visual inspection and/or using relative changes in SUV. More guidelines are being developed as technology moves forward [23]. Recently, Delbeke et al. [24] therefore provided procedure guidelines for tumour imaging with FDG PET-CT.

Although many issues are addressed in the aforementioned papers, there is still a need for standardisation of FDG-PET studies in case of multi-centre studies focussing on quantification using SUV. Therefore, a PET imaging working group was formed by the Dutch Society of Haemato-Oncology (HOVON, Hemato-Oncologie Vereniging voor Volwassenen in Nederland) and by Nuclear Medicine Physicians, representatives of the Dutch Society of Nuclear Medicine, and Medical Physicists, representatives of the Dutch Society of Nuclear Medicine and Dutch Society of Clinical Physics, with the task of formulating a protocol for standardisation and quantification of FDG Whole Body PET studies in multicentre trials in The Netherlands. The protocol specifically aims at standardisation of FDG-PET studies in order to allow inter-institute interchangeability of SUV. The first version of the protocol was finalized in November 2007 and was approved by the Dutch Society of Nuclear Medicine. The purpose of the present paper is to provide a description of the protocol and the rationale for recommendations given. First, factors affecting SUV outcome will be summarized shortly as these have been addressed extensively elsewhere. Next, guidelines and recommendations will be described in the main part of the paper. Finally, limitations, pending issues and future work will be addressed.

A brief summary of factors affecting SUV

Various factors and sources of error affect SUV outcome. These factors can be assigned to physiological effects, sources of error and physical effects.

Physiological factors affecting SUV are, for example, the plasma glucose level during FDG-PET scanning, FDG plasma clearance during the phase of FDG uptake, the scanning period and the effects of patient motion and breathing. As FDG uptake is time dependent, the time interval applied between FDG administration and execution of the PET study will affect SUV as well. For response monitoring, the time interval between the baseline FDG-PET study and start of treatment as well as the time intervals between subsequent FDG-PET studies and cycles of treatment are of importance. This is especially important when inflammatory processes may occur as may be the case during or after radiation therapy. These factors and their impact on SUV outcome are summarized in Table 1.

Table 1 Biological factor affecting SUV outcome

Table 2 presents some of the most common errors affecting SUV, as also described by Weber et al. [4] and Shankar et al. [19]. Correct calculation of SUV depends on an accurate (cross-)calibration between PET or PET-CT scanner and the dose calibrator used for measuring patient dose. Moreover, the net dose given to the patient needs to be known exactly. Thus, any remaining activity in the syringe after administration should be minimized to within 1% using appropriate procedures. Alternatively, remaining activity in the syringe may be measured after the administration. In the majority (95%) of the cases, remaining activity in the syringe is smaller than 3% of the specified dose, although occasionally (5% of the cases) remaining activity was more than 10% (unpublished data), which mainly occurred when specific activities are still high (i.e., shortly after production). The effects of dead volume in the syringe preparation system and administration system should thus be carefully taken into account. A subtraction method to derive the amount of activity in a syringe may be used. The latter is generally based on subtracting a small amount of FDG from a stock, positioned in a dose calibrator, and measuring the amount of activity of the stock prior and after withdrawal of the patient dose. If the stock contains a large amount of activity relative to that needed for the patient dose, care has to be taken that dead volume or statistical errors do not affect accuracy and reproducibility of net dose assessment. With proper procedures, however, this can be handled easily. A frequently overlooked problem may arise from an incorrect synchronisation of clocks in a PET system with those of the dose calibrator used (or computer attached to it for read-out and/or) or with clocks in various rooms/locations. Moreover, as the patient dose is specified at a certain dose calibration time, which is generally not exactly equal to the actual injection time, correct decay corrections have to be applied. This means that decay between calibration time and PET acquisition time needs to be applied to the dose specified at the dose calibration time. Obviously, paravenous injection may still allow visual interpretation of the PET study, but will make quantification based on SUV incorrect. A reproducible and correct administration of FDG is required.

Table 2 Errors affecting SUV

Physical or technical factors affecting SUV outcome (Table 3) are described in various papers. Data acquisition settings such as time per bed position, the amount of overlap between subsequent bed positions, acquisition mode (2D or 3D) and FDG dose affect scan statistics and/or the noise equivalent count level of PET studies. Poorer scan statistics and lower image signal-to-noise ratio (SNR) will result in an upward bias of SUV, especially when using the maximum SUV as final outcome parameter [10].

Table 3 Technical factors affecting SUV

Nowadays, attenuation corrections will be mostly derived from a dedicated CT scan (CT-AC) acquired on a PET-CT scanner. Patient motion (e.g., breathing) may result in a mismatch between PET and CT and thus result in incorrect attenuation correction. The latter may be minimized by breathing instructions (breath holding at mid-inspiration volume or shallow breathing) during CT scanning. Use of respiratory gating would be beneficial, but is not yet routinely available in most centres. Use of contrast agents may affect accuracy of conversion of CT-AC image Hounsfield units into 511 keV linear attenuation correction factors and thereby accuracy of SUV. Hopefully, sophisticated algorithms to perform this conversion will be introduced allowing for use of contrast agents, as will be discussed in more detail later. Finally, attenuation correction based on CT could be affected by truncation of the CT image. Note that CT truncation does not only affect the quantification of PET at the position of CT image truncation alone, but may affect SUV of tumours located elsewhere in the same axial planes as well.

Image reconstruction methods and settings determine both quantitative accuracy and final image resolution. With iterative reconstruction methods, a sufficient amount of iterations (or the product of the number of iterations and subsets) need to be applied to ensure sufficient convergence of the algorithm. Insufficient convergence will result in an object- (and/or surroundings) dependent SUV outcome [15]. In addition to these settings, generally some filtering is performed during or post-reconstruction. Typically, Gaussian filters of 5 to 10 mm full width at half maximum (FWHM) are applied resulting in a clinical image resolution of 7 to 12 mm FWHM. Image matrix size and zoom factors applied during reconstruction determine the final voxel size and may lead to additional image resolution loss when voxel size is larger than half the intended FWHM (Nyquist criteria). The final image resolution has a large effect on so-called partial volume effects (PVE), which results in an underestimation of SUV for small objects. PVE increases with decreasing image resolution. PVE becomes mainly important for lesions smaller than three times the FWHM. To date, no accurate and precise PVE correction are available for objects smaller than about two times the FWHM (i.e., generally <15 to 20 mm diameter). Consequently, image resolution must be matched as closely as possible among centres in multi-centre trials to avoid differences in SUV due to difference in image resolution. Other data acquisition setting and reconstruction parameters, such as the use of time of flight and relaxation parameters, affect image quality and SUV outcome. When images have been collected, additional variability in SUV results from different data analysis strategies, especially region of interest (ROI) definitions. Fixed size ROI providing ‘SUV peak’, maximum pixel values providing ‘SUVmax’ or the average ROI value within a 2D or 3D ROI providing ‘SUV mean’ are being used. Moreover, 2D and 3D ROI can be generated manually or using various semiautomatic procedures, the latter generally based on some percentage threshold value relative to maximum SUV. Examples of the effects of the number of OSEM iterations and different image resolutions using various ROI on SUV are shown in Fig. 1. Finally, various SUV normalisations are being utilized such as SUV normalised by body weight, body surface area or lean body mass.

Fig. 1
figure 1

Images illustrating effects of the number of OSEM iterations and image smoothing on SUV. Representative image reconstruction settings, as observed in a previous multicentre study, were taken. Images were reconstructed using OSEM with two iterations, eight subsets and smoothed using a 8-mm FWHM Gaussian filter (a, c) and using OSEM with four iterations, 16 subsets and smoothed with a 5-mm FWHM Gaussian filter (b, d)

A standardised protocol for FDG-PET whole body studies

The protocol for standardisation and quantification of FDG-PET whole body studies is based on principles for minimizing intersubject and inter-institute variability of SUV measures:

  • Recommendations for patient preparation are specified in order to minimize patient related or other physiological effects on SUV accuracy and reproducibility. Moreover, guidelines aim at optimizing FDG uptake in the tumour, minimizing uptake in surrounding tissues (muscle, brown fat) and minimizing SUV variability.

  • Recommendations for the administration procedure are given to ensure that the net dose given to the patient is exactly known by avoiding (unknown) remaining activities during preparation and administration.

  • FDG dose is specified as a function of patient weight, scanning mode, percentage bed overlap, and acquisition time per bed position. In this way, it is attempted to obtain a smaller range in scan statistics between subjects, because of a difference in weight, and between different scanners, due to difference in sensitivity and applied overlap of subsequent scan bed positions.

  • Interchangeability of SUV (or it’s absence) is, to a large extent, affected by the overall spatial resolution of the PET images after reconstruction, filtering or smoothing and processing. Resolution matching across scanners or centres is therefore needed to allow for SUV interchangeability.

  • After PET acquisition and image reconstruction, SUV outcome is determined by data analysis procedures. Common region of interest (ROI) strategies are the use of fixed sized 2D or 3D ROI, manually defined ROI in one or more axial slices and 3D ROI based on region growing procedures while applying a user-specified threshold. Preferred and obligatory ROI therefore need to be specified.

  • Various SUV normalisations can be applied such as body weight, body surface area and lean body mass. Moreover, SUV may be corrected for blood glucose level. In order to interpret reported SUV it is mandatory to clearly indicate and specify which SUV measure will be used.

  • Finally, it is essential that (1) the PET scanner is technically operating according to specifications, that (2) the PET scanner and the dose calibrator used to determine patient FDG dose are calibrated against each other correctly, including synchronisation of clocks used within the department, and that (3) performance of PET scanner and the characteristics of reconstructed PET studies, such as SNR, uniformity and spatial resolution (specified in terms of activity recovery coefficients) remain consistent or are known after hardware, firmware and software upgrades. To this end, multicentre QC measures are therefore specified.

Patient preparation

  1. 1.

    Patients should have fasted for at least 4 h and preferably 6 h before administration of FDG.

  2. 2.

    Ample hydration is recommended by intake of 1 l of water during 2 h prior to administration of FDG. In case of venous hydration, no glucose may be present in the infusate.

  3. 3.

    After administration of FDG, patients should drink 0.5 l water before the PET examination. When necessary, 0.5 l water or saline can be given intraveneously.

  4. 4.

    No or minimal exercise 4 h prior to and after FDG administration and during the entire PET examination. However, patients are allowed to visit the rest room. Instruction should be given to the patients to relax, avoid motion and unnecessary talking.

  5. 5.

    Patients are asked to void the bladder shortly (5 to 10 min) before the PET examination.

  6. 6.

    Blood glucose level must be measured before administration of FDG. If blood glucose level is greater than 11 mmol/l, the patient must be rescheduled. In case blood glucose level is being used to correct SUV, it must be measured at the clinical laboratory of the hospital using valid methodology.

  7. 7.

    Waiting conditions, preparation room and room temperature must be comfortable to allow optimal resting conditions during and after FDG administration in order to minimize muscle and brown-fat uptake. Blankets must be provided to the patient when needed/requested.

  8. 8.

    An urinary catheter may be used upon indication, preferably placed before administration of FDG.

  9. 9.

    Administration of a diuretic may be considered in case of expected pathology in the lower abdomen.

  10. 10.

    Routine use of tranquilizers (benzodiazepines) is not recommended, but may be considered in case tumours are expected in the head and neck region to minimize FDG muscle uptake. Benzodiazepines have no further use for suppressing the so-called brown fat phenomenon.

  11. 11.

    Patient weight and height must be measured using calibrated and medically approved devices.

  12. 12.

    Date of last cycle of chemotherapy, administration of growth factors (G-CSF, GM-CSF) or other treatments must be reported. Intervals between interventions and PET should be specified for each (research) protocol. For routine clinical care in solid tumours, an interval between end of last (chemo-)therapy cycle and FDG-PET must be at least 14 days. In case of radiation treatment, an interval up to 4 months may be required occasionally (e.g., larynx carcinoma). For malignant lymphoma therapy evaluation, time intervals have been specified by the Imaging Subcommittee of International Harmonization Project in Lymphoma [22].

In case of type II diabetes mellitus, the following additional guidelines apply:

  1. 1.

    Study will be scheduled preferably at the end of morning

  2. 2.

    Patients will need to be fasted for at least 4 h. Intake of water is recommended as is nondiabetics.

  3. 3.

    Oral antidiabetic drugs should be continued.

In case of type I and insulin-dependent type II diabetes mellitus, the following additional guidelines apply:

  1. 1.

    It is preferred to reach normal blood glucose levels in mutual agreement with patient and referring physician

  2. 2.

    Study will be scheduled preferably at the end of morning

  3. 3.

    Patient will have a normal breakfast at 7:00 a.m. and uses a regular dose of insulin, followed by fasting as described above.

FDG administration procedure

The procedure aims at an accurate and reproducible administration of FDG, minimizing or avoiding remaining activity in the administration system and thus ensuring that the exact net dose administered is known. This is a prerequisite for accurate SUV calculations as discussed previously in the “A brief summary of factors affecting SUV” section because the net dose administered to the patient is used directly in the SUV equation (Eqs. 1 and 2).

It is preferred to administer FDG through a three-way valve system attached to a venous canula. After injection, the entire administration system should be flushed with at least 10 cc saline to avoid remaining activity in the system. One should be aware of any activity remaining in the syringe, needle or administration system. If rest–activity in syringe, needle or administration system cannot be minimized using appropriate procedures (<1%); remaining activity should be measured afterwards in a calibrated dose calibrator.

FDG dosage in relation to scanner type, patient weight and time per bed position

A prescription of FDG dosage as function of patient weight, scanner type, percentage bed overlap, and time per bed position is given to minimize variability in scan statistics across scanners and patients (with different weights). As an upward bias (maximum), SUV was seen with decreasing image SNR minimizing (referentie?), scan statistics variability across centres, and patients will thus improve interchangeability of SUV in multi-centre studies.

In principle, a dosage of 5 MBq/kg for 2D scanners and 2.5 MB/kg for 3D scanners with minimal bed overlap (<25%) assuming a (fixed) 5 min emission scan per bed position is used. In case of 3D scanners with a 50% bed overlap of subsequent bed positions of a whole body PET scan 1.25 MBq/kg is sufficient. For shorter scan times per bed position, the dosage can be adjusted proportionally but with an additional surplus of 10% to (partly) compensate for lower noise equivalent count rates per MBq at higher doses. The following formulas apply to determine the patient specific FDG dosage (MBq):

  1. 1.

    For 2D PET scanners with less than 25% bed overlap the total amount of FDG dose is given by:

    a. Dosage = 27.5 times patient weight (kg) and divided by the scan time per bed position (min per bed)

  2. 2.

    For 3D PET scanners with less than 25% bed overlap the total amount of FDG dose is given by:

    a. Dosage = 13.8 times patient weight (kg) and divided by the scan time per bed position (min per bed)

  3. 3.

    For 3D PET scanners with 50% bed overlap the total amount of FDG dose is given by:

    a. Dosage = 6.9 times patient weight (kg) and divided by the scan time per bed position (min per bed)

The following additional recommendations apply:

  1. 1.

    Net administered dose should be within 10% of the recommended dose but must be exactly known within 3%.

  2. 2.

    Increasing the FDG dosage to reduce overall scanning time should not result in count rates exceeding the maximum count rate capabilities of the PET scanner, i.e., above which corrections for dead time or others can no longer be accurately performed (<5%). When specified, the recommendations of the manufacturer on maximum dosage should be followed.

  3. 3.

    When variable scan times per bed position can be applied, it is allowed to reduce the scan time per bed position with 50% for scan trajectories outside thorax and abdomen (thus for head, neck and legs) to reduce overall scanning time. For the calculation of the FDG dose, it is still required to use the scan time per bed as needed for scans at the thorax and abdomen regions.

  4. 4.

    To facilitate clinical use, a table specifying required FDG dose in 10 kg patient weight steps may be used. An example of such a table is given in Table 4.

Table 4 FDG dose as function of patient weight (10 kg steps)

Exceptions

For new scanners, e.g., scanners with time of flight (ToF) or equipped with new reconstruction methods (e.g., recovery correction), it is allowed to deviate from the above specifications. In the latter case, it is up to the users to demonstrate that use of other than the recommended dose-scan time per bed specifications does not result in a change of observed SUV. To that end, the deviation from the above-given specifications must be justified using phantom experiments, as described later in the QC section of this protocol, as well as by a clinical study relevant to the ongoing multi-centre study (i.e., using the same patient population). In this way, this protocol facilitates a proper validation of new technology for quantitative FDG-PET studies and ensures exchange of SUV in multi-centre studies while allowing benefits of new technology.

PET and PET-CT acquisitions

  1. 1.

    Emission scanning should start at 55 to 65 min after FDG administration

  2. 2.

    For most oncological applications, a ‘whole body’ scan trajectory extends from proximal thigh up to skull-base. Scanning direction should be in the caudocranial direction to have minimal bladder filling at the time of scanning the lower abdomen (‘feet first’). Alternative scan trajectories can be used upon indication.

  3. 3.

    During PET-CT, arms should preferably be elevated over the head to minimize streak artefacts (in case of positioning along the body) due to, e.g., beam hardening of CT-AC.

  4. 4.

    Correct isotope and decay times should be set or entered in the acquisition computer.

  5. 5.

    FDG dosage, dose calibration time, patient weight must be correctly set when required by the acquisition software.

  6. 6.

    Online random correction based on delayed coincidence window technique is recommended.

  7. 7.

    For CT-based attenuation correction generally a low-dose CT is performed (CT-AC). For the CT-AC scan, the recommendations provided by the manufacturer may be followed. Generally, CT-AC scans are operated using 30 mAs or less. During and prior to CT-AC scanning, no intravenous contrast agents may be used (until it has been unequivocally demonstrated that this does not affect SUV).

  8. 8.

    During CT-AC, shallow breathing is recommended for multi-slice CT scanners to minimize effects of patient motion on attenuation correction.

  9. 9.

    General PET-CT procedure consists of a scout scan for positioning the PET and CT scan trajectories. Next, a low-dose CT scan for attenuation correction is performed followed by the emission scan. Upon indication, the procedure may be concluded by a diagnostic CT scan including use of intravenous contrast agents and applying maximal inspiration breath hold. Diagnostic CT settings should be according to specifications provided by the radiologist and/or specialised medical physicist.

Pitfalls and additional recommendations:

  1. 1.

    For certain PET-CT scanners, the CT or CT-AC FOV may be smaller than that of the PET. Therefore, truncation of CT-AC should be avoided to avoid attenuation (and scatter) correction artefacts and quantification errors.

  2. 2.

    In case transmission scans based on radioactive sources (generally Ge-68 line sources) are used for attenuation correction, these sources must have sufficient strength to acquire transmission scans. Low transmission source activity will result in poor SNR in transmission scans affecting image quality and quantification of emission scans. Transmission scan sources should fulfill criteria specified by the manufacturer. Moreover, Ge-68 line sources need to be replaced at least every 1.5 years.

PET image reconstruction

Based on a phantom study (Fig. 2), it was observed that most modern PET-CT scanners provide images with a spatial resolution close to 7 mm FWHM when using the default by the manufacturer-recommended image reconstruction settings. It is therefore attempted that PET studies from various scanners and centres will have a final image resolution as close as possible to 7 mm FWHM. When possible, specific image reconstruction algorithms and settings are defined per scanner. Settings are chosen such that, for iterative reconstructions, sufficient convergence is achieved. Other settings, such as smoothing filters and image matrix size, will be chosen such that final image resolution will be close to 7 mm FWHM. Optimal settings need to be derived so that activity concentration recovery coefficients as function of sphere size will meet QC specifications, as described later (Fig. 2). The latter procedure is followed, as use of point sources for verification of image resolution should not be used in case of iterative reconstruction methods, which are exclusively used nowadays for FDG-PET whole body studies. Moreover, variably sized spheres, simulating tumours, resemble clinical conditions more closely than point sources. The phantom experiments, as part of multicentre QC procedures, will be described later.

Fig. 2
figure 2

Average (±1 SD) activity concentration recovery coefficients as function of sphere size observed with image quality control measurements at eight different scanners using a an adaptive 50% isocountour VOI (A50%) and b maximum pixel value

The following reconstruction parameters are defined.

  1. 1.

    All correction methods needed for quantification need to be applied. These include decay correction, dead time correction, corrections for detector efficiencies (normalisation), random correction, attenuation correction and model-based scatter correction, such as the single scatter simulation method by Watson et al. [25] or similar ones

  2. 2.

    Additional and indicative reconstruction settings are suggested.

For Siemens/CTI scanners:

  • (FORE+) 2D OSEM reconstruction with four iterations and 16 subsets.

  • A post reconstruction smoothing using a 5-mm FWHM Gaussian filter

  • Model-based scatter correction.

  • Matrix size of 128 × 128

  • Zoom = 1.0

For GE scanners:

  • Default reconstruction settings (generally OSEM with about two iterations and 30 subsets for 2D scans and FORE+2D OSEM with five iterations and 21 subsets for 3D scans)

  • A post-reconstruction smoothing using a 5-mm FWHM Gaussian filter

  • Model-based scatter correction

  • Matrix size of 128 × 128

  • Zoom = 1.0

For Philips scanners:

  • For scanners equipped with time of flight (ToF), the default reconstruction methods (LOR based reconstruction including ToF) should be used applying ‘normal’ smoothing setting

  • For scanners without ToF, reconstructions should be performed using LOR-RAMLA or 3D RAMLA applying four iterations and other settings as specified by the manufacturer.

  • Matrix size of 144 × 144 (default)

  • Zoom = 1.0

  1. 3.

    Exceptions and additional remarks

As technology moves forward and new scanners and/or new reconstruction algorithms are being developed, the aforementioned scanner-specific recommendations may become obsolete. Consequently, it is allowed to deviate from the recommended reconstruction parameters provided that reconstructed images still meet the quality control criteria, especially those for activity concentration recovery coefficients measured in the NEMA NU 2 2001 Image Quality phantom, as specified later in the QC section.

Scanners which are not equipped with accurate random correction (based on singles or delayed coincidence time window technique) or with model-based scatter correction cannot be used when absolute SUV quantification is needed. However, these scanners may still be used in studies in which relative SUV changes are used to assess treatment response provided that the patient is scanned in an identical manner on the same scanner and using the same reconstruction settings and analysis methods during all subsequent scans.

Data analysis—volume of interest definitions

Beyond acquisition and image reconstruction, the SUV result is determined by data analysis procedures. Common region or volume of interest (ROI, VOI) strategies are the use of fixed-sized 2D or 3D ROI, manually defined ROI in one or more axial slices and 3D ROI based on region-growing procedures while applying a user-specified threshold. In order to exchange SUV across centres, referred and obligatory ROI need to be specified.

The following regions of interest are suggested to be useful

  • 3D isocount contour at 41% of maximum uptake including local background corrections, i.e., an adaptive 41% VOI (VOIA41) [10, 26].

  • 3D isocount contour at 50% of maximum uptake (VOI50)

  • 3D isocount contour at 50% of maximum uptake including local background corrections (VOIA50)[10, 17].

  • 3D isocount contour at 70% of maximum uptake (VOI70).

  • 3D isocount contour at 70% of maximum uptake including local background corrections (VOIA70) [10, 17].

  • Maximum voxel value (Max)

Use of VOIA41 is based on results of recovery coefficient phantom experiments, such as the one described in the QC section later on, in which it can observed that using an adaptive 41% threshold provides VOI with a volume close to real sphere volumes used. Moreover, this or a similar threshold VOI has been suggested for radiation oncology purposes [2629], although many other VOI strategies have been suggested for the latter purpose as well [30, 31].

Use of VOI50, VOIA50,VOI70, VOIA70 are commonly applied VOI and are included to allow comparison of SUV with those reported in previous studies. The maximum pixel value is nowadays most frequently used, and it is especially attractive as it does not require specialized algorithms to automatically generate VOI, and it does not suffer from observer variability as may be seen when using manually defined VOI.

VOI recommendations:

  1. 1.

    The maximum voxel value or maximum SUV (SUVmax) must always be reported.

  2. 2.

    Use of larger VOI may provide SUV estimates with better precision (smaller uncertainty). Therefore, in a response monitoring settings, use of VOIA41 should be attempted in all sequential scans of a patient. When VOIA41 does not provide meaningful tumour definitions, as needs to be checked visually, smaller VOI obtained with higher percentage threshold values can be applied [10, 12, 17]. Subsequently and in this order, VOI50, VOIA50,VOI70, VOIA70 may then be used consistently in all scans of a subject. However, when it is expected that one or more of these VOI may not be useful for certain studies, it is optional to agree upon the use of VOI70 and the maximum uptake beforehand. In all cases, SUVmax and its change need to be reported in addition to all other VOI strategies agreed upon.

  3. 3.

    Automatic procedures for generating 3D isocount contours may fail in case of low tumour-to-background ratios or in the vicinity of organs showing high uptake (bladder, heart, brain). It is recommended to visually check if the automatically generated volume of interest corresponds with the localisation of the lesion, i.e., without inclusion of non-tumour structures or organs. In case automated VOI strategies cannot be applied reliably, the maximum SUV should be derived from a manually drawn 3D volume of interest which includes the entire tumour and excludes normal tissue.

Data analysis—scan report and SUV calculation

  1. 1.

    PET and PET-CT scans will be evaluated visually directly from a computer screen by a trained Nuclear Medicine Physician. Attention will be paid to the presence or absence of abnormal focal FDG avidities as well as their size and uptake. Both images reconstructed with and without attenuation correction need to be evaluated to rule out artefacts due to patient motion, metal implants, or contrast agents. Findings need to be reported and correlated to those seen with other diagnostic modalities or findings (if any).

  2. 2.

    It is recommended to report SUV normalised by body weight. However, besides weight, also patient height and gender should be reported thereby allowing calculating other SUV measures. Use of blood glucose level correction is not mandatory, but may be performed in addition to non-blood glucose level corrected SUV measures. SUV normalised by body weight without or with blood glucose level correction are calculated as follows:

    $$SUV = \frac{{ACvoi\left( {{{kBq} \mathord{\left/{\vphantom {{kBq} {ml}}} \right.\kern-\nulldelimiterspace} {ml}}} \right)}}{{{\raise0.7ex\hbox{${FDGdose\left( {MBq} \right)}$} \!\mathord{\left/{\vphantom {{FDGdose\left( {MBq} \right)} {BW\left( {kg} \right)}}}\right.\kern-\nulldelimiterspace}\!\lower0.7ex\hbox{${BW\left( {kg} \right)}$}}}}$$
    (1)
    $$SUVglu = \frac{{ACvoi\left( {{{kBq} \mathord{\left/{\vphantom {{kBq} {ml}}} \right.\kern-\nulldelimiterspace} {ml}}} \right)}}{{{\raise0.7ex\hbox{${FDGdose\left( {MBq} \right)}$} \!\mathord{\left/{\vphantom {{FDGdose\left( {MBq} \right)} {BW\left( {kg} \right)}}}\right.\kern-\nulldelimiterspace}\!\lower0.7ex\hbox{${BW\left( {kg} \right)}$}}}} \times \frac{{Pglu\left( {{{mmol} \mathord{\left/{\vphantom {{mmol} l}} \right.\kern-\nulldelimiterspace} l}} \right)}}{{5.0}}$$
    (2)

In Eqs. 1 and 2, ACvoi represent the average activity concentration in kBq/ml in the specified volume of interest (or the maximum value). FDGdose is the administered dose of FDG in MBq and BW is the body weight in kg. Pglu represent blood glucose level in mmol/l. In case SUV is corrected for blood glucose level, i.e., SUVglu, the observed Pglu is normalised by 5.0 representing approximately the population average Pglu value.

Data exchange

When data (FDG-PET images) need to be exchanged within a multi-centre study, it is required to specify image data format and meta-data required for accurate SUV calculations. DICOM images may be stored in various units, such as ‘counts’, ‘SUV’ or “Bq/cc”. Therefore, when using DICOM format, one has to verify that all factors (scaling and calibration factors) are present in the header of the DICOM files in order to allow quantification of the images.

The following (minimal) recommendations are specified:

  • DICOM format should meet the specifications “Digital Imaging and Communications in Medicine (DICOM), Part 10: Media Storage and File Format for Media Interchange “given by National Electrical Manufacturers Association (NEMA; http://dicom.nema.org/)

  • In addition, DICOM format should meet the Conformance Statement written by the manufacturer of the PET or PET-CT scanner.

  • All data, including DICOM images, need to be anonymised. Patient name and ID need to be erased or replace by study ID and randomisation number, respectively, before sending these data to other centres. The birth date needs to be erased as well. This is especially required when data are exchanged over the Internet.

  • DICOM images or headers have to contain all factors (scaling and calibration factors) required for quantification of the images.

  • Transport of data or DICOM images may be performed by storing it on media (CD or DVD) and sending it by regular mail to other centres. However, electronic transport using ftp is allowed provided that ftp is secured and ftp-server password protected. Moreover, all (image) data need to be anonymised.

  • All additional information needed to calculate SUV must be provided. This information, such as study ID, patient randomnisation number, patient weight, height, gender, administered dose, dose calibration time, injection time may be entered during acquisition so that these information will be present in the header of the DICOM image. It is, however, strongly recommended to exchange an additional digital scan report form providing all required data, as frequently essential information is still missing. The scan report form should contain study ID, patient randomnisation number, length, weight, gender, administered dose, dose calibration time, injection time, PET scan start time, scan duration, hospital and department, responsible physician and/or address for correspondence.

Multi-centre QC measures

In order to achieve interchangeability of SUV, a set of PET or PET-CT scanner quality control procedures is required. In case of PET-CT scanners, it is recommended to perform all CT QC measures as recommended and implemented by the manufacturer. Furthermore, other guidelines for execution of diagnostic CT studies and of CT-QC measures on a PET-CT scanner should follow the recommendations provided by the Radiological Society and/or upon indication by a radiologist. In addition, correct alignment between PET and CT data must be checked following manufacturer supplied procedures.

In the remainder of this paper, recommendations focus on PET QC measures specific for accurate SUV measurements. The following PET QC measures are recommended: (1) daily quality control; (2) (cross-)calibration of PET or PET-CT scanner against the (local) dose calibrator used for measurement of patient-specific FDG doses and; (3) an inter-institute cross-calibration and recovery coefficient assessment.

  1. 1.

    Daily quality control (DailyQC)

The purpose of a DailyQC is to assess correct functioning of the PET or PET-CT system, i.e., to detect detector failures and/or (overall) drift. Most commercial PET and PET-CT systems are equipped with a (semi-) automatic procedure for performing a DailyQC. It is recommended to follow the procedure provided by the manufacturer. The user has to check whether the DailyQC has passed the test correctly.

  1. 2.

    Quality control of (Cross-)Calibration of PET or PET-CT system

The purpose of the (cross-)calibration is to assure correct and direct calibration of the PET or PET-CT camera against the (local) dose calibrator used for measurement of the (FDG) dose to be administered to the patient [32]. In case patient-specific doses are directly ordered from or provided by a pharmaceutical company, the cross-calibration of the PET or PET-CT has to be performed based on a calibrated FDG sample provided by that company. Note that the cross-calibration is the direct relative calibration of the PET system against the (locally) used dose calibrator, and it therefore provides information about potential discrepancies in calibration between PET and dose calibrator. The latter is required because separate calibration of dose calibrators, well counters and PET-CT systems may still result in (small) discrepancies under specific clinical conditions. Standard operating procedures (SOP) for performing the PET (cross) calibration QC are made and provided to all Dutch centres. The procedure has been described elsewhere in more detail [32]. A crucial principle of the calibration QC is that, during execution of this measurement, acquisition and reconstructions settings should be identical to those used during patient studies (and as recommended in this paper). Following the SOP, the cross-calibration factor between the PET or PET-CT scanner and dose calibrator should equal one within 5%.

  1. 3.

    Image quality and recovery coefficients

Although the aforementioned cross-calibration QC assures correct calibration for standard conditions, SUV quantification in multi-centre trials and inter-institute interchangeability of SUV is still hampered by differences in applied methodology (e.g., due to use of different scanners, acquisition protocols, reconstruction methods and settings). For example, differences in (reconstructed) image resolution will have a very large effect on SUV quantification in relatively small tumours with volumes up to 10 ml (depending on shape as well). Therefore, it is also required to assess accuracy of SUV using an image quality phantom containing variously sized sphere simulating tumours. Such a phantom will allow assessment of SUV accuracy in ‘tumour’-like conditions. The purposes of the ‘image quality and recovery coefficients’ quality control procedure are: (1) to assess SUV quantification in ‘tumour’ like structures; (2) to assess correct calibration/quantification for a nonstandard phantom and; (3) to determine recovery coefficients as function of sphere (tumour) size. The latter data can be used to compensate for differences in image resolution and convergence properties of the available reconstruction algorithm amongst institutes and, if needed, to adapt local acquisition and reconstruction settings such that recovery coefficients match those specified in Tables 5 and 6.

Table 5 Recovery coefficient (RC) specifications for VOIA50
Table 6 Recovery coefficient (RC) specifications for ROI MAX

The ‘image quality and recovery coefficients’ quality control procedure is performed similarly to the “image quality, accuracy of attenuation and scatter corrections” procedure of the NEMA Standards Publication NU 2-2001 “performance measurements of positron emission tomographs”. The main difference is that, now, all spheres will be filled with a FDG solution equal to about eight times the activity concentration of the background compartment. Furthermore, acquisition, reconstruction and data analysis (ROI) methods to measure and determine sphere uptake are performed as recommended in this paper for patient studies. A standard operating procedure (SOP “Image Quality and recovery coefficients PET”, Dutch version) is available in the Dutch version of the approved recommendations protocol. Following the SOP, absolute activity concentration recovery coefficients as function of sphere size are obtained. These recovery coefficients should be within the specifications given in Tables 5 and 6 for the two given ROI (A50% and MAX).

Limitations, observations and open issues

The recommendations in this paper aim at standardisation of FDG-PET studies in order to achieve interchangeability of SUV in multi-centre trials. Yet, a few issues are still open for debate and need to be addressed in the near future. Also, during multi-centre phantom measurements, some limitations were encountered and will be discussed below.

First of all, the recommended FDG dose takes into account patient weight, time per bed position, applied bed overlap and scanning mode (2D or 3D). Although patient weight is used directly, the variation in image quality does not depend linearly on patient weight, and the relation between weight, length and gender is ignored. An alternative parameter to vary dose could be measures such as lean body mass, body surface area or body mass index, but their relation with image quality is not resolved as well. Secondly, it is assumed that effects of minimal versus 50% bed overlap on image SNR equals a factor 2. Similarly, it is also assumed that 2D and 3D acquisition mode results in a factor 2 difference of SNR. Ideally, it is optimal to assess image SNR for all scanners using various anthropomorphic phantoms (with various sizes and levels of outside field of view activities) to fully determine differences in image quality amongst various scanners and use those data to specify recommended FDG dose. The recommendations given in this paper, however, offer an advantage over the usually (fixed) dosages applied in most centres, while they are still feasible to be used in a clinical setting. The image quality QC experiments are used to assess any remaining impact of image SNR on SUV quantification. So far, it has been observed that differences between scanners and centres are within 10% provided recommendations are followed strictly (Fig. 2).

Occasionally, the possibility and flexibility to set acquisition and reconstruction parameters are limited. In practice, however, it was observed that present default reconstruction settings, as implemented by the manufacturer, closely fulfilled the recommendations given in this paper. The limited possibilities to change or overrule the manufacturer-provided settings is a way of standardisation by itself and may therefore be considered as ‘beneficial’ for the moment. Yet, the impacts of future software and hardware upgrades on SUV quantification need to be carefully monitored. Any change in default acquisition and reconstruction algorithm and their settings will directly have an effect on observed SUV. Any software and hardware upgrade should therefore be clearly specified by the manufacturer, including a description of the effects of the upgrade on image quality (resolution, SNR). The authors therefore hope that PET and PET-CT scanner manufacturers are willing to clearly specify the meaning of acquisition and reconstruction parameters used in their software and graphical user interface. Moreover, sufficient flexibility in changing parameter settings would be beneficial to facilitate matching of image SNR, convergence of iterative reconstruction methods and image resolution in a multicentre studies.

Another observation (data not shown) is that calibration and image quality, especially uniformity of pixel values of reconstructed images, may differ between (same) scanners of the same manufacturer at different sites. The consequence of this observation is that all QC measures have to be performed at each site to verify that the scanner characteristics and specifications fulfill the criteria given in the ‘NEDPAS’ protocol.

Occasionally, we observed that cross-calibration between PET scanner and dose calibrator revealed a (relative) difference of more than the recommended 5%. The cause of this discrepancy is not fully understood, as the phantom used to calibrate the scanner to the dose calibrator is or can be used to verify correct calibration during the cross-calibration QC measurement. One hypothesis may be that acquisition and reconstruction settings and algorithms used during the calibration procedure are not equal to those during the calibration QC (or verification) measurements and thus not equal to that used during patient studies. Discrepancies may also be caused, e.g., by the use of ‘calculated’ attenuation correction rather than using a CT-AC or a transmission scan. Other causes might be the use of (slightly) different reconstruction or correction algorithms or settings. Moreover, the calibration must be performed at (a range of) count rates normally encountered in clinical studies. It is therefore of utmost importance that the scanner is calibrated using acquisition and image reconstruction algorithms and settings identical to those used during patient studies. In this way, it is assured that results obtained during calibration are applicable to those obtained during the calibration QC and the patient studies.

An issue to resolve is the use of contrast agents during PET-CT studies. Several studies have been performed to assess the effects of intravenous contrast agents on the accuracy and image quality of PET-CT scans [3339]. Impact of the use of IV contrast on SUV of lesions (tumours and nodes) ranged from 0% to 25%. Amplitude and frequency of observed increase in SUV depend on contrast agent used and specific imaging protocols [40]. Moreover, effects of using intravenous contrast may differ between baseline and posttreatment scans [37]. A simple strategy to avoid any artefact or quantification error is to acquire a dedicated low-dose CT-AC scan prior to PET scanning. After concluding the entire PET procedure, a diagnostic CT scan including use of intravenous contrasts can then be made. Using the latter strategy, it is also feasible to acquire the diagnostic CT at maximal inspiration to further enhance diagnostic accuracy, while the CT-AC can be used for localisation of FDG avid lesions.

In case of oral contrast agents, this scanning setup cannot be followed to avoid presence of contrast agents in the patient during CT-AC. To date, use of (diluted) oral contrast agents during PET-CT studies is a matter of debate. Various studies have shown that oral contrast agents affect quantification of FDG-PET studies possibly up to 103% in the case of high density barium contrast [41], but effects on SUV seem to be small when using diluted or other types of contrast agents [4245]. The amplitude of this effect thus depends strongly on type of contrast agent and a priori unknown local accumulations. Moreover, increase of SUV seemed to be mostly observed in the ascending colon [46]. An interesting alternative might be the use of negative oral contrast agents, which do not affect the accuracy of CT-based attenuation correction, and is therefore a promising possibility for PET-CT studies [47, 48]. Use of negative oral contrast will hopefully be of sufficient quality for diagnostic use, which needs to be assessed by the radiology community.

The potential effects of contrast agents on SUV may vary between scans and can create (extra) artificial SUV responses in longitudinal studies. Although effects of 0% to 25% seem small, it is just one of the many such (small) factors affecting SUV accuracy and variability. The accumulated resulting inaccuracy led us to argue that all (small) factors contributing to variability in SUV across centres should be controlled as much as possible, and this reasoning is, in fact, the main driver of many of the presented recommendations. Based on phantom experiments (Fig. 2), inter-institute variability due to technical issues could be minimized to within 7% (1 SD) provided upon strict standardisation of PET procedures. The clinical benefit of using contrast agents should therefore be carefully balanced against potential increased SUV variability and/or incorrect assessments of treatment response up to 10–20% in some cases [33, 37].

Conclusions

This paper describes a protocol for standardization of quantitative FDG whole body multicentre PET studies. The protocol was successfully introduced in the Netherlands and has been approved by the Netherlands Society of Nuclear Medicine.