AUC indicates area under the curve; Mayo, model developed by the Mayo Clinic; PanCan 1b, Pan-Canadian Early Detection of Lung Cancer Study (PanCan) model using a parsimonious approach including spiculation; PanCan 2b, PanCan model using a comprehensive approach including spiculation; PanCan MeanDiam, PanCan 2b model replacing nodule diameter with mean diameter (calculated as [largest nodule diameter + perpendicular diameter]/2); PanCan Vol, PanCan-2b model replacing nodule diameter with volume; PKUPH, model developed by the Peking University People’s Hospital; UKLS, UK Lung Cancer Screening trial model; VA, model developed by the US Department of Veterans Affairs.
eTable 1. Low-Dose Computed Tomography Evaluation Algorithm Applied in the German Lung Cancer Screening Intervention Trial
eAppendix. Supplementary Methods
eTable 2. Coefficients of the Selected Lung-Cancer Risk Prediction Models
eFigure 1. Flow Graph Showing Inclusion and Exclusion Criteria for Lung Cancer Screening Intervention Trial Low-Dose Computed Tomography Arm Participants
eTable 3. Nodule Count By Size, Screening Round, Malignancy Status, and Nodule Type
eFigure 2. Receiver Operating Characteristic Curves of Nodules First Seen in the Incidence Screening Rounds
eTable 4. Observed vs Predicted Nodule Malignancy by Nodule Size in the Incidence Screening Rounds
eTable 5. Observed vs Predicted Nodule Malignancy Rates by Deciles of Predicted Risk in the Prevalence Screening Round
eTable 6. Observed vs Predicted Nodule Malignancy Rates by Deciles of Predicted Risk in the Incidence Screening Rounds
eTable 7. Evaluation of Absolute Risk Calibration of the Selected Models by Screening Round
eTable 8. Coefficients of Multivariable Logistic Regression Models Fitted on Data From the Lung Cancer Screening Intervention Trial
Customize your JAMA Network experience by selecting one or more topics from the list below.
Identify all potential conflicts of interest that might be relevant to your comment.
Conflicts of interest comprise financial interests, activities, and relationships within the past 3 years including but not limited to employment, affiliation, grants or funding, consultancies, honoraria or payment, speaker's bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued.
Err on the side of full disclosure.
If you have no conflicts of interest, check "No potential conflicts of interest" in the box below. The information will be posted with your response.
Not all submitted comments are published. Please see our commenting policy for details.
González Maldonado S, Delorme S, Hüsing A, et al. Evaluation of Prediction Models for Identifying Malignancy in Pulmonary Nodules Detected via Low-Dose Computed Tomography. JAMA Netw Open. 2020;3(2):e1921221. doi:10.1001/jamanetworkopen.2019.21221
这项诊断研究涉及 1159 例参加德国肺癌筛查干预试验的参与者。研究表明，在将模型应用于发生率筛查的低剂量计算机断层扫描影像时，最初使用泛加拿大肺癌早期检测研究数据开发的模型相比临床环境下开发的模型，具有出色的区分度和校准度。
Malignancy prediction models based on participant-related characteristics and imaging parameters from low-dose computed tomography (CT) may improve decision-making regarding nodule management and diagnosis in lung cancer screening.
To externally validate 5 malignancy prediction models that were developed in screening settings, compared with 3 models that were developed in clinical settings, in terms of discrimination and absolute risk calibration among participants in the German Lung Cancer Screening Intervention trial.
Design, Setting, and Participants
In this population-based diagnostic study, malignancy probabilities were estimated by applying 8 prediction models to data from 1159 participants in the intervention arm of the Lung Cancer Screening Intervention trial, a randomized clinical trial conducted from October 23, 2007, to April 30, 2016, with ongoing follow-up. This analysis considers end points up to 1 year after individuals’ last screening visit. Inclusion criteria for participants were at least 1 noncalcified pulmonary nodule detected on any of 5 annual screening visits, receiving a lung cancer diagnosis within the active screening phase of the Lung Cancer Screening Intervention trial, and an unequivocal identification of the malignant nodules. Data analysis was performed from February 1, 2019, through December 5, 2019.
Five annual rounds of low-dose multislice CT.
Main Outcomes and Measures
Discrimination ability and calibration of malignancy probabilities estimated by 5 models developed in data from screening studies (4 Pan-Canadian Early Detection of Lung Cancer Study [PanCan] models using a parsimonious approach including nodule spiculation [PanCan-1b] or a comprehensive approach including nodule spiculation [PanCan-2b], and PanCan-2b replacing the nodule diameter variable with mean diameter [PanCan-MD] or volume [PanCan-VOL], as well as a model developed by the UK Lung Cancer Screening trial) and 3 models developed in clinical settings (US Department of Veterans Affairs, Mayo Clinic, and Peking University People’s Hospital).
A total of 1159 participants (median [range] age, 57.63 [50.34-71.89] years; 763 [65.8%] men) with 3903 pulmonary nodules were included in this study. For nodules detected in the prevalence round of CT, the PanCan models showed excellent discrimination (PanCan-1b: area under the curve [AUC], 0.93 [95% CI, 0.87-0.99]; PanCan-2b: AUC, 0.94 [95% CI, 0.89-0.99]; PanCan-MD: AUC, 0.94 [95% CI, 0.91-0.98]; PanCan-VOL: AUC, 0.94 [95% CI, 0.90-0.98]), and all of the screening models except PanCan-MD and PanCan-VOL showed acceptable calibration (PanCan-1b: Spiegelhalter z = −1.081; P = .28; PanCan-2b: Spiegelhalter z = 0.436; P = .67; PanCan-MD: Spiegelhalter z = 3.888; P < .001; PanCan-VOL: Spiegelhalter z = 1.978; P = .05; UK Lung Cancer Screening trial: Spiegelhalter z = −1.076; P = .28), whereas the other models showed worse discrimination and calibration, from an AUC of 0.58 (95% CI, 0.46-0.70) for the UK Lung Cancer Screening trial model to an AUC of 0.89 (95% CI, 0.82-0.97) for the Mayo Clinic model.
Conclusions and Relevance
This diagnostic study found that PanCan models showed excellent discrimination and calibration in prevalence screenings, confirming their ability to improve nodule management in screening settings, although calibration to nodules detected in follow-up scans should be improved. The models developed by the Mayo Clinic, Peking University People’s Hospital, Department of Veterans Affairs, and UK Lung Cancer Screening Trial did not perform as well.
The US National Lung Cancer Screening Trial (NLST)1 and other randomized clinical trials in Europe2-5 have shown that low-dose (LD) computed tomography (CT) is a viable screening tool for reducing lung cancer mortality among long-term smokers. However, screening entails harms in the form of false-positive screening test referrals and follow-up diagnostics. The efficiency of cancer screening using CT depends in part on optimized criteria for reporting and managing lung nodules; therefore, radiologic and oncologic expert societies have developed guidelines6-10 for using nodule characteristics as predictors of malignancy and as decision-making indicators to guide further diagnostic work.
For nodules detected by CT for which there is an absence of information on nodule growth, several research groups have developed statistical models to determine the likelihood of malignancy based on radiologic features and patient-related attributes, such as age, sex, smoking history, presence of emphysema or other respiratory diseases, and personal or family history of cancer. Initial models were based on data from clinical settings, such as at the Mayo Clinic,11 US Department of Veterans Affairs (VA) clinics,12 or Peking University People’s Hospital (PKUPH).13 However, these models were fitted to data from patients with high pretest probability of malignancy, focused on larger, mostly solid and often solitary nodules detected incidentally or in symptomatic individuals, and may overestimate malignancy risk for nodules detected in screening contexts.14
Malignancy prediction models were first developed in a screening setting at Brock University, Toronto, Canada, using data from prevalence screenings in the Pan-Canadian Early Detection of Lung Cancer Study (PanCan) LDCT screening trial.15 Initial PanCan models15 (sometimes also referred to as Brock models) were generated using either a parsimonious (hereafter, PanCan-1) or a comprehensive (hereafter, PanCan-2) approach for variable selection without (PanCan-1a and PanCan-2a) or with (PanCan-1b and PanCan-2b) nodule spiculation among the radiologic variables. The more comprehensive model including spiculation, PanCan-2b, was recently updated by replacing nodule diameter with multidimensional measurements: mean diameter (PanCan-MD) or volume (PanCan-VOL).16 Independently of PanCan, a new model was developed recently in the context of the UK Lung Cancer Screening (UKLS) trial17 using nodule volume.
The PanCan models have been externally validated for discrimination in the NLST16,18 and Danish Lung Cancer Screening Trial,19 but they have not been externally validated for calibration. The recent UKLS model17 has not been externally validated, to our knowledge. Using data from the German Lung Cancer Screening Intervention (LUSI) trial,5,20 we evaluated the earlier and more recent versions of the PanCan models, as well as the UKLS model, in terms of discrimination ability, calibration, and operational performance (eg, sensitivity, specificity, and positive predictive values). For comparison, we also present findings for the Mayo Clinic, VA, and PKUPH models, as well as for pulmonary nodules first observed in follow-up (ie, incidence) screenings.
The LUSI trial5,20 is a screening trial among adults aged 50 to 69 years with a history of heavy smoking (defined as ≥25 years of smoking ≥15 cigarettes per day or ≥30 years smoking ≥10 cigarettes per day and ≤10 years since smoking cessation) randomized into a screening intervention group, which included an LD multislice CT scan at time of randomization and 4 annual follow-up screening examinations, and a control group with no intervention. Participants were recruited as a random sample from population registries in Heidelberg, Germany, and surrounding areas and were assigned to the CT screening group or the control group. All participants provided written informed consent. The LUSI trial was approved by the ethics committee of the University of Heidelberg and the German Federal Office for Radiation Protection. This analysis of LUSI trial data is covered under the original ethical board approval per regulations of the ethics committee of the University of Heidelberg. This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) reporting guideline for diagnostic studies.
Computed tomography scans were obtained using a Toshiba 16-row scanner from October 23, 2007, to December 31, 2009, or a Siemens 128-row scanner from March 18, 2010, until the end of the screening phase of the LUSI trial on May 25, 2016. We used Median computer-aided detection software (Median Technologies) to read CT images, and nodule measures were derived, with largest diameter given in millimeters, perpendicular diameter given in millimeters, and volume given in millimeters-cubed. Perifissural nodules with oval or triangular shape and/or smooth delineations were excluded under the assumption that they were lymphatic nodules. Further data collected for each CT scan included nodule identifier, location, type (ie, solid or subsolid), shape and border characteristics (eg, presence of spiculation, translucence), and calcification. Presence of emphysema was determined on CT images using a densitometry method.21
All CT images were evaluated by 2 trained chest radiologists (including S.D.), and decisions were made based on nodule size and growth (in cases of nodules that had previously been recorded) (eTable 1 in the Supplement). Participants with a lung cancer diagnosis made later than 1 year after the last CT scan or for whom the malignant nodule(s) could not be identified in earlier CT images were excluded.
In all study participants, spirometry was performed using MasterScreen IOS (VIASYS Healthcare) to determine 1-second forced expiratory volume (FEV1) and forced vital capacity (FVC), and FEV1 to FVC ratios were calculated from the largest FEV1 and FVC values recorded in any 1 of 2 repeated assessments. Detailed descriptions of imaging, nodule, and lung function assessments are given in the eAppendix in the Supplement.
All nodules detected during screening were analyzed using data from the LDCT image in which they were first seen. For participant-related characteristics, we used the Mann-Whitney U test for differences in continuous data and the χ2 test or Fisher exact test for categorical data, as appropriate. For differences in nodule-specific characteristics, we used mixed-effects logistic regression with participant as the random effect.22 We applied 8 models from 6 published studies to our data to calculate the probability of nodule malignancy (eTable 2 in the Supplement): 4 PanCan models (PanCan-1b15: parsimonious model including spiculation; PanCan-2b15: full model including spiculation; PanCan-MD16: PanCan-2b with the mean of the largest and perpendicular diameters as nodule size; and PanCan-VOL16: PanCan-2b with nodule volume as nodule size), the recently developed UKLS model17; and the models developed in VA clincs,12 the Mayo Clinic,11 and PKUPH clinics.13
We explored associations of radiologic parameters and participant-related characteristics with risk of nodule malignancy by fitting multivariable logistic regression models via generalized estimating equations using prevalence and incidence screening rounds combined and accounting for the correlation structure of multiple pulmonary nodules per individual on the LUSI data. Model selection was based on the quasi–Akaike Information Criterion23 (eAppendix in the Supplement).
For each selected model, the ability to discriminate nodules by malignancy status was evaluated through cluster-adjusted receiver operating characteristic curves and the corresponding area underneath the curve (AUC).24 Given the correlation structure of nodules within participants, sensitivity, specificity, and positive and negative predictive values were estimated using generalized estimating equations.25-27 For comparison, we also estimated the discrimination capacity of models fitted on LUSI databased on a larger set of variables using cluster bootstrapping (B = 1000).24 Model calibration was assessed by examining observed vs predicted nodule malignancy rates by category of nodule size (<5 mm, 5 to <8 mm, 8 to 10 mm, or >10 mm; as used in the LUSI CT evaluation algorithm) as well as by deciles of predicted risk. The Hosmer-Lemeshow goodness of fit test28 was used to examine the fit between predicted and observed malignancy probabilities across deciles of predicted risk, and Brier scores (BS) and Spiegelhalter z test were used to assess the overall deviance of risk predictions estimated by models vs observed rates.29 Hosmer-Lemeshow goodness of fit and χ2 for independence P values were 1-sided, and all other P values were 2-sided. Statistical significance was set at .05.
All analyses were performed using R statistical software for version 3.5.1 (R Project for Statistical Computing), with packages gee, Hmisc, lme4, MuMIn, rms, pROC, and ROCR, as well as the clusteredROC function. Data analysis was performed from February 1, 2019 to December 5, 2019.
During 5 screening rounds, 1182 participants in the LUSI LDCT arm showed at least 1 noncalcified pulmonary nodule. A total of 62 participants were diagnosed with lung cancer up to 12 months after their last LDCT screening participation: 56 cancers detected via screening and 6 interval cancers.5 For 54 of 56 cancers detected via screening, malignancy was linked to 1 (51 participants) or more (3 participants) nodules. For 2 cancers detected via screening without nodules, the decision of referral to further diagnostic work was based on other pulmonary abnormalities. For 2 of 6 interval cases, no nodules were observed on LDCT screenings; for the remaining 4 cases with nonsuspicious nodules, no information was available to unequivocally link any of these to malignancy. Nodules with benign-appearing calcification and individuals with lung cancer whose lung tumors could not be linked back to a specific nodule were excluded from statistical analyses.
After removing participants with a lung cancer diagnosis made later than 1 year after the last CT scan or for whom the malignant nodule(s) could not be identified in earlier CT images, data was available for 3903 pulmonary nodules from 1159 individuals (median [range] age, 57.63 [50.34-71.89] years; 763 [65.8%] men) (eFigure 1 in the Supplement). Detailed nodule counts by diameter, type (solid vs subsolid), malignancy status, and screening round of first observation (prevalence or incidence) are presented in eTable 3 in the Supplement. Of 3903 nodules observed, 2883 nodules (73.9%; 32 of these [1.1%] identified as malignant) were first seen during the prevalence screening, whereas 1020 nodules (26.1%; 31 of these [3.0%] identified as malignant), were first observed in 1 of 4 incidence screenings. Irrespective of screening round, more than 70% of malignant nodules (prevalence round: 25 of 32 nodules [78.1%]; incidence rounds: 25 of 31 nodules [80.6%]) had a diameter of 8 mm or greater. For these nodules 8 mm or greater, the malignancy rate was higher among nodules first observed in the prevalence round (12.8% [95% CI, 8.6%-18.5%]) than among those first observed in the incidence round (8.6% [95% CI, 5.7%-12.5%]); conversely, mean malignancy rate was higher for nodules smaller than 8 mm first observed in incidence rounds (0.8% [95% CI, 0.3%-1.9%]) than those first observed in the prevalence round (0.3% [95% CI, 0.1%- 0.6%]).
Irrespective of screening round of first detection, compared with benign nodules, malignant nodules were significantly more often located in the left upper lobe (561 nodules [14.6%] vs 23 nodules [36.5%]; P < .001), were larger in terms of diameter (median [range], 4.8 [2.2-65.6] mm vs 11.6 [3.5-107.7] mm; P < .001) and volume (median [range], 41.4 [5.0-28947] mm3 vs 357.1 [19.3-17466] mm3; P < .001) and more often had spiculated borders (193 nodules [5.0%] vs 23 nodules [36.5%]; P < .001) (Table 1). Compared with patients with benign nodules, patients with malignant nodules were statistically significantly more likely to be older (median [range] age, 57.38 [50.34-71.89] years vs 59.88 [51.90-69.98] years; P < .001), have emphysema (474 patients [42.9%] vs 35 patients [64.8%]; P = .002), and have lower FEV1 (median [range], 2.88 [0.66-6.11] L vs 2.66 [1.26-4.24] L; P = .009); whereas sex, smoking status at randomization, smoking duration and intensity, self-reported history of extrathoracic cancer, years since smoking cessation, presence of asthma or bronchitis, and FVC showed no significant association with malignancy diagnosis in the LUSI data (Table 2). Data on family history of cancer were not available in our data set.
For nodules detected at the participants’ first screening (prevalence round), all PanCan models showed high discrimination accuracy. Among the PanCan variants, the comprehensive PanCan-2b model achieved only marginally better discrimination (AUC, 0.94 [95% CI, 0.89-0.99]) than the parsimonious PanCan-1b model (AUC, 0.93 [95% CI, 0.87-0.99]), and there was no improvement in discrimination with the use of volume in the PanCan-VOL model (AUC, 0.90 [95% CI, 0.90-0.98]) or 2-dimensional perpendicular mean diameter in the PanCan-MD model (AUC, 0.94 [95% CI, 0.91-0.98]). The discrimination by UKLS was poor (AUC, 0.58 [95% CI, 0.46-0.70]). A reduced UKLS model (eTable 2 in the Supplement) that ignored variables with definitions or prevalence markedly differing from LUSI performed better in our data (prevalence round: AUC, 0.79 [95% CI, 0.68-0.89]; incidence rounds: AUC, 0.60 [95% CI, 0.49-0.70]) compared with the original version. For comparison, the models originally developed in clinical settings showed moderately good discrimination, (VA : AUC, 0.84 [95% CI, 0.76-0.92]; Mayo Clinic: AUC, 0.89 [95% CI, 0.82-0.97]; PKUPH: AUC, 0.87 [95% CI, 0.79-0.995]) (Figure).
Compared with the prevalence round, all models (except for UKLS) showed useful, although reduced, discrimination when applied to nodules first noticed in any of the incidence rounds (PanCan-1b: AUC, 0.93 [95% CI, 0.87-0.99]; PanCan-2b: AUC, 0.94 [95% CI, 0.89-0.99]; PanCan-MD: AUC, 0.94 [95% CI, 0.91-0.98]; PanCan-VOL: AUC, 0.94 [95% CI, 0.90-0.98]; UKLS: AUC, 0.58 [95% CI, 0.46-0.70]; VA: AUC, 0.84 [95% CI, 0.76-0.92]; Mayo Clinic: AUC, 0.89 [95% CI, 0.82-0.97]; PKUPH: AUC, 0.87 [95% CI, 0.79-0.97]) (eFigure 2 in the Supplement). This difference may be explained by reduced variability in nodule size (Table 1; eTable 3 in the Supplement).
Regarding model calibration, for nodules detected at the prevalence screen, visual inspection of predicted absolute probabilities of malignancy and observed malignancy rates within categories defined by nodule size suggested best agreement for PanCan-1b (Table 3; eTable 4 in the Supplement). Additional comparisons within deciles of the predicted probability scores (eTable 5 and eTable 6 in the Supplement) combined with Hosmer-Lemeshow (HL) tests for deviance between predicted and observed rates showed acceptable calibration for PanCan-1b (HL = 7.71; P = .56), PanCan-2b (HL = 7.23; P = .61), and PanCan-VOL (HL = 10.89; P = .28) but not for PanCan-MD (HL = 30.53; P < .001) or UKLS (HL = 158.99; P < .001) (eTable 7 in the Supplement). Alternatively, using BSs and Spiegelhalter z tests, acceptable calibration was found for PanCan-1b (BS = 0.009; Spiegelhalter z = −1.081; P = .28), PanCan-2b (BS = 0.009; Spiegelhalter z = 0.436; P = .67), and UKLS (BS = 0.012; Spiegelhalter z = −1.076; P = .28) but not for PanCan-VOL (BS = 0.009; Spiegelhalter z = 1.978; P = .05) or PanCan-MD (BS = 0.009, Spiegelhalter z = 3.888; P < .001) (eTable 7 in the Supplement). Malignancy probabilities estimated by models developed in clinical contexts were all strongly overestimated (VA: Spiegelhalter z = −25.24; P < .001; Mayo Clinic: Spiegelhalter z = −12.63; P < .001; PKUPH: Spiegelhalter z = −19.35; P < .001) (Table 3) and were poorly calibrated according to all tests performed (VA: BS = 0.063; HL = 774.38; P < .001; Mayo Clinic: BS = 0.014; HL = 162.32; P < .001; PKUPH: BS = 0.094; HL = 1119.9; P < .001). None of the models showed acceptable calibration on nodules first observed in incidence screens.
Table 4 shows estimated sensitivity, specificity, and positive and negative predictive values for malignancy at 2%, 5% or 10% model probability thresholds for nodules observed in the prevalence screening for the PanCan and UKLS models. PanCan-1b yielded highest sensitivities at 2% risk threshold (0.81 [95% CI, 0.67-0.94]), and 10% risk threshold (0.52 [95% CI, 0.32-0.71]), but lower specificities (2%: 0.90 [95% CI, 0.89-0.92]; 10%: 0.99 [95% CI, 0.98-0.99]) than the other models. The highest specificities at the 2% and 10% risk thresholds were observed for PanCan-MD (2%: 0.96 [95% CI, 0.95-0.97]; 10%: 0.99 [95% CI, 0.99-1.00]) but were accompanied by lower sensitivities (2%: 0.72 [95% CI, 0.56-0.86]; 10%: 0.43 [95% CI, 0.25-0.61]). The lowest positive predictive values at 2% or 5% risk thresholds were for PanCan-1b (0.08 [95% CI, 0.05-0.12]; 5%: 0.19 [95% CI, 0.11-0.27]), while the highest were for PanCan-MD (2%: 0.17 [95% CI, 0.10-0.24]; 5%: 0.33 [95% CI, 0.20-0.45]). On our data, PanCan-1b showed very similar estimates for sensitivity and specificity compared with those from the original PanCan study data.15 However, the recent PanCan-VOL and PanCan-MD updates in PanCan showed higher sensitivity and lower specificity in our study but similar PPV.16 External validation of PanCan-2b in the NLST also showed higher sensitivity, lower specificity, and comparable PPV18 compared with estimates in LUSI (Table 4). The UKLS model showed inferior sensitivity (2%: 0.25 [95% CI, 0.11-0.39]; 5%: 0.20 [95% CI, 0.07-0.33]; 10%: 0.14 [95% CI, 0.01-0.27]) and specificity (2%: 0.83 [95% CI, 0.81-0.85]; 5%: 0.92 [95% CI, 0.91-0.94]; 10%: 0.97 [95% CI, 0.96-0.98]) compared with PanCan models.
Logistic regression models fitted via generalized estimating equations on LUSI data combined with model selection via backward elimination based on the quasi–Akaike Information Criterion resulted in a model retaining age (β coefficient = 0.06 [95% CI, 0.01 to 0.11]; P = .02), years since quitting smoking (β coefficient = 0.83 [95% CI, −0.15 to 1.82]; P = .10), bronchitis (β coefficient = −1.23 [95% CI, −2.29 to −0.17]; P = .03), nodule mean diameter (β coefficient = 0.14 [95% CI, 0.09 to 0.19]; P < .001), nodule location (β coefficient = 1.23 [95% CI, 0.35 to 2.11]; P = .01), and spiculation (β coefficient = 1.72 [95% CI, 1.02 to 2.42]; P < .001) as significant variables (eTable 8 in the Supplement), whereas sex, self-reported history of extrathoracic cancer, smoking duration, emphysema (based on CT results), FVC, nodule type, and nodule count per scan showed no further association with malignancy. This final model yielded discrimination AUC of 0.90 (95% CI, 0.83-0.93)24 (bootstrap AUC, 0.88 [95% CI, 0.84-0.92]) for nodules detected in the prevalence round and 0.81 (95% CI, 0.71-0.90) (bootstrap AUC, 0.81 [95% CI, 0.73-0.87]) for nodules first detected in the incidence round.
Using data of the German LUSI trial, this diagnostic study found that 4 variants of the PanCan prediction model, developed in context of the Canadian Early Detection of Lung Cancer Study, each provided good discrimination between malignant and nonmalignant pulmonary nodules observed at individuals’ first (prevalence) LDCT screening, whereas more moderate discrimination was observed for 3 models originally developed on the basis of patient data from the Mayo Clinic and VA and PKUPH clinics. By contrast, discrimination for the model recently developed in context of the UKLS was poor.
To our knowledge, these analyses are the first to provide external validation of the recent UKLS model17 and a complete external validation for PanCan-MD and PanCan-VOL models.16 Our analyses showed no clear superiority for the PanCan-VOL or PanCan-MD models over the older PanCan model versions, confirming observations in PanCan16 but not those from Horeweg et al.30 In LUSI, diameters were retrieved by software, thus based on a 3-dimensional estimation, which possibly led to a more accurate estimation of nodule diameter compared with manual diameter measurements used in other studies.31 Overall, our observations are in line with those from other validation studies, in context of incidentally or symptomatically detected pulmonary nodules13,32-38 or in LDCT screening settings,14,17-19,39,40 which mostly have also shown discrimination indices of AUCs of approximately 0.80 and higher, and similar ranking by discrimination capacity when comparing performance of the PanCan, VA, Mayo Clinic, and PKUPH models.
Aside from good discrimination, PanCan-1b, PanCan-2b, and PanCan-VOL, but not PanCan-MD or UKLS, also showed acceptable calibration of estimated absolute malignancy probabilities when applied to data from the prevalence screening. In the original study leading to development of the initial PanCan models, acceptable calibration was reported on validation on data from a chemoprevention trial by the British Columbia Cancer Agency,15 and for PanCan-1b and PanCan-2b in NLST data14,18 and a pilot LDCT screening trial in Queensland, Australia.39,40 For the more recent PanCan-MD and PanCan-VOL models, by contrast, Tammemägi et al16 reported lower calibration accuracy by such tests in NLST data. The PanCan-1b model showed similar sensitivity, specificity, and positive predictive values at selected (2%, 5%, or 10%) probability thresholds in LUSI and PanCan data,15 while among LUSI, PanCan, and NLST data, these estimates were somewhat more variable for PanCan-2b, PanCan-VOL, and PanCan-MD.16,18 For the models developed in clinical settings, we found overestimated malignancy risk estimates, which were also observed for the VA and Mayo Clinic models on NLST data.14 None of the models (including the PanCan variants) showed acceptable calibration when applied to nodules first observed in incidence screens.
Depending on the context (ie, clinical or screening) in which they were developed, malignancy prediction models include different sets of predictor variables. While age is an established factor for lung cancer risk, it was not retained as predictor in most PanCan models (PanCan-1b, PanCan-MD, or PanCan-VOL) or in those fitted in LUSI. However, in the UKLS model, age appeared together with smoking duration. With regard to smoking, while lifetime smoking status (ever vs never) is not relevant in LUSI or other screening contexts, including UKLS and PanCan, as screening generally is targeted to long-term smokers, our data and the analyses for the development of UKLS model17 show that in multivariable models that include age and detailed nodule characteristics, additional smoking-related information may improve malignancy prediction in combination with CT imaging, even among individuals at high risk who are eligible for lung cancer screening.
Sex, associated with increased risk of malignancy for women based on the PanCan and UKLS models, showed no predictive value in our data and was found to be associated with lower malignancy risk in the Danish Lung Cancer Screening Trial.19 The female-to-male sex ratio for lung cancer incidence varies across populations and by age, in relation to variable prevalence of and changing trends in smoking habits among women,41 which could explain population differences in the predictive value of sex for nodule malignancy. Also, among patients with lung cancer, women more often have adenocarcinomas as opposed to other histologic tumor subtypes.5,42 Adenocarcinomas develop from slowly growing nodules with longer lead time until clinical relevance, which makes a benefit from lung cancer screening in terms of mortality reduction more likely.2,5,42 Thus, it may be relevant to refine malignancy prediction by sex and tailor models to specific screening populations.
The PanCan and UKLS models, which were fitted on data from a first prevalence screening, may be less applicable to new nodules in follow-up screenings.16 In NLST, the malignancy risk of nodules 4 to 6 mm or larger or 6 to 8 mm or larger was higher than that of nodules found at the baseline screening, and malignancies associated with new nodules were significantly less likely to be adenocarcinomas.43 We could not confirm these observations in LUSI data because of small numbers (eTable 3 in the Supplement).
Our study has some strengths, including that (with only 2 exceptions), malignancy was linked to specific nodules, as opposed to NLST in which malignancy was assumed to be always linked the largest nodule observed.14,16,18 Also, our study is the first, to our knowledge, to compare and document differences in the performance of malignancy risk predictions for models applied to nodules detected in prevalence or incidence screenings and for models originally developed in screening vs clinical settings.
Our study also has some limitations. One limitation is the missing information in LUSI for several risk factors, including family history of lung or extrathoracic cancer and possible differences in CT-based assessment of emphysema as compared with PanCan.
A limitation of all models examined in this analysis is that they do not incorporate measures of longitudinal nodule growth as a predictor of malignancy. For nodules of intermediate size, current diagnostic criteria for determining malignancy include early recall examinations for the determination of volume doubling time of nodules observed at an individual’s first screening or a direct assessment of longitudinal growth for nodules already noted at earlier visits when screening participants return for annual follow-up screenings. Detection algorithms are now being developed that integrate relevant nodule and nonnodule features on repeated CT screening examinations over time to predict the presence of lung cancer.30,44
Variables difficult to standardize across populations, such as lung disease diagnosis and nodule type categories, may hinder model transferability. In fact, a reduced UKLS model (eTable 2 in the Supplement) that ignored variables with definitions or prevalence markedly differing from LUSI performed better in our data compared with the original version. A more general limitation not only of LUSI but also other studies, including PanCan and UKLS, is the relatively small study size, with 2028 participants at baseline screenings in UKLS, 2029 participants at baseline screenings in LUSI, and 2537 participants at baseline screenings in PanCan, and limited numbers of malignant nodules observed. Accuracy of variable selection and model calibration may be improved in data from larger studies (eg, based on pooled data worldwide).
Our findings suggest that the PanCan models have good discrimination and confirmatory evidence for calibration accuracy of predicted malignancy risks when applied to nodules observed in an individual’s first (prevalence) screening examination, suggesting that such models may become useful tools for optimizing nodule management in population screening settings. Estimates of current models seem most applicable to nodules detected on a first screening examination, the specific context in which PanCan and UKLS were developed. For individuals screened at regular intervals, as in organized screening programs, models may be further developed to incorporate estimates of nodule volume doubling times, determined on early recall follow-up CT or determined directly for nodules already noted on an earlier visit when individuals return for annual incidence screenings.44
Accepted for Publication: December 16, 2019.
Published: February 14, 2020. doi:10.1001/jamanetworkopen.2019.21221
Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2020 González Maldonado S et al. JAMA Network Open.
Corresponding Author: Rudolf Kaaks, PhD, Division of Cancer Epidemiology, German Cancer Research Centre, Im Neuenheimer Feld 581, Heidelberg 69120, Germany (email@example.com).
Author Contributions: Dr Kaaks had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: González Maldonado, Delorme, Kauczor, Heussel, Kaaks.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: González Maldonado, Delorme, Kaaks.
Critical revision of the manuscript for important intellectual content: González Maldonado, Delorme, Hüsing, Motsch, Kauczor, Heussel.
Statistical analysis: González Maldonado, Hüsing, Kaaks.
Obtained funding: Kauczor.
Administrative, technical, or material support: González Maldonado, Delorme, Motsch, Kauczor, Heussel, Kaaks.
Supervision: Delorme, Motsch, Kauczor, Heussel, Kaaks.
Conflict of Interest Disclosures: Dr Delorme reported receiving grants from German Research Foundation (DFG) and Dietmar Hopp Foundation during the conduct of the study and serving as a member of the German Radiation Protection Commission and as the chair of its Committee on Radiation Protection in Medicine. Dr Kauczor reported receiving grants from DFG and Dietmar Hopp Foundation during the conduct of the study; grants, personal fees, and nonfinancial support from Philips; grants and personal fees from Siemens; non-financial support from Bayer; and personal fees from Boehringer Ingelheim, Merck Sharp & Dohme, and AstraZeneca outside the submitted work. Dr Heussel reported serving as a member of the German Center for Lung Research, and founding member of the Working Team in Infections in Immunocompromized Hosts of the German Society of Hematology/Oncology, faculty member of European Society of Thoracic Radiology, European Respiratory Society, and member in European Imaging Biomarkers Alliance; owning stock in GlaxoSmithKline and a patent for a method and device for representing the microstructure of the lungs (IPC8 Class: AA61B5055FI; PAN: 20080208038), and serving as a consultant or receiving personal fees from the European Conference on Infections in Leukemia, European Congress of Clinical Microbiology and Infectious Diseases, European Organization for Research and Treatment of Cancer Mycose Study Group, Schering-Plough, Pfizer, Basilea Pharmaceutica, Boehringer Ingelheim, Novartis, Roche Holding, Astellas Pharma, Gilead Sciences, Merck Sharpe & Dohme, Eli Lilly and Company, InterMune, and Fresenius Medical Care; receiving research funding from Siemens, Pfizer, MeVis Medical Solutions, Boehringer Ingelheim, and the German Center for Lung Research; receiving lecture fees from Gilead Sciences, Essex Pharma, Schering-Plough, AstraZeneca, Eli Lilly and Company, Roche, MSD, Pfizer, Bracco, Meda, InterMune, Chiesi Farmaceutici, Siemens, Covidien, Pierre Fabre, Boehringer Ingelheim, Grifols, Novartis, Basilea, and Bayer; serving on committees for the Chest Working Group of the German Roentgen Society, National Guidelines for Bronchial Carcinoma, Mesothelioma, Chronic Obstructive Pulmonary Disease, Screening for Bronchial Carcinoma, Computed Tomography and Magnetic Resonance Imaging of the Chest, Pneumonia, Hospital-Acquired Pneumonia; serving as editor of Medizinische Klinik, Intensivmedizin und Notfallmedizin at Springer publishing. No other disclosures were reported.
Funding/Support: The study was funded by the Dietmar Hopp-Foundation together with the German Research Foundation (BE 2486/2-1) from 2007 to 2010, and by the German Research Foundation (BE 2486/2-2) from 2010 to 2013. Dr Hüsing was supported by grants from the German Center for Lung Research.
Role of the Funder/Sponsor: The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Additional Contributions: From 2007 to 2011, Marie-Louise Gross, MD, carried out recruitment with initial patient information and randomization. Kirsten Lenner-Fertig, BSc, analyzed blood samples and organized freezer storage. Andrea Albrecht, BSc, and Ulrike Beckhaus, BSc, mailed the annual questionnaires, performed the scanning of the filled-in questionnaires and data entry into the database, and kept telephone contact in case of doubtful answers or missing feedback. The low-dose computed tomography scans were performed by Jessica Engelhardt, BSc, and Martina Jochim, BSc. Jan Tremper, MD; Monica Eichinger, MD; Daiva Elzbieta Optazaite, MD; Michael Puderbach, MD; and Mark Wielpütz, MD, provided radiologic evaluations of the low-dose computed tomography images. They were not compensated for their contributions outside of their normal salaries.