Abstract
Background An internal validation substudy compares an imperfect measurement of a variable with a gold standard measurement in a subset of the study population. Validation data permit calculation of a bias-adjusted estimate, expected to equal the association that would have been observed had the gold standard measurement been available for the entire study population. Guidance on optimal sampling of participants to include in validation substudies has not considered monitoring validation data as they accrue. In this paper, we develop and apply the framework of Bayesian monitoring to determine when sufficient validation data have been collected to yield a bias-adjusted estimate of association with a prespecified level of precision.
Methods We demonstrate the utility of this method using the Study of Transition, Outcomes and Gender—a cohort study of transgender and gender non-conforming children and adolescents. Transmasculine and transfeminine status were determined from the gender code in the electronic medical record at cohort enrollment. This status is known to be misclassified because it can indicate either gender identity or sex recorded at birth. Our interest is in the association between transmasculine and transfeminine status and self-inflicted injury. To address possible exposure misclassification, we demonstrate the method’s ability to determine when sufficient validation data have been collected to calculate a bias-adjusted estimate of association that is less than 80% greater than the precision of the conventional estimate.
Results In the conventional age-adjusted analysis, we observed that transmasculine children and adolescents were 1.80-fold more likely to inflict self-harm than transfeminine youths (95%CI 1.27, 2.55). Using the adaptive validation approach, 200 cohort members were required for validation to yield a bias-adjusted estimate of OR=3.03 (95%CI 1.76, 5.56), which was similar to the bias-adjusted estimate using complete validation data (OR=2.63, 95%CI 1.67, 4.23).
Conclusions Our method provides a novel approach to effective and efficient estimation of classification parameters as validation data accrue. This method can be applied within the context of any parent epidemiologic study design, and modified to meet alternative criteria given specific study or validation study objectives.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported in part by the US National Cancer Institute (F31CA239566) awarded to Lindsay J Collin, (R01CA234538) awarded to Timothy L Lash, and the US National Library of Medicine (R01LM013049) awarded to Timothy L Lash. Thomas P Ahern was supported by an award from the US National Institute of General Medical Sciences (P20 GM103644). STRONG cohort data were collected with support from Contract AD-12-11-4532 from the Patient Centered Outcome Research Institute and by the NICHD R21HD076387 awarded to Michael Goodman.
Author Declarations
All relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.
Yes
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Conflicts of Interest: The authors declare no conflict of interest.
Sources of Funding: This work was supported in part by the US National Cancer Institute (F31CA239566) awarded to Lindsay J Collin, (R01CA234538) awarded to Timothy L Lash, and the US National Library of Medicine (R01LM013049) awarded to Timothy L Lash. Thomas P Ahern was supported by an award from the US National Institute of General Medical Sciences (P20 GM103644). STRONG cohort data were collected with support from Contract AD-12-11-4532 from the Patient Centered Outcome Research Institute and by the NICHD R21HD076387 awarded to Michael Goodman.
Ethics Statement: The study was conducted in accordance with the Declaration of Helsinki and was approved through Emory University IRB (#00062742). Participant consent was not required as the study used de-identified data obtained from the Kaiser Permanente sites in Georgia, Northern California, and Southern California. Each Kaiser Permanente site received its own IRB approval.
Data Availability
Due to patient confidentiality, data are only available upon IRB approval from the research institution in collaboration with Dr. Michael Goodman and the STRONG research team. Example code used to perform the adaptive validation study is available from GitHub (https://github.com/lcolli5/Adaptive-Validation).