Articles

Performance Bias Due to Test Data Reuse in Post-market Retraining of a Bone Scintigram Diagnostic Support System and Its Bias Correction

Haruto YAMANAKA, Ryusuke NAKAOKA, Akinobu SHIMIZU
Vol. 15 (2026) p. 76-84

Recent attention has focused on the post-market retraining of AI-based software as a medical device (SaMD) following the introduction of a new approval process in Japan, known as the “Improvement Design within Approval for Timely Evaluation Notice (IDATEN).” This process facilitates the post-market performance improvement of SaMD. However, repeated use of test data raises concerns about over-adaptation to the test dataset, potentially introducing performance bias. In this study, performance bias was evaluated using a deep model designed to support hotspot detection in bone scintigraphy by simulating the selection and integration of multiple post-market models with repeated use of identical test data. Performance bias was observed in both ensemble learning approaches employing bagging- and boosting-inspired sequential aggregation with pretrained models. Additionally, bias reduction was demonstrated using ThresholdoutAUC, based on differential privacy principles. These findings are expected to be useful for SaMD development, leading to continuous performance improvements.

READ FULL ARTICLE ON J-STAGE