Development of a New Method to Trace Patient Data Using the National Database in Japan

Tomoya Myojin, Tatsuya Noda, Shinichiro Kubo, Yuichi Nishioka, Tsuneyuki Higashino, Tomoaki Imamura
Vol. 11 (2022) p. 203-217

The National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB) is a comprehensive database containing health insurance claim information. The structure of the NDB complicates long-term cohorts for two main reasons. First, the NDB data are stored on a per-claim basis. Second, the NDB is a billing-focused record structure. Therefore, the objective of this study was to use ID0 to modify the data structure to allow for long-term cohorts, provided that the data volume is not increased and the runtime per data year is maintained within one month. The NDB uses two primary keys (ID1 and ID2) made from hash values that mask personally identifiable information. ID0 is our recently developed key from ID1 and ID2, which improves patient-matching efficiency with excellent long-term tracing performance. Our study used claim data with filing dates between April 2013 and March 2016 to trace hospitalizations of one month or longer, including outpatient care, in three steps. In Step 1, claims were transferred to a CD-record format. As some diagnosis procedure combination (DPC) claim records contain a mixture of overlapping comprehensive and piece-rate data, we sorted and reorganized them. In Step 2, pharmacy and medical outpatient claims were integrated using the ID0 key, the medical institution code for issuing a prescription, and the prescription issue date. In Step 3, the transferred data were combined and converted from consecutive hospitalization days into sequences based on ID0, the medical institution code, and hospital ward classification. Consequently, the size of the originally extracted comma-separated variable dataset for three years (approximately 10.5 TB) was reduced to an approximately 6 TB main database file that was usable for processing. The process took approximately three months. With similar conventional methods, the data size was 30 times larger, and it took more than seven months to process a year’s worth of data. In addition, to demonstrate the application of this method, we conducted a six-year mortality cohort for all Japanese citizens. Our technique makes it easy to perform follow-up and longitudinal cohort surveys while accurately tracing patient data in large-scale medical claims databases.