The Role of Dual-step Linguistic Structuring in Self-paced Reading Rhythms
Ryutaro KASEDO, Atsuhiko IIJIMA, Kiyoshi NAKAHARA, Yusuke ADACHI, Fumitaka HOMAE, Ryu-ichiro HASHIMOTO, Isao HASEGAWA
Vol. 13 (2024) p. 343-353
Humanlike text-to-speech synthesis has not yet been achieved using machines. Reading speed is a critical parameter determining natural prosody and is used to assess the quality of text-to-speech synthesis. In reading, sublexical units are consecutively combined into minimal lexical units (sublexical structuring), which are further structured into phrases, clauses, and sentences (lexical structuring). Most psychological research on linguistic structuring processes has focused separately on sublexical and lexical processes, and it remains unclear whether spontaneous reading rhythms depend on sublexical and lexical dual-step linguistic structuring processes. To address this question, we introduced a self-paced sequential letterstring reading task, in which Japanese kana letters were sequentially presented while participants spontaneously pushed a button to proceed to the next letter. This task allowed us to estimate the timing of self-paced linguistic structuring from the reaction time to each letter. We found that letter-by-letter changes in reaction time can be explained by dual-step linguistic structuring processes. The reaction time decreased with the accumulated number of unstructured sublexical and lexical units, and then transiently increased with the increase in number of structured sublexical and lexical units at the boundaries of a lexical unit, phrase, and sentence. By comparing the relative prediction errors obtained by calculating the Akaike information criterion and Bayesian information criterion of different linguistic models, we found that the reaction time to each letter was best explained when both the sublexical structuring and lexical structuring were considered simultaneously. Our finding that dual-step linguistic structuring affects self-paced reading rhythms provides useful information in the development of more humanlike text-to-speech synthesis.