Investigation of the Cross-frequency Coupling Characteristics during Attentive Listening in a Two-speaker Paradigm
Mai TANAKA, Masato SUGINO, Reo OTSUKI, Kenta SHIMBA, Kiyoshi KOTANI, Yasuhiko JIMBO
Vol. 14 (2025) p. 89-100
The neural signals of a subject listening attentively to one of two simultaneously presented speech streams can be used to decode the auditory source that the subject is focusing on. Many auditory attention decoding (AAD) studies have deciphered the attended speech envelope from the listener’s delta (1-4 Hz) and theta (4-8 Hz) bands. However, other auditory source defining features, such as spatial location of the target speech relative to the listener, have not been extensively studied in the context of AAD. In this study, we systematically investigated the cross-frequency coupling (CFC) characteristics during attentive listening in a two-speaker paradigm. An open access electroencephalography (EEG) database of sixteen subjects listening attentively to one of two concurrently presented speech streams was analyzed. We evaluated CFC in the form of phase‒amplitude coupling using modified time series signal of the normalized modulation index. In this study, CFC was observed in several of the frequency pairs examined, with many pairs found in the delta and theta phase frequencies. In terms of the amplitude frequency, the 24-28 Hz frequency, corresponding to the beta band, was coupled with the 1-4 Hz, 2-6 Hz and 4-8 Hz phase frequencies. Decoding of the attended speech envelope significantly exceeded the chance level when using isolated neural frequencies and CFC measures as inputs. As reported in the literature, the isolated delta and theta frequencies contributed the most to the success of decoding the attended speech envelope, suggesting that the prominent features of the attended speech envelope are encoded in the amplitude and phase of these lower frequencies. On the other hand, the directional auditory attention decoding performance of the isolated frequencies or their combinations did not exceed chance level, but decoding of speaker location from the CFC measures was highly successful. These results indicate that the speaker’s position relative to the listener is encoded in multiple CFC frequency pairs but is not encoded in any isolated frequency band, suggesting that CFC may serve as a method to enhance the neural representation of the attended speaker position.