IntroductionEmotion is a complex psychological and physiological state that drives subjective experiences, physiological responses, and expressive behaviours1. According to Constructionist theories of emotion, emotion categories are non-entitative, with no consistent mapping between specific emotion categories and dedicated biological mechanisms (constructed by more general brain networks)2. As constructed phenomena, emotions involve allostasis, abstract categorisation, and social learning3,4,5. Humans typically use symbolic language to support the recognition, understanding, and expression of emotions6, integrating complex physiological and psychological responses into specific emotion categories. Language-supported category access can lead to different experiences derived from a multidimensional feature space (emotional states, physiological responses, autonomic nervous system activity, and facial behaviours)7,8,9. Jackson et al. used a colexification approach (i.e., the phenomenon in language where the same word is used to name semantically related concepts) to reveal extensive differences in emotional semantics across 20 global language families—different patterns of association exist in the networks of emotion concept colexification across language families10.Russell’s circumplex model of affect provides a systematic representation of the processes of perception, cognition, and emotional response to external information. It emphasises that affect is essentially a neurophysiological state composed of two dimensions: valence (pleasure vs. displeasure) and arousal (physiological activation)11. These two dimensions combine to form core affect12, which serves as the foundation for emotional responses and represents a fundamental and intrinsic emotional experience. All emotional states are mapped as points in a two-dimensional space defined by valence and arousal, thus being interpreted as specific emotional concepts. Since valence and arousal arise from independent neurophysiological systems, based on the capacity for biological evolution, various cultures may universally distinguish emotional states based on these dimensions10.Sound is a powerful emotional elicitor. In urban spaces, soundscapes are composed of multiple dynamic sound sources that exhibit spatiotemporal variability13,14,15. The perception of soundscapes is a continuous and dynamically evolving process, involving both conscious and unconscious evaluations16. Through this process, individuals interpret the acoustic environment within a given context, encompassing physiological, psychological, social, and cultural dimensions. Even a single sound source can elicit multidimensional emotional experiences due to the complexity of perception. To measure the intertwined emotional cues of soundscapes, researchers in soundscape studies have used the circumplex model of affect attributed to environments as a starting point17, providing an effective tool—the Soundscape Affective Quality (SAQ) model—to identify and quantify emotional responses elicited by specific soundscapes. This model defines eight Soundscape Affective Descriptors (SADs) within a two-dimensional circular space of Pleasantness–Eventfulness18,19,20. Similarly, the sense of Pleasantness reflects the positive or negative affect elicited by the soundscape, while the sense of Eventfulness reflects the alertness or energy state induced by the soundscape. Figure 1a illustrates a comparison between the circumplex model of affect and the SAQ model. However, compared to the general circumplex model of affect, the emotion measurement instrument developed for specific domains (such as this one for urban soundscape studies) can more accurately capture the complex emotional experiences associated with particular stimuli. This approach has also been confirmed in the music and olfactory fields21,22.Fig. 1: Background framework.a Frameworks of affective quality in different domains: The green framework represents the environmental affective quality framework and its constructed model, also known as the structure of core affect12. The red framework represents the soundscape affective quality framework and its constructed18; b Soundscape Affective Quality (SAQ) Assessment results of 13 languages that showed “high confidence” results. Each point represents a sample, and the shaded areas represent contours that enclose 50% samples. It can be seen that European countries with close geographical proximities showed similar distributions. (Data Source: Soundscape Attributes Translation Project (SATP) Dataset26).Full size imageCulture shapes individuals’ emotional experiences and expressions in different contexts through unique frameworks for understanding emotions and norms for expressing them23. Similarly, cultural differences affect the way individuals perceive, express, and evaluate the SAQ. In this context, careful consideration needs to be given to the “transplantation” of specific measurement instruments or theoretical frameworks from one cultural context to another, i.e., “imposed-etic approach”. In a recent global collaboration, the Soundscape Attributes Translation Project (SATP)24,25, a list of standardised soundscape affective descriptors was translated into 18 languages. Native speakers from various countries evaluated the same soundscape stimuli using their respective language versions of the SAQ scale. However, the results revealed significant differences in the affective quality ratings of soundscapes between Chinese participants and those from other countries, as illustrated by the density plot of soundscape quality ratings26 (Fig. 1b). This observation prompted an in-depth consideration of possible reasons: first, Chinese soundscape emotion may have significant structural differences from other cultures, which may stem from differences in soundscape affective semantic space; second, differences in affective semantic space may lead to the possibility that the imported instrument models may cause bias in the assessment of SAQ.From an emic perspective, it is essential to explore the emotional salience and motivational forces within cultural scripts rather than assuming these saliences and forces. This involves adopting an insider’s view of culture, presenting human behaviour and subjective experiences from the perspectives of actors, intenders, and subjects of attention, and exploring “what life is like there—what features are salient to its inhabitants”27,28.Herein, through the emic approach, we explore the Chinese soundscape affective semantic space from specific cultural phenomena and experiences based on local input items (Chinese SADs) and representative soundscape excerpts (SEs) of Chinese urban open spaces and validate their robustness through further grouping into sub-dataset. The structure of Chinese SAQ is established based on Chinese soundscape affective semantic space, and the assessment differences between the indigenously derived (indigenous Chinese SAQ scale, ICSAQ scale) and the imported SAQ scales (translated global SAQ scale, TGSAQ scale) are compared. The results showed the significant structural differences in soundscape emotional experiences between Chinese and Western cultures, confirming the necessity of developing culturally appropriate environmental affective assessment tools.ResultsConstruction of semantic space of soundscape affective quality in the Chinese contextTo carry out the experiment from an indigenous perspective, SEs and a list of SADs had to be established locally. The methodological framework is shown in Fig. 2. To capture the diverse and characteristic soundscape of China, we recorded 424 SEs from 13 provinces of China that each comprises of a 30-second binaural audio and a panoramic shot of the surrounding visual context. These SEs were screened, and 132 high-quality SEs that represented 17 urban forms were selected for subsequent experiments. The indigenous SAD list was then created through a process of adjective extraction, focus group interviews, and online screening experiments. This created a list of 108 potential SADs that are suitable for describing Chinese soundscapes (refer to Table S9 for the explanations of all descriptors).Fig. 2: Process flow of the research method.a Collection and analysis of indigenous soundscape excerpts (SEs); b Indigenous soundscape descriptors solicitation and screening; c Listening experiment.Full size imageWith the indigenous SEs and SADs in hand, a listening experiment was carried out. The participants rated the SE-SAD pairs in terms of how well the SAD matched their perception of the soundscape. The SADs assessed included the 108 developed here and eight translated in the SATP project.For the dataset acquired, principal component analysis (PCA) was then used to analyse the outcome of the listening experiment. The Kaiser‒Meyer‒Olkin value was 0.847 ( > 0.6). The results of Bartlett’s test of sphericity were significant (χ2 = 21695.1, p 0.85, where Va denotes the variance of each SAD that can be interpreted by the corresponding PC. The SAD in Zone 1 does not explain the PC well.Fig. 3: Soundscape semantic space models.a Soundscape semantic space of China: Loadings of the 108 Chinese soundscape affective descriptors (SADs) in components 1 and 2. The figure is divided into three zones according to the length of the component loading vectors of the 108 Chinese SADs (Va: the distance to the origin), Zone 1 (light red square); Zone 2 (red square); and Zone 2 (dark red square); where Va2 represents the variance of each attribute that the corresponding components; b The soundscape semantic spaces of Europe19 and China depict the loadings of 116 English soundscape affective descriptors (SADs) and 108 Chinese SADs across components 1 and 2. Green circles represent the English SADs, while red squares denote the Chinese SADs; c Soundscape semantic space of the sub-datasets: Displays the loadings of the 16 clustering descriptors in components 1 and 2.Full size imageThe two PCs each embody a different dimension of the soundscape emotional experience. This is based on the similarity of the revealed results to the emotion construct12,30. In the circular space, the SADs yirende/宜人的, shushide/舒适的, shuxinde/舒心的, xuannaode/喧闹的 and hunluande/混乱的 (in descending order of their component loadings on the PC) had the highest interpreted rates on PC 1. When soundscapes are described as yirende/宜人的 or shushide/舒适的, this may indicate that the soundscape has a positive effect on the individual. Conversely, if the soundscape is described as hunluande/混乱的 or other SADs, this may indicate that the soundscape is causing the individual to experience negative affect such as stress and anxiety. Together, these SADs describe the positive or negative affective sate elicited by the soundscape; therefore, PC 1 was labelled 舒适感 (Comfort). The highest SADs in PC 2 include fengfude/丰富的, renaode/热闹的, dandiaode/单调的, wubianhuade/无变化的 and chenmende/沉闷的. fengfude/丰富的 and renaode/热闹的 indicate the variety and activity of the soundscape, which tends to stimulate physiological responses and mental alertness. dandiaode/单调的, wubianhuade/无变化的 and chenmende/沉闷的, on the other hand, may result from low levels of physiological and psychological responses. Together, these adjectives describe the differences in the soundscape in terms of triggering physiological activity and mental alertness; hence, PC 2 is labelled 丰富感 (Richness).We name this model, as presented in Fig. 3a, the semantic space of SAQ of China. The simple plot of the affective states within the Cartesian space formed by the underlying dimensions suggests that this is a circular structure rather than a simple linear structure. In this circular structure space, Chinese SADs are meaningfully arranged, attributing various affective states of soundscapes to different combinations of Comfort or Richness. The first quadrant contains soundscapes that are generally perceived as comfortable and rich, while the opposite quadrant contains various annoying soundscapes.Soundscape affective quality comparisons between regions and within ChinaThe current globally adopted soundscape perception scale originated from the soundscape semantic space model developed by Axelsson et al. In Europe19. Among the two PCs in that model, PC 1 (Pleasantness) explained 50% of the variance, which was best explained by “uncomfortable”, “comfortable”, “appealing”, “disagreeable” and “inviting”. PC 2 (Eventfulness) explains 18% of the variance, which is best explained by “eventful”, “lively”, “uneventful”, “full of life” and “mobile”.The SAD alignment patterns in the circumplex space reveal differences in the composition of the soundscape affective state across cultures. Specifically, there are fewer SADs in the second and fourth quadrants of the Chinese soundscape semantic space. In addition, SADs such as hunluande/混乱的, xuannaode/喧闹的, chaonaode/吵闹的, shushide/舒适的, pinghede/平和的, wenhede/温和的 and xianshide/闲适的 are close to the axes, which is not the case in the European space, as shown in Fig. 3b. Caution needs to be exercised in interpreting the specifics of PCs. According to the soundscape semantic space, potential dimensions are given different labels, but Comfort and Pleasantness both reflect the positive or negative tendency (valence) of soundscape emotional experience, whereas Richness and Eventfulness both represent the intensity of soundscape affective activation (arousal).Based on the regional distribution and urban spatial types, we grouped the data into two sub-datasets in two ways: South and North, and natural and artificial. Each pair would add up to the complete dataset. Cluster analysis of all samples resulted in the identification of 16 SAD clusters. In the four datasets, PAC was performed on these 16 clusters respectively. In the PAC of each dataset, two PCs were obtained, and all datasets passed the KMO and Bartlett tests. The results show that the two PCs of each of these four sub-datasets can all be labelled as Comfort and Richness. The “variance explained” parameter for the two PCs ranges between 40.9–50.3% and 26.1–39.8%, respectively, and together explain 76.3–80.7% of the total variance. In addition, the SADs under the four types of contexts show very similar distributions in the soundscape semantic space, as shown in Fig. 3c.Construction of an indigenous Chinese soundscape affective quality scaleFigure 4a summarises the proposed Chinese SAQ structure, in which eight SADs are placed in a circular order at approximately 45° intervals, forming an equally spaced circular structure. This structure provides a network of testable propositions to which all SADs are related, the effective quality attributed to a particular soundscape can fall at any point in space, and any SAD can be represented as a vector originating from the center of the circle. The model does not aim to capture all emotion but rather aims to provide a means of describing or conceptualising people’s perceptions of SAQ at the most general level. There has been empirical evidence on this class of models, for example, that their affective space is bipolar, and the two principal axes are independent and of comparable significance in the semantic space31,32. The SAQ structure discussed is limited to eight SADs, but theoretically, the circumplex model allows for infinite subdivision, and any of the SADs of the soundscape semantic space can be used for measurement. The order in which these SADs form the circumplex is shown in Fig. 4a: pingdande/平淡的, youqude/有趣的, wuliaode/无聊的, renaode/热闹的, shushide/舒适的, hunluande/混乱的, fengfude/丰富的and dandiaode/单调的. The ring structure can also be defined in independent dimensions: “shushide/舒适的-hunluande/混乱的” and “fengfude/丰富的-dandiaode/单调的”.Fig. 4: Structure of soundscape affective quality scales and correlation heatmap.a1 Indigenous Chinese soundscape Affective quality scale (ICSAQ scale). Displays the eight descriptors and their circular arrangement; a2 Translated global SAQ scale (TGSAQ scale)25. Displays the eight descriptors and their circular arrangement; b Analysis of the indigenous SAQ structure. Presents a heatmap of the correlation between the SAQ and clustering among the eight SADs; c Correlation analysis of indigenous and imported scales. Shows the heatmaps of the correlation between the two PCs and the results of cluster analysis.Full size imageIf the eight SADs shown are exactly 45° apart and measured without error, their correlation should conform to a specific pattern in the correlation matrix33. However, as this is psychology and not geometry, in practice, owing to the complexity of mental structures, it does not necessarily follow a strict circumplex structure34,35. As shown in Fig. 4b, the actual correlation matrix obtained is very close to the expected value of the circular ordering of the variables. This process produced four bipolar scales whose Cronbach’s alpha estimates were very adequate (Supplementary Information Table S1), and the correlations between the scales were roughly in line with the expected pattern of correlation. The reliability of the pairs ranged from 0.70 to 0.84 ( > 0.70 is considered high).Measurement bias of the translated global SAQ scaleWhen SAQ measurement instruments are applied to another culture, issues of measurement bias and equivalence become important36,37. Here we compared the developed ICSAQ scale with the translated scale (TGSAQ scale). The affective quality scores of the 132 SEs were measured using the two scales (Fig. 4a). Pearson correlation (Fig. 4c) revealed high correlation coefficients between the Comfort and Pleasantness dimensions (0.887) and between the Richness and Eventfulness dimensions (0.843). This suggests that the two instruments are effective in measuring the same construct. However, this does not mean that the two scales are equivalent. The underlying dimensions of the two scales are assigned different labels, and structural bias may occur when the constructs to be measured do not overlap exactly across cultures38. In addition, there are differences in some particular SADs, and different emotional response styles may give rise to instrument bias (method bias) and item bias. Any combination of construct, method, or item bias that may be present in TGSAQ scale can give rise to bias in test interpretation39. These potential biases are assessed below.The Shapiro‒Wilk test was performed on the whole dataset, and the results revealed that all the variables conformed to the characteristics of a normal distribution (p > 0.05). On this basis, t test_rel (i.e., paired samples t test) was used to compare the results of the two measurement instruments on the same samples, and significant differences (p 0.05). The LAeq of natural space ranged from 44.6 to 70.5 dB (mean = 57.9 dB, SD = ± 6.5), whereas that of artificial space ranged from 46.3 to 77.7 dB (mean = 63.4 dB, SD = ± 7.6), there is a statistically significant difference in LAeq between the two spaces (F = 19.903, p = 0.000