IntroductionThe Tanka people or boat people (水上人) predominantly reside in Southeast China. Officially classified as part of the Southern Han, their traditional settlements span Guangdong, Guangxi, Fujian, Hainan, the coastal regions along the Pearl River Delta, and the Special Administrative Regions of Hong Kong and Macau (Anderson, 1970; Chan, 2012; Huang and Xu, 2018; Yang, 2019; Yao, 2022; Zhuang, 2009). A small population of Tanka people can also be found in Vietnam, where they are recognized as Đàn, a subgroup of the Ngai people (Kani, 1967). As bearers of a unique maritime heritage, their language functions not merely as a communication tool but as a vital repository of ancestral knowledge, rituals, and social cohesion. While many Tanka people have transitioned to a land-based lifestyle, the older generations continue to preserve their maritime traditions. This shift has, at times, led to their marginalization, with the community being labeled as “water gypsies ” (Chen et al., 2022; He and Faure, 2016; He et al., 2022).The Tanka people are believed to be among the earliest inhabitants of South China, though their origins remain a subject of ongoing research and debate (Cheung, 1991; Fung, 2013; He et al., 2022; Kani, 1967; Luo, 1929; Luo et al., 2020; McCoy, 1965). While previous studies have primarily focused on the cultural practices of the Tanka, their language, paticular in the Hong Kong context, has been largely overlooked. The erosion of the Tanka language thus represents more than linguistic loss; it threatens the transmission of cultural memory, undermines intergenerational bonds, and risks severing the community’s connection to its historical identity. In Hong Kong, the Tanka community is predominantly distributed across the New Territories, with significant populations in areas such as Tai Po, Sai Kung, Tai O, Tap Mun, and Cheung Chau (see Fig. 1). Although the Tanka language is one of Hong Kong’s original cultural languages, it is now endangered, with only around 1,125 speakers remaining (see Fig. 2). The decline is attributed to sustained interaction with Cantonese, one of Hong Kong’s official languages, which has led to the development of distinct linguistic features (Census and Statistics Department, 2022; Wang et al., 2024; Ward, 1986).Fig. 1The distribution of the Tanka communities in Hong Kong.Full size imageFig. 2: Population of Tanka people over time (1961–2021).The data for this figure was sourced from Census and Statistics Department of The Government of the Hong Kong Special Administrative Region.Full size imageThis paper is organized as follows: Section “Tanka people and their language in Hong Kong” reviews existing literature on the Tanka people’s transition from a maritime to a land-based lifestyle and its impact on their language. Section “Theoretical framework” introduces the theoretical framework of this study. Section “The present study” outlines the research questions and objectives. Section “Methods” details the methodology, including informant selection, data collection, and the tools used for phonological and lexical analysis. Section “Results” presents the results, focusing on the Tanka phonological system and lexical convergence with Cantonese. Section “Discussion” discusses the findings within historical phonological and lexical contexts. Finally, Section “Conclusion” concludes by summarizing the key findings, highlighting the implications for future research, and emphasizing the importance of preserving the Tanka language as part of Hong Kong’s cultural heritage.Tanka people and their language in Hong KongThe Tanka people are one of Hong Kong’s four original ethnic groups, alongside the Hoklo, Hakka, and Wai Tau communities. Historically, they have resided in the coastal areas of the New Territories (Wang et al., 2024). However, the urbanization of Hong Kong in the 1960s severely impacted the local fishing industry, which had been central to the livelihood of the Tanka people (Lo, 2019). In response, the government implemented initiatives to encourage Tanka fishermen to transition from their traditional maritime lifestyle and integrate into the broader Hong Kong community. One of the immediate challenges they faced was the need to learn Cantonese, leading to a rapid decline in the use of the Tanka language. Today, it is rarely spoken in formal contexts and has been almost entirely excluded from school curricula (Wang et al., 2024). As a result of these changes, concerns have emerged over the potential extinction of both the Tanka language and the distinct cultural identity it represents (Cheung, 2020; Wang et al., 2024). Despite these concerns, research on the Tanka language remains limited, with only a few studies describing its phonological and lexical aspects (Chang and Zhuang, 2003; Dai and Wang, 2024; Fung, 2015; Zhuang, 2009). These studies collectively provide a foundational understanding of Tanka’s phonological and lexical systems, highlighting their descriptive aspects. By comparing previous research on Tanka phonology with our most recent fieldwork data, we observed that Tanka varieties spoken in different regions of Hong Kong share several common phonological features. For instance, the Middle Chinese dental nasal initial n- and retroflex nasal initial nr- have both merged into /l-/ in many Tanka varieties, including those spoken in Cheung Chau (Lok, 2010) and Shek Pai Wan (Fung, 2013). This is evident in the word 難 “difficult”, realized as [lan33] in Cheung Chau (Lok, 2010, p. 13), and in 南 “south”, realized as [lam21] in Shek Pai Wan (Fung, 2013, p. 11). Furthermore, in the Tanka varieties spoken in Cheung Chau, Tai O, and Shek Pai Wan, the Cantonese vowel /y/ regularly corresponds to /i/. This correspondence is illustrated by the following examples: 魚 “fish” is realized as [jwi35] in Tai O (Chan, 2012, p. 13); 鼠 “mouse” as [ʃi35] in Cheung Chau (Lok, 2010, p. 17); and 豬 “pig” as [tʃi55] in Shek Pai Wan (Fung, 2013, p. 12). The Tanka variety spoken in Sam Mun Tsai, as revealed in our recent survey, exhibits the same patterns (see Section “Historical phonological comparison of Tanka and Cantonese” for details). Despite these shared features, there are also notable differences between the Tanka language spoken in Sam Mun Tsai and the other Hong Kong Tanka varieties. For instance, while the final occlusives -p, -t, and -k have been entirely lost in the Tanka language in Sam Mun Tsai, the codas -t and -k are still preserved in the Tanka languages in Tai O, Cheung Chau, and Shek Pai Wan (Chan, 2012; Fung, 2013; Lok, 2010). In addition to studies on other Tanka varieties, previous research has also been conducted specifically on the Sam Mun Tsai variety. A comparison between our findings and those reported by Zhuang (2009) suggests that the Tanka language spoken in Sam Mun Tsai is undergoing rapid phonological change under the influence of Cantonese. Notably, Zhuang’s description of the phonological system diverges from our data, which reflect a more pronounced Cantonese influence (see Section “Historical phonological comparison of Tanka language and Cantonese” for more details). Meanwhile, although Fung (2015) has listed some examples of Tanka vocabulary together with their IPA and meanings in Chinese, such as 天臭 [thin55 tʃhɐu33] : 天色不佳 “bad or gloomy weather” (p.283), 他魚[thai35 ji21–35] : 下桿釣魚 “to start fishing with a rod”Footnote 1(p. 288), his study does not systematically examine the culture-specific Tanka lexicon in Hong Kong, nor does it investigate the cultural implications, morphological features or semantic characteristics of these lexical items. Additionally, existing research does not employ a quantitative lexical comparison to explore the relationship between the Hong Kong Tanka language, Hong Kong Cantonese, and Mandarin, which limits a deeper understanding of the linguistic dynamics at play. Furthermore, prior research has largely overlooked inter-generational differences in lexical usage, an essential factor for understanding the processes of language contact and linguistic change over time.By addressing these gaps, this study aims to offer a more comprehensive and nuanced understanding of the evolving linguistic landscape of Tanka vocabulary in Hong Kong, thereby contributing to broader discussions on language contact, change, and cultural preservation. Given the increasing interactions between Tanka and Cantonese speakers, further research on the linguistic contact between the two languages is essential. Moreover, to effectively preserve the Tanka language and culture, comprehensive efforts must be undertaken to document and analyze the Tanka language in Hong Kong.Theoretical frameworkThis study anchors its theoretical foundation in language contact theory (Thomason and Kaufman, 1988) as the primary framework for explaining phonological and lexical changes in the endangered Tanka language under sustained Cantonese influence. Building upon this core premise that prolonged language interaction drives bidirectional change, the framework strategically layers complementary perspectives to address multidimensional research needs.Historical linguistics provides critical diachronic context by tracing the Tanka language and Cantonese to their shared Middle Chinese origins, enabling a precise distinction between contact-induced innovations and inherited or independent developments. This temporal depth directly informs the analysis of contemporary variations within contact theory. Concurrently, lexicostatistics injects quantitative rigor, systematically measuring lexical convergence (e.g., Cantonese-Tanka loans) and divergence to empirically validate contact theory’s predictions about the intensity of change.The framework further integrates socio-cultural dimensions through dual lenses. Endangered language studies (Crystal, 2000) frame the linguistic phenomena within urgent preservation imperatives, transforming academic inquiry into vital cultural heritage documentation. Complementing this ethical focus, cultural linguistics reveals how Tanka’s maritime lexicon encodes cultural resilience, identifying domains of high cultural salience that resist external influence while reinforcing the urgency of preservation.By integrating these perspectives, the study not only addresses the phonological and lexical aspects of the Tanka language but also situates these linguistic features within broader socio-historical and cultural contexts. This integrated structure enables a holistic examination of Tanka’s linguistic evolution, where structural shifts, quantitative patterns, historical trajectories, and cultural identity collectively illuminate contact dynamics within an endangered language ecosystem.The present studyThe Tanka people, a distinctive group in Hong Kong, possess a unique culture, language, and way of life deeply connected to the sea. Despite their significant contributions to Hong Kong’s cultural heritage, their language remains underexplored, particular in terms of its phonological and lexical influences from Cantonese. This gap in research is concerning, especially given the imminent threat of the Tanka language’s extinction due to the dominance of Cantonese in Hong Kong, as well as the community’s shift away from their traditional fishing lifestyles. The decline of the Tanka language represents not only the loss of a means of communication but also the erosion of a key aspect of Hong Kong’s cultural diversity. Preserving this language is crucial to maintain the broader cultural diversity of the region.In light of this, the present study aims to address the following research questions (RQs):RQ1: What are the phonological features that distinguish the Tanka language from Cantonese in Hong Kong?RQ2: How does the Tanka language’s lexicon reflect its unique cultural and historical background, and how does it differ from Cantonese?RQ3: What is the extent of linguistic influence between the Tanka language and Cantonese in terms of phonology and vocabulary, and what insights can be drawn from this contact?MethodsInformantsTo be eligible, informants were required to meet the following criteria: (1) aged 55 or above. This threshold was established to ensure that all informants had acquired and used the Tanka language during its final period of community-wide vitality. Both preliminary fieldwork and previous sociolinguistic research (Wang and Dai, 2024a; Wang et al., 2024) have indicated that fluent native speakers of the Tanka language under the age of 55 are exceedingly rare, owing to a rapid language shift toward Cantonese that began in the 1960s and 1970s. This shift coincided with increased school enrollment, urban migration, and the sociopolitical marginalization of Tanka identity (Wang and Dai, 2024b). By the late 20th century, the Tanka language had been largely excluded from public domains, surviving primarily among older generations within domestic and ritual contexts (Fung, 2015). Including younger participants risked introducing speakers with incomplete acquisition, passive competence, or Cantonese-influenced speech, which could compromise the phonological and lexical authenticity essential to the aims of language documentation. (2) Lifelong residency in Hong Kong and upbringing in Tanka-speaking families. This criterion ensured consistent exposure to the sociolinguistic environment of the Hong Kong Tanka community, minimizing dialectal interference and maximizing the capture of locally situated, authentic language use. (3) The Tanka language as their first language (L1), with continued active use. Selecting informants who acquired the Tanka language as their native language guarantees early natural acquisition during the critical period of language development. Continued daily use further indicates that the language remains cognitively accessible and fluently produced, which is crucial for eliciting spontaneous, naturalistic speech data rather than fossilized or semi-passive speech forms. (4) Informants’ both parents were native Tanka language speakers. This requirement ensured uninterrupted intergenerational transmission of the language within the household, reducing the likelihood of early exposure to dominant languages such as Cantonese or Mandarin, and enhancing the internal consistency and linguistic purity of the data.In addition to the core criteria above, two supplementary factors were considered to enhance the ethnolinguistic representativeness of the sample. First, limited or no formal schooling was preferred, given the dominance of Cantonese, Mandarin, and English in Hong Kong’s education system. Minimizing exposure to formal instruction in other languages helped mitigate potential lexical borrowing or phonological interference in Tanka speech. Second, informants with occupational backgrounds in fishing were prioritized, as fishing constitutes a core domain of Tanka livelihood and cultural identity. These individuals were more likely to retain traditional vocabulary related to seafaring, weather, and ecological knowledge. Socio-economic status was not used as an inclusion or exclusion criterion since the primary objective was to maximize linguistic authenticity and dialectal fidelity. Some socio-demographic variables, such as birthplace, and the languages used in communication with different interlocutors, are presented in Table 1 to provide additional context for understanding the informants linguistic environments.Table 1 Detailed information about the four informants.Full size tableTo ensure the reliability and validity of the linguistic data, a two-stage screening protocol was conducted prior to formal data collection. First, informants underwent a brief cognitive screening using the Mini-Cog, a clinically validated instrument designed for rapid assessment of cognitive impairment (Borson et al. 2003). This tool was selected for its efficiency, cultural adaptability, and minimal reliance on literacy, which made it particularly suitable for elderly Tanka speakers, many of whom had received little or no formal education. The Mini-Cog includes two tasks: (1) a three-item recall, in which participants were asked to repeat and later recall three culturally familiar words (e.g., “fish”, “boat”, “rice”); and (2) a clock-drawing task, requiring participants to draw a clock face indicating a specific time (e.g., “11:10”). These tasks jointly assess short-term memory, executive function, and visuospatial abilities, which are essential for coherent participation in extended linguistic interviews. The use of the Mini-Cog enabled the research team to efficiently and reliably identify cognitive deficits, ensuring that all informants possessed the cognitive capacity to meaningfully engage in the study and contribute linguistically valid data. Second, language proficiency was assessed through semi-structured ethnographic interviews conducted entirely in the Tanka language by two native-speaking members of the research team. Informants were encouraged to narrate personal experiences, describe traditional fishing practices, and discuss kinship terms, foodways, rituals, and other culturally salient topics. These sessions were audio-recorded and evaluated for fluency, lexical richness, grammatical accuracy, and overall comprehensibility. Informants who exhibited signs of cognitive impairment or inconsistent language production were excluded. All four selected informants demonstrated robust cognitive function and strong command of the Tanka language, making them reliable and representative speakers for the purposes of this study.Ethical approval for the study was obtained from the university’s Human Research Ethics Program (Ref.no.2021-2022-0262) on March 25, 2022 before data collection commenced. The approval process ensured compliance with ethical guidelines for research involving human participants. All informants were informed of the study’s objectives, procedures, and their rights, including the right to withdraw at any stage without any consequences. To ensure informed consent, a Chinese consent form was provided and verbally explained to all informants. Confidentiality of personal data and interview content was strictly maintained throughout the research process. As a token of appreciation for their time and contributions, informants were offered a 30 HKD coupon for each hour of participation.InstrumentsInterviews were the primary data collection method employed. All informants were interviewed using a predetermined outline consisting of 1000 words and 9247 vocabulary items commonly used in linguistic fieldwork. Their responses were analyzed to identify the features and changes of phonology and vocabulary in the Tanka language.Phonology and lexical surveyOur study on the Tanka language’s phonological and lexical systems involved an investigation of 1000 words and 9247 lexical items. The 1000 words were sourced from the Handbook for the Fieldwork of Language Resources in China (Language and Text Information Management Department of the Ministry of Education of China & China Language Resources Protection Research Center, 2015). We added 8000 vocabulary items from multiple research resources, beginning with 1200 fundamental lexical items from the same handbook, and 6800 vocabulary items from various dialectic studies (Dai and Wang, 2023; Huang, 2015; Institute of Linguistics in Chinese Academy of Social Sciences, 2003; Zhan, 2002/2004). We also incorporated 1247 unique technical and cultural terms from the Categorized Vocabulary List of Nanning Tanka language (Huang, 2015), literature research, native Tanka speakers, and interviews.Natural language surveyOur study collected two hours of natural language data on the discourse and oral culture of the Tanka people. For the discourse section, data was collected through storytelling and dialogs on eight Tanka-related topics: local conditions, customs, foods, traditional holidays, personal experiences, work experiences, hobbies, and family. Initially, participants discussed one of four topics: local conditions, customs, foods, or traditional holidays. This was followed by group interviews in which any of the eight topics could be discussed freely. During this phase, speakers were given full freedom to express themselves, and their speech was recorded verbatim using a recorder without any interference in the content of their discourse, allowing for maximum spontaneity. In the oral culture section, data were primarily collected through “visual storytelling” tasks. Participants watched a picture storybook, Frog, Where Are You? (Mayer, 1969), and two silent films, The Pear Story (Chafe, 1980) and the traditional Chinese tale The Cowherd and the Weaver Girl, and then retold the stories in the Tanka language. Videos and pictures were used as prompts to guide participants in narrating the observed stories, avoiding the use of text as a prompt to prevent their storytelling from being influenced by the written form of Chinese.Data collectionThe research was conducted over eight months from July 2023 to February 2024 in two Hong Kong villages, Luen Yick Fisherman Village and Sam Mun Tsai San Tsuen. These locations were strategically selected due to:(1)Their high population of the Tanka people, providing access to native speakers to ensure authenticity in research.(2)The rich cultural setting where the Tanka language and traditions are still integral to daily life, offering an in-depth understanding of the language’s cultural context.(3)The notable lack of research on the Tanka language in these areas, presenting an urgent need for thorough exploration.(4)The absence of significant differences between the Tanka languages spoken in both locations, except for generational variances.The research methodology encompassed interview sessions that covered diverse aspects such as word pronunciation, vocabulary usage, sentence construction, storytelling, and spontaneous conversations. This approach aimed to capture the Tanka language in its most natural form and its practical application in daily life. To prevent external influences, informants were not shown any written items from the interview list or asked to translate from a commonly spoken language into the Tanka language. Instead, flexible elicitation techniques were employed, including nonverbal prompts for nouns, acting for verbs and adjectives, and descriptive methods for more complex linguistic elements.The interview data were recorded, imported into an online database, and categorized with Mandarin annotations for efficient transcription and analysis. All interviews were conducted in a quiet room with only the informant present, using Sony PCM-A10 recorders and wireless microphones, with a sampling rate exceeding 48,000 Hz, to minimize environmental distractions and equipment interference.Data transcription and analysisThe primary objective of our linguistic data collection is to provide phonetic transcriptions of words, phrases, and sentences using the International Phonetic Alphabet (IPA), following the symbols and conventions of its 2020 revision. This task is carried out by three researchers experienced in the Tanka language investigation and trained in language transcription. The data for transcription is categorized into three sections, with each researcher taking responsibility for one category. These transcriptions are then peer-reviewed by the other two researchers, ensuring a thorough and accurate transcription process. The first author of this study has the final review of the transcriptions once this process is completed.All four informants participated in the word transcription, while the vocabulary transcription mainly relied on data from Informant 1 and Informant 3. We transcribed 1000 words and 9247 vocabulary items. This was done to establish the phonological system of the Tanka language spoken in Luen Yick Fisherman Village and Sam Mun Tsai San Tsuen, Hong Kong.Following the transcription, we analyzed the data, identifying phonemes for consonants, vowels, and tones by comparing minimal pairs. This analysis led to two comparative studies based on the identified phonological system. The first comparison was generational, facilitated by the different ages of our four informants. This comparison revealed generational variations in the phonological system, which we hypothesize could be due to the influence of Cantonese. The second comparison positioned the Tanka language within the Sinitic languages by comparing its phonological system with that of Middle Chinese using diachronic data.The transcribed vocabulary list is not only useful for establishing the phonological system but also highlights key vocabulary for the everyday use of the Tanka people. We divided our vocabulary collection into 14 subgroups and transcribed 500 unique Tanka lexical items. For specialized vocabulary less likely to be of “Sinitic” origin, we attempted to trace their etymology. Additionally, we utilized the core vocabulary, specifically the Swadesh 207 list, for a synchronic comparison. By comparing the Swadesh 207 Tanka vocabulary with their Cantonese counterparts, we could identify the relationship between these two languages.ResultsPhonology of Tanka languageAs mentioned in Section “Introduction”, previous studies have examined the Hong Kong Tanka language, including the phonological system of the Sam Mun Tsai variety, which was documented by Zhuang (2009). However, by comparing the previously available data with our firsthand findings, we observed several differences that reflect a stronger influence of Cantonese in our data. For example, according to Zhuang (2009), the three rhymes /ɔi/, /œy/, and /ui/ in Cantonese had all merged into /ui/ in the Tanka language spoken in Sam Mun Tsai. Our study, however, indicates that this phenomenon is changing. In the recordings of elderly speakers (Informants 1, 2, and 3), words that correspond to /ɔi/, /œy/, and /ui/ in Cantonese display two pronunciations: [œi] and [ui], suggesting that the main vowel [œ] is beginning to emerge under Cantonese influence. For the youngest speaker (Informant 4), the rhyme [œy] is well established, aligning with its counterparts in Cantonese. Additionally, Zhuang (2009) noted that the vowel system of the Tanka language in Sam Mun Tsai lacked the main vowel /y/. However, our findings show that /y/ has started to appear among speakers. For instance, the word 魚 “fish” can now be pronounced as either [ji24] or [jy24]. Our documentation of the current phonological system of the Tanka language in Sam Mun Tsai captures these significant changes over a relatively short period, highlighting the substantial influence of Cantonese. This work is crucial for preserving the Tanka language and raising awareness in the academic community about its ongoing transformation and the need for its preservation. Therefore, this section aims to outline the phonological system of the Tanka language in detail. The examination of 1000 words and the vocabulary list yielded a phonological system consisting of 16 consonants, 8 vowels, and 9 tones (including the entering tone).Syllable structureSimilar to many Sinitic languages, the Tanka language syllable can be divided into an initial and a final (or a rhyme). The final, in turn, typically consists of a medial, a nucleus, and a coda. The initial position is exclusively occupied by a single consonant. The medial is represented by a glide, whereas the nucleus always consists of a vowel. As for the coda, it may be either a vowel or a consonant. The tone is associated with the entire syllable (see Fig. 3).Fig. 3Syllable structure in Tanka language.Full size imageA syllable of the Tanka language may consist of an initial and a final, which can include a medial, a nucleus, and a coda, as exemplified by 聲 [sieŋ44] “sound”. Syllables without an initial are also possible, as seen in 用 [joŋ32] “use”. Additionally, a syllable can be represented by a single initial consonant, such as 五 [m24] “five”. There are further variations, including syllables with only a “medial + nucleus” (腰 [jiu44] meaning “waist”), “nucleus + coda” (安 [ɔŋ44] “safe”), or a single nucleus (烏 [wu44] “dark”).ConsonantsThe Tanka language has a set of 16 consonants, outlined in Table 2.Table 2 Consonants in Tanka language.Full size tableWith the exception of the glottal stop /ʔ/, all other consonants can occur in initial positions. The consonants /ʔ/, /m/, and /n/ are also permissible as codas. It’s worth mentioning that the coda /m/ is exclusively found in the speech of Informant 4, the youngest informant. Additionally, this coda isn’t consistently used in positions where it would be expected.As in many Sinitic languages, there is a distinction in aspiration in the Tanka language, as evidenced by 多 [tɔ44] “many” and 拖 [t‘ɔ44] “pull”. The alveolar affricate /ts/, /ts‘/, and fricative /s/ exhibit free variations, specifically [tʃ], [tʃ‘], and [ʃ] respectively. The phoneme /ŋ/ is exclusively observed in the speech of the young informant (Informant 4) as an initial consonant, as illustrated by the examples, such as in the word 熬 “endure”: Informant 2 [au21] – Informant 4 [ŋau21]. However, its consistent application across all expected positions, based on the patterns of diachronic phonological development, is not observed. The initial consonant [ʋ] is only found in the speech of one elder informant (Informant 1), who replaces the [w] with [ʋ] in certain words, as demonstrated in the example 橫 “horizontal”: Informant 3 [wan21] – Informant 1 [ʋan21]. The glides [j] and [w] occur in initial and medial positions. In phonological transcription, we consider [j] and [w] as variants of the vowels /i/ and /u/ respectively.VowelsThe Tanka language has a rich reservoir of vowels with 8 phonemes (see Table 3).Table 3 Vowels in Tanka language.Full size tableThe close-middle back rounded vowel /o/ and the open-middle back rounded vowel /ɔ/ contrast before the velar nasal coda /ŋ/, as show the following examples: 銅 [t’oŋ21] “copper” – 糖 [t‘ɔŋ21] “sugar”, 公 [koŋ44] “male” – 薑 [kɔŋ44] “ginger”, 進 [tʃoŋ32] “enter” – 像 [tʃɔŋ32] “resemble”, etc. In the other phonological environment, /o/ and /ɔ/ are free variations.The front close unrounded vowel /i/, the front close-middle unrounded vowel /e/ and the central near open vowel /ɐ/ have respectively a free realization [ɪ], [ɛ] and [ə].The status of the front open-middle rounded vowel /œ/ presents a nuanced situation. In the speech of the older informant (Informant 2), [œ] is observed as a free variant of /u/ preceding the coda /i/, evident in examples such as 去 [hœi32/hui32] “go, leave” and 娶 [tʃ‘œi35/tʃ‘ui35] “marry (a woman)”, although [ui] is more commonly realized. On the other hand, the younger informant (Informant 4) regards /œ/ as a distinct phoneme, as shown by minimal pairs like 裝 [tʃɔŋ44] “hold” versus 中 [tʃoŋ44] “middle” versus 張 [tʃœŋ44] a family name, and 肝 [kɔŋ44] “liver” versus 公 [koŋ44] “male” versus 薑 [kœŋ44] “ginger”. Notably, in Informant 4’s speech, words ending in /œŋ/ align with /ɔŋ/ in Informant 2’s speech. Additionally, Informant 1 demonstrates a mix of pronunciations, utilizing /ɔŋ/ for some words and /œŋ/ for others, indicating a possible shift from /ɔŋ/ to /œŋ/ within the Tanka community. It’s important to emphasize that this transition only pertains to words categorized under the 宕攝 (dàngshè, “the Category Dang”) of Middle Chinese. Other words containing /ɔŋ/ remain unchanged.The vowels /i/, /u/, and /y/ are observed in the coda position, but the coda /y/ is exclusively found in the speech of Informant 4, the youngest informant, as demonstrated by the following examples: In 去 “go, leave”, Informant 2’s pronunciation is [hœi32/hui32], whereas Informant 4’s pronunciation is [hœy32]. Similarly, for 取 “take”, Informant 2’s pronunciation is [tʃ‘œi35/tʃ‘ui35], while Informant 4’s pronunciation is [tʃ‘œy35].Table 4 illustrates the potential rhymes found in the Tanka language (when a syllable begins with a semi-vowel, we represent the initial as [j] or [w]). Rhymes enclosed in parentheses are confirmed in Informant 4’s speech and exhibit characteristics more akin to Cantonese rhymes, while the rhymes enclosed in square brackets are solely present in Informant 1’s speech.Table 4 Potential rhymes in Tanka language.Full size tableTonesIn the Tanka language, there are nine tones. We treat the entering tone as distinct, despite some entering tones sharing pitch with tones in other categories. Following Chinese linguistic tradition, we use Chao’s (1930) five-level scale for tone representation, where 1 is the lowest and 5 is the highest pitch. Tones (excluding entering tones) are as follows: [high-level]: /44/ (with a variant [43]), [mi-level]: /33/, [low-falling]: /21/, [mid-rising]: /35/, [low-rising]: /24/, and [mid-falling]: /32/.Entering tones are relatively short and are accompanied by a glottal stop coda. In the Tanka language, there are three entering tones: [low]: /2/, [high]: /4/, and [middle]: /32/. Interestingly, the low and high entering tones are falling tones, noted with a single number due to their shorter duration compared to others. Tone sandhi is observed; for instance, 門 “door” is pronounced [mun21] as a monosyllabic word, but after a word with the mid-falling tone /32/, it should be pronounced as [mun24]. Due to space limitations, further exploration of tone sandhi will be undertaken in subsequent research.Vocabulary of Tanka languageTanka culture-specific vocabularyWe have surveyed a total 9247 lexical items of the Tanka language, spanning 29 semantic categories, which include astronomy, geography, seasons and time, agriculture, plants, animals, houses and buildings, tools and utensils, titles and appellations, relatives and kinship, body parts, diseases and medical care, clothing and accessories, food and drink, major life events (such as weddings and funerals), daily life, legal matters, social interaction, business and transportation, culture and education, leisure activities, actions and movements, positions and locations, pronouns, adjectives, adverbs and prepositions, quantifiers and measures, additional components, and numbers.The results of the survey revealed that Tanka people’s distinctive water-borne lifestyle has led to the evolution of a specialized lexicon within the Tanka language, which serves as a rich repository of Tanka people’s cultural heritage, covering the entirety of their fishing practices and vividly reflecting their deep-rooted boatman way of life. Subsequent sections will closely explore the distinctive Tanka vocabulary which is enriched with words and terms that embody fishing, sea-related knowledge, weather patterns, and songs. Each category of vocabulary offers a unique insight into the Tanka’s symbiotic relationship with the sea, their socio-cultural dynamics, and their enduring cultural legacy.Fishing-related vocabularyThe Tanka community has developed a livelihood mainly centered on fishing, which has resulted in a diverse range of lexical items relating to fishing methods and equipment. These lexical items offer a window into the Tanka people’s deep connection with the sea, their diverse fishing techniques, the multifunctionality of their boats, a self-reliant community structure on the water, a focus on practical living, and economic challenges in their life. Some examples include:做海 [tsou32 hoi35]: fishing for a living at sea打山口 [ta35 saŋ43 hɐu35] fishing near a recognizable mountain without using a buoy to prevent theft捕風尾魚 [pɔʔ32 foŋ43 mei24 ji21]: fishing after a strong wind and catching fish in the wind’s wake撈火 [lau44 fɔ35]: attracting fish with light, then scooping them up飛叉 [fei44 ts’a43]: harpooning fish quickly by hand拖網 [t‘ɔ43 mɔŋ24]: trawling圍網 [wɐi21 mɔŋ24]: surrounding fish with a net落罟 [lɔʔ2 ku44]: put the fishing net into the sea火艇 [fɔ35 t’ieŋ24]: boat used for illumination雪船 [siʔ32 sin21]: boat for selling and transporting ice酒艇 [tsɐu21 t’ieŋ24]: boat for hosting banquets雜貨艇 [tsaʔ2 fɔ33 t’ieŋ24]: boat selling groceries水艇 [sui35 t’ieŋ24]: boat selling fresh water鮮艇 [sin44 t’ieŋ24]: boat for selling seafood大眼機 [tai32 an24 kei44]: boat with eyes painted on it生艙 [san44 ts‘ɔŋ44]: small cabin on the boat for raising fish (with four holes at the bottom for sea water)大櫃面 [tai32 kɐi32 min32-35]: dining area on the boat屎坑板 [si35 haŋ44 pan35]: toilet area on the boatThe lexical item 做海 [tsou32 hoi35] “fishing for a living at sea” indicates Tanka people’s strong connection to the sea, suggesting that fishing and marine activities are central to their way of life. This reliance on the sea for livelihood is a defining characteristic of their culture. Distinctive phrases such as 打山口 [ta35 saŋ43 hɐu35] “fishing near a recognizable mountain without using a buoy to prevent theft” and 捕風尾魚 [pɔʔ32 foŋ43 mei24 ji21] “fishing after a strong wind and catching fish in the wind’s wakes” suggest that the Tanka people, honed through generations of living and working on the water, possess deep knowledge of the marine environment and a profound understanding of the sea and its relation to the land, as well as the behavior of fish in different weather conditions. The variety of fishing methods, such as 撈火 [lau44 fɔ35] “attracting fish with light”, 飛叉 [fei44 ts’a43] “harpooning fish quickly by hand”, 拖網 [t‘ɔ43 mɔŋ24] “trawling” and 圍網 [wɐi21 mɔŋ24] “surrounding fish with a net” shows the Tanka people’s adaptability and ingenuity in developing various techniques to efficiently harvest marine resources. The existence of specialized boats such as 火艇 [fɔ35 t’ieŋ24] “boat used for illumination”, 雪船 [siʔ32 sin21] “boat for selling and transporting ice”, and 酒艇 [tsɐu21 t’ieŋ24] “boat for hosting banquets” suggests that boats in Tanka culture serve multiple purposes beyond fishing. They act as commercial vessels, transportation means, and even social gathering spots, highlighting the centrality of boats in the Tanka society. The variety of boat types for different commercial activities, such as 雜貨艇 [tsaʔ2 fɔ33 t’ieŋ24] “boat selling groceries”, 水艇 [sui35 t’ieŋ24] “boat selling fresh water”, and 鮮艇 [sin44 t’ieŋ24] “boat for selling seafood”, implies a self-sufficient community with a complex social and economic structure, capable of meeting most needs within their maritime environment. The practice of painting eyes on boats, as denoted by 大眼機 [tai32 an24 kei44] “boat with eyes painted on it”, is rooted in Tanka people’s spiritual or superstitious beliefs. These painted eyes mean to “see” through the dangers of the sea, offering protection and guidance, reflecting the Tanka people’s respect for and reliance on the ocean, as well as their desire to harmonize with the natural and spiritual world. Lexical items like 生艙 [san44 ts‘ɔŋ44] “small cabin on the boat for raising fish” and 大櫃面 [tai32 kɐi32 min32-35] “dining area on the boat” reflect the functional design of living spaces on the boats. The Tanka people have adapted to limited space and resources, focusing on practicality and efficiency in their living conditions. The existence of vocabulary reflecting basic and limited facilities (e.g., 屎坑板 [si35 haŋ44 pan35] “toilet area on the boat”) reveal that their lifestyle, while culturally rich, seems to have been marked by a lack of modern amenities and comforts, hinting at the economic hardships faced by the Tanka people.Sea-related vocabularyThis set of sea-associated specialized vocabulary underscores the Tanka people’s deep interaction with and extensive understanding of the ocean and their strategic adaptation to living in a marine environment. It reflects how their lifestyle, work, and even survival are intricately tied to the sea and its conditions. The following are some important examples:獨洲 [tuʔ2 tsɐu43]: islet石排 [sieʔ2 p’ai21]: reef by the shore暗排 [ɐn33 p’ai21]: submerged reef (not visible in the water)漁排 [ji21 p’ai21]: fish farm enclosed in the sea風□Footnote 2[foŋ44 po32]: seawall for wind protection避風塘 [pei21 foŋ44 t‘ɔŋ21]: harbor for sheltering from the wind and typhoon水大 [sui35 tai32]: high tide水乾 [sui35 koŋ43]: low tide冚浪 [k‘ɐn35 lɔŋ32]: waves covering overLexicon like 獨洲 [tuʔ2 tsɐu43] “islet”, 石排 [sieʔ2 p’ai21] “reef by the shore” and 暗排 [ɐn33 p’ai21] “submerged reef in the sea” reflect an intimate knowledge of the coastal and marine topography. This understanding is crucial for navigation, fishing, and safety at sea, indicating the Tanka people’s close relationship with their maritime environment. The lexical item 漁排 [ji21 p’ai21] “fish farm enclosed in the sea” suggests that the Tanka people engage in aquaculture, using enclosed areas in the ocean for fish farming. This manifests a diversification of their livelihood beyond just fishing, adapting to and utilizing the sea in different ways. The existence of words like 風□ [foŋ44 po32] “seawall for wind protection” and 避風塘 [pei21 foŋ44 t‘ɔŋ21] “harbor for sheltering from the wind and typhoon” indicates the Tanka people’s strategic adaptation to fight against natural hazards and reflects the development of infrastructure and practices to protect against the harsh elements often encountered at sea. 水大 [sui35 tai32] “high tide”, 水乾 [sui35 koŋ43] “low tide”, 冚浪 [k‘ɐn35 lɔŋ32] “waves covering over” demonstrate the Tanka people’s understanding of tidal patterns and waves, which is essential for activities like fishing, navigation, and managing their floating homes or structures.Weather-related vocabularyThe weather-related vocabulary highlights the Tanka people’s intimate connection with their environment, their extensive experience with and adaptation to maritime weather conditions, the impact of weather on their livelihood, and the possible cultural and spiritual dimensions of natural phenomena in their lives. Here are some representative examples: [t‘in43 siʔ32]: lightning打西北□ [ta35 sɐi44 paʔ4 ts’ai43]: northwest wind blow長命雨 [ts‘ɔŋ21 mieŋ33 ji24]: long-lasting rain扯波 [ts’ie35 pɔ44]: a typhoon is hitting打水氣 [ta35 sui35 hei33]: thunderstormSpecialized phrases like [t’in43 siʔ2] “lightning”, 打西北□ [ta35 sɐi44 paʔ4 ts’ai43] “northwest wind blow”, and 長命雨 [ts‘ɔŋ21 mieŋ33 ji24] “long-lasting rain” indicate Tanka people’s deep connection to the natural world and a keen observation of weather patterns. Living close to the sea, the Tanka people are likely to be highly attuned to changes in weather, which directly affects their livelihood and daily activities. The lexical items 扯波 [ts’ie35 pɔ44] “a typhoon is hitting” and 打水氣 [ta35 sui35 hei32] “thunderstorm” reflect the Tanka’s familiarity with and resilience to severe weather conditions, which are common in maritime regions. They play a crucial role in their fishing activities. Understanding and predicting weather conditions would be vital for planning fishing trips, navigating the seas, ensuring safety, and maximizing their catch.Song-related vocabularySongs are an important part of Tanka people’s cultural heritage. Their song-related lexicon unveils a world where music and song are not merely artistic expressions but also vital components of their daily life, social ceremonies, and emotional communication. The following selected lexical items related to different song genres provides a glimpse into how the Tanka people use music as a tool for storytelling, historical preservation, and expressing complex social and personal themes.鹹水歌 [han21 sui35 kɔ44]: saltwater songs, a type of folk song that originates from the Tanka people歎歌 [t’an32 kɔ44]: lament songs, a kind of song sung mainly by the Tanka women on different ceremonies歎梅娘 [t’an32 mui21 lɔŋ21]: a kind of lament song mainly sung at weddings by the brides for showing gratitude and other emotions to parents and relatives回嘆 [wui21 t’an32]: responding songs to lament songs花歌 [fa44 kɔ44]: flower song mainly sung by the Tanka men especially when pursuing a girlThese song-associated lexical items reveal that music and singing play a vital role in the Tanka people’s cultural expression, social rituals, and personal relationships. 鹹水歌 [han21 sui35 kɔ44] “saltwater songs” often tell stories or convey messages, serving as a means of oral history to pass down stories, legends, and knowledge from generation to generation. Apart from entertainment, these songs have served practical purposes, such as coordinating work on ships, lifting spirits during long voyages, or expressing collective emotions. The existence of different types of songs for various occasions and purposes (such as 歎歌 [t’an32 kɔ44] “lament songs” at ceremonies, 歎梅娘 [t’an32 mui21 lɔŋ21] “lament songs of gratitude at weddings”, and 花歌 [fa44 kɔ44] “flower songs for courtship”) not only underscores the ritualistic and ceremonial significance of music in Tanka culture but also reveals the importance of music as a means of emotional expression and social communication within the community. The differentiation in songs sung by men and women (like 歎歌 [t’an32 kɔ44] “lament songs” mainly by women and 花歌 [fa44 kɔ44] “flower songs for courtship” by men) highlights gender-specific roles and practices in their society. This can provide insights into the social structure and gender dynamics of the Tanka people. The variety of songs, each with its own style and purpose, highlights the artistic creativity and esthetic values of the Tanka people. These songs are not just forms of entertainment and art, but also key elements in preserving and transmitting the Tanka’s cultural heritage and collective identity.In summary, the diverse vocabulary of the Tanka language serves as a living archive of the Tanka people’s rich cultural heritage. These lexical items do far more than just describe; they narrate the story of a people whose life is deeply integrated with the rhythms and challenges of the sea and the water. The distinctive Tanka vocabulary is not only a linguistic treasure but also a cultural bridge connecting the past with the present, and preserving a unique way of life for future generations.Lexical relatednessIn this research, the Swadesh 207 Word List was utilized as a tool to quantitatively analyze the lexical relationship of the Tanka language with Cantonese and Mandarin. The study involved a meticulous comparison of all 207 words from the list across these languages utilizing a lexicostatistical technique.For the comparison, the Tanka language data was primarily sourced from the oldest informant, Informant 3, who is 92 years old. This choice was informed by the recognition that linguistic variations exist among speakers of different ages, primarily due to factors such as language contact. Older speakers, like Informant 3, are often considered repositories of a more authentic or traditional form of the language, in this case, preserving a greater number of Tanka words that might have been lost or altered in younger generations. This approach acknowledges the dynamic nature of language while aiming to capture its most original form as spoken by the Tanka community.The data for the Swadesh lists of Cantonese and Standard Mandarin, used for comparative purposes, were collected from two native speakers of Cantonese and Standard Mandarin, respectively and transcribed by the researchers to ensure a higher level of reliability of the lists in the linguistic comparison. In this study, a rigorous standard was applied, recognizing only those word pairs as cognates which are completely identical in form, meaning, and sequence. For instance, the word for “animal” in the Tanka language is 動物 [toŋ21 mɐʔ2], and it is identically represented as 動物 [tʊŋ22 mɐt22] in Cantonese, exemplifying a precise match. However, scenarios where one language has a single expression that corresponds to multiple expressions in another language, or cases where two languages share a common root but differ in affixes, were deliberately excluded as cognates in this study. These instances will be further explored in our future research endeavors. It is worth mentioning that the comparisons here are made strictly synchronously, meaning we are only examining the usage of words at present. Certain words such as 翼 “wing” existed in Old Chinese but have since fallen out of regular use in modern standard spoken Mandarin. Today, 翅膀 is the common term for “wings”. We are not taking into account diachronic developments, the changes in language usage over time, within this comparison. The lexical relatedness (LR) in this study is computed using the formula proposed by Keraf (1984:172):$${\rm{LR}}={\rm{IW}}/\,{\rm{G}}\,\times {100} \%$$LR= lexical relatednessIW=number of identical wordsG=number of glossesAfter a detailed comparison of 207 words between the Tanka language and Cantonese, it was found that 176 words are identical (see Appendix I). Therefore, the degree of kinship between these two languages can be calculated using lexicostatistics:$${\rm{LR}}={\rm{IW}}/\,{\rm{G}}\,\times {100} \% ={176}/{207}\times {100} \% ={85.02} \%$$This high percentage reflects a considerable degree of linguistic kinship between the Tanka language and Cantonese, which is consistent with the findings of previous studies (Bai, 2007; Huang, 1991; Zhang and Huang, 1988). It is noteworthy that the Tanka language exhibits generational variations in vocabulary. For instance, the words used by a younger informant, Informant 4, aged 57, significantly differ from those of the older generation. Among the 207 words analyzed, Informant 4’s vocabulary only diverged in two instances from Cantonese: □□ [la44 ha21] “who” and □□ [la44 lei44] “where”. This variation results in a lexical relatedness rate of 99.03% with Cantonese. This trend highlights a considerable influence of Cantonese on the linguistic patterns of the younger Tanka generation, suggesting a notable shift towards greater assimilation with mainstream Cantonese language practices.In the lexical comparison between the Tanka language and Mandarin, it was found that out of the 207 words analyzed, 102 are identical (See Appendix II). Thus, the degree of lexical relatedness between these two languages can be calculated using lexicostatistics:$${\rm{LR}}={\rm{IW}}/\,{\rm{G}}\,\times {\rm{100}} \% ={102}/{207}\times {100} \% ={49.28} \%$$This percentage suggests a moderate level of lexical correlation, indicating that the linguistic connection between the Tanka language and Mandarin is not as close as that between the Tanka language and Cantonese.In conclusion, the analysis reveals a significant lexical similarity between the Tanka language and Cantonese, with an even higher congruence observed among the younger Tanka speakers, indicating a strong influence of Cantonese on the Tanka language over generations. In contrast, the relationship between the Tanka language and Mandarin is less notable, suggesting more distinct linguistic features. These findings underscore the dynamic nature of linguistic evolution within the Tanka community and provide valuable insights into the intricate web of linguistic relationships among these languages.DiscussionHistorical phonological comparison of Tanka language and CantoneseThis section seeks to explore the reflections of the Middle Chinese phonological system in the Tanka language, drawing comparisons with CantoneseFootnote 3 to illuminate their relationship. It will begin with an analysis of the initial consonants, followed by an examination of the vowel systems, and conclude with a comparison of the tonal structures in both languages.Comparison of initial consonantsIn traditional analysis of the diachronic phonological evolution of Sinitic languages, the synchronic phonological system is typically compared with the reconstructed Middle Chinese phonological system. Hence, we aim to offer a concise overview of Middle Chinese prior to delving into the diachronic analysis. Middle Chinese (hereafter MC) was the official language from the 5th to the 12th century in China. With the exception of the Min (閩語 mǐnyǔ), which clearly exhibits phonological distinctions preceding MC, other Sinitic languages mainly feature the phonological distinctions of MC (Baxter 1992, p. 14-15). Thus, the phonological evolution from MC to today’s Sinitic languages is an important criterion for determining the various groups of Sinitic languages (Yuan, 2001 [1960]). As the Tanka language in Luen Yick Fisherman Village and Sam Mun Tsai San Tsuen does clearly not belong to the Min group, the phonological distinctions in the Tanka language clearly reflect the phonological system of MC. This is why we are committed to studying the comparison between the MC and Tanka phonological systems.According to the rhyme book Qièyùn (601) and its revised version Guǎngyùn (1008), researchers have identified 37 initials in Early MC (approximately 5th - 9th century, cf. Baxter, 1992, p. 14; Mai, 2009, p. 58-59, etc.). Table 5 displays the reconstructions of these initials by Baxter (1992), along with their reflections in the Tanka language and Cantonese.Table 5 Comparison of initials in Tanka language and Cantonese.Full size tableTable 5 illustrates that, in terms of initial consonants, the Tanka language and Cantonese exhibit nearly identical diachronic evolutions, save for two sets of initials. The first disparity involves two nasal initials in Middle Chinese. In the Tanka language, the dental nasal initial n- and the retroflex nasal initial nr- in Middle Chinese have amalgamated into /l/. Conversely, in Cantonese, these two consonants have merged into /n/. This distinction is exemplified by:娘 “mother” MC nrjang: Tanka [lɔŋ21] - Cantonese: [nœŋ21]泥 “mud” MC nej : Tanka [lɐi21] - Cantonese: [nɐi21]年 “year” MC nen : Tanka [lin21] - Cantonese: [nin21]鬧 “noisy” MC nraewH : Tanka [lau32] - Cantonese: [nau22]The transition of the dental nasal initial n- and the retroflex nasal initial nr- from Middle Chinese to /l/ in the Tanka language may be associated with the “lazy pronunciation (懶音 lǎnyīn)” phenomenon observed in Cantonese, where /l/ and /n/ frequently alternate as free variants. This “lazy pronunciation” reflects a phonetic reduction, motivated by the principle of linguistic economy.The second disparity pertains to the velar stops of Middle Chinese. In Cantonese, certain words that originally had these initials in Middle Chinese now begin with labiovelar consonants, while in the Tanka language, they retain their original velar pronunciation. In our analysis, we consider these labiovelar consonants as a velar consonant coupled with a medial [w]. Consequently, this distinction essentially lies in the differences in the rhymes; for further elucidation, please refer to Section “Comparison of rhymes”.Comparison of rhymesAccording to the Qièyùn Rhyme Dictionary, there are 193 finals (rhymes) in Middle Chinese, grouped into sixteen categories (攝 shè) based on their nuclear vowels and codas. Simultaneously, these finals are also divided into four divisions: Division I comprise finals with a back vowel -u-, -ɑ-, or -o-; Division II includes finals with a non-high front vowel -æ- or -ɛ-; Division III consists of finals with a medial -j-; and Division IV presents a mid-high front vowel -e-. Another important concept to explain is the distinction between a Kaikou final and a Hekou final: while a Kaikou final represents a final without a medial -w-, the Hekou final is one with the medial -w-. We will compare the finals in the Tanka language and Middle Chinese based on their corresponding rhyme categories.While certain Middle Chinese categories, such as 通 tōng (-uwng, -owng, -juwng, and -jowng), 效 xiào (-aw, -æw, -j(i)ew, -ew), 果 guǒ (-(w)a, -j(w)a), 假 jiǎ (-(w)æ, -jæ), 流 liú (-uw, -juw, -(j)iw), and 深 shēn (-(j)im), show similar evolution patterns in both the Tanka language and Cantonese, distinct differences emerge in the evolution of other categories between the Tanka language and Cantonese. As summarized in Table 6, these disparities manifest in differences involving the nucleus, the medial glide, and the coda Table 7.Table 6 Rhyme category divergences in Tanka language and Cantonese.Full size tableSome distinctions are observed in the reflections of the nucleus. Specifically, the rhymes 陽 yáng -jang and 藥 yào -jak of the Category 江 jiāng have transitioned into /-(j)ɔŋ/ and /-(j)ɔʔ/ respectively in the Tanka language. In contrast, these rhymes typically transform into /-(j)œŋ/ and /-(j)œk/ in Cantonese, as exemplified by the following instances: 藥 “medicine”: Tanka [jɔʔ2] - Cantonese [jœk22]; 癢 “itch”: Tanka [jɔŋ24] - Cantonese [jœŋ13], 搶 “rob”: Tanka [tʃ‘ɔŋ35]-Cantonese [tʃ‘œŋ35]. Nevertheless, as discussed in Section “Phonology of the Tanka language”, the younger informant of the Tanka language has adopted the Cantonese evolution for words with the rhymes 陽 yáng -jang and 藥 yào -jak in Middle Chinese.Another dimension of nucleus evolution that distinguishes the Tanka language from Cantonese involves the Category 遇 yù (rhymes 模 mú -u, 魚 yú -jo, 虞 yú -ju), as well as the Hekou rhyme of 仙 xiān -j(w)(i)en, 先xiān -(w)en, 元 yuán -j(w)on from the Category 山 shān in Middle Chinese. In the Tanka language, certain words from these rhymes now feature /i/ as the nucleus, whereas in Cantonese, these words retain nucleus /y/. Illustrated below are a few examples: 樹 “tree”: Tanka [si32] - Cantonese [ʃy22]; 魚 “fish”: Tanka [ji21] - Cantonese [jy21]; 權 “power”: Tanka [k’in21] - Cantonese [k’yn21].The third notable difference in the evolution of rhyme between the Tanka language and Cantonese is observed in certain words from the rhymes 青 -(w)eng, 清 -j(w)(i)eng, and 庚 -j(w)æng. In Cantonese, these words often have /ɪŋ/ or /ɛŋ/ in their rhymes, while in the Tanka language, they consistently feature /ieŋ/. For example, 柄 “handle”: Tanka [pieŋ33] - Cantonese [pɪŋ33], 鏡 “mirror”: Tanka [kieŋ33] - Cantonese [kɛŋ33].The medial -w- of MC shows different developments in the Tanka language and Cantonese. In Tanka, certain words from the categories 止 zhǐ, 蟹 xiè, and 臻 zhēn, which originally featured a velar stop as the initial consonant, have undergone the loss of their medial -w- but maintain it in Cantonese. For instance, 鬼 “ghost” is pronounced [kɐi35] in the Tanka language and [kwɐi35] in Cantonese; 掛 “hang” is pronounced [ka33] in the Tanka language and [kwa33] in Cantonese; and 軍 “army” is pronounced [kɐn44] in the Tanka language and [kwɐn55] in Cantonese.Regarding the differences in codas between the Tanka language and Cantonese, we have identified two primary distinctions. Firstly, words from the Category 咸 xián in Middle Chinese typically end with a bilabial nasal coda /m/ in Cantonese, while in the Tanka language, they generally conclude with an alveolar nasal coda /n/. For example, the pronunciations of the following words are nearly identical except for the coda: 三 “three”: Tanka [san44] - Cantonese [sam55], 喊 “shout”: Tanka[han33] – Cantonese [ham33]. It is noteworthy that in the speech of the younger informant (Informant 4), some words are pronounced with a bilabial nasal coda /m/. This may be due to the influence of Cantonese, where the bilabial nasal coda from Middle Chinese (MC) is preserved. Based on this, we propose the following hypothesis: in the Tanka language, the bilabial nasal coda -m from MC has merged with the alveolar nasal coda -n, but under the influence of Cantonese, the bilabial nasal coda has reappeared in the speech of younger speakers.Secondly, there is an alternation between the Middle Chinese codas /n/ and /ŋ/ in the Tanka language. Certain words with the rhyme 寒 -(w)an in Middle Chinese become /ɔŋ/ in the Tanka language, whereas they retain the alveolar nasal coda in Cantonese, such as 安 “safe”: Tanka [ɔŋ44] - Cantonese [ŋɔn55];Footnote 4 汗 “sweat”: Tanka [hɔŋ32] - Cantonese [hɔn33]. Conversely, some words derived from the Categories 梗 gěng (-(w)æng, -(w)ɛng, -j(w)æng, -j(w)(i)eng, -(w)eng) and 曾 zēng (-(w)ong, -(w)ing) typically have a nasal velar coda in Cantonese, but in the Tanka language, some words feature an alveolar nasal coda instead, as seen in: 更 “update”: Tanka [kɐn44] - Cantonese [kɐŋ55], 冷 “cold”: Tanka [lan24] - Cantonese [laŋ13], 清 “clear”: Tanka [ts‘ɐn43] - Cantonese [ts‘ɪŋ55], 升 “rise”: Tanka [sɐn44] - Cantonese [sɪŋ55], etc.The comparative analysis of the rhymes in Middle Chinese, Tanka language, and Cantonese reveals several distinct differences between the Tanka language and Cantonese, which can be condensed into three main aspects: the nucleus, the medial, and the coda. Furthermore, discrepancies between the speech patterns of the elder and younger informants underscore the influence of Cantonese on the Tanka language.Comparison of toneAccording to rhyme dictionaries (for example, 切韻 Qièyùn), there are four tones in Middle Chinese: the level tone (平聲 píngshēng), the rising tone (上聲 shǎngshēng), the departing tone (去聲 qùshēng), and the entering tone (入聲 rùshēng).The disparity in tone evolution between the Tanka language and Cantonese lies in the treatment of Middle Chinese entering tone codas -p, -t, -k, which are replaced by a glottal stop /ʔ/ in Tanka language (Table 7).Footnote 5Table 7 Tone evolution from Middle Chinese to Tanka language and Cantonese.Full size tableLexical comparison of Tanka language with Cantonese and MandarinMorpheme characteristics of the vocabulary of Tanka languageMorphemes are the smallest units of word formation. People speaking different languages may choose different morphemes to express the same concept or thing, or they might add or omit morphemes in a shared set, or differ in the order of morphemes. These variations can result in words with the same meaning but different expressions. In the Tanka language, the differences in morphemes are mainly manifested in the following aspects:(1) Morpheme selectionWhen faced with the same concept or thing, people may choose different morphemes due to different thinking habits. This directly causes differences in vocabulary between languages. Some of the vocabulary in the Tanka language have different morphemes with both Cantonese and Mandarin (see Table 8).Table 8 Tanka lexical items with morphemes distinct from their Cantonese and Mandarin equivalents.Full size table(2) Morpheme orderThe same concept, thing, action, or state can be expressed in different languages using the same morphemes but in a different order, creating variations in word formation. Some Tanka vocabulary items share the same morpheme order as Cantonese but differ from Mandarin, while other Tanka vocabulary items follow a different morpheme order from both Cantonese and Mandarin (see Table 9).Table 9 Comparison of morpheme order in Tanka, Cantonese, and Mandarin vocabulary.Full size tableAs illustrated in the first two rows of the table, both the Tanka language and Cantonese position gender-modifying adjectives after the head noun, whereas Mandarin places them before the head noun. Row 3–5 shows other examples of the “headword +modifier” structure, which is shared by Tanka language and Cantonese. This structure is also prevalent in several southern Sinitic languages, including Cantonese, Hakka, and Pinghua. Scholars such as Cen (1953), Huang (2015), and Yue-Hashimoto (1976) suggest that this similarity is due to the southern regions’ historical habitation by the Baiyue, an ancient group of various ethnicities. During the cohabitation and interaction in daily life and production, Chinese language and minority languages, especially Zhuang-Dong languages, came into contact, leading to mutual borrowing. In Zhuang-Dong languages, the structure of “headword + modifier” in compound words is common. The similar expressions in the Tanka language, Cantonese, Hakka, and Pinghua are likely influenced by Zhuang-Dong languages. However, scholars like Zhang (1989) contend that this word formation is a vestige of the ancient Chinese postpositive attribute phenomenon, unrelated to Zhuang-Dong languages. A second type of word formation in the Tanka language that differs from Mandarin but resembles Cantonese in morpheme order is the “juxtaposition style”, where the two morphemes in a word are parallel and have the same or similar meanings, as seen in 怪責 [kai33 tsaʔ32] “blame” and 宵夜 [siu44 je32] “late-night snack”. Notably, some Tanka vocabulary features a reversed word order compared to both Cantonese and Mandarin, such as 麻肉 [ma21 juʔ2] “cheesy, cringy”, which appears in the last row of the table.AffixesIn the Tanka language, monosyllabic affixes are abundant, many of which are not found in Mandarin or other Sinitic languages, significantly enhancing the diversity of the Tanka vocabulary. In addition to diverse monosyllabic affixes, some verbs, adjectives, and nouns in the Tanka language can also be combined with repetitive syllables to describe different properties, states, or action states. The monosyllabic and repetitive affixes in Tanka language are analyzed below.(1) Monosyllabic affixesA. PrefixesMain prefixes in the Tanka language include “阿[a33]” and “细[sɐi33]”.a. 阿 [a33]In Mandarin, the prefix “阿 [a33]” is used as well, but its usage is mostly limited to “aunt”, with few other applications. However, in the Tanka language, it is used more frequently and in a wider range of contexts. It is commonly used before surnames, first names, and kinship terms to indicate endearment and intimacy. For example:阿 [a33] + surname: 阿何 [a33 ho21], 阿黎 [a33 lai21]阿 [a33] + first name: 阿梅 [a33 mui21], 阿花 [a33 fa44]阿 [a33] + kinship title: 阿爺 [a33 je21] “grandpa”, 阿媽 [a33 ma44] “mom”, 阿伯 [a33 paʔ32] “uncle”b. 細 [sɐi33]In Tanka language, “细 [sɐi33]” is used to indicate “young” or “small”. For example:細叔 [sɐi33 suʔ4] “the youngest uncle”細佬 [sɐi33 lou35] “younger brother”細路仔 [sɐi33 lou32 tsɐi35] “children”細蘋果 [sɐi33 p’iŋ21 kɔ35] “small apple”B. SuffixesMain suffixes in the Tanka language include “佬 [lou35]”, “公 [koŋ44]”, “婆 [p‘ɔ21]”, “仔[tsɐi35]”, and “妹 [mui32]”.a. 佬 [lou35]“佬 [lou35]” is a noun suffix and is used to denote individuals associated with a specific profession or characteristic, indicating their role or key traits, and the usage often carries connotations of disdain or contempt. For example:大機佬 [tai32 kei44 lou35] “mechanic of the boat”漁民佬 [ji21 mɐn21 lou35] “fisherman”咬臍佬 [au24 ts’i21 lou35] “a child whose umbilical cord was bitten off by his/ her parents at birth”癲佬 [tin43 lou35] “madman”□佬 [pei43 lou35] “lame person”b. 公 [koŋ44]In the Tanka language, the term “公 [koŋ44]” originally refers to an older male, as seen in terms like 阿公 [a33 koŋ44] “grandfather” and 舅公 [k‘ɐu33 koŋ44] “mother’s brother”. When used as a noun suffix, it is generally placed after nouns and denotes older males engaged in a certain profession or possessing certain characteristics. For example:火頭公 [fɔ35 t’au21 koŋ44] “male cook”壽星公 [sou21 siŋ33 koŋ44] “the birthday man”c. 婆 [p‘ɔ21]“婆[p‘ɔ21]” originally refers to an elderly woman. As a noun suffix, it is used to indicate women who are engaged in a certain profession or who have certain features, analogous to the use of “公 [koŋ44]” for older men. For example:火頭婆 [fɔ35 t’au21 p‘ɔ21] “female cook”駝仔婆 [t‘ɔ21 tsɐi35 p‘ɔ21] “pregnant woman”d. 仔 [tsɐi35]“仔[tsɐi35]” originally means “son”, as in 大仔 [tai32 tsɐi35] “eldest son” and 细仔 [sɐi33 tsɐi35] “youngest son”. However, due to language evolution, “仔 [tsɐi35]” has long been paired with nouns and adjectives, typically positioned after the word it accompanies. Its part of speech has gradually weakened and become more abstract, eventually evolving into a suffix denoting diminutiveness, endearment, or familiarity. Specific uses are as follows:(a)animal + 仔 [tsɐi35] indicates smallness and cuteness: 魚仔 [ji21 tsɐi35] “small fish”, 老鼠仔 [lou24 si35 tsɐi35] “small mouse”(b)substances + 仔 [tsɐi35] suggests smallness: 刀仔 [tou43 tsɐi35] “small knife”, 涌仔 [ts’oŋ44 tsɐi35] “very small and narrow river or stream”(c)substances + 仔 [tsɐi35] refers to a type of young man: 白粉仔 [paʔ2 fɐn35 tsɐi35] “young man addicted to drug”(d)adjective + 仔 [tsɐi35] describes the characteristics of a young man: 肥仔 [fei21 tsɐi35] “fat boy or young man”e. 妹 [mui32]As a noun suffix, “妹 [mui32]” denotes girls or young women engaged in a certain profession or possessing certain characteristics, similar to the use of “仔 [tsɐi35]” for young men. It is commonly used after nouns or adjectives that denote profession or status. For example:水妹 [sui35 mui32] “girl or young woman from the Tanka community”鬼妹 [kɐi35 mui32] “foreign girl or young woman from overseas”肥妹 [fei21 mui32] “chubby girl or fat young woman”These prefixes and suffixes in the Tanka language, while not commonly found in Mandarin, are frequently used in Cantonese (Huang, 2015), highlighting a resemblance of the Tanka language and Cantonese.(2) Repetitive affixes (ABB)In the Tanka language, many words can have two repetitive syllables to deepen the degree of a trait or add additional meanings and emotional color to adjectives. For example:青□□ [ts’ieŋ43 koŋ24 koŋ21] “vibrantly green”紅□□ [hoŋ21 pɔʔ4 pɔʔ4] “blushing red”乌□□ [wu44 tsuiʔ4 tsuiʔ4] “very dark”眼光光 [an24 kɔŋ44 kɔŋ43] “glowing-eyed, very fresh”眼濛濛 [an24 moŋ44 moŋ44] “lifeless-eyed”木□□ [muʔ4 k’a21 k’a21] “stale”濛□□ [mun32 ts’a21 ts’a21] “dull-looking”Word classesA word in one language can have multiple grammatical functions permanently and not depend on the surrounding linguistic environment (Guo, 2002). The following usages of the Tanka words are similar to those in Cantonese but are quite different from those in Mandarin. For example:a.車 [ts’ie44] is a noun in Mandarin, but in the Tanka language, similar to Cantonese, it also serves as a verb, meaning “to drive”, as in 車船 [ts’ie44 sin21] “driving a boat”.b.安 [ɔŋ44] in the Tanka language can be a verb, meaning “to name”, but it has no such use in Mandarin.c.遮 [tsie44] in Mandarin is a verb meaning “to cover”, but in the Tanka language, it also serves as a noun, meaning “umbrella”.d.鬼 [kɐi35] in Mandarin is a noun, but in the Tanka language, it serves as an adverb to intensify the adjective that follows, e.g., 好鬼熱 [hou35 kɐi35 jiʔ2] means “very hot”.Word combination patternsIn the Tanka language, some words have different ranges of combination with other elements compared to Mandarin and Cantonese, for example:a.拋 [p’au44] can be combined with 風 [foŋ44] “wind”, 灣 [wan44] “bay”, 船 [sin21] “boat” in Tanka language, indicating to dock the ship in the harbor.b.埋 [mai21] can mean “to return” and is combined with 街 [kai43] “street” in the Tanka language, meaning to return to the land.c.扯 [ts’ie35] can be combined with 波 [pɔ44] “wave”, 風球 [foŋ44 k‘ɐu21] “typhoon signal” in the Tanka language, indicating the onset of a typhoon.d.排 [p’ai21] can be combined with 石 [sieʔ2] “rock”, 暗 [ɐn33] “hidden”, 漁 [ji21] “fish” in the Tanka language, referring to different maritime elements. 石排 [sieʔ2 p’ai21] means “reef”, 暗排 [ɐn33 p’ai21] means “submerged reef”, and 漁排 [ji21 p’ai21] means “fish farm enclosed in the sea”.e.落 [lɔʔ4] in Mandarin is usually combined with nouns like “flower, tear, pen” but in the Tanka language, it can be combined with 雪 [siʔ32] “ice”, 網 [mɔŋ24] “net”, 錨 [lau21] “anchor”, having meanings related to maritime activities. 落雪 [lɔʔ4 siʔ32] means “to buy and move ice blocks to the boat in the sea”, 落網 [lɔʔ4 mɔŋ24] means “to drop a net into the sea to fish”, and 落錨 [lɔʔ4 lau21] means “to drop an anchor into the sea”.The Tanka language exhibits special word combination patterns that reflect its maritime cultural context, distinguishing it from both Mandarin and Cantonese. Understanding these patterns offers valuable insight into the linguistic diversity of the Sinitic language family and the cultural identity of the Tanka community.Semantic characteristicsDifferences in word meanings are another important aspect of vocabulary variation. Word meanings reflect people’s personal understanding of the objective world and are closely linked to the complex objective world. Unlike phonology and grammar, the stability of word meanings is relatively low. The characteristics of word meanings in the Tanka language are explained below.(1) Words with the same forms but different meaningsA small number of words in the Tanka language have the same form with Mandarin and Cantonese but they can carry some different meanings. A couple of typical examples are listed in Table 10.Table 10 Tanka lexical items sharing the same form with Cantonese and Mandarin but conveying some different meanings.Full size table(2) Words with different forms but the same meaningsSome words in the Tanka language, despite differing in form their Cantonese and Mandarin counterparts, convey identical meanings, as are listed in Table 11.Table 11 Tanka lexical items with distinct forms compared to Cantonese and Mandarin but sharing equivalent meanings.Full size tableThe comprehensive quantitative and qualitative analysis clearly demonstrates that the Tanka lexicon possesses numerous distinctive features. These features show a significant degree of similarity to Cantonese, but they markedly diverge from Mandarin. This contrast underscores the unique linguistic identity of the Tanka language, highlighting its closer affinity with Cantonese while illustrating its distinct separation from Mandarin in terms of lexical characteristics.ConclusionThis study presents an in-depth investigation of the Tanka language, one of the languages spoken by the boat people of Hong Kong. The research focuses on the phonology and vocabulary of the language, as well as its relationship with Cantonese and Mandarin.Comparing the phonological systems of the Tanka language and Cantonese from both synchronic and diachronic perspectives reveals a largely shared phonological framework between the two languages. While there are minimal discrepancies in the initial consonants, the primary distinctions emerge in their rhyme reservoir. For instance, the vowel /œŋ/ serves as a distinct rhyme in Cantonese but not in the Tanka language. Additionally, in Cantonese, the medial -w- from Middle Chinese persists following a velar stop, whereas in the Tanka language, it is absent in certain words. We have also observed some variations in tone development from Middle Chinese in these two languages. Despite these phonological disparities, native speakers of the Tanka language and Cantonese can generally understand each other with relative ease. Thus, the phonological disparities appear insufficient to classify the Tanka language in Luen Yick Fisherman Village and Sam Mun Tsai San Tsuen as a distinct language from Cantonese.The rich maritime lexicon of the Tanka language was also examined, contrasting it with Cantonese and Mandarin. This analysis highlights the unique cultural and linguistic identity of the Tanka people. The vocabulary serves as a living archive, narrating the story of the Tanka people intricately intertwined with the sea, providing a linguistic and cultural bridge from the past to the present and into the future. Crucially, this lexicon encodes irreplaceable knowledge of marine ecosystems, traditional navigation, and communal rituals. Its attribution would not only impoverish linguistic diversity but also erase centuries of adaptive wisdom and dissolve a core pillar of Tanka identity. The comparative analysis, revealing a notable lexical affinity (85.02%) with Cantonese, especially among younger speakers (99.03%), points to a significant Cantonese influence on the Tanka people over generations. In contrast, the relatively lower lexical similarity (49.28%) with Mandarin highlights the Tanka language’s unique trajectory in linguistic evolution. This exploration not only uncovers the deep maritime connections embedded in the Tanka vocabulary but also enriches our understanding of the dynamic interplay between language, culture, and community within the broader Sinitic linguistic landscape.The current study acknowledges several limitations that necessitate further exploration. While we have focused on the phonology and vocabulary of the Tanka language, we recognize that grammar is a crucial component requiring more in-depth investigation. This undertaking is substantial given its complexity; however, we have already collected preliminary grammatical data and established methodologies for grammatical analysis (Comrie and Norval, 1977; Xia and Tang, 2021). Our future work will involve a systematic analysis of 711 grammatical sentences to examine Cantonese contact-induced changes. Another vital aspect of the urgent need to preserve the endangered Tanka language. We will prioritize the development of actionable strategies. Furthermore, the urgency of preservation extends beyond linguistics: reviving the Tanka language is integral to reclaiming cultural autonomy, reinforcing collective memory, and empowering a community historically subjected to marginalization. Building upon our preliminary data and established methodologies, we plan to design and implement community-led language documentation projects. These initiatives will train and empower native Tanka speakers to record oral histories, narratives, and everyday speech. Simultaneously, we will explore effective pathways to integrate elements of the Tanka language into local school curricula, potentially through pilot programs that develop culturally relevant teaching materials in collaboration with community elders and educators. These combined initiatives are designed to foster intergenerational transmission, raise broader awareness, and ultimately contribute to the long-term vitality of the Tanka language.The implications of this study for future research are significant. Firstly, it introduces the Tanka language to the global scholarly community, paving the way for further studies on language preservation, documentation, and revitalization. Such efforts are inseparable from preserving intangible cultural heritage and supporting indigenous resilience. Secondly, a deeper understanding of the Tanka language will provide insights into the population evolution in Southeastern China. Lastly, the exploration of the relationship between the Tanka language and Cantonese will contribute to unveiling the origins of the Tanka people and further understanding of the typological features of Sinitic languages in Southeastern China.