Multimodal world construals in English translations of Hongloumeng: a cognitive stylistic and systemic functional linguistic analysis

Wait 5 sec.

IntroductionFirst published in 1792, Hongloumeng (A Dream of Red Mansions, or The Story of the Stone) was considered “the greatest of all Chinese novels” (Hsia, 1968, p. 225). It has enjoyed enduring fame for its vivid and realistic reflection of almost every facet of eighteenth-century China (Chou and Liu, 2023, p. 1). Its detailed description of the luxurious mansions and gardens of the aristocratic Jia family prompts readers to create a lifelike text world with their imagination (Zhou, 2021, p. 1).Enjoying widespread popularity in China, the novel has spread rapidly worldwide (Hou, 2014, p. 5). So far, it has been translated into English with various abridged versions and two full translations. Some literary criticisms pointed out that reading Hongloumeng in English was a different experience from reading it in Chinese (Starr, 2012, p. 116). Previous studies have shown how its English translations created different reading experiences for target readers in rendering metaphors (Han, 2020), culturally loaded words (J. Liu, 2014), narrative style (Chou and Liu, 2023), and characterization (Yao, 2017). Nevertheless, research on text worlds in these translations has been scarce (Zhou, 2021).Text world is a concept of Text World Theory frequently explored in stylistics (e.g., Alarcon-Hermosilla, 2021; Lugea, 2016; Lugea and Walker, 2023; McLoughlin, 2020). It describes how language constructs a world in readers’ minds (Gavins, 2020, p. 3). However, the cognitive mechanism behind this mental world-building is yet to be fully unraveled. Besides, the exploration of text worlds in previous stylistic and translation studies is mainly restricted to monomodal discourse.To address these gaps, the present study designs a systematic framework to analyze multimodal world construals in literary translation. Building on Text World Theory (Gavins, 2007, p. 36), the proposed framework integrates the construal model of cognitive stylistics (Giovanelli and Harrison, 2018, pp. 33, 123; Langacker, 2008, p. 55; Simpson, 1993, p. 47) and the multimodal model of Systemic Functional Linguistics (Halliday and Matthiessen, 2014, p. 310; Kress and van Leeuwen, 1996/2006, p. 149). The construal model demonstrates how the discourse conditions readers’ understanding of the text world, while the multimodal model offers a fine-grained model for multimodal discourse analysis. The proposed framework is tested by a case study of the English translations of David Hawkes and the Yangs.The study explored the following questions:1.To what extent did the translations offer different construals of the text world compared with the ST?2.How did the translators employ verbal elements in texts to shape readers’ conceptualizations of the text world?3.How did the editors and publishers employ visual elements in cover designs to facilitate readers’ conceptualizations of the text world?Literature reviewText worldThe text world is a relatively new stylistic topic (Nørgaard et al., 2017, p. 198). It refers to “a world of discourse that is instantiated by text” (Stockwell, 2020, p. 157). The canonical framework for the analysis is Text World Theory (Lahey, 2023, p. 300), proposed by Paul Werth (1999) and then expanded, most prominently by Gavins (2007, 2020). This theory has been applied extensively to studying literary texts (e.g., Alarcon-Hermosilla, 2021; Gibbons and Whiteley, 2021; McLoughlin, 2020; Norledge, 2020).The “process of text world construction is text-driven” (Lahey, 2023, p. 305). Key elements in world-building include deictic reference and modality. Deictic references relate events to the spatio-temporal environment and the speaker’s perspective (Lugea, 2016, p. 1). Modality, on the other hand, conveys the speaker’s attitude toward the text world (Gavins, 2007, p. 91).Text World Theory has provided basic linguistic tools for analyzing text-world creation (Lahey, 2023, p. 305). However, it primarily focused on monolingual discourse (Lugea, 2016, p. 189). Besides, the world-building elements in this theory are insufficient to explain how discourse shapes readers’ conceptualizations and interpretations of the text world (Nørgaard et al., 2017, p. 200). To address these gaps, the current study integrates the construal model of cognitive stylistics with Text World Theory to clarify the cognitive mechanisms involved in text-world construals. Additionally, it incorporates the multimodal model of Systemic Functional Linguistics, enhancing the framework’s applicability to the analysis of multimodal discourse.The construal model of cognitive stylistics“Construal is an important concept in cognitive grammar” (Giovanelli and Harrison, 2018, p. 33), which was initially devised by Ronald Langacker (2008). According to Langacker (p. 43), “a meaning consists of both conceptual content and a particular way of construing that content.”Construal refers to the way readers conceive and understand the content (p. 43). Generally, how readers construe a scene depends on how closely they examine it, which elements they pay the most attention to, and the vantage point they take (Langacker, 2008, p. 55). Langacker (p. 55) proposed three construal dimensions: specificity, focusing, and perspective. Specificity refers to the degree of detail with which a scene is depicted. Focusing involves choosing the conceptual domains or scope for scene presentation. Perspective describes the relationship between the viewers and the scene they are observing. The construal model presented fresh, psychologically informed insights into the experience of reading texts, enriching our understanding of how readers might interpret texts (Harrison, 2023, p. 315).Stockwell (2002) introduced the cognitive linguistic model of construal to stylistics, highlighting its potential for literary analysis. Since this foundational work, several exploratory applications of this model have emerged in stylistic analysis, particularly in areas like mind style and point of view (e.g., Giovanelli, 2022; Giovanelli and Harrison, 2018; Nuttall, 2018; Rundquist, 2020). The construal model has proved effective in the realm of cognitive stylistic analysis, being considered “a new and valuable addition to the stylistic toolkit” (Harrison, 2023, p. 315). However, the application of this model in analyzing text worlds, especially in translation and multimodal discourse, remains relatively unexplored (e.g., Nuttall, 2018), presenting an interesting area for further research and exploration.The multimodal model of Systemic Functional LinguisticsThe term “multimodality” dates back to the 1920s (Leeuwen, 2011, p. 668). It encompasses information received through various senses (p. 668). Throughout the twentieth century, this notion gained traction and was further developed by several prominent linguistic schools, including the Prague School, the Structuralist Semiotics of the Paris School, Conversation Analysis, and Systemic Functional Linguistics. These schools expanded the term’s application, exploring its relevance in analyzing verbal and visual elements within multimodal texts. Kress and Leeuwen’s (1996/2006) theoretical approach, based on Systemic Functional Linguistics, is the most influential in multimodal discourse analysis (Li, 2021, p. 192). Their model provides a detailed account of the functions played by ideational, interpersonal, and textual meanings in visual design. Ideational function, which hinges on the depiction of participants, processes, and circumstances, represents “objects and their relations in a world” (Kress and van Leeuwen, 1996/2006, p. 47). Simultaneously, the interpersonal function delves into the interaction between the image and its viewer. It is manifested through elements such as contact, distance, and color (p. 114). Lastly, the textual function focuses on the arrangement of an image, organizing its elements in terms of information value, salience, and framing (p. 177).Being a useful tool for analyzing multimodal elements, this model has garnered attention from scholars in the field of translation, leading to its application in the analysis of translation paratexts (e.g., Chen, 2021; Li et al., 2019; Zhao, 2023). The term “paratexts” was coined by French literary theorist Gérar Genette (1997). Translation paratexts are materials added to the translated text, including prefaces, covers, and illustrations, that support the translation’s production, reception, and dissemination (Liu, 2024, p. 138). They serve as crucial guides for readers, shaping their understanding and interpretation of the translated work (Cui and Bai, 2023, p. 726). Furthermore, paratexts reflect the intentions and interpretations of editors and publishers (Genette, 1997, p. 11). The manipulation of editors and publishers, which is hardly identifiable through direct analysis of the translated text, can be tracked through the analysis of paratexts (Liu, 2024, p. 139), such as cover design.This model offers translation studies a multimodal framework, allowing for a detailed analysis of how editors and publishers manipulate paratexts to influence readers’ interpretation of the translated work. However, it falls short of elucidating the cognitive mechanisms behind the construction of multimodal meanings. To bridge this gap, the current study integrates concepts from cognitive stylistics into the multimodal model. It examines the visual elements in translation paratexts, with a special focus on book cover design, and how these elements shape readers’ conceptualization of the text world.Proposed framework for multimodal text-world construalsThe proposed framework consists of verbal and visual models. The verbal model describes how the translators employed verbal elements in the TTs to shape readers’ conceptualizations of the text world. On the other hand, the visual model analyzes how the editors and publishers employed visual elements in the cover designs to influence readers’ conceptualizations.Verbal model for text-world construalsThe verbal model (see Fig. 1) integrates Text World Theory (Gavins, 2007, p. 36) with concepts of cognitive stylistics (Giovanelli and Harrison, 2018, pp. 33, 123; Langacker, 2008, p. 55; Simpson, 1993, p. 47) and Systemic Functional Linguistics (Halliday and Matthiessen, 2014, p. 310).Fig. 1: Verbal model for text-world construals (adapted from Evans, 2019, p. 411; Gavins, 2007, p. 36; Giovanelli and Harrison, 2018, pp. 33, 123; Halliday and Matthiessen, 2014, p. 310; Langacker, 2008, p. 55; Simpson, 1993, p. 47).This verbal model is structured around three main components: perspective, scope, and specificity.Full size imageWorld construal varies in three dimensions: perspective, focusing, and specificity (Evans, 2019, p. 411). The first dimension, perspective, refers to the vantage point from which the world is presented. It is considered subjective when language indicates the presence of an observing consciousness, while it becomes objective when the conceptualizer is backgrounded (Stockwell, 2020, p. 91). Subjective markers include figural deictics, such as deictic expressions, kinship terms and marked linguistic styles, and modality, which reflects the conceptualizer’s attitude toward the potentiality of an event (Pallarés-Garcia, 2012, p. 172). Modality is categorized into five distinct types (adapted from Giovanelli and Harrison, 2018, p. 123; Simpson, 1993, p. 47), each reflecting a unique aspect of human experience. Epistemic modality captures the cognitive processes involved in evaluating the likelihood of an event occurring. Perception modality, on the other hand, deals with how we perceive the world around us. Deontic modality examines the levels of obligation and permission. Boulomaic modality delves into the intensity of an individual’s desires. Lastly, dynamic modality focuses on assessing a person’s capabilities. Together, these modalities show the various ways in which we interpret our environment and ourselves.The second dimension, scope, is a facet of focusing. It refers to the selection of an expression’s coverage for linguistic presentation (Langacker, 2008, p. 62). The scope provides readers with a viewing frame delimiting what they can visually encompass when perceiving the world. It distinguishes between two categories: maximal and immediate. An expression’s maximal scope is the full extent of its coverage, while the immediate scope is a specific portion (p. 63). They are often manifested in the use of non-progressive and progressive aspects in language. Non-progressive aspects (went) construes a maximal scope, allowing readers to see the entire event from start to finish (Giovanelli and Harrison, 2018, p. 39). In contrast, progressive aspects (going) construe an immediate scope. It narrows down the focus on the ongoing part, leaving out the inception and completion of the event (p. 39).The third dimension, specificity, refers to the level of detail at which the world is described (Nuttall, 2018, p. 38). Since readers need more contextual information to become more engaged in the text world (Sanford and Emmott, 2012, p. 200), we examine the specificity of circumstantial expressions of locations in texts. Such expressions, including “left,” “right,” and “from,” describe an event’s spatial and temporal locations (Halliday and Matthiessen, 2014, p. 311).Visual model for text-world construalsThe visual model (see Fig. 2) integrates cognitive stylistics (Evans, 2019, p. 411; Langacker, 2008, p. 55) and Systemic Functional Linguistics (Kress and van Leeuwen, 1996/2006, p. 149). Consistent with the verbal model, it categorizes three dimensions of text-world construals: perspective, scope, and specificity.Fig. 2: Visual model for text-world construals (adapted from Kress and van Leeuwen, 1996/2006, p. 19; Langacker, 2008, p. 55).This visual model is structured around three main components: perspective, scope, and specificity.Full size imageThe first dimension, perspective, refers to the point of view imposed on the viewer toward the represented text world. When participants in the world are portrayed from a frontal angle, it creates a sense of involvement (Kress and van Leeuwen, 1996/2006, p. 136). Furthermore, the gaze demands viewers form an imaginary relationship with the participants, encouraging a subjective perspective (p. 118). In contrast, when participants are shown from an oblique angle without eye contact, they are offered as mere information, leading viewers to adopt a more detached and objective perspective (p. 146).The second dimension, scope, concerns the selection of the size of the viewing frame. The scope can be adjusted by distance. Long distance provides a maximal scope, allowing viewers to perceive the entire text world and its various participants. Conversely, close and medium distances create a more immediate scope, revealing only a segment of the world.The third dimension, known as specificity, refers to the level of detail in a depiction. This can be adjusted by the articulation of color and detail. Typically, high degrees of color saturation, differentiation, and modulation, along with rich pictorial details create a concrete image that enhances the sense of realism (p. 160).MethodologyData selectionThe case study examined two English versions of Hongloumeng (see Table 1). One translation was done by David Hawkes and John Minford, while the other was completed by Xianyi Yang and Gladys Yang. These translations are regarded as the “two most authoritative translations” (Chou and Liu, 2023, p. 6).Table 1 The two complete English translations of Hongloumeng.Full size tableSince the ST editions of the two translations differ (Feng, 2018, p. 196), this article followed previous research (Chen, 2015; Yao, 2017) and created two parallel corpora (see Table 2). TT1 refers to the translation by David Hawkes, and TT2 refers to the translation by Xianyi and Glady Yang.Table 2 Parallel corpora.Full size tableThere are two reasons for building the sample corpora for analysis. One reason is the substantial length of the novel. “In any study of a literary work, especially one of significant length, it is necessary to be selective when undertaking a detailed analysis of the text” (Liu, 2024, p. 215). Inevitably, the samples selected for analysis may not fully represent the David Hawkes and the Yangs’ translation styles. However, the sample corpora can help to identify translation patterns that are not accessible to our intuitions. The second reason for creating sample corpora is the necessity for extensive manual annotations. There is no standard group of search words to identify implicit stylistic features such as figural deictics and modalized expressions. Therefore, a sample corpus is more suitable for a sophisticated and detailed analysis compared to a large corpus that relies on automatic techniques.The criteria for data selection were established to optimize reliability and transparency. First, the sample should be representative of the text-world construal features in the novel. The second section of Chapter Three, titled “Old Lady Jia extends a compassionate welcome to the motherless child” (Cao, 1973, p. 84), was selected due to its multiple instances of world depictions. This section describes various people and decorations that the protagonist, Lin Daiyu, perceived upon her initial visit to her grandmother, Lady Jia’s mansion. Second, the sample size should be sufficient to reflect the distinctive patterns of text-world construals. Biber’s (1990, p. 261) research indicates that linguistic features stabilize in samples exceeding 1000 words, making the size of the sample adequate for revealing the features pertinent to this study.Procedures for analyzing verbal and visual elementsThe research employs a two-phrase research design. In the first phase, a corpus-based analysis is conducted to examine the verbal elements of text-world construals in both the STs and the TTs. The methodology is as follows: First, two parallel corpora were established by aligning the STs and TTs at the sentence level using the corpus tool ParaConc.Then, the texts were subjected to preliminary annotation. The STs were segmented and tagged with part-of-speech information through the Corpus Word Parser software. The TTs were also tagged using the CLAWS5 Tagset designed by Lancaster University. The purpose of the initial tagging is to facilitate the annotation of stylistic features in the subsequent step.Next, the texts underwent further annotation. The ST and TTs were uploaded to UAM CorpusTool, software specifically designed for research that requires detailed, context-specific manual annotations. All the stylistic features of text-world construals, including modalized expressions, figural deictics, verb aspects, and circumstantial expressions of location, were tagged in alignment with the verbal model. The annotation process was carried out twice, with a 1-month interval in between, to produce reliable data.Finally, the statistics were calculated. After annotating the stylistic features with UAM CorpusTool, the software generated the frequency of each category. Then, we compared the observed frequency of each category between the STs and TTs to see how significant the differences were. To achieve this, we performed a log-likelihood (LL) test utilizing the Wmatrix LL calculator.The second section delves into a multimodal analysis, focusing on the visual elements that facilitate text-world construals on the cover designs of the TTs. The elements were classified and calculated according to the visual model.Analysis of verbal elements in text-world construalsThis section analyzes how the translators employed verbal elements in texts to facilitate readers’ conceptualizations of the text world.Translational shifts in perspectiveTables 3 and 4 and Fig. 3 summarize the patterns of translational shifts in perspective. The higher the LL value, the more significant the translational shifts.Table 3 Translational shifts in the observed frequency of subjective perspective.Full size tableTable 4 Translational shifts in the frequency distribution of modality.Full size tableFig. 3: Translational shifts in perspective.This bar chart illustrates that translational shifts in perspective are more frequent in TT1 than in TT2.Full size imageThis bar chart illustrates that translational shifts in perspective are more frequent in TT1 than in TT2. In Table 3 and Fig. 3, TT1 (average LL = 19.84) shows more translational shifts in perspective than TT2 (average LL = 13.23). This suggests that Hawkes was less inclined than the Yangs to adopt a foreignization approach to translating perspective.The frequency of subjective perspectives in the TTs is significantly higher than in the STs (total frequencies in TT1: sig. = 0.000***+; TT2: sig. = 0.000***+). Both translators increased the use of subjective markers, including modality (modality in TT1: sig. = 0.000***+; TT2: sig. = 0.003**+) and figural deictics (figural deictics in TT1: sig. = 0.000***+; TT2: sig. = 0.000***+). Notably, TT1 used more modal expressions, such as perception modality and epistemic modality, compared to TT2.By adding modal expressions and figural deictics, the TTs put the subject of conceptualization on stage. These stylistic choices craft an illusion that we’re perceiving the world through the eyes of a character (Rundquist, 2020, p. 7). When we compare the two translations, it’s clear that TT1 offers a deeper dive into the character’s mind, granting readers a richer experience of their thoughts and how they perceive their surroundings. An example is given below. Markers of modality are underlined, and those of figural deictics are in bold.Example (1)In Example (1), the ST may be regarded as a narratorial description of objective fact since it does not contain any expression of figural subjectivity. This objective narration is prevalent in ancient Chinese novels (Chen, 2010, p. 59). Before the influx of Western novels in the early twentieth century, Chinese novelists tended to employ an external perspective in their storytelling (p. 59). The narrator primarily focuses on depicting scenes, rarely offering subjective evaluations or delving into the character’s psychological activities (p. 59). In the TTs, the presence of subjective markers (seemed, this) foregrounds the character’s subjectivity in the scene depiction, providing readers with an internal perspective of the text world.Compared with TT2, TT1 makes the focalizing character’s subjectivity more explicit through a broader array of subjective markers, such as indefinite expression (some), deictic expression (presently), and marked linguistic style. These expressions orient readers to an internal perspective, likely enhancing their sense of subjective involvement in the text world (Sanford and Emmott, 2012, p. 167).Translational shifts in scopeTable 5 and Fig. 4 show the patterns of translational shifts in scope.Table 5 Translational shifts in the frequency distribution of scope.Full size tableFig. 4: Translational shifts in scope.This bar chart illustrates that translational shifts in scope are more frequent in TT1 than in TT2.Full size imageThis bar chart illustrates that translational shifts in scope are more frequent in TT1 than in TT2. In Table 5 and Fig. 4, TT1 (average LL = 23.32) exhibits more translational shifts in scope than TT2 (average LL = 13.4). It indicates that Hawkes preferred a translation strategy that leans more toward domestication, in contrast to the Yangs, who showed a greater inclination toward preserving the ST’s style of scope adjustments.The extensive use of non-progressive aspect in the STs creates a maximal scope (maximal scope in ST1: 94.46%; ST2: 94.49%), positioning readers at a distance from the scene and allowing them to perceive it in its entirety (Evans, 2019, p. 260).The TTs narrow the scope (immediate scope in TT1: LL = 36.03, sig. = 0.000*** + ; TT2: LL = 24.06, sig. = 0.000***+) by using progressive aspects more frequently. Compared with the Yangs, Hawkes showed more preference for progressive aspects, profiling the ongoing portion of the action (Giovanelli and Harrison, 2018, p. 39). This stylistic choice construes the action as if in progress, thereby drawing readers closer to the event and inviting them to project themselves into the text world (Giovanelli, 2022, p. 414). An example is given below. Markers of maximal scope are underlined, and those of immediate scope are in bold.Example (2)In Example (2), non-progressive aspects occur frequently in the STs. For instance, the combined use of the verb “转 zhuan” with the clitic “过 guo” forms a verbal group denoting a non-progressive aspect. The Yangs translated “转过 zhuanguo” as “gave access to,” retaining the non-progressive aspect in TT2. This aspect provides a maximal scope that presents the entire process of the action (Zhang, 2010, p. 785). It creates an impression of the scene being observed from a distance (Giovanelli, 2022, p. 412).In TT1, Hawkes translated the verbal group into the present participial verb “passing,” casting it in a progressive aspect. The use of the progressive aspect narrows the scope, highlighting the action’s current unfolding. This stylistic choice fosters a feeling of immediacy and closeness to the scene, potentially enhancing the reader’s embodied experience (p. 414).Translational shifts in specificityTable 6 and Fig. 5 show the patterns of translational shifts in specificity by calculating the frequency of circumstantial expressions of location.Table 6 Translational shifts in the observed frequency of specificity.Full size tableFig. 5: Translational shifts in specificity.This bar chart illustrates that translational shifts in specificity are more frequent in TT1 than in TT2.Full size imageIn Table 6 and Fig. 5, TT1 (average LL = 40.2) exhibits more translational shifts in specificity than TT2 (average LL = 19.1). It suggests that Hawkes favored a more domestication translation strategy in conveying circumstantial details than the Yangs.The mansion described in the ST is intricately designed, featuring a myriad of gardens, rooms, and decorative elements that can bewilder even readers familiar with Chinese culture. To address the problem, the translators increased the circumstantial descriptions in the TTs (circumstantial expressions of location in TT1: sig. = 0.000***+; TT2: sig. = 0.000***+), making it easier for target readers to visualize the text world.Compared with TT2, TT1 offers more information on the spatio-temporal setting (circumstantial expressions of location in TT1: LL = 40.2; TT2: LL = 19.01), potentially easing the cognitive load on readers as they navigate through the text world (Sanford and Emmott, 2012, p. 16). Moreover, the richer, more vivid descriptions in TT1 may evoke a stronger embodied experience, enhancing the reader’s immersion in the text world (p. 157). Below is an example with circumstantial expressions of location marked in bold.Example (3)In Example (3), the location of the couplet is underspecified in the ST. Only two circumstantial cues are provided: “座上 on the seat” and “堂前 in front of the hall.” TT2 adds a prepositional phrase “above these” in the circumstantial description, providing readers with more details of the couplet’s location.Compared with TT2, TT1 presents more circumstantial cues, including phrases like “above the chairs” and “on each side.” These details not only ease the cognitive load for readers unfamiliar with Chinese culture, enabling them to make inferences more effortlessly but also amplify embodiment effects.Moreover, the use of deictic expressions, such as “on the right-hand one” and “on the left-hand one,” provides readers with an internal perspective (p. 166), positioning them alongside the character Lin Daiyu’s deictic center and allowing them to perceive the text world through her eyes. This approach initiates sensory representation and may enhance readers’ immersion in the text world (p. 167).Analysis of visual elements of text-world construalsThis section analyzes how the editors and publishers employed visual elements in cover designs to facilitate readers’ conceptualizations of the text world. Figures 6 and 7 present the cover designs.Fig. 6: Cover design of TT1 Hawkes (Cao, 1973).This picture displays the cover of the English translation of Hongloumeng published by Penguin Press.Full size imageFig. 7: Cover design of TT2 Yangs (Cao and Gao, 1987).This picture displays the cover of the English translation of Hongloumeng published by Foreign Language Press.Full size imageThis picture displays the cover of the English translation of Hongloumeng published by Foreign Language Press. Perspective of the cover designsTable 7 compares the perspective of cover designs for TT1 and TT2.Table 7 Perspective of the cover designs.Full size tableThe book cover of TT1 depicts a female character in the foreground. The character is captured from a frontal angle (involvement: frontal angle). It fosters a connection between the reader and the character, since the “frontal angle is the angle of maximum involvement” (Kress and van Leeuwen, 1996/2006, p. 145).In addition, the character engages the reader with a direct gaze (demand: gaze at the viewer), which “creates a visual form of direct address” (p. 117). This direct address draws readers into a more intimate interaction with the character, inducing them to form a subjective perspective of the text world.The cover of TT2 presents the grand setting of the mansion, with an array of characters scattered in the background. The characters are portrayed from different angles, predominately through an oblique angle (detachment: oblique angle). This angle positions readers as objective observers, prompting them to construe the text world from a detached perspective (p. 146).By manipulating the perspective of the cover designs, the editor and publisher of TT1 foster an intimate connection between the character and readers. This connection deepens readers’ sense of involvement, inducing them to adopt a subjective perspective of the text world. In contrast, the editor and publisher of TT2 position readers as observers, allowing them to view the text world from a more objective perspective.Scope of the cover designsTable 8 compares the scope of the cover designs for TT1 and TT2.Table 8 Scope of the cover designs.Full size tableThe cover of TT1 features a close-up of the female character (immediate scope: close distance), showcasing only the upper part of her body. Her figure dominates the image, filling half of the frame. The choice of distance creates an immediate scope of the text world. It brings readers into the scene, enabling them to closely observe the character’s facial expressions and the details of the setting. Such an approach may enhance readers’ embodied experience and deepen their involvement in the text world.The cover of TT2 presents the characters and the setting at a long distance (maximal scope: long distance), providing a maximal scope that allows readers to view the panorama of the text world. The characters appear as distant figures. Their small size makes it difficult to discern their facial expressions. This choice of distance fosters a sense of detachment. Readers may perceive the world as objective observers without feeling a strong emotional connection to the characters (p. 126).By manipulating the scope of the cover designs, the editor and publisher of TT1 invited readers to engage with the character in the text world as intradiegetic participants. In contrast, the editor and publisher of TT2 distanced readers from the characters, creating a sense of detachment that leads them to observe the world objectively as extradiegetic participants.Specificity of the cover designsTable 9 compares the specificity of the cover designs for TT1 and TT2.Table 9 Specificity of the cover designs.Full size tableThe book cover of TT1 depicts the character and the setting in detail (specific: high articulation of detail). The female character is meticulously portrayed, showcasing her facial expression, the light blush on her cheeks, the delicate folds of her clothing, and the exquisite ornaments in her hair. In addition, readers can see the details inside and outside her room, from the chess and fan on the desk to the bamboo and rockeries in the garden. Moreover, the color is modulated (specific: high articulation of color modulation). For example, the rockeries are painted in different shades of brown. These details in depiction allow readers to feel a stronger sense of realism, enriching their embodied experience (p. 161).The cover of TT2 portrays the characters and the setting schematically (abstract: low articulation of detail). The characters are drawn with rough lines, omitting specific features of their faces and clothing. Similarly, the setting is depicted in a sketchy style, with few details presented inside the rooms. Additionally, the color used is unmodulated (abstract: low articulation of color modulation). This simplicity in visual representation results in an abstract construal of the text world, potentially reducing the reader’s embodied experience.The editor and publisher of TT1 provided a specific construal of the text world, fostering the realistic effect that deepens readers’ immersion. Conversely, the editor and publisher of TT2 opted for a more abstract representation, which may reduce the immersive experience for readers.ConclusionThis study analyzed the multimodal world construal in the English translations of Hongloumeng. It has the following findings:The text world created in TT1 deviates more significantly from the ST than in TT2 (average LL in TT1 = 27.79; TT2 = 15.21). Hawkes tended to adopt a domestic translation strategy, while the Yangs leaned more toward preserving the ST’s world-building style. In the introduction to his translation, Hawkes (1973, p. 45) expressed his difficulty in adhering strictly to the ST and acknowledged that he made some modifications of his own. In contrast, Yang (2011, p. 4) emphasized in an interview that a translator should minimize explanations and strive to remain faithful to the ST without exaggerating or adding content.The translators employed verbal elements in the texts to shape readers’ conceptualizations of the text world. TT1 is more likely to deepen readers’ engagement and immersion in the text world than TT2. In terms of perspective, Hawkes provided a more subjective perspective of the text world compared to the Yangs. By greatly increasing modalized expressions and figural deictics in TT1, he drew readers into the character’s deictic center and invited them to be involved emotionally in the text world.Regarding scope, Hawkes showed more preference for progressive aspects compared to the Yangs. This stylistic choice construes actions as if in progress, thereby enhancing readers’ immersion in the text world.In the aspect of specificity, by adding circumstantial expressions, Hawkes set up a more specific spatio-temporal setting than the Yangs. The more detailed orientation information provided in TT2 enhances readers’ embodied experience in visualizing the text world.Verbal elements interact with visual elements in text-world construals. The editors and publishers employed visual elements in cover designs to facilitate readers’ conceptualizations of the text world. The cover of TT1 is more likely to immerse readers in its text world than that of TT2. In terms of perspective, the character on the cover of TT1 is depicted from a frontal angle, gazing directly at the reader. It invites readers to engage with the character and adopts a subjective perspective of the text world. Conversely, on the cover of TT2, the characters’ oblique angle and lack of direct gaze prompt readers to observe the text world from a more objective perspective.Regarding scope, the cover of TT1 offers a close-up view of the text world, forming an immediate scope that may enhance readers’ immersive experience. In contrast, the cover of TT2 presents the text world from a greater distance, creating a maximal scope that encourages readers to observe the world from a more detached viewpoint.In the aspect of specificity, the cover of TT1 portrays the text world in detail, enhancing the realistic effect that draws readers in. Conversely, the editor and publisher of TT2 opted for a more abstract representation, which may reduce the embodied experience for readers.This study fills a gap in existing research by focusing on world construals in the English translations of a classic Chinese novel Hongloumeng, offering insights into how different translation and publishing strategies affect readers’ conceptualizations of the text world. Furthermore, this study proposed a theoretical framework that integrates cognitive stylistics and Systemic Functional Linguistics into Text World Theory, providing an effective tool for analyzing multimodal elements in translation.However, this article does not examine the stylistic effects on readers within real-world contexts. Further empirical studies are needed to investigate readers’ emotional responses to text-world construals. In addition, when analyzing literature, especially lengthy works, it is crucial to take a selective approach. While the sampling method allows for a detailed examination, it may not fully represent the translator’s style. Therefore, future quantitative studies with larger data could provide insights that better represent the translators’ stylistic choices.