Enduring constraints on grammar revealed by Bayesian spatiophylogenetic analyses

Wait 5 sec.

MainHuman languages are strikingly diverse. They vary in almost every way1, from the sounds they use to the ordering of words and other grammar rules. However, such diversity does not preclude the existence of regular patterns and structured variation. A central goal of linguistics has been to describe the patterns and structure of human linguistic diversity and identify the constraints on that diversity2,3,4,5,6,7,8. In Hjelmslev’s2 words, the aim of linguistic typology/theory “must be to show which structures are possible, in general, and why it is just those structures, and not others, that are possible”.Some argue that languages all face the same pressures for communicating and encoding information, leading to convergence towards structurally good solutions9,10,11. For example, take verb agreement, the phenomenon where verbs are marked for the person and number features of their grammatical arguments, such as ‘-es’ in ‘Alex catch-es the ball’. Verb agreement in English is very limited; in other languages, it is far more extensive or non-existent altogether. One explanation for this cross-linguistic variation is that verb agreement ‘trades off’ with word order. Speakers of all languages need a way to differentiate argument relations (to identify and mark subjects, objects and other syntactic arguments as such), and languages with subject–verb–object word order (as in English) do not ‘need’ verb agreement because subjects are on one side of the verb and objects on the other.Others argue that all languages are shaped by our human cognitive capacity for online production, comprehension and acquisition5,12,13,14,15,16,17,18,19,20,21,22. Word order patterns, especially those related to the order of object and verb, have been explained in terms of efficient online processing. These have been claimed to be rooted in principles where the order of the ‘head’ (the most important element of a phrase that determines its type and syntactic behaviour) and its ‘dependents’ (other elements in the phrase) match each other across different types of phrases13,17,23. For example, if adpositions (heads of adpositional phrases) precede nouns (dependents in adpositional phrases) in a given language, we may expect the same ‘matched’ pattern where verbs (heads of verb phrases) also precede objects (dependents in verb phrases).It is also possible that all languages are shaped by general pathways of diachronic language change11,24,25,26,27,28. For example, the association between the order of adposition and noun and the order of object and verb has been explained through the common process of grammaticalization where adpositions develop from verbs29. Here, adpositions such as ‘for’ may arise from verbs meaning ‘give’ (as in ‘Anna gave John a flower’), and if the word order is such that verbs come before objects, then these forms will be prepositions rather than postpositions (‘for John’ rather than ‘John for’).The nature of universals and the extent to which the types of constraints mentioned above contribute to their emergence is of direct relevance for understanding the nature of human language and human cognition and, hence, a matter of some dispute across various approaches to linguistics30,31. Within formal approaches, universals such as those invoked in X-bar structure (simplistically, the idea that phrases of any type consist of specifiers, heads and their complements32,33) are seen as absolute rules of human language, tied to robust, innate grammatical constraints5,7,8,32,34. Evans and Levinson1 argue against absolute universals of all types, emphasizing the diversity of the world’s languages and the complex interactions of multiple potential constraints in any non-trivial generalizations about human grammars. Generative replies to Evans and Levinson1, such as Freidin35, Pesetsky36 and Rizzi37, argue that their account does not hold, among other critiques, because the generative level of analysis is deep. In contrast, the analysis in Evans and Levinson1 and other work in functional typology remains surface-level. Hence, the generative study of universals is distinct from the typological one, both in its methods and in terms of explanations.The current study is rooted in the field of linguistic typology, initially pioneered by Greenberg38, which has subsequently generated a large body of research on linguistic universals (see, among other works, ref. 6). This approach focuses on identifying patterns of grammatical feature co-occurrence that need not be exceptionless (‘statistical universals’), in many cases using them to inform theories that invoke the aforementioned types of constraints (communication, cognition and language change). For example, Greenberg’s38 universal number 4 claims that “with overwhelmingly greater than chance frequency, languages with normal SOV order are postpositional” (where SOV is subject–object–verb order), which follows the earlier example. Dryer23 finds evidence for a large set of word order associations using a large language sample and proposes an explanation for them rooted in ease of online language processing. Similarly, Bickel et al.39 have demonstrated that a strong cognitive preference to identify the first base-form noun phrase as the agent (the ‘do-er’ of an action) leads to a persistent bias against so-called ergative languages, which can explain the rarity of this pattern.These types of explanations for universals are often grounded in the so-called competing motivations account, which can account for both variation across languages as well as provide (cognitive) grounds for the universal itself. A famous example of a formal application of this model is Aissen’s40 optimality theory account of differential object marking (DOM). DOM is common cross-linguistic behaviour where some direct objects in a language are marked (for example, with case or adpositions), while other direct objects remain unmarked. For example, in Spanish, direct objects that are definite and denote human referents are preceded by the marker ‘a’ (lit. ‘to’), whereas other objects are left unmarked. Aissen demonstrates that cross-linguistic variation in DOM can be explained by two interacting principles (motivations): (1) iconicity, which implies that more prominent objects (those that are definite, human or higher animates) are more likely to receive marking and (2) economy, which implies that marking should be avoided altogether. Her analysis suggests that these two constraints interact in such a way that the outcome is the observed cross-linguistic variation in differential object marking—in DOM languages, highly prominent objects always receive marking, but languages vary on the ‘cut-off’ regarding less prominent objects. Underlyingly, the iconicity motivation may be grounded in the communicative need to distinguish prominent objects from subjects, which also tend to be definite, human or higher animates.Such accounts form the theoretical groundwork for why we may find universals in the first place. However, not all linguists are convinced of the importance of such constraints: Dunn et al.41 argue that statistical word order universals arise in “an evolutionary landscape with channels and basins of attraction that are specific to linguistic lineages”. They claim that word order correlations do not emerge in response to functional constraints, but rather are a consequence of particular diachronic changes unique to particular language families. A common view among generative linguists is that universal grammar does not (and should not aim to) explain statistical universals34.Resolving the complex problem of what grammatical relationships are ‘universals’ and what they mean for understanding language or cognition has been hindered by several fundamental challenges. First, the lack of a comprehensive grammatical dataset has meant previous work has tended to consider a relatively small subset of the world’s ~7,000 languages, limiting statistical power and the ability to test for strong associations rigorously. Second, shared linguistic ancestry and the diffusion of features between neighbouring populations mean linguistic data do not constitute independent data points, violating the independence assumption of many statistical tests and potentially generating spurious statistical associations between features42. Finally, raw correlations between features (or traits) tell us little about the historical causal relationships between them.Here, we overcome these challenges by analysing a comprehensive database of grammatical features, Grambank43, which covers more languages than previous work (Supplementary Text 1), and employing sophisticated methodologies to handle non-independence and test for co-evolution. We test 191 putative ‘linguistic universals’ extracted from the Universals Archive44 (Supplementary Text 2 and 3). These are all so-called implicational universals, as Greenberg’s38 universal number 4 above (SOV ⇒ postpositions): they relate characteristics of the world’s languages in an ‘if X, then Y’ structure. First, we apply a Bayesian generalized linear mixed effects model to evaluate the support for each hypothesis while controlling for genealogical and geographical relations. We then apply a Bayesian phylogenetic method to infer the underlying evolutionary dynamics (see Methods and Supplementary Text 5 for explanations and rationale). We see this work as rising to the challenge spelt out by Piantadosi and Gibson45, which claims that “claims about linguistic universals should be accompanied by some measure of the strength of evidence in favour of such a universal”. They propose that any hypothesized universal should be compared with the corresponding null hypothesis using a Bayes factor or similar, and point out that such measures are critical given the relevance of universals to debates on the nature of human language.To examine differences in the reasoning behind universals we divided the generalizations from the Universals Archive into four types, reflecting recurrent themes in typological literature10,12,46: (1) narrow word order, (2) broad word order, (3) hierarchical universals and (4) other. Narrow word order universals link the order of words in two or more constructions in ways that generally relate to where the most important words occur, such as Greenberg’s38 universal 4 above. Broad word order universals correlate a word order feature with a morphosyntactic feature unrelated to word order, such as “non-accusative alignment may be associated with verb–initial order”47. Hierarchical universals are chains of implicational universals with the most frequently attested traits on the left and the rarest ones on the right, as defined within the same domain or paradigm. An example is Greenberg’s38 claim that “no language has a dual unless it has a plural”; this type of universal has also been called scalar or ‘scale’ in literature. The remaining universals are captured under ‘other’, but in practice often correlate two morphological features (see Methods and Supplementary Text 2, 7 and 8 for further details).ResultsPhylogenetic and spatial correlationWe constructed Bayesian generalized linear mixed effects models (GLMMs) for all universals using brms48 implemented in R49. We find that, without controlling for genealogical and geographical relations, the vast majority of the proposed universals are supported by our regression models—that is, the fixed effect of the predictor variable (the second part of the universal) has posterior estimates whose 95% credible interval (CI) exclude zero (Fig. 1, Supplementary Data 1 and Supplementary Table 3). In these naive models, 174 (91%) of the 191 universals are found to be supported.Fig. 1: Bar chart showing the proportion of supported universals under the naive model and the spatiophylogenetic model.Support implies, for the naive model, that the 95% CI of posterior coefficient estimates does not straddle zero. For the spatiophylogenetic model (with genealogical and geographical relations controlled for), it means that the median of the 95% CI of the main fixed effect estimates does not straddle zero. Universals identified to be supported are coloured in blue, while non-supported universals are coloured in grey. a, The overall universals. b–e, The universals by subset: hierarchy (b), broad word order (c), narrow word order (d) and other (e).Full size imageHowever, when we do control for spatial and phylogenetic non-independence this number decreases substantially to 89 of 191 (47%). Here, we conduct the analysis over 100 phylogenetic trees50 (Methods) and hence report on the median of posterior estimates and their 95% CIs (Supplementary Fig. 4, Supplementary Data 1 and Supplementary Table 4). We find marked differences between the strength of support for four types of universals (Fig. 1). There is strong support for hierarchical universals with 24 of 30 (80%) having posterior estimates that exclude zero. The narrow word order universals are also relatively well supported, with 36 of 65 (58%) confirmed. In contrast, there is weaker support for the broad word order (18 of 72 supported, 25%) and the ‘other’ universals (7 of 24, 32%).Evolutionary dynamicsTo infer the evolutionary (in the sense of diachronic) pathways behind the statistically supported universals identified in the spatiophylogenetic brms analyses, we performed co-evolution analyses using the BayesTraits program51,52. Again, we conducted the analyses over 100 phylogenetic trees, calculated Bayes factors (BF) by comparing the dependent and independent model and calculated the 95% high density interval (HDI) to summarize BF support for each universal (Methods and Supplementary Texts 5 and 6). We took the lower bound of the 95% HDI >10 as indicating support for the dependent model of trait co-evolution over the independent model (Fig. 2, Supplementary Data 1 and Supplementary Table 4). On this criterion, 60 of the 89 universals supported in the spatiophylogenetic model were also supported in the co-evolution analyses. We continue discussing this set of 60 universals. Across the different types of universals, we observe the same pattern as for the spatiophylogenetic correlations: the strongest evidence can be found among the hierarchical universals (evidence for all of the 24 universals supported in the spatiophylogenetic analysis, 80% of all hierarchical universals (n = 30)). Second, the word order universals show a more mixed pattern: for the narrow word order universals, less than half of narrow word order universals (24/36, 37% in all (n = 65)) and a much smaller fraction of the broad word order universals (8/18, 11% in all (n = 72)) are supported. Third, we find that only four of the seven ‘other’ universals supported in the spatiophylogenetic model hold (17% in total, n = 24).Fig. 2: Median natural log BF and their 95% HDI from the BayesTraits analyses showing support for co-evolutionary models.Universals with the lower bound of the 95% HDI on the distribution of BF >10 are considered supported and are coloured blue. The relationships are ranked from strongest to weakest, top to bottom, per category. The universals are given in short form, where the formula X ⇒ Y means ‘if X, then Y’; the full citations, sources and all abbreviations can be found in Supplementary Table 2. Sample sizes can be found in Supplementary Data 1 and Supplementary Table 4.Full size imageRobustnessSince both studies are dependent on a global phylogeny of languages50 (Methods), we conducted additional tests with categorical control for language family. Our results also hold with this approach (Supplementary Texts 5 and 6). Of the 89 statistically supported universals identified in the spatiophylogenetic analysis using the global phylogeny, 67 (75%) are supported using a categorical control for language family instead (Supplementary Fig. 6, Supplementary Data 1 and Supplementary Table 3). The main effect estimates of the two studies (the spatiophylogenetic brms models and the categorical control for language family brms models) are highly correlated (Spearman’s r = 0.93 (189 degrees of freedom), P