Language Contact & Historical Linguistics

© 1998-2008 Rick Horowitz All Rights Reserved Worldwide
This article is copyrighted! Enjoy, but do not steal my work!


The common methods of historical linguistics for the reconstruction of previous stages or states of languages are the Comparative Method (CM) and Internal Reconstruction (IR). Both of these methods break down in cases involving contact between differing languages. This is due to the primary assumptions requisite to each of these methodologies.

According to Jeffers and Lehiste, CM depends upon assumptions of relatedness and regularity.

The relatedness hypothesis tries to explain obvious similarities between words belonging to different languages or dialects by assuming that these languages are related. It assumes that the languages and dialects are descended from a common ancestor language or protolanguage. The regularity hypothesis... assume[s] that sound changes are regular. It assumes that each sound...will be changed similarly at every occurrence in like circumstances, if it is changed at all. (1979: 17)

CM thus compares languages which might be related in an attempt to reconstruct the protolanguage from hypotheses about the similarities in the posited descendants.

Internal reconstruction likewise depends upon an assumption of regularity. The significant difference is that IR depends upon only one language; since it assumes regularity, it looks to alternations of sounds within a single language to reconstruct previous stages. (Arlotto 1981:98-99)

Lexicostatistics is likewise limited with respect to the basic assumptions noted above. As Teeter (1963) notes, lexicostatistics is

the statistical manipulation of lexical similarities for the purpose of making inferences as to genetic relationship and subrelationship among languages, including the attempts, known as glottochronological studies, to fix the dates of so-called language splits. (638)

Lexicostatistics breaks down under language contact situations in two ways. Depending as it does on the idea that language changes at a more or less constant rate, the phenomena to be noted below could make it appear that the clock has actually sped up. This is because not only pidgins and creoles, but other "mixed" languages exist which "cannot be classified genetically at all." (Thomason 1988: 3)[1] Language contact thus creates its own difficulties for lexicostatistics. (Jeffers and Lehiste 1979: 135)

As noted, certain problems, occurring at all levels of language, cannot be solved by any of these methods. A quick sampling of examples will serve to demonstrate.

As noted, none of the standard methods of historical linguistics can provide explanations for the above. We are forced to look to hypotheses, then, which do not depend upon the assumptions inherent in those methods. Some relief from the above conundrums may thus be garnered by positing another hypothesis. Let us assume that some innovations may be explained by understanding the sociohistorical context of the languages in question. Specifically, we might consider the possibility that the changes may be explained by the interaction of speakers with their neighbors who do not speak the same language.

Indeed, this hypothesis leads to the dissolution of the above problems. The replacement of /h/ by /k/ noted above is done "al por de millones de italianos que hablan espanol." (Rosenblat, quoted in Cassano 1977: 262) Sapir's (1921: 198) example of nasalization of vowels in Swabian is explained by the interaction of Swabisch with neighboring French. Rickford's (1986: 270-271) example requires a slightly more complex explanation, depending upon the influence from Hiberno-English upon a South Sea Island creole to get "Sometimes you does be..." followed by an incomplete process of decreolization which leaves [z] as residue. Likewise, the example of estar displacing ser is explained as occurring under the influence of English and an autonomous language-internal cause resulting in simplification. (Silva-Corvalán 1986: 588) Finally, the Pomo languages get their word for "broken glass fragments" from the Russian word butilka, "bottle" (Oswalt 1971: 48), and Asia Minor Greek acquires seksenia and doksania from Turkish (Thomason 1986: 216). A look at Hui, located in the same Sprachbund as the Baonan, reveals homophony between their word for "day" and that for "thousand." Obviously, the Baonan speakers have modeled the semantic extension of their term for "day" on a confusion between homophony and polysemy. (Li 1985: 328)

These examples show that the basic methods of historical linguistics are unable to explain all of the data. Particularly, when foreign interference occurs, an understanding of CM (and presumably IR and lexicostatistics as well) as a theoretical, rather than a methodological principle, leaves us unable to explain enigmas of the type noted above. (Thomason 1986: 3)


We've seen some examples of problems which cannot be solved by the standard comparative methods[2] in historical linguistics and final steps in the solution to the problem of the data presented.

However, we still have not clearly explained why the data should have appeared as it did in the first place; we have not explained how these elements entered their respective languages. Einar Haugen (1950: 210) notes "Hermann Paul pointed out that all borrowing by one language from another is predicated on some minimum of bilingual mastery of the two languages." When individuals communicate (in any language), they use their knowledge of language to transmit their ideas. Bilingual speakers will have more linguistic patterns to draw from to meet their needs. Some of these patterns will come from languages which differ from the base language in lexicon, grammar, or both. Nevertheless, these speakers will, unconsciously or consciously, draw upon whatever resources are available to them regardless of where it comes from. The result for the language used amounts to some kind of interference.

Various writers (Haugen 1950, Thomason 1988, Jeffers and Lehiste 1979, Whinnom 1971, Cassano 1977, and even, implicitly, Heath 1978) have distinguished between the phenomena known as "borrowing" and various other forms of interference, primarily "substratum" interference. Thomason claims (1988: 4) that the two "differ sharply in their linguistic results." A characteristic definition of "borrowing" is that given by Arlotto "as the process by which one language or dialect takes and incorporates some linguistic element from another" (1981: 184; emphasis his); while Cassano (1977) defines "substratum interference":

When the language of a group of people (usually conquerors) is superimposed upon and ultimately displaces the language of the autochthonous people in any given area, some habits or features characteristic of the indigenous language are retained and transferred to the new language as spoken by the indigenous people. As time advances, however, these habits or features become characteristic of the language of all speakers and are attributable to the substratum interference of the indigenous language. (239-40)

To this definition, we might add that the substrata effects are the result of imperfect learning of the target language (TL). (Haugen 1950: 216; Thomason 1988: 38-39)

Interestingly, when speakers of subordinate, or less prestigious languages acquire facility in a superordinate TL, but continue to use their native languages, the TL can begin to effect changes at various levels in the native language. Thus Silva-Corvalán (1986: 604) notes that Los Angeles Spanish shows a preference for estar + -ndo (be + -ing) constructions over other constructions to express the progressive. This kind of effect is known as "superstratum interference," and, in spite of Silva-Corvalán's example, is typically found in the case where a conquered group linguistically assimilates their conquerors. Superstratum interference, more so than substratum interference includes the borrowing of lexical items. (Thomason 1988: 116)

In the field of historical linguistics, there is some overlap and even confusion of the terms relating to borrowing phenomena. This terminology may best be organized by making explicit what is implicit in all the writers cited.[3]

The phenomena of interference can best by understood by the recognition of two parallel continua. The first of these is the unconscious-conscious; the second, phonological-lexical. These continua are parallel because phonological---and perhaps to a slightly lesser degree morphosyntactic---"borrowings" are unconscious. Lexical borrowings are nearly always conscious. The reasons for this should be obvious. Probably no speaker ever sets out to "borrow a phoneme." At best, depending on the prestige of the donor language and the speaker's skill, the bilingual will, with greater or lesser degree of success, borrow a word with a pronunciation closely approximating that of a native speaker of the donor language.[4]

More morphemic elements, such as loan words, hybrids, blends, translations (calques) and shifts, and semantic loans, move in the direction from more to less conscious.

This is not at all surprising if the impetus for the changes is taken into account. For loan words to enter the language

all that is necessary is that the concept be within the borrowing speech community or at least within a subgroup that desires to speak about something "foreign." Thus, while neither a lama (Tibetan monk) nor a llama (Peruvian beast of burden) has become a part of the culture of any English-speaking group, both are acceptable English words....When the first English speakers reached the New World, they borrowed into their language many words related to the native cultures whose references they may never have dreamed of using in their own societies: wigwam, wampum, pemmican. And even today we can speak freely of a junta, without necessarily wanting one as our government. (Arlotto 1981: 185)

Further evidence for this idea of the unconscious-conscious/phonological-lexical correlative continua comes from the study of pidgins and creoles. If, as Hall (1962: 152) asserts, every contact situation requiring communication results in the development of a pidgin, it is unavoidable that conscious efforts will be made to "adopt" lexical elements and a "hypothesized" grammar. Successful acquisition of words is a distinct possibility. The grammar, however, will likely be some version from the speaker's own language; for all her effort, she will not be able to purge herself of unconscious grammatical constructions. A fortiori, it is not surprising that, disabusing ourselves of Hall and turning to the more paradigmatic cases of pidginization/creolization, we nevertheless find speakers both consciously and successfully achieving relexification, while patterning grammatical structures on previous models, thoroughly learned at some earlier time, and thus not readily available to conscious manipulation. (Decamp 1971: 13)[5]

The near-irrefrangibility of substrate phonology and grammar is largely due then to its unconscious character. So Arlotto, following his material quoted above, somewhat incorrectly asserts, "that the borrowed words are assimilated into the phonemic (or sound) system of the borrowing language" (1981: 185); at least if he fails to recognize the possibility that bilinguals, in attempting to reproduce the phonological patterns of the donor language, will do so "in a phonetic form as near that of the model language as he can," (Haugen 1950: 216) and may thereby either inadvertently import a new phoneme (as when veal, imported into English, created a minimal pair with feel resulting in the new phoneme v), or cause a shift in the phonological system of the recipient language through incomplete assimilation of the borrowed words. Thus Heath points out that loanwords with special phonological features may result in the development of two co-existing systems, and quotes Mathesius:

...on peut ordinairement constater que le son dont il s'agit ne participe pas aux différences phonologiques et, partant, n'appartient pas au système phonologique....montre bien qu'il existe une sorte de conscience d'une différence entre les éléments indigènes et étrangers. (Mathesius, quoted in Heath 1978: 27)

As mentioned (p.5), Thomason claims that substratum interference and borrowing have sharply different impact on languages. Her view is not incongruent with what I have presented here. Substrata effects are typically unconscious and affect the phonological, morphological, and syntactic structures of languages. Indirect, and to even greater degree, direct borrowing is a more deliberate process; if agents are not aware of having borrowed a word or phrase---if it "just happens"---they nevertheless are capable of recognizing, even appreciating, what they have done. This helps to account for the fact that in language acquisition by children

Adults can provide instruction in vocabulary, perhaps, but little else; young children don't really respond to grammatical correction. (Gleitman and Wanner 1982, cited in Jackendoff, p. 86)

Presumably, the same is true in the acquisition of other languages, most notably pidgins and creoles, but likely also to some extent for second-language acquisition (although here there is normally a supposition on the part of the learner that the grammar and phonology must be attended to in addition to the lexicon). That the lexicon is so amenable to correction is evidence for a greater degree of consciousness in this area.


The understanding of the influence of unconscious and conscious processes in language contact situations points up a fact mentioned earlier. It is essential to recognize that languages are used in communicative efforts between individuals. In the period leading to his 1859 publication, Darwin's recognition of the importance of the individual within a system provided him with a breakthrough in understanding the process of natural selection. (Mayr 1991: 74, 80) Before Darwin, most natural philosophers (or, as we might say, biologists and other scientists) believed in natural kinds, or "species." Whewell had incorrectly noted that it was at the level of species that one must work. (Giere and Westfall 1973: 127) Whinnom (1971) points out that early conceptions of the hybridization of languages were similarly misunderstood. Simply looking at language systems without attending to the fact that language systems are constituted by language speakers leads to egregious errors of analysis. As Whinnom (91) expresses it, "all linguistic changes start with individual speakers."

Nevertheless, it would be equally wrongheaded to fail to recognize the impact of the system in which the individual finds herself. This is just as true of linguistic systems as it is of ecosystems. In his attention to the individual in a species, Darwin recognized a dialectic between the individual and the system. (Mayr 1991: 86-87) In the case of languages, we may wish to view a lect as the mold which shapes the production of the speakers; the form which is thus produced, the individual speech act, will be the result of the interaction of both individual and system.

This dialectic further complicates the task of historical linguistics. Even though language contact may facilitate diffusion, it is not sufficient for speakers of one language to be in contact with speakers of another; the impact of system will also be an important factor.

Consider: We may note in Spanish the presence of the following three comparative forms.

  1. The comparative of equality (Compeq).
  2. The comparative of superiority (Compsup).
  3. The comparative of inferiority (Compinf)

Yaqui has borrowed both lexical items and grammatical patterns from Spanish for the production of both Compeq and Compsup. However, it has failed to do so for the Compinf form, preferring to use a construction which negates the Compeq form. Since Yaqui speakers were not averse to borrowing lexemes for than and more from Spanish, why not also borrow menos (less)? The answer is that

Were the Spanish lexeme menos que introduced into the Yaqui language, a restructuring of the whole comparison system would have to take place on the model of the Spanish system. (Lindenfeld 1971: 13-14)

Thus, Lindenfeld states, the semantic component of the system acted as a filter constraining grammatical borrowing.

Another kind of example is provided by Householder (1983)

Among the settlers in America or Australia, say, someone starts up a new plumbing business or tanning business with only limited knowledge of the ins and outs. When he has to talk...about something he is doing, he may make up new words for old ideas for which [there had been] well-established kyriolexias. (13)

This shows that not only does the linguistic system impact the individual speaker with respect to borrowing, but social systems must be taken into account. In the example above, there may exist no community from which to borrow, because a particular part of the social order has become extinct.

The enforcement of constraints on language use, and thus on the possibility of importation of forms from other languages, may be driven by literary or artistic institutions. (Levin 1987: 167)

Political systems also act either to facilitate or constrain interlingual influence and development, e.g. the case of language reform in China. (Cheng 1979) As with literature and art, the determination of prestige is the cause, for even when political (or chauvinistic) forces are at work, they are prestige-driven, at least in the minds of the governments or educational institutions responsible for their enforcement, if not always for the speakers forced to adopt them. Language status is often determined by social or political superiority. (Grimshaw 1971: 434; Thomason 1988: 117-18; Arlotto 1981: 204) In addition, governments may proclaim one or another language as the official language for a geographical area. When this happens, there may be attempts to purge the language of "outside influences" as is famously occurring in Modern French. Additionally, educational institutions will begin to teach only these languages. The result is that the native languages are negatively-impacted, while the officially-sanctioned language is positively-impacted both by law and, later, by the fact that it is the language of the well-educated. (Levin 1987: 165, 170-71)


We have seen examples of situations in which comparative reconstructive methods are incapable of explaining varieties of linguistic residue. The solution to the difficulties engendered a discussion of the behavior of speakers, specifically with respect to the unconscious and conscious substratal and borrowing phenomena. It was noted that these phenomena must be understood in light of the individual speakers of languages in their social, historical, and political settings, or systems. The dialectic of individual psychology and the formative influences of systems is what is responsible for the idiosyncratic nature, especially from the point of view of historical development, of various languages. >From this, a case could be developed for the necessity of interdisciplinary work in understanding systems and individuals in a dialectic, but that is the topic for another paper.


Arlotto, Anthony. 1981. Introduction to historical linguistics. Lanham: University Press of America, Inc.

Cassano, Paul Vincent. 1977. Substratum hypotheses concerning American Spanish. Word 28: 239-274.

Cheng, Chin-Chuan. 1979. Language reform in China in the seventies. Word 30: 45-57.

DeCamp, David. 1971. The study of pidgin and creole languages. In Hymes, ed., 13-39.

Giere, Ronald N., and Richard S. Westfall, eds. 1973. Foundations of scientific method: The nineteenth century. Bloomington: Indiana University Press.

Grimshaw, Allen D. 1971. Some social forces and some social functions of pidgin and creole languages. In Hymes, ed., 427-445.

Hall, Robert A., Jr. 1948. The life cycle of pidgin languages. Lingua 11:151-156.

Haugen, Einar. 1950. The analysis of linguistic borrowing. Language 26:210-231.

Heath, Jeffrey. 1978. Linguistic diffusion in Arnhem Land. (Australian Aboriginal Studies: Research and Regional Studies #13.) Canberra: Australian Institute of Aboriginal Studies.

Householder, F. W. 1983. Kyriolexia and language change. Language 59:1-17.

Hymes, Dell H., ed. 1971. Pidginization and creolization of languages. Cambridge: Cambridge University Press.

Jackendoff, Ray S. 1987. Consciousness and the computational mind. Cambridge: MIT Press.

Jeffers, Robert J., and Ilse Lehiste. 1979. Principles and methods for historical linguistics. Cambridge: MIT Press.

Levin, Saul. 1987. The perennial "language question" among the Greeks. General linguistics 27: 162-172.

Li, Charles. 1985. Contact-induced semantic change and innovation. In Jacek Fisiak, ed., Historical semantics: Historical word-formation. Berlin: Mouton Publishers, 325-337.

Lindenfeld, Jacqueline. 1971. Semantic categorization as a deterrent to grammatical borrowing: A Yaqui example. International Journal of American Linguistics. 37:6-14.

Mayr, Ernst. 1991. One long argument: Charles Darwin and the genesis of modern evolutionary thought. Cambridge: Harvard University Press.

Oswalt, Robert. 1971. The case of the broken bottle. International Journal of Linguistics 37: 48-49.

Rickford, John R. 1986. Social contact and linguistic diffusion: Hiberno-English and New World Black English. Language 62: 245-289.

Sapir, Edward. 1921. Language: An introduction to the study of speech. New York: Harcourt, Brace and World, Inc.

Silva-Corval‡n, Carmen. 1986. Bilingualism and language change: The extension of estar in Los Angeles Spanish. Language 62: 587-608.

Teeter, Karl V. 1963. Lexicostatistics and genetic relationship. Language 39: 638-648.

Thomason, Sarah Grey, and Terrence Kaufman. 1988. Language contact, creolization, and genetic linguistics. Berkeley: University of California Press.

Whinnom, Keith. 1971. Linguistic hybridization and the "special case" of pidgins and creoles. In Hymes, ed., 91-115.


  1. A good example of a difficulty for glottochronology specifically shows up in Leicester and Stafford English as they underwent norsification. The "rule" in effect seemed to eliminate the English words in favor of words sounding very similar, but of Norse origin. (See Thomason 1988; 291ff for a more thorough discussion.)
  2. CM, IR, and lexicostatistics are all "comparative" because they make comparisons: CM compares different languages towards hypotheses regarding protolanguages; IR compares forms within one language to reconstruct earlier stages; and lexicostatistics compares lists of words, usually for glottochronological determinations.
  3. Concerning the continuum from the unconscious to the conscious, some are nearly explicit, e.g., Heath, but Whinnom (1971) is typical of the confusion even whree he should note a distinction: with respect to an "ethological barrier" to linguistic diffusion.
  4. Or dialect, cf. Householder 1983:8; Haugen 1950:216; for an example of difficulties incorporating "foreign" phonemes, see especially Cassano 1977:244 ff.
  5. His comments on relexification occur on p. 25, while the remarks about the phonological aspects appear on p. 22.

© Copyright 2003, 2004, 2005, 2006, 2007, 2008 Unspun™. All Rights Reserved Worldwide.


Unique Visitors to Unspun™ Since It Started in 2003!