Formulaicity and Second Language Acquisition

Info: 8540 words (34 pages) Dissertation
Published: 9th Dec 2019

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Abstract

The current paper discusses formulaicity within the field of second language acquisition (SLA), with a particular focus on adult second language (L2) learners and their spoken production. It begins with a summary of Pawley and Syder’s (1983) seminal article, who proposed that formulaic language (termed memorized and lexicalized sentence stems) are largely responsible for the ease in which idiomatic expressions are selected by native speakers, and subsequently produced in a fluent manner. They proposed that these linguistic elements are the key to these abilities, and that the complex nature of these stems (contemporarily referred to as formulaic sequences (FS)) partly explains why L2 learners have difficulty in achieving nativelike usage of the target language. The paper explores the status and processing advantages of FSs in non-native speakers via psycholinguistic experimentation and FSs in learner speech production, where the evidence from these studies largely corroborates Pawley and Syder’s original hypothesis. Important and recurring themes include type and amount of exposure, and the evolution and refinement of methodologies used in formulaicity research to date, which are explored within the framework of usage-based theories and a closer look at a recent study conducted by Serrano, Stengers, and Housen (2015). The paper concludes with some pedagogical implications in light of what research can tell us about the exposure needed to acquire FSs, and how the explicit teaching and inclusion of FSs in the language curriculum may help bridge that gap. Furthermore, it concludes with some discussion about persistent conceptual and methodological concerns which must be addressed for future directions in formulaicity research, particularly for research involving the adult L2 learner population.

A Summary of Pawley and Syder (1983)

Thirty-five years ago, Andrew Pawley and Frances Syder (1983) published a ground-breaking paper titled “Two puzzles for linguistic theory: nativelike selection and nativelike fluency”. Their linguistic theory was an attempt to explain what previous theories of grammar (e.g., Chomsky, 1957;1965 as cited in Pawley & Syder, 1983) had not entirely captured (Pawley, 2007). They proposed that “memorized sentences and lexicalized and semi-lexicalized sentence stems” as elements of linguistic knowledge are key to understanding both the puzzle of nativelike selection: the ability of native speakers to select an idiomatic subset of constructions from the (nearly) infinite possible grammatically sound options, and the puzzle of nativelike fluency: the ability to produce and maintain fluent stretches of speech under the constraints of real-time interaction. As this paper will show, subsequent research as largely confirmed their hypotheses.

Memorized sentences, coming ready-made, or prefabricated, are those sentences which belong to an individual’s idiolect, whereas a lexicalized stem is a convention and belongs to a community of speakers. Furthermore, they lie on a cline, or continuum. That is to say, these stems remain open to a certain degree of creative generation. Where a lexical stem is inflexible (e.g., an idiom, such as kick the bucket), a semi-lexicalized stem contains obligatory categories, in addition to permissible variations termed inflections, with slots amenable to creative variation, named expansions. For example, the grammatical frame “NP be-TENSE SORRY to keep-TENSE you waiting”, can vary its inflections (She’s, Mr. X, I’m) and allows expansions (so, all this time; e.g., I’m so sorry to have kept you waiting all this time). Thus, stems can be conceptualized as a frame which are either completely fixed (the former) or subject to variation (the latter).

Pawley and Syder proposed that lexicalized stems are what enable nativelike selection, which is not the language speaker’s ability to learn and know the rules of the language system which generate all possible grammatical sentences, but instead the ability to distinguish which of these well-formed utterances are actually used by speakers. “Native speakers do not exercise the creative potential of syntactic rules to anywhere near their full extent” (Pawley & Syder, 1983 p.193), because to do so would include a slew of unidiomatic, odd, and “off” sounding phrases. Further, given the limitations of human memory and attention, time-bonding constraints in speech-exchange, and other aspects of communication attended to when conversing (e.g., the planning of larger units of discourse, sensitivity to the social context, register), Pawley and Syder propose it is unrealistic to posit that native speakers produce language from a purely rule-based system.

Thus, lexicalized stems are also the key to nativelike fluency. Pawley and Syder postulate that native control of language is reliant on the store of many hundreds of thousands of lexicalized sentence stems, stored whole. In fluent production, they are rapidly accessed from long-term memory, compensating for the limitations of working memory and the other demands that online communication entails.

Their paper concluded with a discussion of the implications for the theory of grammar in light of their hypothesis. They argue that the internal structure of many complex items must be specified twice in the description of linguistic knowledge in order to account for their dual-status in the language, because a lexicalized stem includes specification of its productive rules of syntax and semantics in addition to its special status as a culturally authorized concept, which sets it apart from other possible grammatical strings to express a similar meaning.

Pawley and Syder’s hypotheses have implications for second language (L2) learning and teaching. They argue that the great difficulty L2 learners have in acquiring language lies in the fact that permissible variation in nativelike use of language is far more restricted than traditional grammatical and syntactical rules alone. In addition to these constraints, the language learner is tasked with learning the unique mini-grammar of the lexicalized stem, where “each one is subject to a somewhat different range of phrase structure and transformational restrictions” (p.215). The complexity of these mini-grammars is why even proficient learners may fail to combine words in the way of native speakers.

Post-Pawley and Syder: Methodologies

Studies have shown that formulaicity is “all-pervasive” in language (Wray, 2002), where research has estimated that it makes up as much as 80% of general language (Altenburg, as cited in Wray, 2002), and 58.6% of spoken language (Erman & Warren, 2000). The important assumption here is that prefabricated language must have vital consequences for how language is acquired and processed because it is so ubiquitous in discourse (Nattinger & DeCarrico, 1992; Pawley & Syder, 1983).

There has been an explosion of research into formulaicity since Pawley and Syder’s (1983) paper, particularly in the last 15 years. Wray (2002) has identified some fifty terms used in the research to describe these phenomena (e.g., collocations, chunks, conventionalized forms, multiword units, lexical phrases, and more). One of the most commonly used definitions is Wray’s (2002) “formulaic sequence: A sequence, continuous or discontinuous, of words or meaning elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than subject to generation or analysis by the language grammar” (p. 9). This definition is comprehensive and emblematic of the commonalities across definitions. As such, the term formulaic sequence (FS) will be used in the current discussion.

FS research is intrinsically tied the conceptual definitions and research goals of the studies, and thus are varied. Native speaker (NS) intuition (as in Boers et al., 2006; Stengers et al., 2011; Wood, 2006), corpus measures (Shin & Nation, 2007), and taxonomies of formulaic language (both descriptive and corpus-derived; Biber, Conrad, Reppen, Byrd, & Helt, 2002; De Cock, 2004) are used by researchers to identify and study FSs. The current paper concentrates on a specific population: Adult L2 learners in both second and foreign language contexts, with a focus on FSs in speech production with the exception of a brief mention of a few psycholinguistic studies concerned with FS processing.

Corpus linguistics has played an important role in FS research, particularly in that it has helped demonstrate how very wide spread the regularity of phrases is in language by allowing massive amounts of language to be systematically analyzed (Schmitt & Carter, 2004). As Wray (2002) puts it, “corpus linguistics has upped the ante for the traditional accounts [Chomsky], revealing formulaicity, in its widest sense, to be all pervasive in language” (p.13). It has been useful in the classification of FSs and the differences and similarities in NS language across registers and modalities (Biber et al., 2002; Biber, Conrad & Cortes, 2004; Biber & Barbieri, 2007), learner language (Paquot & Granger, 2012), and the differences between the two (De Cock, 2004). Furthermore, it has helped address some problematic consequences of relying solely on intuition to identify FSs. For instance, Sinclair (2004) argues that speaker intuitions of language alone are fallible, fail to capture certain regularities in language, and must be accompanied by corpus data. While this strong view holds truth, and corpus measures have proven to be a useful and essential tool, its limitations must not be ignored.

Corpus-based research is driven by frequency, which cannot be a singular defining criteria of formulaicity, as frequency measures may fail to capture formulaic sequences that have a relatively low frequency of occurrence (Wray, 2000). Conversely, certain strings extracted on the basis of frequency alone will differ widely in perceptual salience from other strings identified via intuition. For instance, most lexical bundles are generally not idiomatic or structurally complete in the ways of other FSs, although they serve as important building blocks in discourse (Biber & Barbieri, 2007). In sum, no one method in the identification of FSs is fully foolproof, and a triangulation of methods is needed in order to obtain valid results (Schmitt, Grandage, & Adolphs, 2004; Read & Nation, 2004) along with more innovative and refined corpus-driven statistical measures which serve to better capture psychologically salient FSs (e.g., Appel & Trofimovich, 2017; Simpson-Vlach & Ellis, 2010).

Simpson-Vlach and Ellis (2010) address the need for improved measurements and suggest that the mutual information measure (MI), which assesses the degree to which the words in a phrase occur together more often than would be expected by chance, is essential in the identification of FSs, as most psychologically salient formulas cohere far more than chance predicts. More recent FS studies (i.e., Serrano, Stengers, & Housen, 2015, discussed in detail later in this paper) aim to incorporate MI for a more precise selection of FSs.

FSs and Processing Advantage

One of the key tenets of Pawley and Syder’s argument was that FSs are retrieved whole from memory (i.e., stored holistically) and so afford the speaker a processing advantage. Although it is widely asserted that formulaicity offers NS a processing advantage, the state of affairs is not so clear in non-native populations, with some researchers believing they likely operate very differently in adult L2 learners (e.g., Wray, 2002). While some earlier studies found little or no support that FSs afford L2 learners a processing advantage (Schmitt & Underwood, 2004; Schmitt, Grandage, & Adolphs, 2004) and others were inconclusive (e.g., Underwood, Schmitt, & Galpin, 2004) some more recent studies have successfully established a link.

Conklin and Schmitt (2008) compared self-paced reading times of FS (operationalized as idioms, which could have either a figurative and literal meaning, depending on the context) in native and non-native (NN) graduate students in the UK. Faster reading times (indicating faster processing) of FS were recorded for both NN and NS speakers, showing that NN speakers enjoy a processing advantage from certain FSs as well. Furthermore, this advantage was sustained for both literal and figurative idioms compared to equivalent non-formulaic sequences, suggesting that idioms are likely processed whole by default, but that their constituents can be dismantled and analyzed separately to access the literal meaning as needed, determined by the preceding context (text) in which they are found. This finding lends support to theories that propose a dual-processing route (Wray & Perkins, 2000; Wray, 2002), in that once a formulaic sequence has been registered and stored in long-term memory, the fastest and most efficient way of processing it and retrieving it is in its whole form, but it can also be broken down into its constituents and analyzed sequentially when needed.

Another study that reported similar findings in a comparable population of participants is Jiang and Nekrasove (2007). They used grammaticality judgement tasks to investigate native and NN speakers’ reaction times (RTs) and error-rates when reading formulaic, non-formulaic, and ungrammatical strings. The FSs were selected from other corpus-based studies, and included formulas such as on the contrary and to tell the truth. Shorter RTs and lower error rates were reported for FSs in both NS and NNS groups as evidence of holistic storage and processing.

Although some evidence of both intuition and corpus-based FSs have been found to offer a processing advantage for the adult L2 learner population, other studies (i.e., those mentioned above) were found to be inconclusive or offering no such advantage. In a recent article, Myles and Cordier (2017) offer some insights as to why this may be, in light of the way FSs have been conceptualized and identified in studies which aim to understand how FSs are processed by adult L2 learners. This will be discussed in more depth in the concluding section of the paper.

Another way of assessing the status of FSs and the advantage they are thought to provide is through the investigation of their effect on fluency, in learner speech production. The following section will review studies which have linked the use of FSs with fluency.

Formulaic Language L2 Speech Production

Another tenet of Pawley and Syder’s (1983) paper was that the processing advantage FSs afford speaking leads to greater (oral) fluency. Studies which have directly investigated this claim have approached the issue in very different ways and in different contexts. The first two studies (Wood, 2006; Stengers, Boers, Housen, & Eyckmans, 2011) included no specific intervention of explicit instruction of formulaic sequences, therefore it was assumed that any learning of them had been implicit. Additionally, neither used corpus-measures to identify FSs, and instead employed native-speaker intuition. However, they differ in some important ways. First, Wood’s study was small-scale (11 participants) and longitudinal in an ESL context, whereas Stengers et al. (2011) took place in a foreign language context (60 Dutch L1 participants), with half learning Spanish as a second language and the other half learning English. In addition to investigating how FSs contribute to oral proficiency, Stengers et al. were interested in seeing if these associations were equally strong for two typologically different languages. Moreover, Wood’s FSs identification criteria allowed for the inclusion of non-target idiosyncratic FSs (e.g., thanks God, what’s happened) whereas Stengers et al. (2011) limited their analysis to target nativelike forms.

Wood (2006) investigated the role of FSs in the development of speech fluency over time, and data was collected in a repeated measures oral narrative retell task of three silent films from 11 intermediate international students enrolled in an intensive ESL program at a Canadian university over the course of six months. Fluency gain was measured by four temporal variables in speech (Phonation/time ratio, speech rate, articulation rate, mean length of run) as well as a variable linking mean length of run (MLR: syllables uttered between pauses) to the amount of FSs used by participants, the formula/run ratio (FRR). The FRR was key to Wood’s hypothesis, as it was this quantitative measure used to test the assumption that a learner’s growing repertoire of FSs positively influenced the growth of MLR. Three trained NS judges identified the FSs in the participants’ speech according to several criteria, with the two most important being phonological coherence (e.g., no internal pausing) and reduction (e.g., reduction of syllables) and greater length and complexity than other runs. Wood found that the participants’ use of FSs played a clear role in the development of speech fluency over time, having both strategic and performance related uses such as use of FSs to extend a run, and repeated reliance on one formula. Furthermore, FSs performed pragmatic and functional uses such as self-talk (e.g., I don’t know, I think so) and rhetorical devices (e.g., at the beginning, the end of the story).

Although no NS group was included for comparison, overuse of certain FSs were observed in some participants. They seemed to rely on particular FSs by using the same ones repeatedly, but this overuse is seen as a positive and valuable strategy as it contributes to fluent length of runs. The strategy is in place, and the overuse can eventually be overcome if learners are able to expand the variety of FSs in their repertoires, either through more exposure and implicit acquisition of them or via explicit instruction.

Stengers, Boers, Housen, and Eyckmans (2011) investigated if the use of formulaic sequences was positively associated with oral proficiency (fluency, range of expression, and accuracy) in a story retell task. Both oral proficiency and formulaic sequence counts were assessed by experienced language teachers and NS trained in linguistics, respectively (three for each language). Results revealed positive correlations between number of formulaic sequences and oral proficiency measures as judged by experienced teachers for both L2 groups, but this relationship was stronger for the English L2 cohort. English L2 groups used both more types and tokens than the Spanish L2 group, and the Spanish L2 group made significantly more inflectional errors overall. It was concluded that the greater importance of inflection in Spanish (a synthetic language) may make the acquisition of formulaic sequences more difficult under real time conditions, and that it make take more time to acquire them in these languages compared with an analytic language such as English.

Another study which explored formulaicity and oral production did so within a heavily pedagogic framework. Boers, Eyckmans, Keppel, Stengers, and Demecheleer (2006) conducted a study in an English as a Foreign Language (EFL) context with two goals in mind: first, to investigate whether use of collocations and idiomatic expressions contribute to perceived oral (fluency, range of expression, and accuracy), and second, to test whether awareness-raising techniques contribute to the growth of FSs in the participants repertoire over time. Findings show that the experimental group used more FSs overall, and were judged to be more orally proficient than the control group taught with the traditional grammar-lexis approach, for both fluency and range of expression, but not for accuracy.

Thus, research to date indicates FSs may contribute to L2 oral fluency; however, they may also be underused, overused, or misused by L2 speakers. The next section addresses this issue.

Underuse, Overuse, and Misuse

FSs perform important pragmatic functions and help pave the way for acceptance into the speech community (Boers & Lindstromberg, 2009; Lundell & Erman, 2012; Mugford, (2017). In early stages of second language acquisition, formulaic sequences are beneficial to learners in that they provide them with a means to start and sustain the verbal interactions necessary for language growth (R. Ellis, 1984; Ortega, 2013). However, FSs pose difficulties for even very advanced L2 speakers, and studies have shown that their use is still far from that of NS in that they do not seem to use as many, or with as much variability (Siyanova & Schmitt, 2007; Serrano et al., 2015). Learners may overuse certain forms (referred to in the literature as “islands of safety”, coined by Dechert, 1985, as cited in Boers & Lindstromberg, 2009); “islands of reliability” Granger, (1998); and “phrasal teddy bears” (Ellis, 2012), underuse others, and misuse others yet (DeCock, 2004; Syanova & Schmitt, 2007).

With the goal to compare native and NN speakers’ frequency of FS use, De Cock (2004) conducted a large-scale corpus analysis of L1 French EFL learners and NS spoken data. FSs sequences were extracted according to frequency measures, and further analyzed qualitatively by the researcher. Findings show that NNS use differs considerably from that of NS. For example, NNS significantly underuse markers of vagueness (e.g., sort of, and things like that), which play a significant role in informal spoken discourse. In fact, the NNS in De Cock’s study used approximately half as many markers of vagueness in their speech compared with NS. Additionally, the ones NNS did use tended to be more formal, typically reserved for written discourse, and thus less appropriate for speech. NNS overuse was also observed, for example, learners over relied on yes, of course or of course. These patterns of use are important to address; their usage may negatively affect learners in that they may be perceived as sounding pedantic or formal on the one hand, and distant or even rude on the other.

Lundell and Erman (2012) found patterns of underuse in advanced Swedish L1 English and French speakers who were living and fully immersed in the target language communities. In order to investigate the extent to which L2 speech is conventionalized in terms of formulaic language, participants participated in a role-play. Although participants were found to have a significant repertoire of FSs, the findings showed that NNS FS use did not overlap with NS use, and that they used less conventionalized forms when making requests. Overall, they found that both NNS groups used approximately half the amount that the NS comparison did. In particular, they underused lexical and syntactic downgraders (e.g., a little bit, I was wondering). These patterns of underuse conspire to make the NNS seem more direct to their interlocutors, where the directness may be perceived as rude or inappropriate for the context of the conversation.

Erman, Denke, Fant, and Lundell (2015) compared multi-word sequences (MWS) in two oral tasks, one dialogic (role-play) and one monologic (online retelling of a silent movie) across three groups of L1 Dutch long-residency L2 English, French, and Spanish speakers. A native-speaker benchmark group was also included. Although findings indicated that the L2 groups approached nativelike use of MWS in as social routines in the roleplay, highly restricted collocations were underrepresented in all three L1 groups. The results of this study and those reported above indicate FSs continue to be problematic for learners even after significant exposure and immersion in the target language, with evidence to support this coming from not only L2 English but other languages as well. The next section will expand on the issue of exposure within the framework of contemporary usage-based theories.

Usage-based Learning and Exposure

In line with Pawley and Syder’s (1983) rejection of top-down models of acquisition, usage-based theories (e.g., Ellis, 2008; Ellis, 2003) view first and second language acquisition as a sequentially bottom-up process, where the human language system emerges from the language experience we gain from our relevant interaction with others. Ellis (2017) stresses that human language demonstrates remarkable regularity, and any model of acquisition must include an account of these usage distributions, which likely develop out of formulaic language. Ellis posits that the acquisition of highly frequent formulas carry important functional and semantic cues which allow us to then extract low-scope patterns or limited-scope slot-and-frame patterns, which in turn gradually become a productive schema encoded in memory (Ellis, 2003; 2012). Thus, acquisition is considered to be a bottom-up process, which consider formulaic language at the bottom and creative rule formation processes at the top (e.g., Pawley & Syder, 1983; Nattinger & DeCarrico, 1992). In other words, according to usage-based theories, language is learned by our ability to extract patterns from the input (Schmitt & Carter, 2004). Usage-based models put formulaic language at the center of language acquisition (Weinert, 2010), and echo and expand on Pawley and Syder’s (1983) hypothesis that memorized sentences are the normal “building blocks” of fluent spoken language.

In second language learning these processes and the acquisition of formulaic sequences likely operate very differently from how they do in L1 acquisition, due to a long list of variables, e.g., cross-linguistic influence and language typology (Stengers et al., 2011; Erman et al., 2015), acculturation and aptitude (Dornyei, Durrow, & Zahran, 2004), motivation (Fitzpatrick & Wray, 2006). Arguably one of the most important variables at play that account for the difficulties adult (and particularly late) L2 learners have in acquiring FSs is the amount of exposure they are afforded. Studies post-Pawley and Syder (1983) have largely shown that while FSs offer L2 learners a processing advantage and contribute to fluent speech production, they are often underused, overused, or misused in these populations. However, only recently have researchers begun to look at what kind of exposure might lead to successful acquisition of these linguistic building blocks.

Serrano, Stengers, and Housen (2015)

In a recent study, Serrano, Stengers, and Housen (2015) directly address how two types of exposure facilitate or not the implicit learning of formulaic sequences (no treatment, i.e., no explicit teaching of formulaic sequences). Their goal was to explore if and how the time distribution and concentration of a total of 110 hours of instruction (intensive = four and a half weeks VS regular=seven months) affects the acquisition of formulaic sequences (FSs) in the EFL classroom context across three adult groups of undergraduate students and young professionals of different proficiencies: beginner (intensive N=14; regular N=21), intermediate (intensive N=22; regular N=22), and advanced (intensive N=22; regular N=23). Additionally, a native English speaker (NES) comparison group (N=12) was included in order to observe how advanced learners differed from NS in attainment of FSs overall and under the different instruction schedules.

The study was conducted in Catalonia, Spain, and as such the participants shared either Catalan or Spanish as an L1. The learners’ use of FSs was examined by the analysis of their L2 performance in an oral narrative six picture serial description task. In a pre-test / post-test design, the intermediate and advanced participants completed the test twice (the beginner groups were exempt as they were true beginners, and the NS were included as a benchmark group for comparison to the advanced learners), once directly before and another after the courses. FSs were identified according to corpus-based lexical frequency information from the COCA, in addition to the verb collocations’ Mutual Information scores (a statistical measure of the “strength of association” between words), recurrent and frequent structures of VP, VN, VPN, VPP, and further classified according to their function: Discourse-structuring and fluency devices, informed by the taxonomies of Nattinger and DeCarrico (1992), and Biber, Johansson, Leech, Conrad, and Finegan (1999) although due to the relatively low number of sequences in each category, the total number of formulas were analyzed, regardless of their classification. Certain very high frequency verbs such as make and have frequently collocate with other words, which lowers their MI score. In such cases, where certain verb sequences had a low MI score but the researchers intuited that they were nonetheless formulaic, they were checked against the Oxford Collocation Dictionary to confirm their FS status. Pauses and word fillers such as uhm were ignored, while only target, error-free sequences were analyzed. Both tokens (number) and types (range) were considered, and a ratio of FSs as opposed to raw scores was used in order to control for differing narrative lengths.

Results show that under certain conditions, the concentration of time distribution (intensive) promotes the acquisition of FSs, but not in all circumstances. In the case of the beginner groups, the intensive students produced more FSs than the regular, although this difference reached significance with tokens, and not types. The researchers conclude that this could be an indication of beginner students overusing familiar, safe language, which serve as “islands of reliability” (Granger, 1998). The most robust results were observed with the intermediate learners in the intensive program, who used more FSs than those in the regular program at post-test. The intensive time distribution condition is most beneficial for learners at the intermediate level, especially when compared to the advanced groups. In contrast, the advanced learners in the regular program outperformed those under the intensive condition, showing that they simply do not benefit from intensive courses as the intermediate appear to. In the comparison of advanced learners and NES, no ceiling effects were found at pre-test, meaning the advanced speakers where not nativelike in their production of FSs to begin with, while post-test results demonstrate that regardless of the program type, advanced learners do not produce nearly as much FSs as NS do.

The criteria used for the selection of FSs in Serrano et al. (2015) is particularly refined in that they apply an eclectic approach in their identification and selection of the sequences. Ellis, Simpson-Vlach, and Maynard (2008), in a series of psycholinguistic experiments, demonstrated that statistically defined and extracted formulaic sequences from large corpora do in fact have clear educational and psycholinguistic validity, for both native and NN speakers. Importantly, they found that native and NN speakers are sensitive to different measures, and that these sensitivities directly impacted the processing of the formulas. Where NN speakers were more attuned to the frequency of the string, the (MI) score was the major determinant for NS speakers. They attribute these disparate sensitivities to the fact that the NNS have not yet sampled enough of the language, and require much more exposure

Serrano et al.’s (2015) methodological approach of triangulation demonstrates how far research concerning FSs sequences has evolved since Pawley and Syder’s (1983) original hypothesis.

Pedagogical implications

The type of exposure that the learners are afforded is an important component in the ordering and emphasis of which constructions are to be taught. Ellis (2012) distinguishes between formulas that are seeds of learning and those that are targets for learning, suggesting that readily learnable, high frequency, prototypically functional phrases are not only what learners are likely to latch on to, the phrasal teddy bears, but also serve as construction seeds. The best targets for learning, depending on the context of exposure, are the more challenging less frequent, less prototypical, and often non-transparent formulas that are not readily picked up from limited exposure to the language. Thus, models which focus on lexical phrases that fulfill important pragmatic and discourse functions may be more relevant for learners who are immersed in the target language, whereas the lexical approach (Lewis, 1993; 1997; 2000, as cited in Boers & Lindstromberg, 2009), which involves strategy development, may be more relevant to classroom-taught learners in the foreign language environment (Boers & Lindstromberg, 2009).

Beyond the exposure challenge, one of the most important ways in which L1 and L2 acquisition processes differ, according to Wray (2002), is that adult learners have largely abandoned their primary holistic mode of learning in exchange for a more analytic mode. Namely, she proposes that the adult language learner is prone to decomposing the individual constituents of formulaic sequences due to biological and cultural factors, and have moved away from the bottom-up processes that are thought to be sufficient for L1 acquisition (Boers & Lindstromberg, 2009). When the adult learner breaks up the patterns from input, they are then subject to analysis. This makes them difficult to remember as wholes, thus when they are recombined in production, the learner’s interlanguage rules are applied to the string, which compromises the accuracy of the reproduction (Fitzpatrick and Wray, 2006). However, others (e.g., Boers and Lindstromberg, 2009; Myles and Cordier, 2017) do not believe the gap between how L1 and L2 adult learners acquire language (and more specifically, FSs) to be so large. For instance, Boers and Lindstromberg (2009) argue that L2 adult learners continue to acquire large stores of lexical phrases with continued exposure and practice of the language. Furthermore, they propose that the dearth of these compared with NS is in fact due to the opportunities, or lack thereof, to learn these sequences as a direct consequence of exposure. Although the issue of exposure is a difficult barrier of acquisition to overcome, the explicit teaching of FSs may help bridge this hurtle. Implications for L2 pedagogy include the importance of regular and varied exposure to the second language, as well as explicit teaching and awareness-raising of FSs and their functions.

Conclusions: From Then Till Now

The evolution of robust methodologies since the publication of Pawley and Syder’s (1983) paper have allowed the way FSs contribute to language learning and retrieval to be better understood. Psycholinguistic methodologies and corpus-based linguistics have allowed more precise investigation into how FSs are processed, and improvements in measures of fluency have allowed FSs contributions to oral production to be tested empirically. Serrano et al.’s (2015) approach is an exemplary example of the triangulation of identification measures, which cast a wider net to capture formulaic sequences, a necessary step in the right direction in order to obtain valid and generalizable evidence of how FSs are acquired, processed, and used by L2 learners.

However, although the methodologies have evolved, they are still problematic, which is perhaps best demonstrated in in the mixed picture of how FSs are processed in adult L2 learners. Conflicting and inconclusive evidence gathered from psycholinguistic experimentation such as those mentioned earlier in this paper (i.e., Schmitt & Underwood, 2004; Schmitt, Grandage, & Adolphs, 2004; Underwood, Schmitt, & Galpin, 2004), lend support to the view that adult L2 learners have largely abandoned their holistic bottom-up processes (Wray, 2002). Myles and Cordier (2017) argue that the most difficult aspect of the way in which the processing of FSs has been studied in L2 learners lies in the conceptual views of formulaicity, and the methodologies used to investigate it. The crux of their argument is that due to the conflation of definitions of what FSs are, and the fact that they have been lumped together under one umbrella term, likely have researchers comparing apples and oranges, particularly with advanced adult L2 learners. Crucially, they argue that the conceptual and methodological frameworks used to date in much of the research have varied so much and have thus been unclear in precisely what aspects of formulaicity were under investigation. They explicitly draw a distinction between “learner-internal” and “learner-external” (originally put worth by Wray, 2008). Learner-external definitions center around aspects of the language itself, such as semantically or syntactically irregular forms, or those that have been conventionalized by a speech community due to high frequency of co-occurrence. Learner-internal definitions, on the other hand, are those that afford an individual speaker a processing advantage either because they are stored whole, or have become highly automatized. They stress that it is the latter type of sequence that needs to be studied in more depth for adult L2 learners, because the external and internal FSs found in NS data are not likely to be present in the L2 context, and research which conflates the two are really just measuring how much L2 learners have appropriated (or not) externally-defined FSs (this is nicely illustrated by the notion of overuse, underuse, and misuse discussed earlier). They argue that NNS do use chunking processes, and that these processes must be better understood, especially if certain FSs function as “seeds” which contribute to more complex, abstract constructions, as is the view in recent used-based theories (Ellis, 2012; Myles & Cordier, 2017).

Although the research in FSs has evolved considerably since Pawley and Syder’s (1983) seminal article, there are still issues which need to be addressed on both the conceptual and methodological plane. More research needs yet to be conducted with adult L2 learners in order to better understand how these phenomena are acquired and processed, and how their teaching can be effectively incorporated into language classrooms.

References

Appel, R., & Trofimovich, P. (2015). Transitional probability predicts native and non-native use of formulaic sequences. International Journal of Applied Linguistics, 27(1), 24-43. https://doi.org/10.1111/ijal.12100

Biber, D., & Barbieri, F. (2007). Lexical bundles in university spoken and written registers. English for specific purposes, 26(3), 263-286. https://doi.org/10.1016/j.esp.2006.08.003

Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied linguistics, 25(3), 371-405. https://doi.org/10.1093/applin/25.3.371

Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2002). Speaking and writing in the university: A multidimensional comparison. TESOL quarterly, 36(1), 9-48. https://doi.org/10.2307/3588359

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of Spoken and Written English. Harlow, Essex: Pearson Education Ltd.

Boers, F., Eyckmans, J., Kappel, J., Stengers, H., & Demecheleer, M. (2006). Formulaic sequences and perceived oral proficiency: Putting a lexical approach to the test. Language teaching research, 10(3), 245-261. https://doi.org/10.1191/1362168806lr195oa

Boers, F. & Lindstromberg, S. (2009). Optimizing a lexical approach to instructed second language acquisition. Basingstoke, UK: Palgrave Macmillan. https://doi.org/10.1057/9780230245006

Conklin, K., & Schmitt, N. (2008). Formulaic sequences: Are they processed more quickly than nonformulaic language by native and nonnative speakers? Applied Linguistics, 29, 72-89. https://doi.org/10.1093/applin/amm022

De Cock, S. (2004). Preferred sequences of words in NS and NNS speech. Belgian Journal of English Language and Literatures, New Series, 2, 225-46.

Dörnyei, Z., Durow, V., & Zahran, K. (2004). Individual differences and their effects on formulaic sequence acquisition. In N. Schmitt (Ed.), Formulaic sequences (pp. 87-106) Amsterdam: John Benjamins. https://doi.org/10.1075/lllt.9.06dor

Ellis, N. C. (2003). Constructions, chunking, and connectionism: The emergence of second language structure. In C. Doughty & M. H. Long (Eds.), Handbook of second language acquisition (pp. 63-103). Oxford: Blackwell. https://doi.org/10.1002/9780470756492.ch4

Ellis, N. C. (2008a). Optimizing the input: Frequency and sampling in usage-based and form-focussed learning. In M. H. Long & C. Doughty (Eds.), Handbook of second and foreign language teaching. Oxford: Blackwell. https://doi.org/10.1002/9781444315783.ch9

Ellis, N. C. (2012). Formulaic language and second language acquisition: Zipf and the phrasal Teddy Bear. Annual Review of Applied Linguistics, 32, 17-44. https://doi.org/10.1017/S0267190512000025

Ellis, N. C. (2017, March). Usage-based approaches to language, language acquisition, and language processing. Paper presented at the Institut des Sciences Cognitives, UQAM, Montreal, QC.

Ellis, R. (1984). Formulaic speech in early classroom second language development. In J. R. Handscombe, R. A. Orem, & B. P. Taylor (Eds.), On TESOL ’83 (pp. 53-65). Washington DC: TESOL.

Erman, B., Denke, A., Fant, L., & Lundell, F. F. (2015). Nativelike expression in the speech of long-residency L2 users: A study of multiword structures in L2 English, French, and Spanish. International Journal of Applied Linguistics, 25(2), 160-182. https://doi.org/10.1111/ijal.12061

Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle. Text-Interdisciplinary Journal for the Study of Discourse, 20(1), 29-62. https://doi.org/10.1515/text.1.2000.20.1.29

Fitzpatrick, T., & Wray, A. (2006). Breaking up is not so hard to do: Individual differences in L2 memorization. Canadian Modern Language Review, 63(1), 35-57. https://doi.org/10.3138/cmlr.63.1.35

Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and formulae. In A.P. Cowie (Ed.), Phraseology, theory, analysis and applications (pp. 145-60). Oxford: OUP.

Jiang, N. A. N., & Nekrasova, T. M. (2007). The processing of formulaic sequences by second language speakers. The Modern Language Journal, 91, 433-445. https://doi.org/10.1111/j.1540-4781.2007.00589.x

Lundell, F., & Erman, B. (2012). High-level requests: A study of long residency L2 users of English and French and native speakers. Journal of Pragmatics, 44(6), 756-775. https://doi.org/10.1016/j.pragma.2012.02.010

Mugford, G. (2017). Formulaic language and EFL requests: Sensitive wording at the right time. Profile Issues in Teachers Professional Development, 19(2), 29-39. https://doi.org/10.15446/profile.v19n2.57428

Myles, F., & Cordier, C. (2017). Formulaic sequence (FS) cannot be an umbrella term in SLA: Focusing on psycholinguistic FSs and their identification. Studies in Second Language Acquisition, 39, 3-28. https://doi.org/10.1017/s027226311600036x

Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford, UK: Oxford University Press.

Ortega, L. (2009). Understanding second language acquisition. London: Hodder Education. https://doi.org/10.4324/9780203777282

Paquot, M., & Granger, S. (2012). Formulaic language in learner corpora. Annual Review of Applied Linguistics, 32, 130-149. https://doi.org/10.1017/s0267190512000098

Pawley, A., Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. C. Richards, & R. W. Schmidt (Eds.), Language and communication (pp. 191-226). New York: Longman.

Pawley. A (2007). Developments in the study of formulaic language since 1970: a personal view. In P.Skandera (Ed.), Phraseology and culture in English. (pp. 3-34). Berlin: Mouton de Gruyter. https://doi.org/10.1515/9783110197860.3

Read, J., & Nation, P. (2004). Measurement of formulaic sequences. In N. Schmitt (Ed.): Formulaic sequences (pp. 23-35). Amsterdam: John Benjamins. https://doi.org/10.1075/lllt.9.03rea

Schmitt, N. (Ed.). (2004). Formulaic sequences. Amsterdam: John Benjamins. https://doi.org/10.1093/applin/ami018

Schmitt, N. and N. Carter. (2004). Formulaic sequences in action: An introduction. In N. Schmitt (Ed.), Formulaic sequences (pp. 1-22). Amsterdam: John Benjamins. https://doi.org/10.1075/lllt.9.02sch

Schmitt, N., S. Grandage, and S. Adolphs. (2004). Are corpus-derived recurrent clusters psycholinguistically valid? In N. Schmitt (Ed.), Formulaic sequences (pp. 127-51). Amsterdam: John Benjamins. https://doi.org/10.1075/lllt.9.08sch

Schmitt, N. and G. Underwood. (2004). Exploring the processing of formulaic sequences through a self-paced reading task. In N. Schmitt (Ed.), Formulaic sequences (pp. 173-89). Amsterdam: John Benjamins. https://doi.org/10.1075/lllt.9.10sch

Serrano, R., Stengers, H., & Housen, A. (2015). Acquisition of formulaic sequences in intensive and regular EFL programmes. Language Teaching Research, 19, 89-106. https://doi.org/10.1177/1362168814541748

Shin, D., & Nation, P. (2007). Beyond single words: The most frequent collocations in spoken English. ELT journal, 62(4), 339-348. https://doi.org/10.1093/elt/ccm091

Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list (AFL). Applied Linguistics, 31, 487-512. https://doi.org/10.1093/applin/amp058

Sinclair, J. (2004). Intuition and annotation-the discussion continues. Language and Computers, 49, 39-60. https://doi.org/10.1163/9789004333710_004

Siyanova, A., & Schmitt, N. (2007). Native and nonnative use of multi-word vs. one-word verbs. International Review of Applied Linguistics, 45, 119-139. https://doi.org/10.1515/iral.2007.005

Stengers, H., Boers, F., Housen, A., & Eyckmans, J. (2011). Formulaic sequences and L2 oral proficiency: Does the type of target language influence the association? IRAL-International Review of Applied Linguistics in Language Teaching, 49(4), 321-343. https://doi.org/10.1515/iral.2011.017

Underwood, G., Schmitt, N., & Galpin, A. (2004). The eyes have it: An eye-movement study into the processing of formulaic sequences. In N. Schmitt (Ed.), Formulaic sequences (pp.153-172). Amsterdam: John Benjamins. https://doi.org/10.1075/lllt.9.09und

Weinert, R. (2010). Formulaicity and usage-based language: linguistic, psycholinguistic and acquisitional manifestations. In D. Wood (Ed.), Perspectives on formulaic language: Acquisition and communication (pp. 1-20). London/New York: Continuum.

Wood, D. (2006). Uses and functions of formulaic sequences in second language speech: An exploration of the foundations of fluency. Canadian Modern Language Review, 63(1), 13-33. https://doi.org/10.3138/cmlr.63.1.13

Wray, A. (2000). Formulaic sequences in second language teaching: Principle and practice. Applied linguistics, 21(4), 463-489. https://doi.org/10.1093/applin/21.4.463

Wray, A. (2002). Formulaic language and the lexicon. Cambridge, UK: Cambridge University Press.

Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford, UK: Oxford University Press.

Wray, A., & Perkins, M. (2000). The functions of formulaic language. Language and Communication, 20, 1-28. https://doi.org/10.1016/s0271-5309(99)00015-4