Think-aloud Verbal Protocols to Explore Foreign Language Anxiety

Think-aloud verbal protocols (TAPs) have been used in a number of language research studies to investigate thought processes and emotions related to L2 learning. However, there has been major criticisms on the use of TAPs in the literature. A case for the use of TAPs is made here, particularly when exploring language learning factors which cannot be observed fully using more traditional research instruments. For instance, foreign language anxiety (FLA) is being explored as a fundamental variable to contend with when learning a foreign language in relation to students’ perception of the virtual learning environment (VLE). This study is thus seeking to deepen our understanding of learners’ emotions and thoughts, and the experience of learning pronunciation at a distance. As technology continues to undergo rapid change, so do pedagogical applications to language learning. The qualitative findings from the present study offer thus an invaluable resource for providing insights into what it is like to learn pronunciation outside the classroom.


Think-aloud verbal protocols (TAPs) have already been used in a number of language research studies to investigate thought processes related to L2 learning. For instance, Hurd mentions a dozen language research studies in her literature review on TAPs (2007a, p. 244). More recently, Woore (2010) conducted an exploratory study to investigate strategies used by participants when pronouncing unknown French words. One of the distinctive and unique advantage of TAPs is that they are applied in real time whereas other forms of self-report such as diaries and surveys are retrospective. The data is therefore potentially more accurate as it has not been subject to mediation. However, there has been major criticisms on the use of TAPs in the literature (see Bowles, 2010), the three main ones being about automaticity, reactivity and addressivity, which will all be discussed in this paper.

The aim of this study is two-fold; first, a case for the use of TAPs will be made, particularly when exploring language learning features which cannot be observed fully through other research instruments. The second aim is to look at foreign language anxiety (FLA) as a fundamental variable to contend with when learning a foreign language in relation to students’ perception of the virtual learning environment (VLE). The paper will start with a review of FLA and the VLE as two variables impacting on language learning and then open up the TAPs debate.

In 2011, a study involving distance learning students on a Beginners’ French course (n = 590) explored the relationship between their phonological attainment, FLA, learning strategies and the VLE. A correlation between FLA and phonological attainment was reported in a previous paper (————- & Hurd, 2016) but a large body of qualitative data had also been yielded and it is the latter that will be used to address the following research question: ‘To what extent can TAPs be used as a valid research tool to investigate FLA that distance learners may experience when practising pronunciation in the VLE?’ The contribution of this study to the distance language learning field lies in its emphasis on TAPs, not only as a valid research tool, but also as a unique and more useful way to obtain data on internal processes. Learning pronunciation is also an area that is under-researched, particularly when examined in relation to the added difficulty of the virtual learning environment (VLE) and the data obtained in this study will help researchers and practitioners alike to make sense of what learners go through when coming to terms with affective issues in a non-traditional learning environment.

Literature Review

Foreign Language Anxiety

Foreign Language Anxiety (FLA) has been identified as one of the key affective factors having an effect on second language acquisition. As Dörnyei states, citing Arnold and Brown (1999), ‘anxiety is quite possibly the affective factor that most pervasively obstructs the learning process’ (2005, p. 198). Young (1990) adds that students feel the highest level of anxiety when speaking the foreign language. This is one of the reasons why FLA has gained more prominence in SLA research for the last 30 years. Instruments have been developed for the sole purpose of measuring FLA in the foreign language class, such as the French Use and French Classroom Anxiety scales (Gardner, 1985), the Input-Processing-Output scale (MacIntyre and Gardner, 1994) and, perhaps the most well-known FLCAS from Horwitz et al. (1986, p. 129).

Koch and Terell (1991) also remark that a lot of studies about language learning have shown a relationship between FLA and language proficiency (see also Mak, 2011; Rodriguez and Abreu, 2003). However, FLA can be detected not just in speaking but also in other receptive and productive skills (see Zhang, 2013, for a possible causal relationship between foreign language listening anxiety and listening performance, and Woodrow, 2011, on the relationship between writing performance and anxiety). Nevertheless, fears about oral communication are more of a salient feature of speaking in a foreign language than fears about reading and writing skills, including sub-skills such as pronunciation. For instance, ———– and Hurd (2016) found a correlation between pronunciation attainment and FLA. The present study aims to shed more light on this relationship through qualitative data obtained using think-aloud protocols (see MacIntyre and Gregersen, 2012, for a recent evaluation of FLA), and by this, adding to the body of literature showing TAPs as a valid and useful research instrument.

According to MacIntyre and Gregersen (2012), language anxiety conveys the idea of negative emotions related to fear and worry during language use. However, anxiety in language learning can be distinguished from trait anxiety (a personality characteristic more or less permanent), state anxiety (a temporary emotion felt at a particular moment in a given situation) and situation-specific anxiety (anxiety felt in specific and isolated events such as exams and when giving an oral presentation) (Ellis, 2008, p. 691).

Horwitz et al. argue that, given the distinctive context of a language classroom and the learning process itself, the anxiety felt by students when learning a language will be different in terms of the interplay between emotions, beliefs and the way learners view themselves (1986, p. 128). We agree with Hampel et al. (2005) who acknowledge that it is difficult to distinguish between all these types of anxieties unless one uses an extremely intricate research methodology. On the other hand, Arnold (2007) investigated the impact of computer-mediated communication (CMC) on L2 communication apprehension which he notes as being conceptually synonymous to FLA by some (MacIntyre et al., 2002; MacIntyre and Charos, 1996) and as a component of FLA by others (Horwitz et al., 1986).

Moreover, Scovel (1991) quotes Kleinmann’s study which contended that anxiety cannot be merely qualified as high or low but that a distinction should be made between a temporary state of anxiety and a personality trait. De Los Arcos, Coleman and Hampel (2009) came to the conclusion in their study which looked at the impact of online settings on this emotion that FLA is no longer a psychological state but a social construct. The interrelationship of the virtual environment with learners’ affect, in particular their FLA emerged as an important aspect of the present study but it has to be noted that we are also investigating FLA in relation to learning in solo mode. Adopting the situation-specific aspect of FLA would therefore be too complex for the scope of this study. The MacIntyre et al. (2002)view is thus adopted here, i.e. the two concepts are taken to be similar, and state anxiety is the type of anxiety referred to when discussing FLA. The exploration of emotions and perception of the VLE through TAP will thus help deepen our understanding of what is happening in students’ mind when studying at a distance. This leads us to the way the students perceive the VLE can impact on oral production.


The VLE is the second variable investigated in the present study through TAPs in relation to pronunciation learning and includes both CALL when learning in solo mode and CMC when attending online tutorials. Warschauer (1998) provides us with a useful review of CALL, from behaviouristic to integrative in his overview on computers and language learning which then moved on to CMC. For the purpose of this research, the succinct definitions that Levy and Herring provided were adopted, namely that CALL is ‘the search for and study of applications of the computer in language teaching and learning’ (Levy,1997, p. 1). CMC, on the other hand is viewed as ‘communication that takes place between human beings via the instrumentality of computers’ (Herring, 1996, p. 1). CMC was first limited to text, until audio-conferencing became available in the mid-nineties, followed by video-conferencing which is now commonly available and used in language learning.

When investigating language learning supported by CALL, some studies have addressed the use of computer software to teach pronunciation to adult learners, and these software packages were particularly useful when teaching and giving immediate feedback on the prosodic aspects of the language (Moyer, 2004; see also Blake, 2011 and Demmans-Epp and McCalla, 2011 for the latest developments). However, Weinberg and Knoer (2003) remind us that there are very few studies which combine the exploration of French as a second language and the efficiency of these pieces of software.

Given the significant changes in the delivery and support of language learning with an increased use of CMC over the last two decades (Jauregi et al., 2012), it is essential to include the learning environment (virtual with others or on one’s own) as having a potential impact on learners’ emotions and how these can impact on their learning of pronunciation. It is interesting to note that researchers have already linked some aspects of affect such as, for instance, motivation with the learning environment. For example, Dörnyei and Ushioda stress the impact of motivation on classroom learning and the importance this variable acquired in studies published in the 1990s (2009, p. 29). In the same vein, Benson and Nunan (2004) state, when discussing learners’ diverse learning outcomes, that the context in which the learning process is situated should always be considered in language research.

Ellis (2004) talks about the need to acknowledge the situated nature of L2 learning which is influenced by the setting in which the learning event is set. It is therefore essential to find out whether the virtual learning environment and practice on one’s own has an impact on the pronunciation of learners by causing or alleviating high levels of anxiety. To be more specific, we need to find out whether the medium of a computerised environment adds or even generates stress (e.g. in real time) or decreases it (e.g. when listening to instruction in a recorded session). Indeed, in Hurd’s study it was found that ‘the distance factor was associated with additional specific anxiety-provoking elements’ such as, for instance, ‘the complexity of all the technology’ (2007b, pp. 495 – 497; see also McInerney et al., 1999 for their Computer Anxiety and Learning Measure). A salient question would be whether learners are actually learning anything or whether they are too busy coming to terms with the medium because of lack of training (Levy and Stockwell, 2006; see also Ko, 2012).

Hurd also found that anxiety can sometimes be reduced thanks to the relative anonymity that online learning can give (2007b; see also Ko, 2012; Murphy et al., 2012; Rice and Markey, 2008; Hauck and Hurd, 2005), although it is unsure at this stage whether this reduction in anxiety would also translate in other non-virtual settings where communication takes place (Arnold, 2007). Other studies have not found a reduction in nervousness when interacting through video-conferencing and other online environments (Eneau and Develotte, 2012; Jauregi et al., 2012; Hampel, 2006). Hampel et al. (2005) found in their study that the loss of embodiment in the VLE and the unfamiliar medium induced anxiety for some students but that not being physically in front of students reduced anxiety for others. Moreover, in her study on EFL learning through an online multimedia environment as part of blended learning resulted in students improving all their language skills including pronunciation of the foreign language (Bañados, 2006), as shown in the results obtained by her students in the diagnostic and final assessment tests. The findings from TAPs in the present study may temper such positive view of the use of the VLE in language learning.

Think-aloud Protocols

The use of TAPs is particularly appropriate to investigate FLA related to an aspect of language performance and is in line with MacIntyre and Gregersen who contend that there is ‘something missing’ in qualitative research to date, namely ‘the need to describe the underlying mechanisms that connect affect in general, or anxiety in particular, to language performance’ (2012, p. 108), which is precisely what this study is about. We are looking here at the processes underlying affective responses to a language performance activity in a new learning environment, so a process-oriented research tool such as TAPs is needed. Self-reporting tools such as questionnaires are useful, but only give partial insights, and interviews can be marred by bias. TAPs are also particularly relevant to the sample made up of distance learners practising pronunciation in solo mode as it would be difficult to observe those processes otherwise. Gathering data through TAPs can also be validated by other studies in which they were used.

For instance, Baralt and Gurzynski-Weiss (2011) mention as a limitation of their study on state anxiety in CMC that using a method such as TAPs to explore students’ processing would have deepened the understanding of the variables they had investigated. Moreover, Young (2005) states that another area of research that she believes will benefit from the use of TAPs is the processes a student goes through when learning in the VLE. She adds that it is one of the research methods she selects when exploring learning in a web-based context. TAPs are thus a very useful tool to explore cognitive and affective processes which can add depth to a piece of research.

Admittedly, there are methodological criticisms that have been made of TAPs. The three major criticisms are reactivity (Bowles, 2010; Dörnyei, 2007; Leow and Morgan-Short, 2004), automaticity (Russo et al., 1989) and addressivity (Sasaki, 2003; Ericsson and Simon, 1998; Smagorinsky, 1998). These criticisms cannot be ignored as Bowles makes clear: ‘It is critical to determine whether (or to what extent) verbalising while completing a language task actually reflects (or alters) natural thought processes’ (2010, p. 2; see also Deschambault, 2012, for a review of the possible effects of TAPs on information processing). For these reasons, it is important to ensure validity and avoid the reactivity that a TAP may produce.

In his study of the cognitive processes of business auditors where he used TAPs to collect data, Russo defined automaticity as ‘a term applied to sequences of observable task behaviours that are performed without cognitive mediation’ (1999, p. 5). Participants in TAPs studies are likely to find it difficult to actively notice processes they are engaged in if these are automatic, and are not in these circumstances able to articulate what they are doing and how they are doing it. However, as Hurd (2008) states, quoting Ericsson and Simon (1980), this potential adverse effect can be reduced by making sure that the TAP activity has a good degree of complexity which will not be automatically completed but will involve actual thinking by the student. This was the case in the present study because the participants were doing something entirely new (pronouncing totally unknown words to beginners) with an entirely new instrument (the French Phonemic Chart), so automaticity was not found to be an issue here.

The third methodological criticism involves addressivity. Indeed, referring to Bakhtin’s (1984) notions of ‘dialogicality’ and ‘addressivity’, Smagorinsky (1998) comments that any speech event is socially grounded, in other words, that it must be addressed to someone else, in this case the researcher, whether absent or present in the room. In his study examining the social nature of verbal reports, Sasaki (2003) observes that protocols are socially and interactively constituted, and this fact has to be taken into consideration when analysing Think-aloud (TA) data. He cites the example of specific forms of address in Japanese which denote who the participants thought they were talking to. As the TAPs for the present study were recorded in English, it is less easy to pinpoint issues of addressivity in the transcript as this is not apparent through a grammatical structure but through content words.

On the other hand, Smagorinsky, citing Ericsson and Simon (1998), emphasises that verbal protocols should not be elicited as an ‘act of communication’ (1998, p. 166) therefore minimising addressivity. There are also other steps that can be taken to achieve this. According to Ericsson and Simon (1993, p. xiv), the experimental situation should be in a ‘nonreactive setting’ and should be arranged in a way that would signal to the participant that no interaction with the researcher is intended. In the present study, the experimenter was not even physically present as TAPs were recorded by the participants at home, although a further literature review might show that because distance learners are already familiar with talking to a computer screen, the issue of addressivity might reappear in a somewhat different form. By carefully controlling the interaction, i.e. not making it into a communicative event proper, addressivity, if any, was considerably reduced.

At this point in the discussion, a distinction needs to be made between two sorts of TAPS. Citing Ericsson and Simon (1993), Bowles makes the difference between TAPs where the participants merely talk about what they are thinking or doing at the time of recording and those where they are asked to articulate further information, such as clarifications and explanations (2010). The former is referred to as non-metalinguistic (or non-metacognitive when dealing with non-verbal tasks) and the latter ones as metalinguistic (or metacognitive). One of the first issues that arose was deciding which of these was going to best address the research questions for the present study. Given Bowles’s contention that metacognitive reports may actually impact on cognitive processing because of greater reactivity, it was decided to use non-metacognitive TAPs as explained in the next section.


The research was conducted with a cohort of students from the Open University, United Kingdom, and in a particular context, a Beginners French course with blended tuition (L192) written for distance language learners. The study investigated the possible relationships between FLA, learning strategies, the VLE and phonological attainment. This paper reports on some of the qualitative elements of the study related to FLA and the VLE, which addressed the following research question arising from a review of the literature: ‘To what extent can TAPs be used as a valid research tool to investigate FLA that distance learners may experience when practising pronunciation in the VLE?’

Part of this study depicted a descriptive model using a questionnaire which yielded detailed information from students about learning pronunciation both in the virtual classroom setting and when practising on one’s own. The online questionnaire was sent to half the cohort of L192 students chosen at random by the Open University Student Research Project Panel (n = 590). The sample for this stage and all subsequent phases of the study was self-selected, and 87 students chose to respond. Out of the 53 respondents who had completed the questionnaire and agreed to take part in the next phases of this project, a total of 9 completed the TAP activity, 5 of which were useable. No incentive to complete the TAP was offered, although it was hoped that students would gain some enjoyment and further learning through expressing their own learning strategies and emotions in relation to the learning the pronunciation of these new words. They also obtained access to the French Phonemic Chart, a useful learning tool not yet available elsewhere (see Appendix 1). The representativeness of the whole study sample was established through a correlation of demographic values between the sample and the whole course cohort and was found to be .983, showing that the sample was representative of the population studying that course (n = 1272). The 5 student TAPs used in this paper were part of this representative sample.

For the TAP activity which participants recorded themselves,10 French words were chosen above their level to encourage learners to ‘engage in strategic reasoning, either describing their thought processes in advance of pronouncing a word, or retrospectively justifying their pronunciation of it’ (Woore 2010, p. 13) and thus minimizing automaticity. The phonemic transcription appeared next to each word as it does in the dictionaries used by students. In addition, the Phonemic Chart designed as a distance learning tool was given to the participants to be used as an aid to overcoming difficulties with some phonemes contained in the list of words (see Appendix 1). Students were able to identify the phonemes they found difficult by looking at the phonemic transcription next to the 10 given words. The participants were asked to record their thought processes and emotions as they endeavoured to pronounce words from the list with the help of the interactive Phonemic Chart.

As think-aloud techniques are not a natural process, Dörnyei recommends training participants well before they engage in the task. In the present study this was carried out via a small recorded demonstration session using a web-conferencing system, Elluminate (now Blackboard Collaborate) used by the Open University, for all participants to listen to before they engaged with the task. A demonstration also showed them how easy it was to do and how useful it was for them in terms of encouraging greater self-awareness, and ability to monitor and self-evaluate their performance, all very important metacognitive skills. Very clear instructions similar to those given in Bowles (2010, p. 114) and Hurd (2007a, p. 247) were given to participants (see Appendix 1) following Bowles’s advice to provide them with clear instructions and a rationale for taking part in the project.

A warm-up task was also provided to familiarise the participants with the procedure, in the form of an arithmetical problem. A verbal task more similar to the actual task could have been chosen but there are advantages and disadvantages in both cases (Bowles, 2010). Bowles maintains that the advantage of an arithmetic problem is that the nature of the practice does not have any bearing on the TAP activity itself, but the disadvantage is that it may be difficult to go from a numerical warm-up to a verbal activity. On the other hand, the advantage of a verbal warm-up is its similarity with the actual activity so it may be easier to go from one to the other, but the disadvantage is that the verbal warm-up may prime the participants for what is being investigated if not designed carefully. The instructions and the warm-up task were both essential preparation for the participants in order to make it easier for them to talk freely and to understand that they were not talking directly to the researcher, as this would result in issues of addressivity confusing the data (see above), but just talking, saying whatever came into their heads as they tackled the pronunciation activity. The return consisted in 9 recordings, 4 of which being unusable as the participants had not provided what can be called a TAP, and 5 which were then transcribed for reasons given above.

Data Analysis

The TAP recordings were transcribed and the transcripts were uploaded onto NVivo 9 and coded to basic pre-defined nodes. A snapshot of nodes on an NVivo 9 screen can be seen below in Fig. 1.

Figure 1 – Snapshot of nodes on an NVivo 9 screen

In a similar way to what was done in Hurd’s study (2007), the coding process was informed by various paradigms – affective factors and learners’ perception of the VLE replacing Hurd’s metacognitive knowledge in the present study. The comments were then organised against pre-determined tree-nodes covering the main categories: two variables and pronunciation knowledge (see Fig. 2). These in turn were divided into sub-nodes to reflect more closely what was being said. However, further nodes had to be created to reflect the range and depth of the data. There was thus an element of a top-down approach where the data fitted the categories determined by the two variables, fitting the post-positivist stance adopted here, but the coding was data-driven in some cases, a good example of the interpretivist aspect of some features in the present study. For instance, emotions and thoughts can be both positive and negative, so some of the data obtained dictated the focus of this project through some elements of Grounded Theory (GT) (Charmaz, 2006) as well as through a top-down concept driven approach.

Glaser argues in GT for the delay of a literature review until after the analysis of the data in order to avoid contamination and force-fitting the data into pre-existing codes. However, Thornberg states that it is possible ‘to appreciate extant theories and concepts without imposing them on the data’, adding that any event observed will be influenced by the researcher’s prior knowledge of what is being observed (2012), which is the case here where the concept of FLA in a conventional classroom is extended to the observation of it in the VLE. The present study follows the precepts of abduction where theory is the driver of inspiration to enable the researcher to find patterns in the data (Thornberg, 2012). Only aspects of GT have been applied here, such as obtaining multiple viewpoints of the same learning event, as was the case for the 5 TAPs used here.

The analysis procedure thus involved assigning segments of the protocols to a standard set of variable categories and sub-categories as well as considering a broad set of new ones emerging from the data (Conrad et al., 1999).

Figure 2 – Coding scheme for the TAPs


As a reminder, the sample was made up of five participants who had various levels of pronunciation ability and who carried out a task as described in Appendix 1.

Results from TAPs

The TAPs yielded extremely valuable data which was organised into the following categories using NVivo. Nodes relating to FLA accounted for 52.1% of the coded data, a sizeable proportion indicating the pertinence of choosing this variable as one of the main strands of this study. Table 1 is from a screenshot of an entire TAP transcript coverage and shows emotions and evaluative comments:

Table 1 – Node structure of TAPs analysis

FLA was evident from the start of the task as evidenced by negative assumptions about the difficulty of the task for the participants:

‘Let’s start, mmh, right, it all looks very complicated’.

Participants sometimes expressed their concern which may also have indicated anxiety by using the following terms:

‘Ah, a squiggle with a top on, oh, dear’.

Ooh that’s tricky, I am going to listen to that again’.

Oh gosh I’m looking through here and I can’t see the ‘c’ now, must be there somewhere, I can see the inverted ‘c’ but I can’t see the ‘c’ the correct way round unless it’s , I missed it somewhere in all of this.  Oh Lord, ssss, that’s backwards! …, no, no, flicked over the wrong one there (clicks)’.

Non-verbal interjections such as sighing or nervous laughter also suggested a degree of FLA:

‘No it isn’t that, no, I have gone for the wrong one, er, there, haven’t I? (sighs)’.

In addition, anxiety was sometimes evident in their reluctance to have a go or in their hesitation:

‘[…] although I’m very tempted to pronounce the ‘n’ in the spelling’

‘Right, looks (sighs) … right, looks as if, mmh’.

Foreign Language Anxiety

Anxiety was sometimes apparent at the very moment students had to pronounce the words after having tried to learn the correct pronunciation using the Phonemic Chart:

‘[…] gwafra’ for the third one although I am very doubtful (laugh) about that so let’s move on’.

‘I’m not sure if am I speaking properly or not’.

They sometimes used a lot of hedging and thus ‘prepared’ the listener for the approximate pronunciation that was coming:

‘Next one, (sighs) to someone English it looks really funny like ‘gwafrez’’.

‘Mmm, well this is going to come out as ‘museau’’.

Apologies were heard, though not always explicitly, after their performance for its perceived or genuinely poor quality:

‘So I think that’s about it although I do apologise for taking so long to actually, er, complete the activity’.

Perception of the VLE

The TAPs were more useful in yielding data relevant to the first research question concerning FLA. However, the few comments from the five participants relevant to the VLE supported the quantitative data from the questionnaire which will be reported on in separate paper: technological problems, not being able to hear properly or the flash system on the chart which highlights the phoneme when the participants point their mouse at it, as follows:

‘Right, well I’ve finished now. And, that was quite difficult, er, and I would prefer to listen to somebody speaking so that I could actually repeat it’.

‘Every time I move my pointer it brings up something else on the computer screen’.

‘OK, I’m just having more problems with the sound, I am going to check that it is still recording and it is signalling that it is recording so I can go back’.

These negative reactions to the VLE and electronic tools appeared to be related largely to difficulty using the tools and did sometimes change with time:

‘Having looked at it a little bit longer… it has been quite useful, so, thank you!  I’m about to print the phonetic chart, if my printer works, right, ok, pause recording’.


This project explored learning pronunciation from two different perspectives: emotions and thoughts and the VLE. An analysis of the data enabled a comprehensive picture to emerge which has not only considerably added to our understanding of the interaction between FLA and the VLE as a situated context of studying. It has also established TAPs as a very useful research tool. The study was conducted with distance learners who study pronunciation outside the classroom, both on their own and online. It investigated a new learning environment which is relatively absent from previous research, and pronunciation, an under-researched sub-skill in language learning.

FLA featured in data from the TAPs, perhaps not so much in the number of comments or reactions but in their intensity. Some mixed findings resulted from those instruments, for instance, that the online environment helped the learners who did not like to speak in a conventional classroom setting. This is probably one reason why learning on one’s own was also mentioned, an environment where a marked decrease in anxiety surpassed the disadvantage of a lack of feedback. FLA was experienced not so much when having to pronounce words in front of others, but more in terms of accurate performance. Dealing with uncertainty and the lack of feedback on pronunciation as a distance learner also frequently featured in the protocols. The TAPs thus put some meat on the questionnaire answers as FLA could be observed in action, not just as self-reported data.

Other types of emotions were also displayed in both quantitative data and open-ended responses, some of them being positive, such as excitement (39.1%), contentment resulting from positive self-evaluation of their performance (72.4%) and appreciation of the Phonemic Chart as a useful learning tool. Some of the participants showed negative emotions such as great frustration, either with the chart or with the online environment itself, embarrassment (37.9%) and lack of motivation resulting from time-wasting in online tutorials, all of which could be usefully explored in a follow-up study.

Given that more than 60% of the data shows that participants displayed some form of FLA, practitioners could perhaps help their learners to apply affective strategies in order to lower their anxiety and embarrassment during tutorials. Affective strategies in the form of self-encouragement and meta-cognitive strategies, such as resourcing help, organising one’s learning and self-evaluation, were also used but to a lesser degree. Metacognitive strategies were only noticeable when participants evaluated their performance, either to reassure themselves as perhaps an unconscious affective strategy or because there was no teacher feedback available to them. The TAPs thus yielded extremely useful data on a wealth of information which would not have been possible to obtain through a different instrument, as the data reflected what actually happens when students learn on their own.

It had been planned to use a control group, taking account of Leow and Morgan-Short (2004) who argue for including a group who would not be performing a TAP whilst doing the activity in order to measure the possible impact of thinking aloud on reactivity. On reflection, and given that the task was of a non-metacognitive nature (i.e. participants were not being asked to justify or expand on their thoughts), there was no need to use a control group as non-metacognitive verbalisations have not been found to shape cognitive processes (Bowles, 2010). As found on a first listening of the TAPs, participants’ utterances tend to show latency in terms of the time it takes them to complete the activity as the further information required obviously increases the overall time taken by the participants. (Bowles, 2010). Indeed, four participants whose TAPs were not used in the study merely read the words and completed the task much faster. However, latency was not an issue for this task because participants were not being marked or timed for their performance.


The present study has identified and explored various concepts relating to the learning of pronunciation outside the classroom. It has done so through a mixed-method research approach which made for a more robust triangulation of data, as suggested by Horwitz (2001; see also Benson, 2004). FLA was confirmed as an essential variable to investigate in relation to the pronunciation of a foreign language. Although the VLE provoked strong reactions, these were not altogether negative and some advantages to the medium were mentioned by learners who had stated that they did not enjoy using the VLE for pronunciation learning.

This study fills a gap in language learning research on several grounds. The acquisition of pronunciation of a second language has and continues to be under-researched. From a learner’s point of view, good pronunciation is important as intelligibility is natural not only as a goal per se but also in terms of the perception of these L2 speakers when attempting to use the language in a non-pedagogical context  (Dlaska and Krekeler, 2013; for a review of the intelligibility construct, see Munro, 2011; see also Hahn and Watts, 2011; Levis, 2011). This view was alluded to in the TAPs. Secondly, the learning environments highlighted in this research are relatively new, and the type of learners who took part, i.e. distance learners studying pronunciation in the VLE and on their own, are not, to the best of our knowledge, the object of any studies. Previous studies have covered some of the topics explored here, but not together and not in relation to pronunciation learning (see, among others, MacIntyre and Gregersen, 2012 and Horwitz, 2001 on FLA; Kenning, 2010 and Hampel, 2003 on the VLE). Some studies have covered the variables investigated in the present study together, for instance, studies with distance language learners have covered FLA and strategies but not dealt explicitly with pronunciation (Hurd 2008, 2007a, 2007b; White, 2006; Hauck and Hurd, 2005). Studies with conventional classroom students have dealt with pronunciation learning but not in a distance setting (Moyer, 2007, 2004; Flege, 2003; Flege et al., 1995). In addition, FLA as a variable has often been linked to the acquisition of speaking skills, but not with pronunciation as a sub-skill.

Although OU Associate Lecturers are provided with full teaching resources, there is a degree of freedom as to how they choose to use the materials. It has always been left to the tutors to adapt the suggested exploitation of them to address learners’ needs with regard to FLA or other issues their learners may have. This study has highlighted that the VLE has many advantages for all learners for learning pronunciation as some prefer the anonymity it allows, its practicality and the opportunity it gives them to practise pronunciation and obtain immediate feedback. There was also a marked preference for tools which help learners to learn pronunciation on their own, which is not surprising given that some learners will have chosen to study a language at a distance because a conventional setting did not suit them. Language course designers would therefore do well to devise tools, exercises and activities which explicitly seek to help distance learners to practise pronunciation outside the classroom.

In conclusion, the present study achieved its aims on several grounds. The use of TAPs was original and relevant to the sample and the variables being investigated. It offered a unique contribution to the field in the form of an investigation of the learning of pronunciation in the distance context from different perspectives. Third, the research questions were prompted directly by the changing learning environment for distance language learners, clearly linking this study to current educational practice. Indeed, this research had a direct application in the design of part of a course to meet the needs of distance learners: following a preliminary analysis of the results, some of the findings were applied to the writing of the online pronunciation manual for the new edition of the Beginners French module L192, including some strategies to address FLA issues and improve the learning of pronunciation in the VLE.

Through an exploratory investigation, this study sought to deepen our understanding of learners’ emotions and thoughts and the experience of learning pronunciation at a distance. As technology continues to undergo rapid change, so do pedagogical applications to language learning. The findings from the present study offer an invaluable resource for providing insights into what it is like to learn pronunciation outside the classroom, and what learners do to cope affectively and cognitively with a challenging sub-skill in a learning environment fast becoming the norm in language learning.

TAPs are a unique way to tap into students’ perspective on learning events. They allow the researcher to delve into both cognitive and emotional experiences as was done in the present study. To use a simile from Stenhouse in which he contrasts quantitative and qualitative analysis: ‘The contrast is between the breakdown of questionnaire responses of 472 women respondents who have had affairs with men other than their husbands (quantitative) and the novel Madame Bovary’ (1983, p.6). The data obtained through TAPs sheds so much more nuanced light on the student’s experience compared to the statistics obtained through questionnaires which will yield featureless data. Although there may be some limitations to the generalizability of data obtained through TAPs, the information thus obtained is rich, meaningful and grounded in the student’s learning experience.


Appendix 1 : Think-aloud Protocol Activity

Cette activité a pour but d’explorer vos sentiments, pensées et stratégies d’apprentissage pendant que vous essayez de prononcer tous les mots sur la liste (ne vous inquiétez pas si vous ne les comprenez pas, ce ne sont pas des mots pour les débutants!) Pour vous aider, utilisez le tableau phonémique de la langue française (qui vous a été envoyé dans un fichier zippé) et cliquez sur les sons dont vous avez besoin pour prononcer les mots de la liste. Au commencement de l’activité, n’oubliez pas de vous enregistrer sur Audacity. Pensez à voix haute, prononcez les mots, aidez-vous du tableau phonémique, parlez de vos émotions, de vos sentiments pendant l’activité, de vos stratégies, de ce qui fonctionne ou pas. MAIS SURTOUT, CONTINUEZ DE PARLER!

The aim of this activity is to explore your emotions, thoughts and learning strategies whilst you are trying to pronounce all the words on the list (don’t worry if you don’t understand them or if you find them difficult to pronounce, these are not ‘beginners’ words!) To help you, use the French Phonemic Chart – showing sounds, not letters (the chart was sent to you in a separate zipped file) – and CLICK on the sounds shown in /blue/ that you need in order to pronounce the words on the list. Think aloud, that is, SAY OUTLOUD WHAT YOU ARE DOING and all that is going through your MIND (in other words, everything that you would say to yourself silently while you think) as you try to pronounce the words with the help of the interactive Phonemic Chart. Apart from the French words, speak in English. Talk about anything and everything that is going through your mind; nothing is too trivial or negative. Talk about your emotions, your emotions as you work your way through the activity, saying what is working and what isn’t working for you and what you are doing to help yourself, etc. Just act as if nobody would listen to what you say and don’t try to explain your thoughts. You can leave a comment at the end on what you thought about using the chart and pronouncing these words. Start recording yourself on Audacity at the beginning of the activity and above all, KEEP TALKING THROUGHOUT!

quignon /k i ɲ ɔ̃/ It might be a good idea to print this page so that you can read the words and open the French Phonemic Chart on your screen to use it. To open it, extract the files from the zipped file you received and click on

Once you start recording, downsize Audacity so that you can see the Phonemic Chart on the screen.  It should look like this:

giclée /ʒ i k l e/
goinfrez /g w ɛ̃ f R e/
guingois /g ɛ̃ g w a/
caleçon /k a l s ɔ̃/
museau /m y z o/
ferreux /f ɛ R ø/
huilage /ɥ i l a ʒ /
poignard /p w a ɲ a R/
veilleur /v ɛ j ə R/

Take as much time as you need. Then, if you recorded it in Audacity, export it with your initials in the file name as an MP3 as you would do for the oral part of a TMA, and send it back as per instructions. Thank you so much for taking part!

