Article body


The research on working memory will contribute greatly to interpreting studies in that it can offer important clues to account for various cognitive issues involving simultaneous interpreting. It has been established that the interpreting task has a significant relation to the Listening Span Task and that interpreting performance is influenced by working memory (Osaka 1994). Recent contributions to interpreting studies by researchers of working memory (e.g., see the papers of ASCONA II conferences in the journal Interpreting Vol. 5 No. 2, 2000/01) are a very promising sign to further the research on the cognitive aspects of interpreting. This paper will try to shed light on some of the cognitive constraints of simultaneous interpreting based on recent developments in working memory research.

Articulatory Suppression and Simultaneous Interpreting

Simultaneous interpreting is often referred to as ‘listening and speaking concurrently’ or ‘holding the spoken message while simultaneously formulating and articulating the translated message.’ In experimental psychology, the method requiring the subjects to vocalize a single word ‘the’ or ‘bla’ repeatedly while reading a text or listening to a speech is called ‘articulatory suppression’ or ‘concurrent articulation.’ Articulatory suppression is known to interfere with comprehension or recall by preventing subvocal rehearsal (Baddeley et al. 1981). In interpreting studies, producing the target language while listening to the source language is considered to be a kind of articulatory suppression, which may exert a negative influence on the recall and comprehension of interpreters. According to Hulme (2000), simultaneous interpreting ‘amounts almost exactly to what is referred to as articulatory suppression in studies of short-term memory.’ Many researchers have focused their attention on this aspect of simultaneous interpreting (Daro and Fabbro 1994; Padilla, Bajo, Canas, and Padilla 1995; Isham 1994 and 2000; Chincotta and Underwood 1998; Hulme 2000; Bajo Padilla and Padilla, 2000; Shlesinger 2000).

Indeed, articulatory suppression does have a negative impact on verbatim recall (Isham 1994; Daro 1994). Additionally, as Shlesinger (2000) points out, although some form of rehearsal may be possible even when subvocalization is prevented (Vallar and Baddeley 1982), additional cognitive demands such as retrieval and inference may deprive interpreters of the opportunity of covert rehearsal. However, ‘the consequences of articulatory suppression are not catastrophic in the sense that input material is stored long enough for a translation equivalent to be constructed’ (Chincotta and Underwood 1998). In his recent article, Baddeley (2000) reports that articulatory suppression does have a significant effect, but that it is by no means devastating. The reduction of auditory memory span is from 7 to 5 digits, not more. Furthermore, he indicates that patients with grossly impaired short-term phonological memory and with an auditory memory span of only one digit can typically recall about four digits with visual representation. Martin (1990) also suggests that ‘a great deal of sentence processing can be carried out despite very impaired articulatory and phonological memory capacities’ and that ‘the phonological memory abilities of an adult may represent the residual of a system that was once vital to language processing but that only comes into play in exceptional situations in adult language.’

These findings and the very fact that simultaneous interpretation is somehow possible lead us to the following hypotheses: (1) subvocal rehearsal may not be of much importance to interpreters; (2) interpreters can circumvent the consequences of articulatory suppression by developing some skills or strategies. As Bajo, Padilla, Muñoz, Padilla, Gómez, Puerta, Gonzalvo, and Macizo (2001) suggest, ‘interpreters develop their ability to process information in the working memory in a general way, while their articulatory loop is occupied.’

In either case, simultaneous interpreters must be able to retain information as long as necessary without the help of the articulatory control process (subvocal rehearsal). And professional interpreters seem to be able to do it. However, one should not forget the interference caused by ‘irrelevant speech’ effect (Gupta and MacWhinney 1993), because it is one thing that the rehearsal is prevented by articulating the target language, but quite another that the phonological store is partially occupied by the interpreters’ own speech. It seems plausible that interpreters counter this effect by maintaining robust phonological representations in their phonological store.

The issue of articulatory suppression in simultaneous interpreting is worth continuing investigation. However, the central issue of the process model for simultaneous interpreting may reside not in the concurrent articulation but in other areas.

Memory System and Central Executive

Simultaneous interpreting is a demanding and complex task that makes use of the working memory to its extreme (Osaka 2002). In order to perform this feat, interpreters must undertake various tasks such as listening and comprehension, information retention, retrieval, production, and monitoring almost concurrently. These tasks involved in simultaneous interpretation cannot be handled by the working memory alone. Of these tasks, listening and comprehension are mainly dealt with in the language comprehension system and production is dealt with in the language production system. Both systems are supported by the working memory in normal language processing with the central executive and memory system serving as a ‘working space.’ Language conversion or translation is dealt with by the central executive with the support of the long-term memory. Various information, including the intermediate products of simultaneous interpreting, is maintained in the storage system of the working memory. It should be noted here that what is important in the information retention in simultaneous interpreting is not the performance of the recall (immediate serial recall or understanding of the contents) often measured in the study of the working memory.

It is true that simultaneous interpreting is similar to, although more complex than, the extrinsic load task in which subjects read several sentences while retaining words or digits for later recall (McDonald and Christiansen 2002). It is also similar to the reading span test which is essentially identical to the extrinsic load task since ‘the two tasks require the participants to simultaneously comprehend language while retaining the load of words or digits for later recall’ (McDonald and Christiansen 2002). The major difference between simultaneous interpreting and the two memory tasks lies in the fact that interpreters retain information (semantic, phonological and contextual) as long as they are necessary for interpreting, and after they have produced the translation, the retention of information is no longer required. Interpreters are not, in usual circumstances, required later recall.

Although there are many models proposed for working memory (see Miyake and Shah 1999), the most suitable and promising models that have the potential to explain and account for simultaneous interpreting would be those of Alan D. Baddeley and Nelson Cowan. However, as Baddeley’s recent proposal of adding ‘episodic buffer’ to the existing model indicates, with his tripartite model he has difficulty in explaining the significant but not-so-devastating effect of articulatory suppression as cited above and the data on the recall of prose. Contrary to the expectation of his model, in a recall of a meaningful sentence, a span of 16 or more is possible (Baddeley 2000). While the addition of the new fourth component of ‘episodic buffer’ can provide a better explanation for the concurrent processing of information of different codes, it is still underspecified (e.g., the capacity of the episodic buffer) as Baddeley himself admits (Baddeley 2000) and ‘the relationship between the central executive and the episodic buffer remains sketchy’ (Andrade 2001).

Embedded Processes Model of Working Memory by Cowan

Cowan’s model of working memory is an ‘embedded processes’ model that consists of (1) central executive, (2) long-term memory, (3) active memory: subset of memory in a temporarily heightened state of activation, and (4) the focus of attention, which are represented in Figure 2. It involves all information accessible for a task: (a) memory in the focus of attention; (b) memory out of the focus but nevertheless temporarily activated; and (c) inactive elements of memory with pertinent retrieval cues. Active memory is a subset of long-term memory and the focus of attention is a subset of the active memory. The direction of the attentional focus is controlled by the central executive (Cowan 1999).

To put it differently, “some of the necessary information may be in the focus of attention; some may be in an especially active state, ready to enter the focus as needed; and some may simply have the appropriate contextual coding in long-term memory that allows it to be made available quickly (Cowan 1999). Cowan called his model a “virtual” short-term memory. This working memory has some limits. The evidence suggests that memory activation is time-limited and fades within about 10 to 20 seconds unless it is reactivated. On the other hand, the focus of attention is limited by its capacity to about four unrelated items, though chunking can raise the effective limit (Cowan 1999; 2001). Any information that is deliberately recalled is restricted to this limit in the focus of attention and only the information in the focus is available to conscious awareness and report (Cowan 2001). As the focus of attention is capacity-limited, if information exceeds the capacity, the earlier items in the focus have a higher chance of being deactivated and displaced from the focus of attention (Haarman and Usher 2001). This displacement type of capacity limit is shown in Figure 1.

Figure 1

-> See the list of figures

Divided Attention or Attention Switching

‘Divided attention or attention switching’ has been one of the contentious issues in cognitive science and the controversy has a significant implication in constructing the process model of simultaneous interpreting. It also concerns the training of interpreters because if the dual task is possible only through the practice of divided attention (“Practice makes perfect”), seemingly irrelevant training such as doing mental arithmetic while listening to speech would be justified. In reference to simultaneous interpreting, Cowan (2000/01), based on some evidence, suggests that ‘interpreters are unlikely to share attention adequately between listening and speaking.’ Instead, he argues, interpreters may succeed because (a) part of one task may become automatic, and (b) interpreters may learn to switch attention between the tasks in a more efficient manner.’ In other words, concurrent tasks are made possible by (a) automatization and/or (b) attention-switching between tasks (see also Cowan 1995).

Other Features of Cowan’s Working Memory

Cowan’s model reserves the place for slave systems of Baddeley’s working memory model. The activated elements in the memory roughly correspond to the passive stores (phonological store) and the focus of attention reflects the storage ability of the central executive of Baddeley’s model (Cowan 1995), though Baddeley abandoned the storage capacity of the central executive (Baddeley 1993). Baddeley’s articulatory control process is one type of memory reactivation process and the memory reactivation routines are initiated by the central executive (Cowan 1999). Subvocal rehearsal ‘may serve to reactivate information by recirculating it through the focus of attention’ (Cowan 1999). In a comment on Cowan’s ‘alternative approach,’ Baddeley suggests Cowan’s model is not incompatible with his multi-component model (Baddeley 2003). Taken as a whole, Cowan’s working memory model is to some extent compatible with Baddeley’s model.

Lastly, in Cowan’s model, ‘retrieval means entering the correct item into the focus of attention’ (Cowan 1999). While the retrieval from long-term memory is time-limited because it must be done within the time frame of an assigned task (e.g., retrieval of equivalent expression), the retrieval from activated memory ‘must occur quickly’ because the memory will disappear in 10 to 20 seconds’ (Cowan 1999). Put differently, the transfer of activated information into the focus of attention is rate-limited. Cowan emphasizes the importance of the rapidity of processing in achieving more successful results in working memory span tasks (Cowan 2000/1). The implication for simultaneous interpreting would be obvious. For example, when interpreters have difficulty in retrieving the corresponding target language for some lexical items, or in understanding some segment of the source language, the resulting delayed response would induce an unfavorable outcome, such as the accumulation of unprocessed information, disruption or deterioration of the processing of an otherwise easier segment of the source language at a distance (Gile 1995), or total failure of the interpreting task. If that is the case, it would be desirable for interpreters to keep the delay time as short as possible, and that may call for interpreting strategies or processing strategies of some kind.

I would argue that since an attention-switching hypothesis instead of a tenuous assumption of divided attention is adopted and the functions of slave systems of Baddeley’s model are retained, Cowan’s working memory model has the potential to provide a foundation for formulating an information-processing model for simultaneous interpreting.

Enlarged Embedded Processes Model for Simultaneous Interpreting

The process model for simultaneous interpreting I propose is an enlarged embedded processes model in which the working memory system and language comprehension/ production systems constitute indispensable parts.

Figure 2

FOA: focus of attention LTM: long-term memory

FOA: focus of attention LTM: long-term memory

-> See the list of figures

As shown in Figure 2, the central executive and the long-term memory consist of a part of the language comprehension system and the language production system. Long-term memory includes the lexicon of both source and target languages and automatized conversion (translation) procedures. This graphic representation is quite simple and may seem indistinguishable from the normal language processing system, but it is sufficient for the present purpose.

Current models of working memory rarely distinguish or specify the relationships between the language comprehension/ production system and working memory. However, Saito (2000) suggests that the function of the phonological loop is a part of the language perception and production process. In other words, the function of the phonological loop stems from the interaction between parts of the language perception process and the language production process. He cites Gathercole and Martin (1996) who argue that the phonological store is a pseudo-memory system that makes use of the language perception system. Watanabe (1998) also says that the maintenance and switching of attention as well as the selection of appropriate action and behaviour are required even when there is no requirement for the temporary retention of information. That is, the central executive includes more than the functions intrinsic to working memory. These suggestions seem to support the view that working memory and language processing systems are partially overlapping and closely related.

The central executive is involved in the control of the focus of attention and coordination of the working memory system (Cowan 1995) and does not itself have storage capacity (Baddeley 1993). As indicated above, the central executive structure is also an indispensable component of the language processing system. If attention switching or coordination of tasks takes long to complete, interpreters have to remember memory items longer, risking the loss of information altogether (Towse & Houston-Price 2001). Similarly, if parsing of incoming speech in the language comprehension system or speech planning in the language production system takes longer, it will switch away from other activities that should be completed in a timely manner, risking the breakdown of the overall task (e.g., failure of simultaneous interpreting). If two or more tasks compete with each other in the central executive due to poor coordination, that may cause interference and the degradation of efficiency and behavior. The result of this ‘task-length effect’ may become quite similar to those of the processing capacity saturation described in the Effort Model by Gile (1995).

Nature of Code of Information in Working Memory

The code of information in the activated memory includes both phonological (verbatim) and semantic representations. While phonological information in the activated long-term memory decays unless it is refreshed by entering the focus of attention, semantic information (i.e., word meanings and propositions) is actively retained much longer (Haarmann and Usher 2001; Martin 1990). Semantic short-term memory stores word meanings that are actively maintained until they can be integrated into a meaningful relationship with words later in the sentence. The area where meanings are maintained is ‘in or near the focus of awareness’ (Haarman, Davelaar, and Usher 2003). Citing strong evidence for separate phonological and semantic memory in the working memory, Haarman, Davelaar, and Usher claim that semantic memory is involved in the rapid computation of information, whereas phonological memory is used as a backup system. Semantic STM component of the working memory also supports the maintenance of concepts associated with words. Furthermore, semantic memory or semantic representation includes not only propositions explicitly expressed in a speech but also propositions inferred by interpreters or macropropositions produced by the integration of propositions (Muramoto 1998). Mental model (Johnson-Laird 1983) or situational representations (van Dijk and Kintsch 1983) are also produced through the interaction between the verbal information and the knowledge of interpreters and remain within the working memory (Glenberg et al. 1987). The mental model or situational representations will be renewed throughout the interpreting task until the new mental model becomes necessary. However, one should not assume that these representations are always constructed (Zwaan and van Oostendop 1993). Haarman and Usher (2001) claim that context information needs to be maintained in an active state (and across intervening items) in order to be used in the control (or biasing) of information processed later on. However, if they mean by ‘an active state’ the focus of attention, it would be impossible to hold contextual information or mental model in the focus of attention where many items compete for entry. It is very likely that the mental model or situational representations will be constructed within the activated portion of long-term memory and will enter the focus of attention when needed.

Accordingly, working memory contains multi-modal representations, which include phonological (verbatim) representations of the source language, lexical semantic representation, propositional representation, products of inferences, situational representation or mental model, and surface form of the target language. Working memory thus provides a buffer for language comprehension and production. The buffer might be used as a means of maintaining subsequent words in a sentence while the analysis of an earlier portion is going on. Or it might retain the filler until it is integrated with the gap. In auditory comprehension, the remaining words of a sentence continue to arrive at the ear of the listener even though the analysis of the earlier portion may still be in progress. Thus, such a buffer would be useful whenever sentence processing lags behind the input. The number of words that have to be maintained in such a buffer would depend on how long various processes take. (Martin 1990)

Cognitive constraints involving simultaneous interpreting clearly indicate the importance of proper resource management and task scheduling because simultaneous interpreting must be performed within the limits of the resources of the working memory and language processing system. Specifically, interpreters should pay attention not to overload the focus of attention and the language processing and retrieval should be completed before the activated memory fades away.

A Perspective from SI between English and Japanese

Just as language comprehension does not proceed on-line as successive words occur (Engle and Conway 1998: 75), simultaneous interpreting is not an on-line operation either. It includes various kinds of reversals and modifications such as reversing the order of lexical items to form a grammatically correct expression in the target language and retaining the earlier portion until the grammar of the target language allows its translation during the translation of the successive portions. In some instances, on the other hand, interpreters might produce some target language in anticipation of the following lexical units. These manipulations apply to simultaneous interpreting of all language combinations to a certain degree.

However, simultaneous interpreting between Japanese and English seems more difficult than other structurally similar language combinations. The difficulty arises mainly from the difference of language structure rather than the difference of cultures and other elements though they cannot be underestimated. In simultaneous interpreting, the difference of language structure often taxes the working memory capacity. Mazuka (1998) succinctly summarizes the typological features of the Japanese language as follows.

Typologically, Japanese is a S(ubject) – O(bject) – V(erb) word order, left branching (LB), and head-final (HF) language. The head of a phrase (e.g., NP, VP, AP, and PP) generally comes at its end. In addition, in complex sentences, a subordinate clause precedes a main clause, and in complex NPs, a relative clause precedes its head noun. Thus, when clauses are embedded recursively, the language branches out leftward.

Mazuka 1998

The typological features of the Japanese language mean that if interpreters try to seek a formal correspondence in simultaneous interpreting between English and Japanese, they are required to reverse the word order in almost every grammatical unit. This will put a heavier burden on the working memory of interpreters than other structurally similar language combinations.

Example 1, which is a part of a small corpus of simultaneous interpreting by four professional interpreters (the speech rate was 184 words/m), clearly shows the difficulties caused by these constraints. However, interpreters circumvent many of the difficulties by using ‘translation strategies.’ In the first sentence, the subject noun phrase with a relative clause (a second historic transformation that is now going on) was translated linearly by either repeating some of the lexical items or adopting a different sentence pattern from the original. None of the four interpreters tried to seek a formal correspondence. These are the common strategies to avoid the accumulation of untranslated information in the working memory. However, in the latter half of the same sentence, all four interpreters failed to render complete translations possibly due to the head-final and verb-final word order of the Japanese language and the limited capacity of the focus of attention. In order to achieve a formal correspondence between English and Japanese, the verb phrase (will enhance) must be retained toward the end of the sentence and the three nouns (the status, the power, and the responsibility) cannot be translated before the translation of the head noun (countries). In addition, the head noun (countries) can be translated only after the translation of the PP (with relatively greater economic capability). The four interpreters used some coping strategies, but they seem to have failed to avoid overloading the focus of attention. In all probability, they accumulated the items – [will enhance] [the status] [the power] [and the responsibility] [of countries] – in the focus of attention, displacing some of the items from the limited-capacity focus. This may explain the omission of the lexical items [the status] and/or [the power] from the translations and the appearance of non-correspondence or semantic dilution such as jyuyosei (the importance) and yakuwari (the role) instead of [the status, the power and the responsibility]. However, the fact that the verb phrase (will enhance) was translated by all interpreters in spite of the long delay indicates the likelihood that interpreters may have used a special encoding strategy – possibly semantic encoding or conceptualizing encoding (Funayama, Kasahara, and Nishimura 2002).

Translation strategies are applied either consciously or unconsciously (automatically). The strategies most frequently used would be load-reduction strategies. But interpreters cannot prepare strategies for every syntactic pattern, which is confirmed by Example 2. All four interpreters failed to produce the translation of the beginning portion of the sentence [there is a strong reason to believe].

Interpreters must have put the phrase [there is a strong reason to believe] into the focus of attention, because there are no translation strategies used for this pattern. Then, by adding successive elements (can; in fact; work out cooperative management arrangement) they might have overloaded the focus and displaced the representation of the beginning segment of the sentence from the focus. These examples might support the “tightrope hypothesis” of the Effort Model that claims interpreters are working close to processing capacity saturation (Gile 1999).

Funayama, Kasahara, and Nishimura (2002) analyzed the delay of translation in simultaneous interpreting from English into Japanese and tried to account for the long delay of translation. Based on the assumption that semantic memory or propositional memory lasts longer than phonological memory, they suggest that interpreters can somehow hold conceptualized items irrespective of time distance or number of words interpolated between listening and production. It should be noted that the translation delay of the examples they cite invariably falls within 10 seconds, which might be accommodated by the activated long-term memory in Cowan’s model. However, some of their examples could be accounted for by the manipulation of information in the focus of attention.

In Example 3 above, Funayama et al. argue that what should be noted is that in spite of the time distance of 7 seconds and the incomplete correspondence between ‘related’ and ‘tunagaru’ due to the possible decay in the short-term memory, the translation ‘tunagaru’ retains the meaning of ‘related.’ However, this example could be explained differently. When an interpreter began to utter ‘toiunowa’ (because), she/he retained only one chunk (it’s closely related) in the focus of attention. Thereafter, the interpreter continued to translate the consecutive segments immediately after she/he listened to them, each instance retaining in the focus of attention the same item (It’s closely related) and the phrase that would be immediately translated. Since the interpreter held only two items in the focus of attention throughout this process, she/he might not have had much difficulty in retaining the item ‘related’ or its meaning for 7 seconds.

While conceptualization or semantic encoding seems to provide a good account of some aspects of simultaneous interpreting, it might be argued that conceptualization or semantic encoding, or for that matter pragmatic and contextual information alone, cannot circumvent difficulties arising from simultaneous interpreting between structurally different languages such as Japanese and English. Even if interpreters conceptualize some items and put them into the focus, they may be obliged to retain lexical items that defy conceptualization or categorization in the focus for structural reasons, thus risk overloading the focus of attention and the consequent loss or semantic dilution of the conceptualized item. The processing of verb phrases in simultaneous interpreting from Japanese into English is a typical case which reveals the limitation of conceptualization. Anticipatory rendition of verb phrases before the corresponding English verb phrases appear means interpreters are mobilizing a load-reduction strategy to avoid overloading their memory capacity. Example 4 indicates that the interpreter’s translation of a verb phrase (have been discussing) precedes the corresponding source language utterance [gironga nasarete mairimasita]. If the interpreter had waited for the verb phrase [gironga nasarete mairimasita] which appears at the end of the sentence, she/he would have accumulated the phrases [kaihatuto bunkano mondaini tuite samazamana] in the focus. (The passive voice would not be considered a good choice for stylistic reasons.)

Liu (2000) and Liu, Schallert, and Carroll (2004) report that professional interpreters interpreted ‘continuation sentences’ (i.e., sentences that immediately follow the test sentences, the first three or four words of which were essential for establishing the correct meaning of the sentences) more accurately than the student interpreters. They conclude that professional interpreters have a domain-specific skill to allocate their working memory resources efficiently and shift attention at the right time. However, in light of the extended embedded model, their results can be interpreted differently. Though we agree that professional interpreters in their experiments had ability to allocate their resources efficiently and shift attention properly, the first few words of the continuation sentences may have entered first into the activated memory area in a phonological form while the interpreters were translating the preceding sentences. After finishing the translation of preceding sentences, the interpreters may have focused their attention on these few words that were retained in the phonological memory and continued processing so as not to overburden the limited-capacity focus of attention. Perhaps the domain-specific skills for interpreters may be translation strategies that would reduce the processing and memory load rather than or in addition to the skills of allocating resources efficiently and switch attention properly.

Interpreters make use of a variety of strategies to overcome the task-specific constraints (Kalina 1992). Some of the strategies may be applied irrespective of language combinations and some may be used only in the specific language pairs. These strategies are only the heuristics, but their accumulation will contribute greatly to the improvement of the performance of simultaneous interpreting and to the teaching of interpreting.

Concluding Remarks

This paper has presented a rough sketch of the theoretical framework for the process model of simultaneous interpreting drawing on the research on working memory. The model proposed is merely an interpretive hypothesis and needs more specifications and elaboration in many respects. The model should be described in relation to other existing proposals in terms of similarities and differences. The author hopes that in due course the model would be refined so that it can be tested empirically.