Article body

Shared attention and skilled performance

One of the most interesting questions in human information processing is whether a number of sensory inputs can be processed at the same time, or whether the only way to cope with more than one input is to switch rapidly from one input to the other (Broadbent, 1958). In normal conversation, unlike simultaneous interpretation, the vocalization of one speaker usually precludes that of another and, as a consequence, people rarely talk at the same time. Miller (1963) suggests that this turn-taking phenomenon may be a universal of language behaviour, but that the reciprocity between talking and listening

“is not a necessary consequence of an auditory or physiological inability to speak and hear simultaneously; […] perhaps there is some limit imposed by agility and attention, perhaps some critical component of the speech apparatus must be actively involved in the process of understanding speech.”

Miller 1963, pp. 417-418

In early studies on attention, it appeared that consciousness, or attention, could only be directed to a single activity at a time. Conscious attention to two simultaneously performed tasks was possible only if they were coordinated into a single higher-order activity (James, 1890); or attended to in rapid alternation (Paulhan, 1887; Jaffe, Feldstein and Cassota, 1967); or that at least one of the two tasks was being carried out automatically, without conscious control (Solomons and Stein 1897; Hirst, Spelke, Reaves, Caharack and Neisser, 1980).

In most experiments on selective listening, subjects are usually asked to attend to one of two verbal messages by shadowing it, and to ignore the other.

Several studies have required subjects to perform two simultaneous tasks (Allport, Antonis and Reynolds, 1972; Shaffer, 1975; and Welford, 1968). Allport et al. (1972) reported experiments in which subjects performed two tasks concurrently without any reduction in performance in either task: their subjects were asked to attend to and repeat back continuous speech at the same time as taking in complex, unrelated visual scenes, or even while sight reading piano music. Allport et al. suggest that when the messages or tasks to be performed are highly dissimilar, both tasks could be performed simultaneously. The main difference between Allport et al.’s study and other experimental paradigms (Moray 1969) is that shadowing was one of the concurrent tasks, in other words, one verbal input was externally generated.

To explain this effect, Allport et al. (1972) suggest

“that the difficulty derives not from exceeding the limited capacity of a single general-purpose central processor, but more simply from the difficulty of keeping separate (i.e., of not confusing or confounding) two closely similar but unrelated messages.”

Allport et al. 1972, p. 226

Shaffer (1975) found that a very skilled copy-typist could successfully type high speed from a visual text while doing another verbal task, such as shadow prose or recite, without any impairment of performance. However, since she had great difficulty combining auditory typing with shadowing, Shaffer suggested that interference was greater when response units rather than stimulus units were similar.

Spelke, Hirst and Neisser (1976) had two subjects read short stories while writing lists of words in dictation. After several weeks of practice, they were able to write words, discover relations among dictated words, and categorize words for meaning while reading for comprehension at normal speed. At the beginning of the experiment, when the subjects failed to notice sentences and categories in the dictated lines, it appeared that they were copying the words without processing them to any extent. In this sense, writing might be called ‘automatic.’ But as the demands of the experiment changed, and after the subjects had been given additional practice, they gradually learned to analyze the dictated words semantically as well as detect simple sentential relationships between them. Finally, both subjects succeeded in categorizing dictated words with no loss of reading speed or comprehension, and, according to the authors’ definition, writing was no longer ‘automatic.’ In a limited sense, they had achieved a true division of attention in that they were able to extract meaning simultaneously from what they read and from what they heard.

A more plausible multi-channel processor could deal with two or more tasks at once provided that there is no competition between the tasks (i.e., that the tasks are dissimilar) for the use of one channel, and that subordinate channels have been established through sufficient practice. When, in his experiments involving skilled typists, subjects were unable to combine auditory typing with shadowing, reading aloud or reciting, Shaffer (1975) suggested three hypotheses to account for this inability, namely the pacing factor in auditory typing, the similarity of codes in the auditory tasks, and finally the possibility that the vocal output in the competing task was masking the auditory typing text.

Other researchers offered the following hypotheses: Brooks (1968) found that concurrent vocal activity may be the source of conflict. Crowder (1970) claimed that although there may be some special advantage in receiving auditory input over a channel as familiar as one’s own voice, this active vocalization may in fact make special demands on the subject, demands which are not present during passive or covert vocalization. Finally, Jaffe et al. (1967) pointed out the difficulty of speaking and listening simultaneously in that, although subjects may be able to attend to two voices simultaneously, they will encounter greater difficulty when one of the two voices is their own.

In any discussion pertaining to simultaneous listening and speaking, the automaticity factor cannot be overlooked. A general rule appears to be that once a skill is highly learned, it gradually requires less conscious attention or little allocation of mental effort. Furthermore, highly skilled tasks seem to become automated and thereby not susceptible to disruption because attention is withdrawn (Norman 1976). With sufficient practice, responses can become ‘pre-attentive’ or are referred to as ‘automatisms’ (Neisser 1967).

Although experience and practice may indeed enable a subject to perform two tasks simultaneously, the feat is still considered unnatural. The simultaneity of listening and speaking imposes a severe strain on human channel capacity, which may explain in part why professional interpreters normally ask to work for 20-minute periods only. To avoid the strain of continuous processing in this fashion, it has been suggested that simultaneous interpreters, even with years of experience, make good use of the brief silences in the source language’s input.

“The intermittent silence between chunks of speech in the speaker’s utterance is a very valuable commodity for the simultaneous interpreter, for the more of his own output he can crowd into his source’s pause, the more time he has to listen without interference from his own output”.

Goldman-Eisler, 1968, p. 128

Poulton (1955) compared simultaneous with alternate listening and speaking and found that a significantly greater percentage of words was omitted or incorrectly repeated in the simultaneous condition than in the alternate condition. Barik (1973) investigated the notion put forward by Goldman-Eisler in 1968 and analyzed the temporal characteristics of recordings of source language speakers’ and interpreters’ speech. He concluded that simultaneous interpreters do, in fact, make greater use of source language pauses than would be expected on the assumption that the interpreter’s delivery is independent of intervals of speaking and pausing in the source language speaker’s delivery. However, Barik also noted that source language pauses occur in between units of meaning, and since interpreters are concerned with translating units of meaning as opposed to words, they might be more likely to begin interpreting during such a pause in the source language input. Since interpreters make greater use of source language pauses, they also reduce the extent to which they have to both speak and listen at the same time, which undoubtedly represents very complex processing behaviour (Barik 1973). In the author’s own words,

“It is apparent that in order to achieve any kind of performance level, the T (translator) has to consider units of meaning rather than perform on the basis of a more mechanical word-by-word process. It is thus more appropriate for the T to listen while the meaning unit is being formulated by S (speaker), and undertake to translate it once it is completed”.

Barik 1973, p. 263

In a similar vein, earlier research carried out by the present author, on ‘depth-of-processing’ and interpretation-related tasks such as listening, shadowing, interpreting simultaneously and interpreting consecutively, indicated that recall results were higher after listening and consecutive interpretation – conditions where subjects were not vocalizing and where it was assumed they were focusing their undivided attention on processing the incoming message – than were recall scores following shadowing and simultaneous interpretation. One possible interpretation was that simultaneous vocalization on the part of the subjects interfered in some way with their ability to process material to any great depth, making it a possible source of conflict (Lambert 1988).

Divided attention and simultaneous interpretation

Simultaneous interpretation is a classical case of divided attention in that it involves several different cognitive tasks carried out more or less concurrently. Attention is divided when an interpreter monitors two or more tasks – listening to a verbal message in the source language, and translating it into a target language – while simultaneously monitoring one’s own output and on occasions reading portions of a written version of the original message for clues as to the best match of specific words in the working languages.

Padilla, Bajo, Cañas & Padilla (1995) provide a substantive description of all the ongoing activities:

In other words, the interpreter must be able to hold the new meaning unit in his/her working memory, to access the meaning of the words involved, to connect this new information to information already stored in the long-term memory, at the same time as s/he is vocalizing the translation of the previous meaning unit. This highly demanding task must be performed during a relatively long period of time […] during which time the interpreter must be able to load and unload his/her working memory at a very high speed.

Padilla et al. 1995, p. 62

Studies in experimental psychology have indicated that after a minimum of six months of intensive training in tasks involving divided attention, some human beings can indeed acquire particular procedural skills enabling them to carry out several overlapping and/or concurrent, independent tasks (Spelke et al. 1976; Hirst et al. 1980). Paradoxically, if professional interpreters are asked to consciously focus their attention either to the input or to the output, and thus revert back to behaviour expected of beginners, their performance deteriorates significantly (Lambert et al. 1995).

The ability to have one’s attention divided between different synchronous tasks has been explained by three hypotheses:

  • the extra-effort hypothesis: the increased resources needed to carry out concurrent tasks require an increased effort on the part of the subject;

  • the alternation-of-attention hypothesis: subjects do not carry out the different tasks in a rigorously concurrent way; instead, they learn how to rapidly shift back and forth from the processing of one task to the processing of another;

  • the automatic-mental-activities hypothesis: after acquiring the ability to carry out a task involving divided attention, there is no longer the need to monitor every single mental activity through a central processing system, since some of these activities can be carried out automatically.

Gran and Fabbro (1995) found that for verbal tasks requiring divided attention, and in particular during simultaneous interpretation, untrained subjects tended to alternate their attention by focussing it mainly either on the incoming message or on their own output, at the same time as they increased their voice level, both detrimental to an interpreter’s performance. Hence one reason for the present experiment was to measure beginning interpreters’ ability to interpret simultaneously both with and without visual support. In other words, would the inclusion of a written version of the speech help or hinder future interpretation candidates?

The tasks under study

The present study set out to determine the types of processing involved when subjects perform a) sight translation compared to b) sight interpretation or to c) simultaneous interpretation.

a) Sight translation involves the transposition of a message written in one language into a message delivered orally in another language. Since both oral and visual forms of information processing are involved, sight translation can be defined as a specific type of written translation as well as a variant of oral interpretation.

From a human processing perspective, sight translation appears to have more in common with simultaneous interpretation (Moser, personal communication), given the number of variables involved – time stress, anticipation, reading for idea closure, not to mention the oral nature of the task – factors that are either absent in written translation, or present only to a limited degree.

It is important to define what type of sight translation is involved and to distinguish sight translation from sight interpretation (described below). For example, sight translation can be rendered more or less challenging: an unstressful form of sight translation is where the candidate is allowed approximately ten minutes to read a 300-word passage and prepare the vocabulary. A more stressful variation of sight translation would be where preparation time is eliminated altogether and the candidate is asked to begin translating immediately, without even having the chance to read the document. [As challenging as this may sound, candidates should be trained to perform unrehearsed sight translation in preparation for work as a court interpreter, for example, where documents may need to be translated on the spot before a judge].

b) Sight interpretation – also known as ‘simultaneous interpretation with text’ – is one facet of simultaneous interpretation that is now part of the interpreter-training program at the University of Ottawa and is also used as a selection tool for admission into the interpreter-training program (see Lambert 1991). Sight interpretation – as opposed to sight translation – is one step closer to simultaneous interpretation in that the message is presented both aurally and visually. In this case, candidates are given five to ten minutes to prepare the written version of the message. Then, candidates are asked to deliver a sight interpretation of the text as it is being read to them through headphones. Candidates are urged to follow what the speaker is saying, given that the speaker may depart from the original text from time to time, and not to simply read from the passage as though it were a sight translation exercise.

The use of sight interpretation as a selection device for admission into the interpretation program is somewhat controversial. Some colleagues feel that sight interpretation, as opposed to sight translation, is a difficult task requiring weeks, if not months, to master properly, and that therefore it should be used to train interpreters during the course of the one-year training program, but not as a selection tool. Others feel that if the subject matter is not overly difficult, and if the pace at which the speech is presented aurally to the students is slow, candidates may benefit from having the chance to read the text during the preparation time, and still have the option to read from the written text when interpreting (for those who may be more visually oriented), or to simply ignore the written material altogether (for those who may find it distracting).[1]

c) In this condition, to mark a clear distinction from the two other ‘visual’ conditions, simultaneous interpretation was straightforward interpretation, presented only through headphones, with no visual input of any kind (i.e., no written speech and no videotape).

A mere handful of studies in earlier research have examined sight translation per se: Moser-Mercer (in press), Viezzi (1989), Howard (1986), Weber (1990), and Just and Carpenter (1987), only two of which will be discussed here.

Based on her own experience as a teacher of sight translation and as an observer of colleagues sight translating professionally, Moser-Mercer concluded that for sight translation,

“beginners tend to assign a semantic and referential interpretation to each word as soon as possible as the words are encountered from left to right. More experienced interpreters adopt a non-linear approach, gathering semantic information on subject, predicate and object, for example, before beginning with their translation and supplementing the initial information as they go along, their approach being meaning-driven.”

Moser-Mercer, in press

Moser-Mercer found that the speed of delivery was about 60 words per minute for beginners, and 115 words per minute for professionals. Professional interpreters shifted quite easily from the written to the oral medium, whereas beginners continuously felt confined by the original written material. While students rarely added an additional word to their translation, professionals made certain additions to the original text, such as qualifiers or connectives not present in the source language message. With regard to mistakes made, professionals hardly ever misread the text, whereas students did so regularly. Furthermore, Moser-Mercer speculated that contrary to simultaneous interpretation, sight translation operates on distinct (input) and oral (output) channels and that the two are separate enough to prevent interference, thereby corroborating Shaffer’s (1975) earlier suggestion that interference is greater when response units rather than stimulus units are similar.

Mauricio Viezzi (1989) used information retention as a means of determining the mental processes activated during sight translation. In Viezzi’s experiment, information retention rates were measured following listening to a text in a foreign language, reading a text in a foreign language, sight translation from a foreign language into Italian, simultaneous interpretation from a foreign language into Italian, the foreign languages in this experiment being French and English.

In Viezzi’s experiment, retention rates after sight translation were lower than retention rates after simultaneous interpretation. According to the author, this unexpected finding may perhaps be explained by Craik and Lockhart’s (1972) depth-of-processing theory, which claims that information retention is a function of processing time and depth. In sight translation, information is constantly available to the interpreter who does not need to process the incoming information chunks, storing them for some time before articulating the translation. In simultaneous interpretation, the form in which the message to be translated is presented imposes on the interpreter a heavier storage burden leading to longer and deeper information processing. This is not the case in sight translation, which may explain the different retention rates observed in the two translation tests.

Of the three tasks, namely sight translation, sight interpretation and simultaneous interpretation, the question is whether performance is enhanced or hindered by the visual presentation of the material to be interpreted.

If the visual input (typed speech) matches what the interpreter hears (i.e., if the speaker does not deviate too significantly from the written version of the speech), one would hypothesize that for most interpreters, the visual input available for sight interpretation would actually enhance the translation performance. But the opposite could also be expected: if interpreters are juggling with two tasks during simultaneous interpretation (namely listening to the speaker and monitoring their own output), one can imagine what a third task (visual processing) could do in terms of interference. But then again, visual processing does not conflict with the aural/oral dual processing of interpreters during simultaneous interpretation. Thus, there may be more interference when the tasks are similar (listening to the speaker and listening to one’s own performance during simultaneous interpretation) than when one has to tap on different processing organs in sight interpretation.

The experimental study

In this experiment, an interpreter’s performance was measured during sight translation (ST), sight interpretation (SIT), and simultaneous interpretation (SIM). It was hypothesized that performance scores for the second condition, SIT, would be higher, albeit not necessarily significantly so, than for SIM. Furthermore, given the absence of any concurrent activities during ST, it was hypothesized that the highest performance scores would be obtained for this condition. In other words, the best performance scores would result during ST, followed by SIT, followed, in turn, by SIM.


To minimize inter-text variance and provide continuum, one 20-minute speech, that made no demands on subjects’ knowledge of specialized or technical vocabulary, was used for the experiment and broken down into three equal parts as follows (See Appendix for text material):

  1. ST: The first few minutes were used as a warm-up and not recorded; part A was used for sight translation for all 14 subjects.

  2. SIT: For part B, subjects were given a typed text and ten minutes to prepare the speech, by simply reading it. Following the preparation time, subjects were asked to interpret the speech and to focus their attention on the speaker’s presentation rather than on the written text.

  3. SIM: For part C, immediately following the sight interpretation, students were asked to interpret part 3 of the speech. The speech was simply read to them via headphones and no videotape of the House of Commons speech was provided as visual support.


All 14 subjects were enrolled in a three-month “introduction to human information processing course” offered at the University of Ottawa to fourth-year translation students; the selected ones were considered desirable candidates for the interpretation program (for curriculum content, see Lambert 1989). All students were at the same level of translation training and all had three months’ exposure to simultaneous interpretation; all claimed French as their B language, and English as their A.


Subjects interpreted from their B language (French) into their A language (English) only, and all subjects were recorded simultaneously to ensure uniformity. Thus, all subjects were given a five-minute warm-up period during which they could either listen to, shadow – i.e., simply repeat in the same language – or interpret the introductory paragraphs of the speech. Most students opted to listen to the warm-up section silently to familiarize themselves with the topic.

All students performed the three experimental tasks in the same order, namely ST, followed by SIT, followed by SIM. The whole experiment lasted approximately 40 minutes, with only 15 minutes of actual performance being recorded, which is well below the 20-minute fatigue limit of most interpreters. The rest of the 40 minutes was either a warm-up or quiet preparation time, and no overt demands on the subjects were made.

forme: 009352aro001n.png

To facilitate correction, each subject’s output was transcribed and matched against the translation of the original speech (published by R. Hansard; House of Commons Debates Official Report) by three judges working independently. Between-judge correlation across 14 subjects for the coding of performance was .93; all judges were kept blind as to the conditions and purpose of the experiment. A more detailed description of the procedure followed to arrive at such a categorization can be found elsewhere (Kraushaar and Lambert 1987).


Since this is a pilot study in a different research domain, and given the limited number of subjects involved, the following results should be considered mainly as suggestive trends for further research. The main information on subjects’ performance under the three different conditions is presented in Table 1.

Table 1

Performance Rates Following Three Conditions

Performance Rates Following Three Conditions

-> See the list of tables

The data presented in Table 1 indicate that both Sight Translation (ST) (⊠ = 82.43) and Sight Interpretation (SIT) ( ⊠ = 82.00) performance scores were significantly higher than Simultaneous Interpretation (SIM) (⊠ = 69.57) (t = 2.43; d.f. = 26; p [ .05), and (t = 2.29; d.f. = 26; p [ .05), respectively.

Table 2

Mean Scores and Standard Deviations Following Three Conditions

Mean Scores and Standard Deviations Following Three Conditions

-> See the list of tables

Performance rates following sight translation (82.43) and sight interpretation (82.00) were virtually identical, suggesting that the added feature of having to interpret an aurally presented message may not interfere. The significantly lower performance scores for SIM may also lend support to Shaffer’s (1975) notion that interference is greater when response units rather than stimulus units are similar.

The high performance scores obtained following condition I (ST) came as no surprise, since in earlier studies (Gerver 1974, Lambert 1988, Viezzi 1989) it was found that the more attention subjects can devote to input processing – without having to share their attention between multiple tasks as in Condition III (SIM) – the more deeply the information is processed and hence, the better the recall following the condition. But how does one explain the fact that scores for SIT were as high as those for ST?

The fact that subjects did not score as highly in SIM came as no surprise either. Research in experimental psychology, combined with didactic experience in interpreter training, have shown that at least part of the subjects undergoing specific training for more than 6 months in tasks that require divided attention succeed in acquiring particular procedural skills, which enable them to carry out different concurrent tasks, while at the same time they are still capable of deciding whether to consciously check one or more task components by resorting to their systems of awareness (Darò 1995).

So if students were weak in SIM due to lack of experience, how does one explain their high scores for SIT, which provided additional processing for the subjects, namely the visual aspect? If the voice delivering the text closely matches the actual text under the subject’s eyes, little or no interference should occur. However, where problems begin to develop is when the speaker, who is in a rush for example, decides to stray from the script, as invariably occurs in real-life situations, and there is an increased risk of interference for the interpreter. However, interpreters are free to ignore the written text and focus their entire attention on the incoming message; some, in fact, even turn the written text face-down. But even if the speaker were to stray from the written text, the fact that the interpreters have had the preparation time to read the speech from beginning to end, highlight certain terms, untangle the embedded syntax, and know more or less where the speaker is going, the additional rehearsal should only enhance an interpreter’s performance.

Some differences between beginners in this experiment and more experienced students in Viezzi’s (1989) experiment are worth mentioning. Viezzi (1989), who tested students in their 4th year at the University of Trieste, found that retention rates after ST were lower than after SIM. In ST, information is constantly available to the interpreter who does not need to process the incoming information into chunks and store them for some time before articulating their translation. In SIM, the form in which the message to be translated is presented imposes on the interpreter a cognitive challenge that leads to longer and deeper information processing. This is not the case in ST, which may explain the different retention rates observed in the two translation tests (Viezzi 1989).

Still according to Viezzi (1989), another possibility is that the language used in the experimental study may be relevant: in translation from English into Italian, the morphosyntactic differences between the two languages require a considerable effort on the part of the interpreter to transform the surface structure of the message to be translated into the form required by the target language.

According to Viezzi, information (retention) processing is inversely proportional to the extent to which morphosyntactic transformations are necessary. That interpretation is consonant with the results of an earlier study carried out by this author (Lambert 1989), where the subjects were required to translate from French to English, two languages with considerable morphosyntactic differences. The cost in terms of information retention was similar to the cost recorded in the translations from English to Italian in Viezzi’s study.

In other words, still according to Viezzi, translation, whether is be ST or SIM implies a ‘cost’ in terms of information retention, and hence processing. The cost appears to depend on the degree of morphosyntactic transformations rendered necessary by the passage from the source language to the target language. Information retention is not, or not only limited by the translation process as such, but also by the structure of the language to which the process is applied.

Finally, Viezzi claims that the processes of sight translation and simultaneous interpretation are by no means parallel. The different forms in which the message to be translated is presented in the two cases impose on the interpreter different strategies, affecting the way in which information is processed, with obvious consequences on information retention rates.

Several experimental studies on attention (Allport et al. 1972, Spelke et al. 1976, Hirst et al. 1980) suggest that during the initial acquisition stages, attention has to be consciously activated and devoted to the different skill components. It is during these initial stages that beginners tend to make more mistakes in their output. Later on, students learn to develop a certain degree of “automaticity” to reach a point where, like professional interpreters, they need not concentrate so intently on the procedural (mechanical) aspects of the task and are able to concentrate on the incoming and outgoing messages.

A theoretical explanation for this kind of typical development during the acquisition stages of simultaneous interpretation is now available: Since simultaneous interpretation basic strategies are procedures, they are very likely to be stored and organized in implicit memory systems. Thus the activation of conscious attention tends to hamper their smooth functioning by calling other systems into action, which are unnecessary and disturbing (Darò 1995a, 1995b).

In conclusion, sight interpretation could be effectively used as an intermediate step, as if it involved ‘training wheels’ (Dejean-Leféal 1997), before weaning students off the visual support and letting them try simultaneous interpretation without text.