Article body

Introduction: The Absence of Sound from the Study of Audiovisual Translation

While many approaches to audiovisual translation address the relationship between the visual and the verbal, little importance has been attached to the role of the aural dimension in translating multimodal wholes such as films. This article starts from the assumption that the limited attention paid to sound in research on audiovisual translation may be indicative of conceptual and methodological constraints that influence the range in which we can think about multimodal translation as a phenomenon.

In her book Enlarging Translation, Empowering Translators, Maria Tymoczko discusses the epistemological aspect of translation research and argues that the current concept of translation, which is based on the Western notion of translation emphasizing literacy practices and written texts, fails to account for certain types of translation and leads to their marginalization or even exclusion from current theorization of translation (2007, p. 310). As an example of marginalized translation types, Tymoczko mentions translation in various oral contexts, in which the primary criterion of what constitutes a good translation is “a story well and truly told, rather than a close verbal or cultural fidelity, largely because there is generally little or no value accorded to a fixed text per se as there is in literate cultures” (ibid., p. 61). Tymoczko calls for a reconceptualization of translation, an enlarged concept that allows for examination of all translation types (ibid., pp. 55-58) and argues that “research that changes the concept of translation will change how translators, translation scholars, and translation teachers habitually act with reference to that concept” (ibid., p. 314).

Following the ideas presented by Tymoczko, this article sets out to rethink, epistemologically and methodologically, the concept of film translation by discussing the limitations of the current concept and by proposing an enlarged concept of film translation that fully considers both sound and image. Currently, there is no methodology to sufficiently address sound from a translational perspective, which has led to its marginalization as an object of study within translation studies. The present article addresses this methodological gap by proposing a phenomenologically informed method of audiovisual analysis developed within film studies by Michel Chion (1994) as a tool for subtitlers, translation students and scholars. Chion’s model was originally developed for analyzing films, not language or translation, and it has an emphasis on the audio and visual dimensions of film. Its primary objective has been to enable research on film sound, and it does not discuss the verbal dimension of film in any significant detail. However, the study of audiovisual translation has a plethora of existing theories, methodologies and established practices for dealing with the verbal element of film, as well as a growing body of knowledge on the relationship between the visual and the verbal, and in time, this knowledge can be integrated into the larger audiovisual framework developed here. Chion’s model has been chosen for this study in order to draw scholarly attention to the ways in which sound and image create filmic worlds to engage the viewer physically, emotionally, aesthetically and intellectually, and to demonstrate how subtitles influence the way these large audiovisual structures are experienced.

In addition to producing knowledge about the topic of inquiry itself—the role of sound in film translation—this article has an epistemological goal of discussing how knowledge of audiovisual translation is acquired. This goal involves casting aside, temporarily, the dominant notion of text and textuality as the centre of translational thought and approaching sound and image as categories in their own right, as forms of expression that are not reducible to texts. This does not, however, mean restricting sound and image to non-verbal purposes or denying the possibility of analyzing them as texts or codes by semiotic means. Rather, the idea is introduced that a more profound understanding of the role of the aural and the visual in film translation might be accessible by some other means.

Chion’s audiovisual analysis emphasizes the relationship between sound and image, and thus, this article does not address sound in isolation, but in relation to the multimodal whole in which it is embedded. Applying Chion’s method to the context of film translation requires adopting a phenomenological attitude towards the object of study, i.e. casting aside, temporarily, every preconceived idea of what film translation is and concentrating on what is actually seen, heard and experienced in a film. In order to develop an in-depth understanding of how subtitles fit into the film as a multimodal whole, it is necessary to cast aside preconceptions such as “translation is predominantly a verbal activity,” or “dialogue is the most important sound element in a film for the subtitler” (or any other similar preconceptions that translators or translation scholars might have), since such ideas predetermine what we will see and hear in a film and prevent us from seeing the possibilities beyond them. Instead, we should open up towards the experienceable and approach image, sound and word as equal forms of expression forming an audiovisual instrumentarium that can be used to create a whole range of meanings and effects. In this way, it becomes possible to analyze all film styles, including marginal productions employing less conventional audiovisual strategies, from a translational perspective.

From Text to Experience

Translators are increasingly dealing with material that is quintessentially multimodal, i.e. consisting of combinations of different modalities: verbal, audio and visual. Yet the audio and visual dimensions of multimodal texts continue to receive less attention than the verbal dimension in research on multimodality and translation. Yves Gambier notes:

Although many kinds of texts with different types of signs are dealt with in Translation Studies (AV, advertising, theatre, songs, comics), the focus tends to be limited to their linguistic features. There is a strong paradox: we are ready to acknowledge the interrelations between the verbal and the visual, between language and non-verbal, but the dominant research perspective remains largely linguistic. The multisemiotic blends of many different signs are not ignored but they are usually neglected or not integrated into a framework.

2007, pp. 6-7

Patrick Zabalbeascoa, who has developed a model for analyzing multimodal audiovisual texts and argues for “the importance of considering non-verbal items as part of a text rather than part of its context” (2008, p. 37), also adopts a critical stance towards the current patterns of knowledge construction within translation studies by pointing out several obstacles constituting a hindrance to theory formation within AVT:

Another problem in making general claims about translation is lack of awareness of the existence of other text-types or, similarly, when there is an attempt to shove square pegs through round holes. Sometimes the theory is built around a single text or text type, e.g. the Bible. Another damaging practice, especially for a unified account of audiovisual translation (AVT) is to isolate literary theories of translation and non-literary theories, the implication being that it is not interesting or possible to theorise about both at the same time. A similarly problematic attitude is to start at the core, whatever that may be (usually novels, religious texts, legal documents, scientific papers, news reporting), and then use that as an excuse to put off studying more peripheral instances of translation, whatever they may be (e.g. poems, songs, small talk).

ibid., p. 23

Due to the scope of this article, it is not possible to make generalizations about the treatment of image and sound in research on multimodality and translation, but the above quotations from research literature on translation studies illustrate the fact that approaches to multimodality and translation are being criticized for privileging certain text types or certain aspects of multimodal texts at the expense of others. This suggests that the current concept of multimodal translation is too narrow to include audio and visual materials to a full extent. In my view, the subject of multimodality has not been, to date, fully broached within translation studies, which might be due to the fact that the methodologies employed are not adequate for analysis of audio and visual dimensions of multimodal texts. This more general methodological question is related to the theme of this article in that it sheds light on the potential reasons why the role of sound has been absent from the study of film translation. I argue that one reason for the absence of sound is that translation studies is immersed in the paradigm of textuality which tends to see its object of study in terms of language and reading. This means that, in the semiotic sense, multimodal entities such as films or advertisements are seen as texts that can be read. The expressions text and reading are seen metaphorically, implying that encountering a multimodal entity is an activity that, to some extent, resembles reading a written text. This view is also underlying semiotics-based approaches to multimodality and translation, e.g. Christopher Taylor’s multimodal transcription model developed for subtitling (2003, pp. 191-205; see also Taylor, 2009). To what extent, however, do we read images? How about sounds? Is an encounter with moving images that are co-presented with synchronous sound an act of reading? In my view, methodology that centres on the concept of text and reading reflects a narrow conceptualization of multimodal translation that excludes the essential differences between reading words, viewing images and listening to sounds. Films are complex, dynamic audiovisual entities that include moving images and synchronous sound. Watching a film is an experience fundamentally different from reading a written text. As Carl Plantinga puts it,

[t]he phrase “reading a film” mischaracterizes the viewing process as literary, with the effect of distracting us from the medium’s sometimes disavowed quality, namely that film is a powerful sensual medium. Film gains its particular power from its direct appeal to sight and hearing.

2009, p. 112

The textual orientation of translation studies partly explains why the role of sound in film translation has been virtually ignored. Sound has been equated with the verbal content—dialogue, song lyrics—it mediates (sound-as-text) and it has been examined from the perspective of semiotics. What has been ignored is the materiality of the speaking voice, music, noise and silence (sound-as-experience), which belongs to the domain of phenomenology. The materiality of sound—e.g. volume, texture, continuity, speed—always tells more about the sound event than the verbal message the sound mediates. When concentrating on the verbal message sound conveys, some other important functions of sound, such as emotion elicitation, are downplayed. Another reason why research on film translation has largely reduced film to interplay between image and word might be connected to the historical development of film studies. Film studies was long dominated by study of the visual, which together with the dialogue propelled the narrative. Sound was treated as a mere add-on and there was no vocabulary to talk—let alone theorize—about it (Chion, 1994, p. 143). In the recent decades, interest in the study of film sound and the related embodied aspect of cinema has increased rapidly, and nowadays the wealth of research literature on film sound provides a fascinating resource for the interested translation scholar. It is these resources that the present article draws on, in the hope of shedding some light on the role of sound in film translation.

In order to be able to theorize the affective, sensory, and embodied dimensions of the film viewing experience described by Plantinga above, this article adopts a phenomenologically informed film studies approach to subtitling that tries to reach beyond the text metaphor and the related representationalist view of translation based on semiotics. Without suggesting that the semiotic approach should be replaced, the objective of this article is to develop an embodied approach to film translation that is intended to be complementary to it. The notion of embodied experience will be discussed in greater detail later in this article.

Translation from a Film Studies Perspective

There has been surprisingly little interdisciplinary activity between translation studies and film studies despite the fact that the film industry and translation industry are closely interrelated at the practical level. However, there have been some valuable recent contributions towards bridging the gap between translation studies and film studies, among them the notion of accessible filmmaking by Pablo Romero-Fresco (2013), an approach that integrates translation to the filmmaking process itself, and Carol O’Sullivan’s book Translating Popular Film (2011).

One reason for the lack of interdisciplinary co-operation between translation studies and film studies might be that there are differences in the underlying concept of translation between the disciplines, for which reason translation scholars and film scholars/filmmakers focus on different aspects of film. While the methodologies translation scholars usually apply to films (e.g. approaches based on semiotics) emphasize the nature of film as a text and focus on the verbal and textual aspect, filmmakers and film scholars, when they mention translation, tend to talk about its influence on larger audiovisual wholes of the film instead of seeing it as predominantly a verbal or textual activity. The cinematic concept of translation will be illustrated below by examples from film studies research literature.

The subject of translation is not usually discussed explicitly within film studies, and people working in the film industry are usually not familiar with theories of translation developed within translation studies. However, there are some works by film studies scholars and filmmakers on translation, for example Subtitles: On the Foreignness of Film, edited by Atom Egoyan and Ian Balfour (2004), and Beyond the Subtitle: Remapping European Art Cinema by Mark Betz (2009). Ethnographic filmmaker David MacDougall devotes a chapter to the problematic of subtitling ethnographic films in his book Transcultural Cinema (1998), and he sees subtitling as a part of the filmmaking process. The subject of translation is mentioned in several other film studies publications (e.g. Naficy, 2001; Marks, 2000). Film scholarly writing on translation is more audiovisually oriented than in translation studies, and translation is seen as affecting many aspects of the film. Translation itself is also used as a filmmaking strategy. In particular, independent filmmakers are often involved in the translation of their own films and employ unconventional translation strategies to further their goals. Sometimes, as is often the case with filmmakers from marginalized cultures, these goals can be more political and ideological, and sometimes, as with art film, they can be oriented towards examining philosophical questions and aesthetic innovation. As Laura Marks points out, one translation strategy used by filmmakers is “to eschew full translation into English (or the language of the original culture)” (2000, p. 37). This strategy is known as partial translation and refers to presenting films without translation or with a clearly incomplete translation. This is, however, not to be confused with omissions that are inevitable due to the space and time constraints of the medium. The purpose of partial translation is to defy the viewer’s conventional expectations that “the image is a window onto the culture” (ibid., p. 39) and to work against “the impression that it is possible to know others without effort—that the whole world is inherently knowable and accessible” (MacDougall, 1998, p. 175). In her film Surname Viet Given Name Nam, Trinh T. Minh-ha uses audio/visual disjunction as a strategy that undermines the sense of authenticity of documentary film by revealing its representational character. The film depicts Vietnamese women speaking, but the speech is out of sync with the images, and likewise, the subtitles are deliberately out of sync with the speech (Naficy, 2001, p. 122). Atom Egoyan’s film Calendar, the story of a Canadian photographer of Armenian descent who, alienated from his ancestral culture, travels to Armenia to take photographs of churches to make a calendar, also problematizes the notion of translation. Marks argues that Calendar is “structured around the losses that take place in acts of translation” (2000, p. 39). The film includes several lengthy scenes in foreign languages that are not subtitled. Moreover, the main character does not speak Armenian, and his wife acts as an interpreter between him and their Armenian driver during the photography trip, and thus, large portions of the film consist of instances of interpretation within the film.

The above indicates that filmmakers participate actively in discussion on translation and are aware of the fact that translation can influence the reception of their work. The importance of translation to filmmakers is also illustrated by the fact that, in film studies research literature, filmmakers sometimes criticize the translations of their own work. The two quotations below illustrate situations in which filmmakers have expressed their wishes regarding translation of their work, although they were not fulfilled:

(1) Experimental filmmaker Petri Kuljuntausta writes about the subtitling of his film Texas Scramble (1995):

The only detail which could be considered a concession was the addition of subtitles to the TV1 Finnish television broadcast. The film includes an English language text, which the Finnish Broadcasting Corporation wanted translated into Finnish. This was contrary to our wishes since the primary purpose of the text was not to provide denotative information. The text in question runs in three levels across the screen. It would be impossible for the spectator to read everything on the screen and this was the intention.

2007, p. 78

Here, the filmmaker’s wish was to present words running on screen in a way that emphasizes their materiality over the content they convey. However, instead of employing the unconventional subtitling strategy proposed by the filmmaker, the Finnish Broadcasting Corporation decided to include subtitles in the scene.

(2) In the example below, Claire Denis discusses a scene from her film Friday Night in an interview with Atom Egoyan. The scene depicts a woman, the film’s main character, sitting in her car outside a café, observing a man who is inside, talking with another woman. Their discussion is not heard clearly on the soundtrack. The purpose of the scene is to create a sense of exclusion felt by the main character by limiting her access to the dialogue that goes on inside the café, and that is why the dialogue has been made inaudible for her. The problem is that the subtitles make clear what the characters inside the café are talking about. This is not consistent with the point of audition of the main character who is outside and cannot possibly hear the conversation that is going on inside.

EGOYAN: It is interesting that you bring this scene up, because this is exactly the scene that got me most excited. In that scene you can barely hear the dialogue in French. But last night as I watched the film, the subtitle made absolutely clear what was being said.
DENIS: I was actually against that. I asked the guy who did the subtitles if we could perhaps print them with one letter missing or one word missing – as artists, you know… And he said that that doesn’t exist in subtitles. Either we have subtitles or we don’t have subtitles.
EGOYAN: So why did we need to subtitle that scene?
DENIS: I don’t know, I was too weak to say.

Denis, 2004, pp. 74-75

Díaz Cintas and Remael suggest that people from the film industry “feel that the more literal a translation is, content-wise and formally, the better it is” (2007, p. 57). This might be the case with filmmakers and scholars who do not deal with translation in their work, but if we consider the above quotations and the translation strategies employed by the filmmakers mentioned earlier in this section, they reflect the idea that the translation unit these filmmakers are concerned with is not the word or the line of dialogue, but a larger audiovisual unit that transcends the verbal level of the film. Instead of seeking to mediate the verbal message itself, the act of translation is targeted at larger audiovisual structures and more abstract themes of the film, and non-verbal materials can be emphasized over verbal ones. In the examples from Friday Night and Texas Scramble, the act of subtitling has been targeted at the verbal level of the film without considering the function of the verbal within the audiovisual whole. This has clearly altered the original audio/visual relationship, and thus, the intended experience of the scene.

While the above examples do not make clear whether these were individual translators’ decisions or dictated by the subtitling guidelines of the company they were working for (Denis refers to “the guy who did the subtitles” and Kuljuntausta mentions “The Finnish Broadcasting Corporation”), the subtitling activity has nevertheless produced a result significantly different than what was originally intended by the filmmakers. In my view, this reflects the idea that, in translation studies, the verbal dimension (as language) is seen as the core of film translation, and it is foregrounded with regard to the audio and visual dimensions. However, making dialogue clear, audible and accessible to the viewer is characteristic to most filmmaking, and as Chion points out, directors who wish to deviate from this norm often face difficulties, because intentional lack of intelligibility can easily be confused with technical ineptitude by professionals and audiences (1999, p. 81). However, the use of filmic devices depends to a great extent on the film style employed. As David Bordwell points out, the classical narrative is based on unproblematic access to reality and does not address questions of representation as such, whereas art film narration sees the notion of representation itself as problematic and seeks new ways of representing the world and subjective experiences of characters (1985, p. 212). Art films are characterized by a degree of uncertainty and ambiguity, and the notion of truth they represent is relativistic. The intelligibility ideal of classical narration does not apply to art film—in fact, to some extent, art film works against the classical tradition. Thus, these two film styles require different subtitling strategies. For example, the scenes from the films by Denis and Kuljuntausta discussed above, clearly deviate from the notion of classical narration, and thus, require different subtitling strategies. And it is not only art films that require different translation strategies. National cinemas (in the absence of a better term) around the world have their own styles that, each in their own way, challenge the Western world view by presenting the world through different audio, visual and verbal strategies (see e.g. Mottahedeh, 2004; Longfellow, 2004). These examples indicate that not only the verbal but also the visual and the aural as forms of expression are culture-specific and meaningful in translation, and classical narration is only one way of representing the world.

Phenomenology and Embodied Experience

Simply put, phenomenology studies experience from the perspective of the individual in a situation. This is done by casting aside “natural attitudes,” i.e. taken-for-granted assumptions and usual ways of perceiving things. Phenomenologically informed approaches work well in bringing issues that normally remain hidden to the surface. In the words of Don Ihde, phenomenology is “a probing for what is genuinely discoverable and potentially there, but not often seen” (1986, p. 26). The ability of phenomenology to discover what is potentially there makes it a suitable method for examining the role of sound in the context of film translation. Sound is a phenomenon that has been, to date, largely ignored in film translation because it has been overshadowed by the language that it mediates. If we cast aside the preconception of translation as a predominantly verbal activity, we might arrive at new understandings of the different functions of sound in a film. Sound itself is closely related to the experiential (affective, sensory, embodied) aspect of film, and within film studies, it is often examined through a phenomenologically informed method. Phenomenology offers an excellent pathway into the non-verbal dimension of film. It opens up a new perspective on film translation, a way to discuss the non-verbal film elements to a larger extent than before and to understand their function as embodied structures of seeing and hearing in which all signs are embedded.

Film scholars working within research traditions as diverse as phenomenology (see e.g. Sobchack, 1992; Chion, 1994; Marks, 2000) and cognitive film studies (see e.g. Grodal, 2009; Plantinga, 2009; Fahlenbrach, 2007) share the view that film viewing is based on what is referred to as embodiment. As Marks argues, “film is grasped not solely by an intellectual act but by the complex perception of the body as a whole” (2000, p. 145). The notion of embodiment has been commonly associated with phenomenology, particularly Maurice Merleau-Ponty’s work on the phenomenology of perception arguing that the body is an existential ground for all meaning (2002 [1962]). This means that the body is not an object or a mere instrument of the rational mind, but the source of meaning. We can only be in the world through the body, and having a body means being in a certain position (e.g. a certain angle of vision) with regard to the objects we perceive.

In the field of cognitive science, Lakoff (1987) and Johnson (1987) have pointed out that human thought is by nature embodied, i.e. knowledge of abstract phenomena is structured metaphorically through concrete, physical, embodied experiences. In order to make sense of more abstract concepts (e.g. time), we use concrete metaphors (e.g. “Time is money”). Thus, expressions “to waste time” or “to invest one’s time in something,” “to spend time” and “to save time” are all based on the “time is money” metaphor. Different languages use different metaphors. Similarly, audiovisual media are based on embodiment. Applying Merleau-Ponty’s phenomenology to the context of film studies, Vivian Sobchack refers to film as “an expression of experience by experience” (1992, p. 3). By this she means that the structures of cinema imitate our embodied experience of being in the world, i.e. film uses our sight and hearing to create an audiovisual structure that resembles human experience of space and time, perception of movement, relations to objects in space, and physical and social encounters with people in the world. In the broadest sense, films can be seen as worlds constructed by the filmmaker that the viewer can relate to in the same way as he or she relates to the real world. Films, therefore, are not only texts that we can read but also entities that we can experience through our senses and bodies. Film viewing, then, cannot be sufficiently characterized as looking at the film as an object, but rather resembles Merleau-Ponty’s “looking with, or looking according to” (voir selon) an image (2012 [1964], p. 425), an act whereby a sense of reciprocity is created between the viewer and the film. The cinematic apparatus expresses abstract ideas by imitating concrete experiences of seeing, hearing and being situated in the world. For example, in Claire Denis’ film Friday Night discussed earlier, the main character’s position as an outsider is expressed by concretely limiting her hearing experience in order to make the viewer identify with her position. Film is especially effective in mediating such physical and emotional experiences through concrete experiences of seeing, hearing and being situated in space. These are things that are difficult, if not impossible, to mediate by language to the same extent.

Materiality and Meaning

In addition to embodied experience, another key concept in the phenomenologically informed approach to film sound is materiality. Michel Chion describes the material, sensory aspect of cinema as follows:

Cinema is not solely a show of sounds and images; it also generates rhythmic, temporal, tactile, and kinetic sensations that make use of both the auditory and visual channels. And as each technical revolution brings a sensory surge to cinema it revitalizes the sensations of matter, speed, movement and space.

1994, p. 152

Materiality refers to the physical characteristics of sound and image that the film viewer always encounters before arriving at anything that can be referred to as “meaning.” Every sign has a material form. According to David MacDougall, “[f]irst and foremost, a film is a collection of materials of which it is made” (2006, p. 270). Therefore, before a film tells the viewer anything, it puts the viewer in a particular relation to the audiovisual materials, creating a starting point from which to interpret and understand the film’s themes. The materiality of signs inevitably influences the way in which their meanings are interpreted. Thus, before arriving at any interpretations about what a film might “mean,” the translator, too, encounters the film at the material level. Of course, film translators do engage with the materiality of films every time they watch and translate films, but this is a step that has to be made visible in translation studies in order to create concrete tools for the analysis of the visual and the aural dimensions of audiovisual texts.

In the following section, sound as an object of study in the context of film translation is discussed, and Michel Chion’s model of phenomenologically informed audiovisual analysis is proposed as a tool for analyzing the visual, the aural and the verbal in the context of film translation.

Sound as an Object of Study in the Context of Film Translation

Defining sound as an object of study in the context of film translation is a complex task for several reasons. I will briefly discuss the roots of the problematic of sound as an object of academic study in general and end this section by presenting a dynamic concept of film text that allows for examining the role of sound from the perspective of film translation, as well as the role of translation in shaping the film as a multimodal whole. For the sake of clarity, I am using the word “film text” here to refer to the multimodal whole the film translator is working with, even though the enlarged concept includes the embodied and experiential aspects not normally treated under the notion of textuality.

Sound as an object of study is by its very nature elusive, which might be one of the reasons why it has been long ignored in many disciplines, despite its clear relevance to these. In the words of Walter Ong, “Sound only exists when it is going out of existence” (2002 [1982], p. 70). Due to its evanescence, sound as an object of study is impossible to pin down without countering its own essence. For example, a sound represented as static data, such as a spectrogram, cannot be seen as an adequate representation of the sound as a phenomenon, because, firstly, it is a sound that we can no longer hear and become immersed in, and secondly, the sound is isolated from the spatial and temporal context in which it was produced. It is, therefore, of vital importance to analyze sound in such a way that preserves its dynamic development and the original context of signal propagation (Augoyard and Torgue, 2005, p. 9).

To be able to grasp sound as an object of study, scholars (e.g. Chion, 1994; Augoyard and Torgue, 2005) have opted for a phenomenologically informed approach to sound that allows for examination of sounds as events that unfold in space and time and are, therefore, inextricably intertwined with the physical characteristics of a specific context of signal propagation and human experience in the context. So far, translation studies approaches to film have tended to focus on the interplay between the visual and the verbal, and film has not been studied as a total experience that includes the aural dimension.

According to Chion, film viewing is based on cross-modal perception, i.e. synchronous sound and image are experienced as a unit, a “synchresis” (1994, p. 63). Chion argues that filmic image and sound transform each other at the moment of perception, producing added value (ibid., p. 5). Chion points out that cinematic image and sound are dependent on each other and their meanings change considerably if they are separated. He uses the distinction between “onscreen” and “offscreen” sound to illustrate the point: If the image is temporarily removed and we are listening to sound only, there is no longer such thing as onscreen or offscreen sound, since it is the image that defines whether a sound is onscreen or offscreen. In Chion’s definition, film consists of a place for images (the frame) which is shaped by sound (ibid., pp. 66-69).

When theorizing about the role of sound in the context of film translation, it is important to note that spoken words, too, are sounds in the sense that they are mediated by sound. In addition to speech, a film soundtrack can have music, noises as well as silences. It is, of course, necessary to take the verbal into account as a means of expression in its own right, in its function as a mediator of conceptual meanings, but this should be done without forgetting that words as sounds relate to other sounds—music, noise and silence—and thus, a film soundtrack always has a consistency that defines which sounds are the most salient (ibid., p. 189). Spoken words exist as sound, which, in its materiality, evokes certain effects, emotions and impressions in the listener. It is this materiality that Chion seeks to explain through his method of audiovisual analysis. Chion describes the objective of audiovisual analysis as follows:

audiovisual analysis is descriptive analysis; it should avoid any symbolizing interpretations of a psychoanalytic, psychological, social, or political nature. Interpretation may of course follow, based on the findings of the analysis. Here, for example, it is not the symbolism of water and waves that interests us, but rather the wave as a dynamic model.

ibid., pp. 197-198

For the purposes of analysis of the role of sound in the context film translation, I propose an enlarged concept of film text. According to this new definition, a film is:

  • (1) A multimodal whole consisting of the visual, the aural, and the verbal as forms of expression in their own right, capable of expressing partly similar meanings but also ones that the other forms cannot convey. The relevance of each element of the audiovisual whole within a film sequence can be determined only by analyzing the relations between the elements. There is no fixed order of importance between the elements, even though certain combinations are more common than others.

  • (2) Dynamic and event-like, because it is of crucial importance to preserve image, sound and word in their original context in order to be able to examine their relationship on the basis of their materiality (physical characteristics) with the objective of understanding their role in forming the total experience.

Adopting a dynamic concept of text and an analysis method based on human experience is by no means uncomplicated. The main obstacle is that when discussing films in this way, concrete examples—film clips—become indispensable. To be able to provide proof of the total experience produced by co-presentation of sound, image and word, the researcher has to have access to the original film, along with permission to present it in a research or teaching context. However, when making decisions about how to translate films on the basis of the physical characteristics of sound and image, the translator has concrete proof that supports certain decisions. For example, if dialogue in a scene is only partly audible, and on the basis of the audiovisual analysis, the translator concludes that the dialogue was not meant to be heard fully, the translator might render only the parts that are actually audible on the soundtrack and refer to the partial inaudibility (as a physical fact) as grounds for leaving some parts out. This approach makes explicit the way in which the aural and the visual are treated in film translation.

The rest of this article presents an analysis of Aki Kaurismäki’s film Lights in the Dusk and sets out to show how the filmmaker has used various embodied aural strategies to give expression to some underlying abstract themes of the film (loneliness, social isolation), and how effectively these are conveyed via the English and German subtitled versions of the film.

Subtitling Lights in the Dusk

Lights in the Dusk (Laitakaupungin valot, 2006) is a story of a socially isolated security guard, Koistinen, who works night shifts at a shopping mall in Helsinki. He is constantly being bullied at work by his colleagues and even by his supervisor, and his social life is virtually non-existent. He lives in a modest basement apartment and, due to his work schedules, has to sleep during the day, which further exacerbates his isolation. He is represented as an outsider who exists at the fringes of society. Koistinen meets an attractive woman and becomes infatuated with her. The woman, however, is a gangster’s moll and Koistinen unwittingly becomes part of a plan to rob a jewellery store that is located at the shopping mall where he works and to which he holds keys. The rest of the film depicts the downward spiral that begins when Koistinen is imprisoned, then released and tries to get back on his feet. It is important to note that the crime storyline, which constitutes the film’s principal plot, is only one aspect of the film. Large portions of the film concentrate solely on characterization in the form of lengthy scenes depicting Koistinen leading his estranged existence. The film repeatedly depicts situations in which Koistinen experiences social rejection (peer rejection, romantic rejection), which results in a social isolation cycle that affects all aspects of his life. Although parenthetical to what could be considered the film’s main action, these scenes are crucially important in creating the impression—in embodied terms—of the existence of a socially isolated individual and his daily struggle for survival in the margins of society. The main function of these scenes is to make the viewer experience the film’s events from the subjective perspective of Koistinen in order to feel what it is like to lead such an existence. In general, elicitation of subjective feelings in the viewer becomes possible through limiting the viewer’s access to a scene to the perspective of a character instead of giving the viewer an unimpeded access to the scene in its entirety. In Lights in the Dusk, this effect is achieved by using different embodied audiovisual strategies that accentuate the feeling of being on the outside.

Lights in the Dusk makes ample use of aural strategies that Michel Chion refers to as relativized speech (1994, pp. 178-183). In general, the function of these strategies is to present speech in a film in such a way that the content of the dialogue is displaced from the center of the scene, whereas non-verbal elements and the materiality of speech are emphasized. Chion distinguishes six strategies of relativized speech. Three of them are discussed below, illustrated by examples from Lights in the Dusk. The strategies discussed below can have other functions in different contexts, and here they are discussed to the extent that they are relevant to this particular film.

(1) Multilingualism and use of a foreign language: Speech is relativized by use of a foreign language that is not understood by most viewers. Blocking the viewer’s access to content of dialogue in this way creates an effect of being on the outside.

The main function of the opening sequence of the film is to introduce Koistinen to the viewer as an isolated and displaced individual. He is depicted doing his duties in the shopping mall where he works as a security guard. Carlos Gardel’s Volver is being played from the loudspeakers of the shopping mall. The transmission of the music is not entirely smooth, which accentuates subjective experience, and the music is intertwined with the monotonous grinding of the escalator and the cold, metallic sounds of Koistinen’s keys. The images are dark, dominated by concrete buildings and shades of grey. All visual and aural elements work together to establish an impression of Koistinen’s dismal existence. Some moments later Koistinen is shown outside the shopping mall in the dark night as male voices speaking Russian are heard coming from offscreen. The image implies that Koistinen has heard the voices since he looks in the direction of the voices and takes a step back as if in reaction to them. Then the image reveals the sound source: a group of three men approaching. They are immersed in a spirited conversation, gesticulating vigorously to give emphasis to their words. Two of the men walk in friendly terms arm in arm, which underlines their togetherness. They pass Koistinen by, very close, without paying attention to him. Their voices are still heard from offscreen when they have disappeared from the frame. Koistinen is shown standing next to the wall, lighting a cigarette.

Audiovisual analysis of the development of the image and sound in terms of their physical qualities helps open up the function of this scene. Importantly, the Russians are depicted from Koistinen’s point of view and audition. In other words, the events are depicted as he hears and sees them, which means that the viewer is invited to identify with him. This is rendered visually by framing Koistinen all the time in medium shot or medium close-up, looking in the direction (offscreen) of the voices, and then turning his head towards the direction in which the sound source moves. Presenting alternating shots of Koistinen’s face and the group of Russians creates the impression that he is observing the passers-by. The development of the sound also reflects Koistinen’s point of audition. The sound comes from offscreen, becomes gradually slightly louder, then moves onscreen, and as the men pass, goes offscreen again and diminishes somewhat in volume. Both the sound and the image develop dynamically and indicate movement with regard to Koistinen’s static position. When the Russians first enter the image, they are shown from the front in long shot, then as they near Koistinen, they are shown in profile in medium shot. They pass Koistinen by and we see their backs in medium shot, and finally, they go offscreen again and are only heard as voices. The framing indicates that they are rather close to Koistinen when they pass him by, but they don’t pay any attention to him. Importantly, despite the slight changes in volume, the dialogue is clear and audible throughout the scene. The effects of translation on this scene are discussed below.

The film’s original language is Finnish, but the dialogue of the scene discussed here is in Russian. The original version of the film has no subtitles in this scene, and subtitles have been omitted from the German dubbed version as well. The English and German subtitled versions, however, have subtitles for this scene. The translators’ decision to include subtitles changes the scene profoundly. The absence of subtitles in the original film implies that Koistinen does not speak or understand Russian and he is, therefore, excluded from the conversation altogether. The warm togetherness of the group of Russians is juxtaposed with Koistinen’s loneliness. Without subtitles, the foreign tongue becomes one of the sounds of the nocturnal city and immerses the viewer in the non-verbal dimension of the scene. In this way, the viewer can sense and experience what it is like to be on the outside. However, when subtitles are added, the dynamics of the entire scene change. The viewer concentrates on following the conversation between the Russians—the subtitles indicate that they are talking about Russian authors and other cultural figures—and the focus shifts onto the verbally conveyed content of the scene at the expense of the non-verbal storytelling. As a result, the viewer no longer identifies with Koistinen’s subjective position but rather, becomes an all-knowing outside observer who has unimpeded access to the conversation between the Russians. What is problematic here is that, with subtitles, the scene does not manage to convey the experience of Koistinen’s loneliness—which is the core of the scene—but is rather reduced to a conversation between random passers-by. This example indicates that the translator does not translate only the verbal element but influences the viewer’s total experience of the scene. To illustrate the difference, some frames from the scene (left column without subtitles, right column with English subtitles) are presented below.

-> See the list of figures

Most viewers do not have competence in Russian, and for them the main character’s sense of isolation and alienation is conveyed by non-verbal, embodied means. However, it is important to note that the meaning of this scene is experienced differently by a viewer who is a Russian native speaker or has competence in the Russian language and culture. Although the conversation might seem unrelated to the film’s main themes, it is not a coincidence that the passers-by are talking about Tchaikovsky, Tolstoy, Pushkin, Chekhov and Gogol. In my interpretation of it, the scene conveys similar meanings both non-verbally and verbally: While the non-Russian speaking viewer is offered the opportunity to identify with the existential condition of the main character in embodied terms through impeded or limited access to the dialogue, the viewer who speaks Russian and is familiar with Russian culture may conclude that the lives and works of the persons mentioned in the dialogue reflect in one way or the other an existential condition of alienation and exile similar to that of the protagonist of Lights in the Dusk. Thus, a similar overall meaning can be potentially found with and without competence in the Russian language and culture. Then again, the scene might not be experienced in the same way by a viewer who has some competence in Russian but is not familiar with Russian culture to a degree that would allow general conclusions to be drawn regarding the lives and works of the cultural figures mentioned in the dialogue. The complexity of this scene is a good example of the subjective nature of film experience and the fact that when making subtitling choices, the translator must make an effort to see beyond his or her own immediate experience and adopt a reflexive attitude that seeks to analyze and understand the experience from different perspectives.

(2) Proliferation and ad libs. The sheer quantity, speed or other characteristics of speech are so overwhelming that the message cannot be processed in a meaningful way (Chion, 1994, pp. 178-183).

In this scene, Koistinen is depicted attending a lecture about founding a limited company. He is sitting in the lecture room with the other participants. The lecturer is smoking a cigarette and reading aloud a passage from a legal text. The text is straight from the Finnish Limited Liability Companies Act (Osakeyhtiölaki), and due to its sheer complexity, the course participants are faced with an impossible task of taking notes. The lecturer’s disregard for professional standard expected of a lecturer is evident, and the situation appears all the more unfair, when he is shown collecting money from the participants after the lecture. The learning situation depicted is unreasonable from the perspective of the participants.

As the images below indicate, one sentence stretches over a large part of the scene. The subtitler has rendered the content accordingly, as one, long and complex sentence. On the one hand, the scene depicts Koistinen as a person striving for self-development, and on the other hand, it depicts society—represented by the authority figure of the lecturer—hindering this development. Both the English and the German subtitled versions have included as full and complicated subtitles as possible for this scene in order to illustrate the absurdity of the learning situation depicted. Here, again, it is not the information content of the words that is of primary importance from the perspective of translation, but the feeling of helplessness before the very complexity of the lecturer’s speech that the course participants are experiencing. Thus, the subtitling norms regarding, for example, reading speed are not the central concern here. On the contrary, the translator should make sure that the subtitles cannot be read at a glance so that the viewer can identify with Koistinen’s not being in control of the situation. While the scene discussed earlier blocked access to the content of dialogue, here the content is presented to the viewer but is too complex to grasp fully. Here, again, the voice becomes just a sound, and the scene produces an effect of speech going in one ear and right out of the other.

-> See the list of figures

(3) Loss of intelligibility: Speech is fully or partly drowned in ambient sounds (Chion, 1994, pp. 178-183).

This scene portrays Koistinen entering a restaurant. He orders a drink and tries to start a conversation with a woman at the bar but is rejected. A man who is in the company of the woman comes to Koistinen and tells him to leave. Koistinen goes near the restroom entrance and stands there next to the wall, holding his drink. After this, we see a colleague of Koistinen sitting at the bar with an unknown man, having a conversation and looking at Koistinen. The colleague points his finger in the direction of Koistinen, which makes it clear that they are talking about him. Alternating shots of both Koistinen and the two men create a strong impression that they are looking at each other. Koistinen stands far away from the men and cannot hear their conversation fully, because the dialogue is partly drowned in the ambient noises of the restaurant. Consistently with this, only a fragment of the first sentence is heard on the soundtrack: “Toi jätkä on meillä yövuorossa…” which in English translation is “That guy works night shifts [at our company]…” The English subtitled version (see images and subtitles below, column on the left) renders what is actually being heard on the soundtrack, and what the film implies is heard by Koistinen. However, the subtitler of the German version (see images and subtitles below, column on the right) has made a different decision and included an additional subtitle for speech that cannot be heard on the soundtrack at all:

  • - Ist er verheiratet? -Nein.

  • - Da haben wir einen perfekten Mann.

Direct translation from German:

  • - Is he married? - No.

  • - He is a perfect man for us.

While the English translation reproduces the original relationship between image, sound and word in this scene, the German version manipulates the relationship by adding a large amount of material that was originally not audible on the soundtrack, thus undermining the element of suspense that arises from the dialogue being unintelligible in the original version of the film. In addition, since on the basis of listening to the soundtrack we have absolutely no proof of what was really said between the two men, the translator of the German version has taken excessive liberties manipulating the story. In addition, the part of the invented subtitle in which the man answers “No” to the other one’s question does not agree with what is simultaneously seen in the image. There are no lip movements coinciding with the invented subtitle: The man is shaking his head. This scene, like many other scenes in the film, relies on information conveyed non-verbally. The verbal here is clearly secondary, buried in the ambient sounds of the setting. In the original version there is a strong sense of foreboding on which the scene rests.

-> See the list of figures


The objective of this article was to enlarge the concept of film translation beyond the paradigm of textuality and propose an analysis method developed by Michel Chion within film studies that allows for addressing film sound from the perspective of film translation. Due to the context-bound nature of sound, it is not possible to analyze film sound in isolation but in relation to the audiovisual whole in which it is embedded. Therefore, an approach to the role of sound in film translation is inevitably an approach to film as a multimodal whole. An approach that treats the aural and the visual as equals to the verbal has epistemological implications for translation studies. They involve rethinking the current verbally oriented categories of translational knowledge and enable a shift from verbally based translational thought to audiovisually based translational thought. This does not, however, mean polarizing the verbal and the non-verbal or giving preference to one over the other. Rather, it must be acknowledged that images, sounds and words have their place in cinematic expression, and an audiovisually oriented approach to translation, which treats all these elements as equals, allows us to see how the verbal fits into the audiovisual whole of the film. When co-presented, images, sounds and words transform each other, and as was shown with the example scenes from Lights in the Dusk, different subtitles can radically change the focal core of a scene. Thus, film subtitling is not an additive process in which the image and sound remain intact, but should be seen as a transformation that leads to a (degree of) qualitative change in the audiovisual whole. Therefore, the translation unit a film subtitler works with is not a purely verbal unit but an audiovisual unit (in which the verbal is always embedded). Changes brought about by subtitling can operate at higher levels as well, for example by influencing the larger themes of the film, as was in the case of the perception of loneliness and social isolation in Lights in the Dusk.

The visual and the aural are always (often implicitly) analyzed to some extent when translating an audiovisual text, but this step has not been made sufficiently visible in research on audiovisual translation. The phenomenologically informed approach, which is based on things that are experienceable, is a step towards making the film translator’s decision-making process more transparent. Chion’s audiovisual analysis is an effective tool in teaching translation students how cinematic expression works and how different translation decisions can influence the film experience as a whole. Learning to perceive audiovisual combinations effectively and systematically requires practical exercises. Based on the individual’s experience of film, the phenomenologically informed approach can also empower the film translator in that it sees the translator’s experience and embodied situatedness in the world as a valuable starting point from which to understand translation, cinema and the world.