Article body


The year 2015 marks the 40th anniversary of the first academic record[1] of what is now known as audio description or AD (Piety, 2004), “a service that makes audiovisual products accessible and enjoyable for the blind and visually impaired by transferring images and unclear sounds into a verbal narration that interacts with the dialogues and sounds of the original text with which it forms a coherent whole” (Reviers and Vercauteren, 2013, n.p.). In those 40 years, audio description—which was only a theoretical possibility back in 1975—has developed into an established practice in many territories and various domains such as cinema and television (recorded AD), theatre and opera (live AD), and museums and other cultural heritage sites (both recorded and live AD). Research into the different modalities of this access service is developing at an ever faster pace, too, and although the topics covered become ever more diversified, most of the research undertaken is related to two issues that are of key importance for the practice of AD:

  • (1) A first question that receives a lot of attention is that of content selection, or what to describe. Indeed, the intersemiotic dimension that characterizes the process of audio description raises various problems related to the content of AD. A lot of visual information is often presented simultaneously, making it difficult for describers to decide what is relevant and what is less so (cf. Yeung, 2007; Remael and Vercauteren, 2007). Furthermore, audio description should as a general rule only be inserted in the target text when there is no dialogue or other significant sound effects. Therefore, even when describers know what they want to include in their description there may not be sufficient time to describe everything and they are forced to set priorities and discard some of the information they initially wanted to include;

  • (2) A second question that is covered in many studies is that of the form of the AD, or how to render specific visual information verbally. One particular problem in this respect is that, to a high degree, images seem pre-filled with meaning for us and are therefore—particularly on the concrete level—much more specific than words. When we look at Figure 1, we all agree that it is a picture of trees. Most of us will recognize them as deciduous trees and a fair number will be able to identify the characteristic trunk of birches. However, when we ask people to draw a tree, chances are relatively small that they will draw a birch unless we specifically ask them to do so. In other words, there does not seem to be in visual language an equivalent of the generalizing superordinate terms in natural languages; a tree in visual languages is always a very specific tree. This means that meaning-activation in verbal communication is always a much more personal process: the more general the verbal term used, the more options the reader or listener will have to make a mental representation of it. As such, the question of how to describe is to a large extent open-ended.

Figure 1



-> See the list of figures

The present article intends to contribute to the first area of research in audio description and presents a structured approach to one dimension of the question of content selection in AD, namely the audio description of characters[2]. First, however, it will suggest that AD research can be framed within the broader field of translation studies (TS), a theoretical framework that has been absent from most AD research, despite the fact that audio description is generally considered a new form of translation.

1. Audio description and the broader field of translation studies

Given the complex communicative and multimodal nature of the source text (ST), it is not surprising that audio description research is to a high extent multi- and interdisciplinary in nature. In her overview of audio description research, Braun (2008) names semiotics, film studies, discourse analysis and pragmatics, narratology, relevance theory and multimodality research as some of the frameworks that can be applied to AD research. A quick overview of introductions to books or special issues of journals dealing with audiovisual translation (AVT) shows that they all incorporate audio description and other forms of media accessibility in the field of AVT and TS (cf. Chiaro et al., 2008; Díaz Cintas, 2008; Gambier, 2003). But if we claim that audio description is a form of translation, it only seems reasonable that we try to apply paradigms used for other forms of translation to audio description, too. In light of the multimodal source text, the intersemiotic translation process, and the status of the target text, it is important that the paradigm approach the concepts of text and translation in a very broad way: text should include those that are not only linguistic but also multimodal, while translation should at least include forms of intersemiotic translation and partial translation. A paradigm that meets all these criteria is the functionalist approach to translation studies, founded in the 1970s and 1980s by Reiss and Vermeer and later developed by Nord, amongst others. One of the characteristics that sets functionalism apart from earlier models in translation studies is that it moves away from the central—yet highly controversial (cf. Pym, 2009)—concept of equivalence that formed the basis of most earlier models, positing instead that the purpose or intended function of the target text should take precedence in the translation process. Defining the concept of equivalence, which was already difficult enough for interlingual translation (cf. Kenny, 1998, for an overview of some of the typologies developed to define the concept of equivalence), would most probably be impossible for the intersemiotic translation process of audio description, given the completely different logics governing verbal and visual communication (cf. Kress and Van Leeuwen, 2006; Vercauteren, 2012).

By making the purpose or intended function of the target text the essence of the translation process, the target audience becomes more important, too. As Nord put it:

one of the most important factors determining the purpose of the translation is the addressee, who is the intended receiver or audience of the target text with their culture-specific world-knowledge, their expectations and their communicative needs

1997, p. 12

Translated to the context of audio description, the purpose is to make the story told in a film accessible to a visually impaired audience, while the intended function of the target text is to tell “the same” story as the one told in the original. Furthermore, as explained by Vermeer (cited in Nord, 1997), the translation process is at least partly verbal, but can include other dimensions. Defining the concept in such a broad way makes it applicable to audio description, too, as it takes into account the intersemiotic dimension of the AD process. Finally, the translation is based on a source text, described by Nord as “some kind of text” (1997, p. 25), a very broad concept “combining verbal and nonverbal elements, situational clues and ‘hidden’ or presupposed meaning” (ibid.). This means that the audiovisual products that make up the source texts of audio descriptions are considered texts within this paradigm. Probably even more important with regard to audio description is that in the functionalist approach, the source text is just one source of information—along with, for example, the target audience and its background knowledge—that the translator can use to create a target text. Rather than striving for equivalence, the translator uses the source text as an offer of information from which to choose the elements to be included in the translation, based on the intended function of the target text. This is particularly relevant for AD, where only part of the source text can be translated and the describer will often be forced to choose what elements to include and what elements to leave out. One very important remark must be made here: the fact that the source text is just an offer of information does not mean it can be completely disconnected from the target text. According to Nord a functional target text must always maintain a relationship with the source text (1991, p. 32). This relationship is partly determined by the function of the target text, but Nord further acknowledges that the only way to avoid the translator’s interpretation of the source text being completely dependent on individual conditions is “to control ST reception by a strict model of analysis which covers all the relevant text features or elements” (ibid., p. 19).

In other words, translation within the functionalist paradigm is very much a controlled activity, which starts with strict translating instructions—often referred to as the translation brief within this paradigm—which determine why the ST must be translated (purpose of the translation) and what function the TT must serve (intended function of the TT). Once the purpose and function have been defined, the translator proceeds to carry out a detailed analysis of the source text to get a complete understanding of all its explicit and implicit meaning-making elements. Based on this analysis, the translator can then determine which of these ST elements can or must be included in the translation to create a functional target text.

Starting from these explanations and using the diagram presented by Nord (1991, p. 39), a functionalist approach to audio description could be represented schematically as follows:

Diagram 1

The AD process according to the functionalist paradigm

The AD process according to the functionalist paradigm

-> See the list of figures

According to the functionalist paradigm, any audio description process should start with the reception and interpretation of the audio description brief to determine the purpose, i.e. why must the audiovisual source product be described. While there can be different purposes, we can assume that the basic one is to make a given audiovisual source product accessible to a visually impaired audience. From this general purpose, we can then deduce the intended function of the audio described target text, namely to ensure that the visually impaired audience knows what the audiovisual product is about. In the specific case of feature films, the function of the TT can be described as telling the story told by the original in such a way that the visually impaired audience is able to follow and enjoy it like a sighted audience does.

Next, the audio describer will compare the audio description brief with the source text and its function to determine whether audio description is possible, desirable or necessary. In the case of a feature film in which there is little dialogue and a large part of the story is told by means of visual information, there is a clear need for audio description and enough time to provide it given the relative absence of dialogue[3]. Next the describer will conduct a detailed analysis of the audiovisual source text, in this case the feature film. While Nord (1991) focuses on linguistic elements for a full understanding of the implicit and explicit, denotative and connotative meanings of the text, in audio description the focus of source text analysis will shift slightly and should cover two aspects: the narrative elements of the story and the way in which the story is told, i.e. the narrative content and the narrative style. Based on this analysis, the describer will decide which ST elements are relevant, i.e. which elements do the visually impaired audience not have access to but need in order to follow the story. Those elements will then have to be adapted to create a suitable target text, based on strategies chosen by the describer. A few major adaptations requiring specific strategies can readily be identified: the describer will have to translate visual information into verbal information, decide how to formulate these descriptions to create a certain description style, and often, decide what information to include and what information to leave out of the description, given the omnipresent time constraints imposed by the dialogue.

Once the relevant ST elements have been selected and the strategies to transfer them to the TT have been determined, the describer will start producing a functional audio description, i.e. a description that, together with the dialogue and other information the visually impaired audience has access to, tells a similar story to that told in the original, in an entertaining way. Finally the describer will test whether the audio description really serves its function, i.e. whether a visually impaired audience is able to follow and enjoy the story.[4]

While this is only a very general attempt at framing audio description within the functionalist paradigm and further research is definitely needed[5], the outline sketched above clearly shows that AD can be fit into existing TS paradigms. Moreover, integrating AD in the functionalist paradigm and its well-structured translation process may contribute to systematizing research in AD by tracing all the different steps and substeps a description should follow.

In the remainder of this article, I will try to contribute to this systematization of the audio description process, more particularly the first step, namely the analysis of the source text. In the case of audio description, one of the advantages of a systematic analysis of the source text is that it can offer the describer clear guidance on what information to include. Indeed, Sonali et al. consider “what to describe - what to include - what not to include - in what order - and how to prioritize the given information” (2010, p. 4) one of the major difficulties audio describers have to deal with. Existing AD guidelines agree that describers should include, among other elements, information on characters, actions taking place, the time and space within which these actions take place, and significant sounds that are hard to interpret without any clarification. In the next part, I will focus on one of these elements, namely characters.

2. Existing literature on the audio description of characters

Given that “identifying and describing characters is vital to effective AD” (Sonali et al., 2010, p. 5), there is already some literature on the audio description of this element. In addition to the different national guidelines, which give a general overview of the main features to include when describing characters, the subject has been discussed in a number of academic publications. Igareda (2011) and Mazur (2014) both look at facial expressions. The former analyzes the way in which five Spanish-language films describe six different categories of facial expressions and emotions, while the latter, drawing on insights from the study of gestures and facial expressions, proposes a categorization for AD purposes that first explains the different types of gestures and facial expressions that can be encountered in films, then shows how describers can prioritize between them, offering a tool to decide which gestures and facial expressions should be described and which can be left out while discussing various strategies for describing them. Fresno (2012) describes an experiment in which she looks at how audiences understand and remember characters based on the description of personality traits, physical traits, and characterization. Although the experiment seems to indicate that not all the information presented in the description is used equally by the audience when re-constructing the fictional character, the traits analyzed are fairly anecdotic and more comprehensive data on what traits should get priority in the description should be gathered—as acknowledged by the author (ibid.). Finally Benecke (2014) focuses on the naming of characters and the description of their physical appearance. While his Audio Description Evolution Model (ADEM) equips audio describers with a very useful tool to determine when they can include certain bits of information, the guidance his model offers for decision-making with respect to the different aspects of this physical appearance is also anecdotic; a more systematic approach with regard to this particular component is called for. This brief overview of the existing literature shows that audio describing characters is a complex undertaking: describers must give information on the characters’ physical appearance, actions, emotions and gestures; decide when to name them, etc. In the following paragraphs I will suggest an approach to analyze all these different aspects—which so far have mostly been treated individually—in a more comprehensive way that allows describers to determine what elements to describe and offers them some initial guidance on prioritizing all this information. The framework underlying that approach is that of narratology.

3. A narratological approach to the audio description of characters

Using narratological principles for source text analysis and target text creation in the field of audio description seems to make sense. Most films tell stories, and audio describers who have better insights on how stories are told and how they are processed by the audience will be better equipped to recreate these stories for visually impaired audiences. Starting from this postulate, Vercauteren (2012) suggests a narratological strategy for analyzing and rendering the temporal organization of events that take place in stories. Likewise Vercauteren and Remael (2014) use a narratological approach to develop a strategy for describing spatio-temporal settings. In addition to time and settings, characters have received a lot of attention in narratology.

An early model for character analysis was developed by E.M. Forster (1927), who distinguished between so-called flat characters and round characters (Herman and Vervaeck, 2005). Flat characters are portrayed as having only one or several character traits and show little or no development throughout the narrative. Round characters present a rich variety of character traits and considerable development. While this theory can be useful for content selection in AD—round characters will probably require more AD than flat characters—its main drawback, as pointed out by Herman and Vervaeck, is its rigidity: there are characters with only a few traits who develop considerably, while there are multi-faceted characters who hardly develop throughout the narrative. One of the most influential structuralist contributions to the study of character was the actantial model developed by Greimas (1966). He proposed a system of six different roles to categorize characters, namely the sender, the receiver, the subject, the object, the helper and the opponent. The main advantage of this system is that it is straightforward and can basically be applied to all narratives (Herman and Vervaeck, 2005) and all agents who are identified as characters. However, various authors have pointed out the flaws in Greimas’ model. It allows for a one-to-many and many-to-one mapping of roles to characters (cf. Herman and Vervaeck, 2005; Herman, 2002): in other words, one role can be ascribed to various characters, while one character can embody different roles. Moreover, the fact that the model is so general may result in “different readers coming up with different actantial structures for the same story” (Herman and Vervaeck, 2005, p. 54), which makes it difficult to use the theory as a general and unequivocal model for character analysis. Finally—with a particular focus on character analysis with a view to audio description—this model allows analysts to identify various general roles, but does not allow them to identify any concrete physical or mental characteristics that can be rendered in audio description. As the models outlined above indicate, developing a systematic and widely applicable narratological model for character analysis has proven very difficult. One possible explanation is the nearly unlimited diversity of characters in narratives. Just like people in real life, all literary characters look different, act and behave differently, and have different personalities, which makes it hard to incorporate them in a systematic model. In what follows, I will discuss three contributions from the field of narratology that, together, can offer audio describers a general yet sufficiently systematic and comprehensive model for analysing narrative characters.

A first contribution was presented by Margolin, who states that authors have to bestow their characters with different properties in order for audiences to recognize them and set them apart from each other. These properties can essentially be classified under three separate headings (Margolin, 2007, pp. 72-73):

  • (1) First, there are the character’s physical properties: age, sex, posture, hair, eye and skin colour, haircut, and features, but also clothing style, possible physical defects, etc.—in short, any exterior characteristic that allows the reader to first identify and later recognize that character on the basis of external appearance;

  • (2) Second, there are the character’s communicative and behavioural properties. The communicative properties essentially refer to a character’s verbal interactions with others and contribute to that character’s identification in a number of ways: they can help identify a character’s gender (at least in narratives comprising an aural dimension); they can hint at a character’s inner personality (e.g. the way a character talks can reveal whether he or she is calm or nervous and how he or she treats others); they can reveal a character’s cultural background (for example through use of a specific register, language variant, or dialect). The behavioural dimension refers to the actions a character performs and the (physical) reactions.

  • (3) Third, there are the character’s mental properties, which are further subdivided into perceptual, emotive, volitional and cognitive properties (ibid., p. 72). These properties refer to what characters see, hear and feel, what their emotions are, what they wish or desire, and what they think and know.

Authors combine properties from these three different dimensions to create unique characters, while audiences use them to create general macro-structures relating to the different characters and to “map out the total landscape of the storyworld in terms of the entities it contains” (ibid., p. 74). Translated to the context of ST analysis, audio describers must pay attention to these three dimensions when analyzing the characters in the source text and determine what information the target audience needs—and cannot derive from the semiotic channels it has access to, such as dialogue—to create these general macro-structures of the characters present in the narrative.

A second contribution that may be useful for audio describers, particularly when it comes to deciding how much character information should be included in the description, explores the so-called fullness of characters—to use Rimmon-Kenan’s term (1983, p. 40)—meaning that characters can be more or less developed by authors. In a sense, fullness is related to Forster’s distinction between round and flat characters mentioned above. But as was already pointed out, Forster’s model is too rigid and, as Rimmon-Kenan states, “the dichotomy is highly reductive, obliterating the degrees and nuances found in actual works of fiction” (ibid., pp. 40-41). In order to overcome this reductiveness, Rimmon-Kenan suggests representing a character’s fullness by means of three sliding scales or axes:

  • (1) The first axis is complexity, which indicates the number of traits a character has, ranging—in theory—from one to an infinite number. One important remark to be made here is that these traits do not necessarily have to be expressed directly in the narrative; the author can choose to render them in an implicit way, for example by showing a character’s reaction to something or describing a habit that is characteristic of a certain trait. Anyone analysing narrative characters will have to take this into account;

  • (2) The second axis is development, which ranges from no development at all to continuous development;

  • (3) The third axis refers to the access the audience has to a character’s inner life. Again, this ranges from no access at all to a profound presentation of a character’s consciousness, feelings, moods, thoughts, mental reactions, ideas, dreams, fantasies, etc.

While the use of these axes partly overlaps with the principles set forth by Margolin, particularly when it comes to analyzing character traits, it also enriches and refines the analysis: by clearly mapping out where characters stand on the three sliding scales, the analyst or describer gets an indication of who the most important characters are, what kind of information is given about them, and whether they are static, i.e. remain the same throughout the narrative, or dynamic, i.e. develop and change as the story unfolds.

A third contribution from the field of narratology that may be valid for character analysis by audio describers is Bal’s (1997) detailed discussion of character construction in narrative. Her account is particularly useful here, as it describes both how authors create characters and how audiences recreate them by analyzing the source text according to specific principles.[6] Bal distinguishes five principles audiences use when recreating characters from a given text:

  • (1) The first principle is what she calls determination (ibid., p. 125). When the audience meets a character for the first time, it does not know anything about that character and will try to determine some general characteristics on the basis of which this character can be recognized later on;

  • (2) This recognition is made easier by means of the second principle, called repetition (ibid.). Throughout the narrative, the typical features of a character will be repeated—in the same or a different way—allowing the audience to confirm what it already knew or inferred.

  • (3) In addition to the repetition of known features, new traits will be presented as the narrative develops. Indeed, not all the information identifying a particular character is presented at the same time, nor is that possible, so through the principle of accumulation new character traits are presented. These traits can be compatible with earlier traits, different, or even contradictory (Margolin, 2007, p. 73).

  • (4) A fourth principle used in (re-)creating and analyzing characters is that of transformation or change, referring to the distinction between static and dynamic characters mentioned above. While characters can remain entirely the same throughout the narrative, they usually undergo some kind of change. This can be a change in a character’s personality or physical appearance.

  • (5) Finally, an important principle in character (re-)construction is that of relations. Characters never appear in isolation; they are related to others and to the spatiotemporal settings in which they appear. These relations can tell us a lot about the characters, their background, the people they are friends with, and their beliefs and opinions. These relations can even be symbolic, for example when a dilapidated house in which a character lives symbolizes his or her mental decay.

4. A narratological model for the analysis and audio description of characters

The three contributions described above can be applied to audio description, all serving different yet complementary functions. Margolin’s three dimensions are useful when it comes to deciding what to describe. Bal’s model gives audio describers a tool to track the evolution of character information throughout the narrative, and finally Rimmon-Kenan’s three axes or sliding scales can help audio describers determine how much information to include in the description. The following diagram presents a possible approach for analyzing and audio describing characters, based on these three contributions.

Diagram 2

A narratological model for analyzing and audio describing characters

A narratological model for analyzing and audio describing characters

-> See the list of figures

When analyzing the characters in a given source text, the audio describer first must determine whether a character shown on screen is new or already known. If the character is new, the describer looks for information relating to Margolin’s three different dimensions and, according to Bal’s first principle, first provides general information that will help the audience recognize the character later on in the film.[7] Visual recognition is by definition based on external, physical traits, so these will probably receive most attention in the first description(s). Moreover, these physical traits are usually relatively stable—except for example when the character is presented as both a child and an old man, or has an accident that leaves permanent physical traces—and therefore, once this information is presented, the describer does not have to focus on it anymore and can instead devote more attention to the usually more variable behavioural and mental dimension.

If the character is already known, there are various possibilities.

  • (1) The information we already know from when the character was previously presented is repeated. In this case the describer can use (part of) the same description (“the tall man,” “the lady in the red dress,” the character’s name) or, if the same information is given but in a different way (for example, a nervous character chain-smoking in one scene and restlessly pacing up and down a room in another), adapt the description accordingly.

  • (2) A new character trait is presented. This can be a physical or mental feature or an action (behavioural and communicative dimension) reflecting a certain mental feature. The describer can reflect the accumulation by including the new information in the description, possibly in combination with a general bit of description identifying the character the new information relates to.[8]

  • (3) The information that is presented signals a physical or mental change in the character. Often these transformations indicate a character’s dynamic nature and drive the story forward. As such, whenever possible the describer should try to include them in the description, not least because they can turn out to be important later on in the narrative, even if they seem trivial at first.

  • (4) Finally—and probably most importantly—information about characters is almost invariably related to other elements in the story, as characters are related to each other, to their actions and to the spatiotemporal settings in which they appear. For the audio describer, these relations are crucial, as they can be considered as the backbone of the mental models audiences create when processing narratives. The basis of these models are so-called contextual frames (Emmott, 1997), frames that create a narrative context for all the actions that take place and that contain (basic) information on the characters and the spatiotemporal setting related to actions. It is beyond the remit of this article to discuss the usefulness of contextual frames in depth,[9] but what makes them interesting for audio description is that a reference in the story to one element of a certain contextual frame will trigger the entire content of that frame. For AD, this would mean that, for example, in the film The Hours (Daldry, 2002), the description “Back in New York” implicitly tells us we are back with the character Clarissa Vaughan in the year 2001, while the description “Virginia opens her eyes and stares at the ceiling,” in addition to naming the character, indicates that we are back in Richmond, England, in 1923. In other words, once relations between characters and the spatiotemporal settings they inhabit are established, descriptions can convey a lot of information by referring to only one of the constituents of a contextual frame. This frees up considerable time to add new information.

While the diagram discussed above mainly relies on the models by Bal and Margolin, Rimmon-Kenan’s three axes or sliding scales can be used to determine what kind of information needs to be conveyed in the AD: more access to a character’s inner life means more information on mental traits. At the same time, the axes allow the describer to determine how much information is needed: more character traits mean more description, both to name the traits and to repeat them for confirmation; more development means more description to account for changes. Taken together, the three models offer tools to carry out a comprehensive analysis of all the characters present in a given source text, allowing describers to determine what and how much information can and ideally should be included.


As indicated in the introduction, the aim of the present article was twofold. First, an attempt was made to frame the highly multi- and interdisciplinary field of audio description within translation studies. While the functionalist paradigm seems perfectly applicable to audio description, the suggestion outlined above is of a very general nature. Follow-up research could focus on, for example, the different intratextual and extratextual parameters that must be taken into account during source text analysis, or the strategies that can be used in target text creation. Second, the article offered describers a tool to analyze (filmic) source texts focusing on one particular constituent thereof, namely characters. Earlier contributions already demonstrated the usefulness of narratology for source text analysis and content selection in audio description (cf. Kruger, 2010; Vercauteren, 2012; Vercauteren and Remael, 2014). The strategy discussed above seems to confirm that it can also be used for the analysis and audio description of characters. Again, however, more research is needed: while the present article looks at what to describe and how much information to give, it does not look at how this information on characters can be rendered, i.e. on formulation. Given the long-standing debate on objective versus subjective description and the often implicit nature of—particularly mental—information on narrative characters (see for example Vercauteren and Orero, 2013), this area needs further examination. Moreover, the strategy presented here is purely theoretical and tests are needed to see if (1) it really helps describers with source text analysis and content selection and (2) if descriptions created using this strategy improve the target audience’s understanding of the story told by the filmic source text.