Article body

0. Introduction

The sound designer’s goal of making his or her work blend in with the rest of a theatrical performance often obscures the constructedness of the finished product. In fact, the construction of this product has strong similarities to the micro and macro structures of linguistic communication. Even when a sound cue is comprised entirely of stock recordings, these recordings are likely to have been manipulated and modified to create a sequence whose meaning hinges on a specific selection and temporal presentation of sonic elements. To take one example of this, Best Picture, a touring fringe show which debuted in 2014, used the squealing of car tires to signpost an abrupt change of tone during a transition from a parody of In the Heat of the Night to a parody of Midnight Cowboy (see Fitzpatrick 2014 : 13). The basic building blocks of this sound are “Everybody’s Talkin’” (the song which serves as Midnight Cowboy’s principal musical motif) and the sound of a car braking suddenly. The first mix of this sound cue used in rehearsal faded in the song approximately one second before the beginning of the first verse, and it played for another eleven seconds before a three second fade-out under one track of car brakes and two staggered tracks of car collisions abstracted from different recordings. It was determined that the collision sounds were likely to lead the audience to read them as an accident involving pedestrians (the actors onstage) rather than a near miss (as was intended in the script), so a second mix eliminated the collision and added an extra track of the same brake skid, staggered slightly to enhance its volume and forcefulness. The altered track, however, proved to have a slower pace than was necessary to keep the onstage action moving at the breakneck speed demanded by the script. To speed up the transition, the third and final mix of the cue shortened the music lead-in considerably – from approximately twelve to five seconds. The skid began around the three-second mark, and covered a hard cut rather than a fade-out of the music.

Language and sound design, then, both rely on selection and timing for their communicative effects : the listener’s interpretation of the meaning of words and sounds depends on the sequence in which specifically chosen words or sounds are presented. I would further argue that not only is sound design similar to language in this superficial fashion, but it also possesses deeper structures which allow it to be analyzed in some of the same ways as language. An analysis of how a variety of the structural elements of language come into play in creating semiotic systems for sound design is perhaps most fruitfully begun by studying theatrical praxes which present a minimum of semiotic systems to complement and compete with one another. The world of touring fringe theatre offers one such starting point. In shows created for the intimate spaces common to fringe theatre, with small casts and a bare minimum of costuming, set, props, and lighting effects, sound design is not merely one of several onstage elements which work in concert to produce meaning, but often carries the bulk of the burden for giving an audience the sense of what a performance text is saying.

What follows is a preliminary survey of the parallels that can be drawn between sound design and spoken language in a performative setting. It begins with a look at the difficulties that can arise when attempting to isolate sound from other semiotic elements within the context of a traditional theatrical aesthetic, and offers a supplementary interpretive paradigm drawn from a performance mode outside the mainstream of contemporary theatrical praxis – radio theatre. As an elaboration of Peircian semiosis, which forms the foundation of one major stream of radio theatre analysis, I propose that the structural models of Roman Jakobson concerning the structure of phonemes, words, sentences, and utterances can shed light on how sound design offers not only symbolic analogues to real-world objects and occurrences, but also identifiable counterparts and counterpoints to dialogue and other language-based signifiers in a performance text. By way of illustrating these ideas, I will draw from three touring fringe shows for which I have designed the sound – Best Picture, Cathedral City, and How To Be A Better Audience – as well as two shows that I witnessed during their 2015 North American fringe tours – Zack Zultana, Space Gigolo and Inescapable.

1. The Problem : How to Separate Sound Design from its Surroundings

Because the overall idea behind this discussion is more in the nature of a proposition than a full-fledged theory, it does call to mind Marco De Marinis’ (1993 : 1) caveat against “arbitrary or metaphorical extensions of the ‘linguistic model’ to the field of nonverbal semiotics”. Even so, there are key features of sound design which call for it to be understood as having a more systematic origin than Erika Fischer-Lichte’s (1992 : 117) reading of it as an entity “not produced on the basis of an intention”. All sound, whether designed or not, shares with the spoken word three features which have been deemed essential for the comprehension of linguistic structures : “intensity, time, and frequency” (Waugh 1976 : 58; emphasis hers). Moreover, a shared reliance on the dimension of time for the creation of meaning establishes an affinity between sound design and the performative elements of mise en scène – not just dialogue, but also gesture and movement. Since the preponderance of designed sound is pre-recorded and edited to fit a specific performative context, it has one more important thing in common with what has been described as the “total speech-sound” of linguistic utterances : the ability to be designated as “an artifact” constructed out of raw materials, for a purely communicative purpose, and with no specific signifying content of their own (Waugh 2002 : 265).

This is not to say that it is always easy to deconstruct such an artifact to uncover the artifice behind it. One of the chief troubles which bedevil analyses of sound design as a signifying system in its own right concerns the question of taxonomic boundaries. Because sound is ubiquitous, omnidirectional, and to some extent unavoidable, and because each theatrical production involves a negotiation as to which of the sounds an audience is likely to hear during the course of a performance, it should be read as the direct result of specific design choices. These negotiations rarely take the form of formal discussions, and are governed by heuristics, emotion and instinct, the result being that the categorization of sounds in a performance setting tends to involve a surprising degree of fluidity and amorphousness. A recorded voice, for example, can be considered part of both a show’s dialogue and its sound design. This dual status can set up uncertainty and ambiguity in an audience’s mind about whether the voice is intended to be a recording, functioning in its literal sense, or a mediated simulation of a live voice. The opening of my 2007 solo fringe show, How To Be A Better Audience, toyed with this uncertainty by having a stentorian announcer intone an introduction over the opening passage of “The Great Gate of Kiev” from Mussorgsky’s orchestral suite Pictures at an Exhibition. As the introduction unfolded, the announcer’s relative consistency of volume and lack of mispronunciations or other vocal anomalies invite the notion that his voice is coming from the same source as the music, rather than from a microphone concealed backstage or behind onstage masking. Moments after the announcer introduces my character by name, a pregnant pause followed by repetitions of the name briefly reopens the question of whether a live announcer is present after all, frantically trying to cue an actor who has missed his entrance (see Cousins 2007 : 1).

My entrance line in How To Be A Better Audience, in which I say “It’s only a recording, anyway. You can turn that off anytime you like” (Cousins 2007 : 1), is not only a comedic reversal of the announcer’s role, but also speaks to the mixed feelings that sound designers can have about drawing attention to their work. Veteran British designer Ross Brown (2010 : 340) is of the opinion that “the conventional wisdom is that sound design should not be noticed”. In the world created by this view, sound, if it has to be heard at all, is heard as a distraction; it is nothing more than so much noise. Noise, however, as Brown remarks, is an integral part not just of the world of the theatre but of the world outside it :

The maxim is true that to notice the sound design is to be distracted, but at this cultural moment, perhaps, it is appropriate to subject audiences to distracting circumstances. Since sound is a phenomenon of surrounding atmosphere, one might think of noise as a meteorological condition. As the ether within which the communications of daily life are transacted becomes more opaque, as the weather of noise and competing sign-systems becomes more energised, we cope with and derive meaning from distraction and obfuscation.

2010 : 341, emphasis his

The insidious nature of sound makes for a good deal of this obfuscation, onstage as well as off. No matter how much it may represent a distillation of the outside world, the world of the theatre constantly presents sounds which cannot be ignored, but whose presence begs a full categorization of their content. A practical prop can be constructed to make a specific sound, but this sound may not necessarily be part of a sound design – frequently, design decisions concerning ‘practicals’ are made by a production’s props department, with very little if any input from the sound designer. Actors may be cast based on their voices, and rehearsed in vocal characterizations which enhance their embodiment of characters, but such choices are typically guided by the director. Music may fall entirely, partially, or not at all under the aegis of the sound designer. A designer may double as a composer and/or selector of a production’s music, may have some hand in these decisions, or may concentrate exclusively on effets sonores while a separate member of the production team takes charge of the music. Costume elements (especially shoes) make noises of their own, but when should these noises be taken as the outcome of design choices with respect to their sonic impact?

The blurriness of the boundary between ambient sound and designed sound has meant that, from the standpoint of analysis, sound has occupied whatever territory best fits the overall paradigmatic focus of any given theory of theatre. In the initial version of his questionnaire of elements that an audience should consider when analyzing a performance, Patrice Pavis asked his respondents to think about the “(f)unction of music and sound effects” (qtd. in Aston and Savona 1991 : 110-111) as belonging to the same conceptual category. The crossovers between music and sound as both abstract concepts and areas of theatrical praxis most certainly have a longer and more well-documented tradition than sound design as a Ding an sich : at least as far as the English-speaking world is concerned, 1959 appears to be the first time a sound designer was credited under that title in a mainstream theatrical production (see Brown 2010 : 343). An older view of sound as a product of technical craftsmanship rather than artistic intention has informed many a discussion of its relative importance in theatre semiotics. Emphasizing the replicative and literal, rather than the poetic and rhetorical, connotations of designed sound, Fischer-Lichte (1992 : 117) goes so far as to state that “(t)he primary sign function of sound in the theater consists of signifying a sound”.

Such literal readings of content, which have been described by De Marinis (1993 : 13) as typical of “an institutional theater which continues to conceive of and produce theatrical performance unreflectively (and commercially) simply as the illustration of a dramatic text”, can certainly dominate a production’s approach to sound just as they can dominate its approach to dialogue. I suggest, however, that since theatrical texts tend to comprise far less sound than dialogue, a sound designer must always be aware of the potential for alternative and multiple readings supplied by audience members. Eli Rozik’s (2014 : 163) characterization of a fictional onstage world as “a set of clues for the spectator to activate [...] competences and mechanisms” drawn from individual and shared cultural experiences holds just as true for sound design as for any other part of a performance text.

To provide one example of this, a gunshot almost always carries with it a primary meaning of a firearm being discharged with deadly intent. Performative context and the audience’s store of experiences, however, can layer other strata of meaning onto this principal connotation. Best Picture uses a gunshot to make a transition into a scene which lampoons Richard Attenborough’s Gandhi. The two most literal connotations of this sound – its status as a replication of the gunshot which occurs near the beginning of Attenborough’s film and its status in quo as a gunshot – only become apparent after a line of dialogue which follows the shot : “Gandhi, are you okay?” (Fitzpatrick 2014 : 46). Up until then, the unseen gun appears to have been fired as a means of ridding the stage of a trio of dancers who have been imitating the slow soft-shoe shuffle seen during the various versions of the opening title sequence of the long-running TV sitcom The Cosby Show. Taken in its full context, this short simple burst of sound can be seen as fulfilling at least four of the six functions which Roman Jakobson (1980 : 81) identifies in his schema of verbal communication. In a referential sense, the gunshot evokes not only its generic meaning, but also its significance as both the end point of Gandhi’s life and the beginning of Attenborough’s film biography. The shot’s metalingual function – the one concerned with codes and their operation – consists principally in marking the temporal, spatial, and thematic boundaries of the two scenes it links. Its poetic and emotive functions are intertwined : one potential reading of the message and the affective background behind the shot is that an unseen authority has deemed that the dancing has gone on longer than is necessary for its informational or entertainment value, and views harsh measures as the surest way of ending it.

2. The Semiotics of Radio as a Preliminary Tool in Analyzing Sound Design

One thought that emerges from a survey of the varying ways that sound has been dealt with in theatre semiotics is that sound in a performative context often defies straightforward analysis, due to its ever-shifting position in a matrix of theatrical signifiers. Phrased in the language of the audio technician, the signal-to-noise ratio is unfavourable for clear reception of the message. The ‘noise’ in this metaphor is the predominance of visual signifiers in Western culture – and the culture of Western performing arts – which make a delicate balancing act out of what Ross Brown (2010 : 342) describes as the “continuous cultural dialectic between focused perception and omnidirectional awareness” necessary for making sense out of what we hear.

This dialectic, and its function in bridging the gap between a production team’s intended meanings and the actual meanings an audience draws from these intentions, are more easily identified and studied if designed sound can be isolated from other production elements. As a performative medium which employs sound alone to generate meaning, radio drama offers a useful case study for a reimagined semiotics of theatre sound design. Of the semiotic theories which have been advanced by scholars of radio drama, the most immediately applicable to live theatre comes from Andrew Crisell (1994 : 42) who has chosen to “borrow some rudimentary distinctions from what is in fact a highly sophisticated classification of signs devised by the American philosopher, C.S. Peirce” as the foundation of an audience reception theory. Drawing on Peirce’s trichotomy of icons, indexes, and symbols, Crisell (1994 : 44) notes that “sounds, whether in the world or on the radio, are generally indexical”, and later elaborates on this basic concept by saying that the preponderance of the manufactured sound effects which define both the radio drama producer’s and the theatrical sound designer’s art “are best described as iconic indexes”. Though the “iconic index” isn’t a class of sign for Peirce, he did recognize that an index often involves an icon. Crisell goes on to state that “They [the sounds] might also be described as ‘non-literal signifiers’…but in radio such signifiers must approximate rather more closely to that which they signify than signifiers in the visual media” (Crisell 1994 : 47).

3. From Peirce to Jakobson : A Framework for Analysis of Designed Sound as Language

My own experience designing sound for both live theatre and radio has led me to believe that neither discipline precludes a combination of indexes and symbols, but that radio soundscapes and effects are a bit more likely than their theatrical counterparts to feature the straightforward and “direct dual relation of the sign to the object independent of the mind using the sign” which Peirce (1992 : 226) uses to define indexes. Even though the analytical standpoint I advocate deviates somewhat from Crisell’s application of Peirce’s theories, a Peirceian view on signs goes a long way towards preparing the ground for the study of designed sound not only as an independently-functioning semiotic system, but as a system approximating verbal language in many of its particulars. Jakobson notes that :

in Peirce’s view it was wrong both to confine semiotic work to language and, on the other hand, to exclude language from this work. His program was to study the particular features of language in comparison with the specifics of other sign systems and to define the common features that characterize signs in general.

1980 : 35

Taking this general notion, and adding to it Jakobson’s (1980 : 87) later description of Peirce’s “view of meaning as translatability of a sign into a network of other signs”, the basis for a study of the parallels and analogues between language and other potential forms of communication begins to emerge. Although the specifics of their processes may differ, sound designers play up one element in the final mix of a sound cue while playing down another, select a certain version of a given sound for the connotative aspects it possesses, while eliminating other sounds which may be part of a real-world experience on the grounds that they supply too little relevant information, or too much that is irrelevant, and arrange these elements in a specific order. In doing so, they shape their finished product into something that can be seen to function in the same way as Jakobson sees language to function, as “a relatively autonomous, self-contained, goal-directed, dynamic, creative semiotic entity in which there is dialectic tension between selection on the one hand and combination on the other” (Waugh 1976 : 37). Furthermore, in preparing sound for the benefit of an audience, the designer becomes, as Jakobson (1980 : 81) would put it, “an ADDRESSER [who] sends a message to [an] ADDRESSEE”, and has therefore established two of the essential “constitutive factors in any speech event”.

Jakobson finishes his thumbnail sketch of the speech event as follows :

To be operative the message requires a CONTEXT referred to […] seizable by the addressee, and either verbal or capable of being verbalized; a CODE fully, or at least partially, common to the addresser and addressee (or in other words, to the encoder and decoder of the message); and, finally, a CONTACT, a physical channel and psychological connection between the addresser and the addressee, enabling both of them to enter and stay in communication.

1980 : 81

The very act of performing for an audience implies contact; a sound designer’s contact with an audience is generally at one remove, but retains its physical channel through the theatre’s sound system. A psychological connection underlies the designer’s entire editing and mixing process : he or she will constantly use his or her own ears, and the ears of others, as stand-ins for those of the audience, to determine how clearly a sound cue is conveying its intended meaning. Although no two of them might use the same words to describe it, each member of the audience has the means to verbalize a description or an approximation of a sound cue, thus creating whatever context is not already being supplied by the more immediate portions of the live performance.

4. Analogues to Phonemes in Sound Design

If designed sound can be seen to fulfill Jakobson’s basic conditions for speech events, it also demonstrates similarities with the smaller component parts of Jakobson’s overall mechanism of language. Seen from a designer’s perspective, short sounds can appear as adhering to the Jakobsonian definition of the phoneme – the basic building block of speech – as “a complex, simultaneous construct of a set of elementary concurrent units” (Jakobson and Waugh 2002 : 29) which can be combined with other such constructs into larger units of meaning. To take the example which figured earlier in this discussion, a gunshot, while distinctive, does not constitute an iconic or an indexical representation in and of itself. Used as it was in the transition into the Gandhi parody from Best Picture, it was meant to be taken as the sonic product of an act of aggression. The very same sound clip could be used with quite a different intention – if, for example, it were followed by onstage actors running in straight lines and clearly-defined lanes, it would serve just as clearly as the starter’s pistol for a race.

Versatile as phonemes are, sound clips are neither self-sufficient nor self-contained vessels of signification. Jakobson and Waugh (2002 : 9) stress the importance of additional semantic content in clarifying meaning when a single phonemic grouping carries multiple unrelated connotations : “(b)y differing in sound, shape and meaning, the surrounding words as a rule serve for the disambiguation of homonyms and thus for their correct semantic interpretation by the listener”. This same principle also appears to govern the sound designer’s craft. Anecdotal evidence has long suggested that, in the absence of other clear contextual cues, a gunshot can be taken for the sound of a firecracker or a car backfiring. Indeed, this particular sonic event requires enough disambiguation that the sound most commonly associated with it is composed of multiple phonemic elements. Since the firing of a pistol generally produces a comparatively dull-sounding ‘pop’, it has traditionally been augmented (in technical parlance, ‘sweetened’) by adding echo or reverberation to its decay. For this reason, the onset of the gunshot and its decay are best interpreted as two separate bundles of sound, each one analogous to a phoneme, and both working together to differentiate the potential meaning of their combined sound from other meanings which may be derived from the sound’s onset alone.

Another similarity between phonemes and the smallest units of designed sound is the capacity for creating meaning in combination with sonic and non-sonic contextual elements. As Jakobson and Waugh (2002 : 8) point out, “the verbal context of the words at issue, and the situation which surrounds the given utterance prompt the hearer’s apprehension” of the potential meaning of individual phonemes and the potential for grouping multiple phonemes into individual semantic entities. The onstage situation which followed the gunshot in Best Picture, with one actor slumped to the stage floor and another actor crouched over him, was sufficient to create the appropriate context for a reading of the shot as the assassination of Gandhi. A hypothetical alternative staging of this same section of text may have required additional sound to supply context in the absence of visual cues. If, for example, the stage had gone dark just before the gunshot, and remained dark for some considerable time afterwards, it might have been necessary to pair the gunshot with the sound of a body falling to the ground to make it clearer that the shot had hit its intended target. A recording of a scream or general commotion following the shot may prove equally useful in such circumstances as a way of giving the hearer a sense of the broader context of the unseen shot.

Given the proper contextual cues, a scream or the sound of a body falling may be enough to indicate that a fatal shooting has taken place. Just as a speaker’s active vocabulary implies a range of options for expressing a single thought, the repertoire of sounds available to a designer grants him or her the ability to phrase the same conceptual entity in a variety of different ways. In principle, the sheer variety of sounds within the range of human hearing means that the potential for creating the sonic equivalent of phonemes, and from them, a working vocabulary, is much greater than in spoken language, where each individual linguistic code “has already established all the possibilities which may be utilized” (Jakobson and Halle 1971 : 74). The vastness of this sonic repertoire does not automatically render it impossible to manage; indeed, the very existence of onomatopoeia implies that some degree of rapport exists between the sounds our voices make and the sounds the rest of the world makes, on the level of both phonemes and basic vocabulary.

As building blocks of meaning, then, phonemes and non-vocalized sounds are both flexible and tricky. The key to managing this tricky flexibility lies, as Jakobson suggests, in a mastery of the role of sequencing : “it is possible from a part of the sequence to predict with greater or lesser accuracy the succeeding features, to reconstruct the preceding ones, and finally to infer from some features in a bundle the other concurrent features” (Jakobson and Halle 1971 : 16). The effect of phonemic units on our understanding of words, and therefore of larger units of meaning, depends on when they appear relative to one another.

The difference made by changing the position of even the shortest and seemingly most insignificant sound in a sequence comes into sharp relief when the sound in question turns out to be the product, not of a single event, but of a composition of sonic elements. In a Jakobsonian view, this is one of the key elements of language : “A speech message carries information in two dimensions. On the one hand, distinctive features are superposed on each other, i.e., act concurrently (lumped into phonemes), and, on the other, they succeed each other in a time sequence” (Jakobson, Fant & Halle 1969 : 3). This is just as true for designed sound as it is for the spoken word. Even a simple and unambiguous sound source like a door reveals a superposition and succession of discrete features which are distinctive in nature. Close analysis of a recording of a door being opened reveals much the same information as carefully listening to a door being opened slowly. What emerges is a superposition of basic sonic elements, arranged in an overlapping sequence : the sound of a doorknob turning or a door handle’s lever mechanism being engaged is followed by the clicking of the door latch, and then the sound of friction (often emphasized for its distinctive creaking quality) coming to bear on the hinges of the door as it rotates open. The proposition that the listener intuitively breaks this sound down into its constituent parts before piecing them together again into a gestalt gains further credence when one considers that swapping the relative positions of the latch and the hinges in this sequence is all that is required to change the sound of a door opening to the sound of a door closing. This is also a conclusion that a sound designer may arrive at through experience with the expedient of having to construct the sound of a single door out of parts of recordings of several doors, when no single recording offers the precise combination of sonic elements necessary to convey an intended meaning.

5. Translatability between Sounds and Language as a Determinant of the Design Process

The question that proceeds from this idea is whether groupings of sounds can also be apprehended as larger units which parallel the semantic structures of language. As far as sound that has been designed for a performance setting is concerned, I would argue that this appears to be the case. Moreover, the parallels between sound design and conventional language structures begin at the inception of the design process, since sound design generally involves the act of translating verbal language into sound, using statements drawn from the text of origin for a performance. When designing sound for a show, the designer relies on a combination of statements contained in the source text and statements made by the director, writer, cast, and other members of the production team during the rehearsal process. Referring specifically to fringe theatre, even a solo performer who designs the sound for a show he or she has written works from language-based instructions, in the form of draft scripts and production notes. These writings act as a translation guide to help transform rough mental images of the sounds intended for use in the show into a playable finished product.

Touring fringe theatre has recently added a new wrinkle to this translation process. The North American fringe festival circuit brings together theatre artists from around the world; these artists stay in touch with one another year-round via e-mail and social media. Increasingly, artistic collaborations among fringe theatre artists traverse great distances and cross international borders. To take one example, Best Picture brought together a writer-performer based in Brooklyn, New York with four Canadian collaborators – two other performers, based in Winnipeg and Vancouver respectively, a director based in London, Ontario, and a sound designer (the present author) based in Ottawa. From the beginning of the preparatory process for Best Picture until the weekend of its final rehearsals, my communication with the rest of the production team consisted entirely of e-mails with draft versions of sound cues for the show attached.

Because these e-mails comprise a complete archive of the discussion surrounding a creative process, they offer useful insight into the dialogue, helping to delineate the finished version of a sound design with a thoroughness not usually offered by traditional production notes. One interesting aspect of these conversations is the degree to which they involve a comparison of personal vocabularies – not just those involving spoken and written language, but also vocabularies of sounds and cultural references. For instance, the scene in Best Picture preceding the gunshot which introduced the Gandhi parody was initially referred to in the script as the show’s cast “doing Cosby dances to The Cosby Show theme music” (Fitzpatrick 2014 : 46) as part of a parody of the 2012 film The Artist. Director Jeff Culbert (“Re : Best Picture” 6 July 2014 11:07 AM e-mail) felt that a musical reference more appropriate to The Artist’s broader historical setting and soundtrack, yet still evocative of the title sequence of The Cosby Show, was called for; his instincts told him that “about 5 seconds of big band dance music – something like ‘Sing Sing Sing’ would be good”. In response to this suggestion, I sent Culbert an e-mail to which I attached an MP3 version of “Sing, Sing, Sing” and MP3s of the Count Basie standards “Tickle-Toe” and “Jumpin’ at the Woodside”. My feeling was that the second of the two Basie songs might create an extra level of comedic recognition by inviting comparisons between the style of the “Cosby dances” and the inexplicably silly “Gene, Gene, The Dancing Machine” segments from the 1970s cult TV program The Gong Show (Cousins “Re : Best Picture” 6 July 2014 12:51 PM e-mail). Culbert’s reply (“Re : Best Picture” 6 July 2014 2:29 PM e-mail) was in line with my sentiments, and finalized the decision on the music cue : “I’ll go with the first Jumpin’ at the Woodside I think – it has a nice lead-in – great for multiple-Cosby mugging – and I’m a big (B)asie fan anyhow”.

This mediated process of verification and re-verification of the communicative content of a sound cue anticipated the processes which were expected to occur in the minds of the audience as they heard the sound during the performance, drawing on what Rozik (2014 : 169) describes as the production team’s “learned intuition of the synchronic real spectator” and the realistic expectations that can be drawn from this educated guess. If a production team’s intuition results in an accurate thumbnail sketch of its audience, any member of that audience should be able to reconstitute a reliable, coherently phrased description of the sounds constructed for the performance they attended. The conversation between the sound designer and the rest of the production team, then, centres not only on what Jakobson (1980 : 86) describes as the need of all concerned “to check up whether they use the same code”, but also on the need to come to a workable consensus on the sorts of codes an audience can be expected to have at its disposal. Even when it takes place via e-mail, this comparison of codes is an essential first step towards an activity that De Marinis (1993 : 99) identifies as an essential feature of all facets of the mise en scène : “distinguishing between the codes of the sender (meaning the codes of textual production) and the codes of the addressee (meaning the codes of textual reception and interpretation)”.

6. Linguistic Discourses as an Interpretive Paradigm for Sound Design

So far, this discussion has centred on the ways in which shorter bundles of recorded sound can contribute to the discourse of a performance text by calling up lightning-fast free-associations, without necessarily constituting discourses in their own right. If one widens the focus of analysis from what has been termed the feature level, “concerned with simple and complex units” of sound, to the semantic level “involving both simple and complex meaningful units from the morpheme to the utterance and discourse” (Jakobson and Halle 1971 : 14), the sound designer’s craft narrows the linguistic gap between speech sounds and non-speech sounds to a considerable degree. A sound cue can present enough information to indicate subject and predicate, and thus can be interpreted and parsed as a full sentence; as with spoken language, “grammatical units and their boundaries exist for the speaker and listener even if they are not expressed” (Jakobson and Waugh 2002 : 43). Even a short, simple sound cue such as the gunshot from Best Picture can supply sufficient semantic content to admit multiple sentence-length encapsulations. An audience member drawing additional cues from the ongoing action onstage could, for example, successively translate the sound into language as “somebody shot the people doing the ‘Cosby dance’”, “somebody shot some other unknown person”, and “somebody shot Gandhi”.

More complex sonic compositions lend themselves even more easily than single effets sonores to verbal recapitulations which adopt formal grammatical patterns. The composition need not be immediately clear in its meaning, either : indeed, the greater the ambiguity of the sonic content, the greater appears to be the listener’s need to impose logic and structure on the content’s potential meaning. In How To Be A Better Audience, a fringe theatre mock-seminar based on receptor theory, a bogus professor (played by the present author) makes an attempt to guide the unwitting ‘students’ in the audience towards the kind of active listening which helps to ensure that, in the words of De Marinis, “the spectator's reception of the performance text is an interpretive activity much greater than a simple act of decoding” (1993 : 99, emphasis by the author), by playing a deliberately disorienting sequence of sounds. The sequence lasts seven seconds, and consists of the grunting of a gorilla laid over the sound of a wooden door breaking while several piles of metal junk crash and clatter earthward, capped off by an elephant trumpeting and the scream of (presumably) another gorilla. The audience is then asked to describe in their own words what they have just heard. The responses have been widely varied and imaginative, ranging from single words to full-sentence encapsulations. The subsequent ‘correct’ answer given – “the sound of a lowland gorilla falling down the stairs to a church basement while carrying a tray full of salad and crocheted tea cosies for a Women’s Auxiliary potluck supper and bazaar, and colliding with a male Indian elephant in rut halfway down” (Cousins 2007 : 4) – was no less fanciful than any the audience had just supplied. Both the ‘correct’ and ‘incorrect’ verbal approximations of this sound cue illustrate that the spoken word and designed sound are alike in the sense that “in the combination of sentences into utterances, the action of compulsory syntactical rules ceases, and the freedom of any individual speaker to create novel contexts increases substantially” (Jakobson and Halle 1971 : 74).

7. Sound Design and Analogues to Copula Verbs : Cathedral City

Novel or not, the contexts which help to create meaning for sounds tend to accompany or evoke actions; in a sense, every sound implies action, insofar as something must move through space to produce the vibrations that reach our ears. Nonetheless, sounds can also be arranged and designed to give the impression of ongoing states of being which are best translated into word form using sentences whose main predicate is an adjective. “Sound sentences” like this can be a little more difficult to parse on the fly than their action-based counterparts. The audience's ability to apperceive an extended sound cue as a single phenomenon, rather than a series of separate events, is quite often helped along by foregrounded onstage action on which the audience focusses the bulk of its attention. The timing of the sonic backdrop and the foregrounded action goes a long way in helping to establish and clarify their functions and their relative importance to the narrative. A sonic backdrop which runs too long before anything happens onstage, or which is left running too long behind onstage action, enters into competition with the enacted portion of the performance text as a focus for the audience’s attention.

Understanding this conventional use of background sound allows a production team to tinker with it, and exploit the relative timing of sound and onstage action to allow them to alternate between them, as the driving force of a section of narrative. During Kurt Fitzpatrick’s stream-of-consciousness-based personal memoir Cathedral City, the narrative shifts from Kurt’s verbal recapitulation of the events leading to his real-life back surgery to a sonic re-creation of the surgery itself from the patient’s point of view. As it does so, the burden of indicating time, place, and action shifts from the performer to the sound designer. While Kurt goes from being an onstage narrator to a silent, motionless figure, slowly crumpling from his standing position to a supine posture flat on the floor, the sound steps in to fill the breach, beginning with the agonized moan of a patient in severe physical distress. This attention-getting device helps to shift the audience’s focus to the sound, as well as cueing them to the idea that the sound is meant both to describe a specific action and evoke its specific location. The perspective of the sonic backdrop evoking the location is somewhat forced, in the sense that the sounds of a hospital corridor, with the clattering of rolling gurneys and PA announcements paging doctors, would not be heard in an operating room. Under the circumstances, though, a certain amount of impressionistic license is allowable, since Kurt is describing the experience of lapsing out of consciousness while under anaesthetic. The indication of the activity in the operating room itself is another piece of impressionistic aural shorthand : Kurt’s back operation is sketched in with the constant beeping of a heart monitor and the wheezy sigh of a respirator (see Fitzpatrick 2013 : 8 et passim).

One’s heartbeat and respiration alone are not usually signs of any great or noteworthy activity; however, amidst the context in which they are introduced, they give notice that something significant is happening. The initial linkage of sound and activity is strong enough that, whenever the surgery is subsequently referenced during the course of Cathedral City, only the duet of the heart monitor and respirator plays through the house speakers. Whether a statement is made in words or in sound, the elision of extraneous information is very common, and as Jakobson reminds us, is no impediment to the full comprehension of an intended meaning :

Some of the signals omitted by the speaker as well as by the hearer are restored and quickly identified by the latter thanks to the context which he succeeds in picking up…Familiar words or even larger wholes are easily recognized in spite of emissive or perceptual ellipsis and hence enable the recipient of the message to apprehend immediately and directly many higher units without the need for prior attention to their components.

Jakobson and Waugh 2002 : 11

8. Sound Design as Part of the Macro-Speech Act of the Performance Text : Examples from Zack Zultana, Space Gigolo and Inescapable

No matter how many elisions they contain, smaller statements are commonly woven together in a performance text to form an integrated thread of narrative and/or argument. Applying a wider analytical lens to the role of individual rhetorical devices, De Marinis (1993 : 152) has postulated that we can “conceive of a performance text as a macro-speech act (a macro-sign act) in which serious micro-acts are placed in a given hierarchy”. The wide view of this interpretive lens can actually aid in focussing on sound design as a communicative signifying system in theatre : it can sometimes be easier to pick up on the role of designed sound in the macro speech-act of a performance text when the sound stands out by virtue of its infrequency.

The choice to use sound sparingly is often made in small-venue touring fringe shows, partly as a concession to the paucity of time for technical rehearsal, and partly to take advantage of the dramatic effect made by such interpolations in a performative discourse dominated by voice, movement, and gesture. A pair of shows which debuted during the 2015 North American fringe festival season provide good illustrations of this point – Jeff Leard’s Zack Zultana, Space Gigolo and Ribbit RePublic’s production of Martin Dockery’s Inescapable.

To begin with the first of the two shows just mentioned, Zack Zultana is a one-person onstage enactment of a familiar film genre : the sci-fi adventure movie. As well as acting out the multiple roles of this movie’s cast of characters, its narrator, director, and cinematographer, Leard also serves as the composer of the film’s score, using his voice to create the kind of emotive musical ‘stings’, ‘stabs’ and background which are standard elements of the filmgoer’s auditory lexicon. Leard’s decision to eschew technology to underscore Zack Zultana makes the one occasion during the main body of the narrative when he chooses to let recorded music help him tell his story all the more striking. The setting is a bar at a space station : Zack is in the early, flirtatious stages of a sexual liaison with a woman new to the station when Elton John’s Rocket Man (by the time the story takes place, not just an ‘oldies’ pop tune, but one of a few centuries’ vintage) starts to play over the bar’s loudspeakers, becoming a sonic punchline for a sequence of dialogue about awkward moments ruining a mood.

Opting for a recording of Rocket Man rather than a live a cappella version as a comedic payoff acknowledges the complexity of the array of sounds that function as a single unit of meaning in this particular pop tune. The instrumental arrangement backing Elton John’s vocals features a distinctive swell of strings at the beginning of the chorus; this unmistakable sound not only gives Rocket Man a great deal of its ‘catchiness’ but has also come to serve as an icon for the song in its entirety. At the performance of Zack Zultana that I attended, ripples of laughter began at the same instant as Rocket Man’s chorus, briefly drowning out the strings and creating a secondary delayed reaction to the joke from other members of the audience.

Leard’s use of Rocket Man is also a commentary on the structure that underlies his own work. Creating a sudden break from unmediated polyvalent solo performance, it reminds the audience of the genre expectations surrounding such performance, by introducing a distinctive theatrical convention unique to the ongoing text, one whose function and purpose is to “act on particular conventions and sometimes even on general codes, transgressing and subverting them to various degrees” (De Marinis 1993 : 115).

Even when it is not part of the performance proper, sound design can have a strong influence on the audience’s understanding of the meaning of a piece of theatre. The role of preshow music in conditioning audience reception is often overlooked : production teams and audiences alike tend to treat it as little more than a necessary evil, useful mostly for papering over gaps in conversations among audience members. Ribbit RePublic’s touring production of Martin Dockery’s Inescapable takes advantage of this conventional thinking, and uses it to define the structure and themes of the performance as a whole. On entering the performance space, the audience is greeted by an insouciant rendition of Let It Snow by crooner Dean Martin. Through both the content of the song and the style of its performance, the audience for Inescapable receives valuable clues concerning setting (the Christmas holiday season, probably in the present; if not, likely no earlier than the late 1950s), tone (light and comedic, at least initially) and subject matter (clandestine sexual encounters of the kind hinted at in the song’s lyrics may feature in either the text or the subtext). As they settle into their seats to await the performance, they receive another more vital piece of information, as the same recording of Let It Snow plays over and over again.

Playing one song on a continuous loop presages the action of Inescapable, during which two old friends are forced to endlessly replay the same tense scene due to the effects of a mysterious object, eventually revealed to be a homemade time machine. Even so, there are potential alternate readings of Inescapable’s preshow music, readings which the production team has taken into account. Repeating the same song could be interpreted as the result of either a technical fault or sloppiness on the part of the theatre’s sound operator. There are a number of ways of eliminating this idea as a valid interpretation : two methods which are familiar enough to have attained the status of sonic clichés are crossfading between the music tracks and mixing in the sound of a CD or an LP skipping in between them. The element of Inescapable’s preshow music mix which leads the audience towards an understanding of the repetition as a choice rather than an error is not a sound, however, but silence. The silence between repetitions of Let It Snow is a pause rather than a full stop; it’s the length that a modern audience’s ears have been conditioned to read as the space between tracks on a CD or cuts of piped-in music in a public place. Almost before the conscious mind can process its meaning, this gap in sound feels like something programmed, and not merely a chance occurrence. Further repetitions of the gap reinforce this feeling, and make the function of the silence within the statement produced by the repetition of the music all the more clear.

9. The Way Forward…? Linguistics in Sound Design as the Basis of a New Theatrical Aesthetic

When recorded sound is used infrequently, or only once, during the course of an entire theatrical event, its signification in toto is not likely to become apparent until the end of the performance. Viewed as part of the macro-speech act of the performance, the sound is one of an array of signifiers whose simultaneity or succession helps to generate both structure and meaning. Whether it stands alone, or comprises part of a larger design, each individual sound cue also possesses elements which can be recognized as identifiable constituents of speech acts. As with the spoken word, a sound cue involves “the CONCURRENCE of simultaneous entities and the CONCATENATION of successive entities” (Jakobson and Halle 1971 : 73), selected so that their arrangement across time serves as a usable guide to interpretation. Just as a composition of sounds reveals an intricacy of structure congruent to that of language, the smaller entities come together to make designed sound possess every bit of the complexity noted in the basic sounds which comprise language. If “the speech sound is a multilayered, hierarchized signal with a variety of components which are invested with a variety of functions” (Waugh 2002 : 260), this is no less true of the individual sonic signals which are combined by a designer to create sound cues.

From a purely theoretical perspective, discussion of the subject could end there, for the time being. A practitioner’s focus, however, must be on finding opportunities for the application of theory. As a site for future conversations between sound designers and audiences, touring fringe theatre provides one such opportunity, through the way that it combines aspects of the laboratory setting and field research. Fringe audiences contain a significant proportion of attendees whose chief exposure to live theatre happens at fringe festivals. Because of this, a fringe artist must always be aware of the greater degree to which this audience’s “‘cultural baggage’, which is the vast set of associations that reflects familiarity with culturally-established contexts such as socio-political reality, history, philosophy, religion, ideology and the arts” (Rozik 2014 : 167) is influenced by non-theatrical performative aesthetics and traditions. The prevalence of the soundtrack as a signifying system in film and television means that a sound designer for a fringe show must account for the possibility that the audience may expect it to be as prevalent – and as meaningful – in a live performance as in a mediatized one.

One way of accomplishing this is to design shows so that both the sound and its point of origin are in the foreground of the performance. To end this discussion on a personal note, the fringe shows which I have written and performed in have strongly foregrounded the sound design, in order to call attention to its status as an integral part of the performance text, one whose specific construction contributes to the text’s overall meaning no less than the specific arrangement of the words in the dialogue. Uncle Fun and Sparky’s Real Live Cartoon Radio Show replicated the conditions of production of radio comedy programs recorded live for broadcast by creating many of its sound effects in full view of the audience; Off My Wavelength presented a jerry-built nomadic pirate radio station whose pre-recorded program content was played through an onstage ‘ghetto blaster’ type CD-and-cassette player; How To Be A Better Audience (later reworked as The Best Audience Ever), in keeping with its putative fictional status as a classroom lecture, had all of its recorded sound and music cued from onstage by a performer.

One thing I have learned from the experience of creating these shows, and assisting in the creation of other shows as a sound designer, is that touring fringe theatre involves a re-examination of one’s aesthetic approach to theatre with each new production. Faced with continual uncertainties concerning the specifics of performance spaces and audience composition, the fringe theatre creator is best served by a careful interrogation of each element of the performance text, to determine how much it contributes to the discourse between text and audience. Discarding extraneous statements from both the performative and scenographic portions of the text places a greater communicative onus on the statements that remain. Even when this means that sound cues are shortened, simplified, or even eliminated, it also means that what reaches an audience’s ears is easier to decipher as a form of communication, and not just an incidental bit of noise or music. By emphasizing the communicative, rather than the merely illustrative, nature of sound, fringe theatre has the potential to demonstrate the viability of a linguistic model for sound design. In doing so, it also demonstrates what Ross Brown (2010 : 341) proposes as the ultimate communicative goal of the designer : “(w)ithin, but also without, the mise-en-scène, theatre sound design shows things about aural culture, about the ways in which we relate to the world around us”.