When considering QUALITY in the output and processes of a game audio department, there are, more often than not, three key areas that need to be considered. These three areas all put user experience first and foremost.


The first area is that of quality source materials. In audio, everything is a part of a chain (or signal path), and if you put bad quality sound in, you tend to get bad quality sound out. A high quality and organized recording process is critical to maintaining high quality source assets. A clean, undistorted signal path is essential in gathering the highest possible source sound assets, recorded at the highest resolutions i.e. 24bit 96khz (to allow for sample manipulation) and organized into an easily accessible, catalogued, searchable library. Originally recorded material, gathered especially for the requirements of the project will often yield the finest results. Considerations should always be made to the context that something is required to be recorded in, i.e. is it outdoors, indoors, distant, close, wet, dry etc. This applies equally for voices and sound effects. Another essential element is the signal path/output itself; I/O signal path (easily re-configurable mixer hierarchies and parametization of sound), controllable, carefully measurable, predictable and trackable output levels.


Secondly, is the consideration of quality audio creation & implementation tools. This can be measured and made more efficient by scrutinizing the time it takes a sound designer to iterate on a sound implementation (create a system flow diagram like this one to find out where your inefficiencies are). The time it takes from creating a source sound asset to hearing it in the game must be the shortest possible, and also offer the least resistance to the designer through ease-of-use and stability. Improvement of tool & game-engine UX should be made a focus: All frustrations should be noted, targeted and removed from the software and pipelines. The more a sound designer is able to iterate on a sound, the closer the experience will get to something that is tuned and satisfactory for the end-listener / player.


Finally, the quality of interactions between team members, of both the sound team and the rest of the multi-discipline team is critical to quality of feature development and execution. If a sound team member can interact in a free flowing, professional and respectful way with the members of the team, and not have to constantly push through barriers or fight against bureaucracy the better for the implementation of features at a high quality, but also the better for innovation and development of emergent, opportunistic sound design and x-discipline influences. Communication must be unclouded, efficient and clear, as must the studio culture that supports the team members and development process in this regard.

As a secondary part to quality collaboration are good audio TONE TARGETS, as these will play a key role in establishing direction and the resolution of any conflicting ideas. Having a central place for high quality, easy to understand documentation as well as video and audio inspiration is essential in creating and maintaining a healthy decision making process inside the team. Key to consider for tone target material: ‘How Should the Player Feel?’ / What are the Key Adjectives (e.g. Hard, Digital, Harsh, Distortion, Cold, Dark vs Warm, Safe, Soft, Protected etc / ) In conflict resolution: Stay focussed on which proposal delivers best on the tone target.


With these three areas in place and receiving consistent attention, tuning and tweaking, team audio can begin/continue to fulfil its role as a key collaborator in studio culture and development process and always focus on what is the most important thing: delivering a high quality experience for the player.

note: this is a re-write of some ideas first floated around in this earlier blogpost.

(first published on Gamasutra)

Noticing a tweet from sound designer Kelly Pieklo about making the transition from linear sound design to non-linear sound design, and about how sound designers get to determine the parameters that can drive, control and transform the sound elements in a game got me thinking.

For ease of writing, i’m taking the term ‘parameter’ to refer to all the various elements of game data that can be mapped onto audio – including states, triggers, switches and variables.

In film sound, there isn’t really a concept of parametric data from the other departments that the sound designers can use to drive their sounds. Perhaps the closest analogy would be an OMF of temp picture cuts which the sound editors can import into their sessions to keep up to date with scene and shot changes during post-production. Often the ‘parameters’ are supplied by the director, and are not tangible programmatic variables that alter over the course of the movie, but ideas that need to be interpreted by the sound designer, and implemented through more abstract methods.

Imagining some well known movie plots with parameters that control their overall sound is a fun proposition. How would we plot the movement in Apocalypse Now towards Kurtz and affect sound, perhaps via a ‘Distance to Kurtz’ parameter? In The Conversation, we could have a parameter for Harry Caul’s ‘paranoia level’. I’m sure these are too high-level to function, but there is something we could do with those ideas once parametized, and it is a great start for thinking about the main thread of a plot or narrative and breaking it down into more interactive ways that the sound can be affected overall.

I was wondering how feasible it might be to not only have the technical and obvious parameters that we deal with most of the time in game sound, but also a whole new group of more abstracted parameters that reflected things like how the player felt (gathering biometric data from players is something that has been discussed a fair bit recently) or, in terms of more narrative game experiences, how the character ‘feels’.

I recall a feature that we implemented in the open world Scarface game which kind of did this. It was Tony Montana’s ‘Rage Meter’. If you build this meter up enough, you unlocked the ability to enter ‘Rage Mode’ at which point Tony was able to go into a blind rage in first person for a limited amount of time. Now, this wasn’t really a fully scalable parameter with many gradual nuances, but more a switch mechanism for a gameplay mode – but the interesting thing of note is that it was directly mapped onto how the character felt and behaved, and as the emotional state of the main character had changed altering his point-of-view, so too did the sound, music and dialogue being employed during that mode. Sounds were pitched down and filtered with weapon sounds pushed forward in the mix, dialogue switched to utterly insane swearing (as opposed to the regular conversational and relaxed swearing that denoted ‘normal’ gameplay) and music switched to atonal Georgio Moroder synth washes that occured in the same scenes in the motion picture.

Narrative, emotional, or point-of-view parameters might be challenging to figure out, but I think there are lots of opportunities to think more abstractly, and less technically, about game parameters. This recent talk by Randy Thom at the Mix Magazine Immersive Sound Conference gives plenty of nourishment for thoughts in this direction, particularly about point-of-view.

Game parameters and switches are mostly the servants of reality, beleivability and simulation; time of day, relative distances, footstep surface type, speed, height, density etc etc. I think these technical parameters, while entirely necessary, are really just the foundation of beleivability for sound integration and synchronization with the game engine. In an open world title, or simulation title, there are likely to be many more of these kinds of ‘reality’ based parameters and switches.


(above – all of the parameters in my current project are technical, or based on simulation)

It may not be practical to parametize the emotional vectors of a game narrative, or even neccesary. Perhaps music ‘states’ are the best example of something already a little more abstracted and closer to the emotional pulse of the game – states are the most likely to drive music or atmospheric transitions in a game and as such offer some tantalizing ways in which to start thinking about also affecting sound and dialogue. Perhaps when a music state changes from ‘calm’ to ‘fear’ there are a great many more opportunities to alter the way the sound and dialogue are presented to the player too. Maybe, without realizing it, music states are mapping the narrative and emotional beats of the game for us, and maybe tapping into these states to make changes in the rest of the soundtrack is one of the biggest opportunities for much deeper game sound integration.

In June, 2008 I was fortune enough to sit in and observe a couple of day’s mixing at Skywalker Ranch with Randy Thom and Tom Myers. I recently found these notes and thought it would be good to post them here.

Theatrical Mix in Dolby EX (6.1) for re-released print of the film. (Now Available as a Blu-Ray)


FX Mixer, Randy Thom

Music and Foley, Tom Myers

Dialogue Mix (already done by clients) usually is given to the more experienced mixer as the best chops are needed to make production dialogue sound its best. Less of an issue on animated features.


Each section, dialogue, music, foley and fx runs on a separate computer (back in the machine room) all slaved via timecode to the reel. Prior to getting to the mix, an editor, whether dialogue editor, sound editor, foley editor etc, will build the ‘pre’s’ or ‘pre-dubs’ as protocols sessions. They are then all brought onto the mix stage for the final mix.

The mixers work through the feature a reel at a time. First listening through the whole reel, then going back and mixing what needs addressing. Each mixer takes turns, while the other breaks. This is because otherwise they would step on each other’s toes and end up missing something the other wanted to write. It was a mix of just setting levels for tracks and automation for more involved ducking etc. All done at desk. The assistants assign sounds to channels for the mixers and the mixer then watches the scene and rides the faders. Every sound is on a separate track.

The mixers bounce ideas off one another, and the editors also chip-in with suggestions. This makes the whole mix process quite democratic and conversational. It makes good sense to ‘test’ mix ideas and suggestions by bouncing them off the other people in the room. Which is why one person alone doing a mix does not make much sense. Although this does happen.

The mixers make notes on timecode areas (measured in feet) that they wish to revisit, and they punch in and roll back to these spots themselves. They then record the automation to a master. Notes are also received from the client about particular areas that they wish to revisit, and these were also addresses in the mix.

A surprising amount of sound effects design and sound replacement also happens all the way through the mix. Sections are extended, under the direction of the mixer, by the editors, new or replacement sounds are also found and dropped in. While I was there, several scenes had sounds added to them. From additional tire squeals to subtle background additions, like ship horns and distant car horns in scenes where there was a suitable gap in the dialogue. The editors either go offline to find the sounds in soundminer, or copy sounds from elsewhere in the session. When a client is present, a lot more of this kind of thing happens.


Randy talked about the rule of 100%, whereby everyone who works on the soundtrack of the film, assumes in is 100% of their job to provide the content for the feature. So the composer will go for all the spots they can, as will the dialogue, and the same with the sound editors. When it comes to the mix, this often means that there is little room for any one particular element to shine. Which means more mixing decisions have to be made, and often this sounds like, music for example, have just been turned down. In more aesthetically successful movies, collaboration is present earlier, composers decide that it is fine to just drop certain cues etc. When Randy is mixing, he wears the mixers hat, and is at the service of the story and the film, and he often makes decisions to get rid of sounds that he has personally worked hard on.

Sometimes ideas about particular key scenes and mix ideas are talked about early with the director, at the script stage. Randy works this way with Robert Zemickis. However, not enough directors consider sound in pre-production and often end up with the 100% situation and a lot more things to ‘fix’ in the final mix, lots of messy and chaotic sound to figure out.

In Ghost in the Shell, there is very little music. And because of this, where it is used, it has a very powerful/meaningful effect on the story / audience. This meant a lot of great opportunities and space for sound design, some very musical sound design, such as the ambient ship horns were able to occur without offending the composer (adding musical sounds, i.e. sounds of a particular pitch, could be perceived by the audience as part of the music, particularly if they are in ‘tune’ with the underscore). A lot of the backgrounds are also very musical in the feature. Foley is very soft, clean and rich. Randy made a point about foley that they tend to not use shoes that are very clicky as they sound too much like ‘foley’, so they tend to use trainers, soft shoes, even moccasins and slippers, so this way the foley stays out of the way and doesn’t jump out as obviously foley. Randy also said that pink noise can be used for foley, just have a track with pink noise on it and ride and eq the fader so that it matches the movement! A little film trick!

Dialogue in the movie, and sounds, were panned very originally for a feature film. Dialogue remained positional to the characters, even when they were off screen, often meaning that the sound would jump to a rear speaker with a visual cut. Quite original and brave I thought, although these mix decisions were made by the clients in Japan. The music soundtrack had been re-mastered in surround. The film was mixed in Dolby EX for a theatrical re-release. So if a theatre has the rear speakers turned off for whatever reason, the audience may miss some dialogue.

Randy discussed mixing as being a series of choices about what to hear at any particular moment. And it is the graceful blending from one mix moment to the next that constitutes the actual mix. These decisions come from the story, what is important at any particular moment, what the audience needs to hear and focus on. He mentioned that cinema with deep focus photography often made things easier to ‘focus’ on with sound. In actions scenes, particularly longer action scenes, it becomes difficult to go from one thing to another constantly, especially if in the script there is no brief let-up of action to allow the sound to take a break. We talked about the extended chase scene in The Bourne Ultimatum as being a good example of handling this well. Having a scene with no music, dropping out various things at various times. The scene is well written for sound and well mixed. He also cites Spielberg movies as being good for examples of how to use sound and mixes well. Often the arrival of the T-Rex in Jurassic Park is mentioned to him as an effect that a director wants to emulate, yet there is no music in this scene. However directors often go to music first to try and achieve the emotional effect. Saving Private Ryan is also cited a lot as an effect that directors want to achieve, again, no music in the opening scene. Knowing when not to use music seems to be a decision to take at the writing stage of development, however deciding to drop cues also can work at a final mix.

There is a quote that is often thrown around in film and game sound circles about the rule of 100%. I believe the idea originates from Ben Burtt, but is often repeated and conveyed by various respected sound designers, especially in film. I’m paraphrasing, but it goes something like this…

“Everyone on a film assumes it is 100% of their job to tell the story, the composer will write music that hits all the major plot points and moods, the writers cover everything in dialogue telling 100% of the story, and the sound designers will cover every single moment with effects to carry 100% of the movie/game/whatever” – I actually found a better reference to this in Randy Thom’s Designing a Movie for Sound essay found here (section: opening a door for sound)

At the end of a production, this feels very true, and it feels especially true when you are sitting in a final mix, trying to figure out what the heck you are going to get rid of in the moment-to-moment mix. What is important at any given moment? This, in film, is where the collaboration with the director kicks into high-gear and the ‘audience’s experience of the story’ really gets into the veins of the soundtrack – a final mix is, if you like, the ‘implementation’ of the story via the soundtrack. The decisions will be made through discussion, and this is certainly easier in film due to the linearity of the medium, as to what has prominence at any moment, sometimes music is fore grounded, sometimes sound fx, sometimes (most often) dialogue. In video game mixes, the experience can be completely different depending on the team involved, the size of that team, and the scope of the project. Sometimes it is one person mixing the game making all these decisions, but at least with the knowledge of what the game design and experience needs to convey. On bigger projects it might be a small directorial multidiscipline group of leads who sit together and talk about the decisions – either way, the process is complicated by technology and workflow.

I like the idea of sound, music and fx contributing to the storytelling in equal measure. This is certainly more appealing than thinking that each of these elements will attempt to create a logjam by providing 100% each, and leaving it to the final mix to sort out the priorities at each moment. I’ve heard of sound editors in film even providing more than 100% coverage in having multiple different ‘options’ available on the dubbing stage.

Now, this is an idealized and utopian scenario, and every project has different demands of each of our three main threads of sound, but perhaps, at least as a starting point and a way of thinking about what will be important in your project from sound, breaking these areas down into the three chunks that ‘ideally’ are responsible for 33% each will work better.

33% of the soundtrack will be about music moments.

33% will be about sound moments.

33% will be about dialogue moments.

It is an oversimplification, and perhaps the practicalities of budgeting and rework make this a difficult proposition, but it is a better starting point than the 100% rule: which creates that logjam at the backend. Thinking about these numbers at the beginning of a project, rather than the 100%, is a more realistic guideline for everyone involved. It should even encourage more forethought and planning as to ‘whose moment’ is required up front, it might kick into gear some early mapping of a project in terms of FX, Music and VO. All of these elements simply can’t be going all the time, so these kinds of decisions do need to be made.

Perhaps an even more simple pre-check before commissioning any sound work should be along these lines…

Should it make a sound? (Yes / No)

Should it have a music cue? (Yes / No)

Should it be conveyed through dialogue? (Yes / No)

The emphasis here being on a reduction of overall sound, rather than an increase of overall sound content.

Leaving the ’what plays and what doesn’t play’ decisions to a final mix is making a lot of work for yourself in those crucial few weeks at the end, and the finished project will sound, more often than not, like ‘music was turned down here’ and ‘sound effects were turned down here’, rather than the coordinated orchestration of specifically written and implemented music, vo and fx to be found in, for example, The Last of Us. In this game, no one element feels as though it is trying to overpower the others, they seem to be very much working together, and the more you think about this (because it isn’t something that you notice when you are enjoying and playing the experience) the more you realize that this has all been very carefully thought out in advance and didn’t just happen to ‘come-together’ at the last moment.

I like the idea of a composer setting out with the knowledge that their contribution is going to be only a third of the entire soundtrack. Similarly I like the idea that writers are starting out with the notion that one third of the experience is going to involve spoken dialogue. I like the idea that, as content creators, we can fully expect, from the outset, to throw away 66% of the responsibility to carry everything on our shoulders. It is also just good common-sense editorial.

Being the sole audio developer at an Indie studio, and having a background as an audio director, I tend to think immediately of any project in terms of it being my responsibility to cover 100% of the soundtrack (foley, fx, ambience, music, vo). But it is only when I start to think about actually creating the content that I realize that it isn’t anywhere close to 100% of my sound or music work that is going to be doing this, but the work of many collaborators. It is very important, I realize, to define the scope of what is needed at the point of delegating out the work, as well as a schedule for its completion and integration. Any projects where I have contributed sound or music myself, I always find I have a hard time at the mixing stage ‘removing’ things. Just too attached to it. I can see the amount of work that has gone into things, and it is natural to resist decisions whereby that content is effectively removed or demoted, even though for the good of the project.

This is why I believe we have so much to learn from watching and listening to mixers. There is a useful, Eno-like idea that in attempting to mix a project you wear the ‘mixers hat’, not the sound designers hat, not the friend-of-dialogue (writer’s) hat, and not the composer’s hat. That is no longer your role. It is in wearing the mixer’s hat that you are allowed to remove yourself from the work done up to that point on every element of the content, and effectively make cold, hard decisions about what is needed, about what can be pushed to the foreground, and what can be removed. Mixing is a very subtle art in that decisions don’t need to be black and white, (“either there is music or there isn’t music”), but several things can co-exist up to a point, music can be ducked out of the way, yet still be audible, as can backgrounds and fx. A massive part of that subtle art is also political, (though it doesn’t really need to be). However, it is at this point of ducking things that you realize that a far better method of approaching this would have been to have designed the music to get out of the way at that particular moment in the first place. Predicting these moments where possible will enhance the interrelationship between the three major food groups of a soundtrack (leading to a more cohesive and telepathic whole), it will also make for a better experience for the audience/player. And, it will also make ‘mixing’ so much easier… another way to think of good planning is as “mixing in pre-production”.

This is something I’m trying out in initiating new projects. I’m hoping to be thinking about the final effect, and the final mix decisions long before we actually arrive there, and already in reality, the closer we get towards a final mix, the closer we get to determining exactly what is required of each of the three components of the soundtrack. Some of the most useful ways of doing this I’ve found are narrative, or gameplay, dynamics maps (detailed here http://www.gamasutra.com/view/feature/132531/dynamics_of_narrative.php ) – these will give an idea of what is needed from each of the three elements, though these are like graphic scores that allow a great deal of interpretation from the artist charged with creation. At least understanding the fundamentals of the dynamics involved in a project will give rise to healthy discussions about whose responsibility it is to, say carry action scenes, as opposed to ambient scenes or exploration and moments of ambiguity. Shifting the focus of sound work to understanding interrelationships between the three main threads of a soundtrack much earlier on in a project is where I see so much scope and opportunity in development right now, no matter what the technology or delivery mechanism for the game.

It is Monday morning. So I thought i’d put something together that i’ve been meaning to do for a while, and that is a process document which details some of the high-level decision making and processes that go into the creation and implementation of sound for a game from the asset to the code.


Audio Iteration Process [Click to Enlarge]

Doing this highlights the importance of a generalist skill set in game audio (for those either looking to get into game audio, or those looking to improve/grow skill-set areas). Not only do you have specific areas/groups of very specific processes, like the recording and editing block at the top of the document (in RED), and the implementation block (in GREEN) towards the bottom, but you have the need for a complete interconnectedness that involves social relations and collaboration in order for the model to work at all (decisions, reviews, communication).

Now, this document was put together thinking of sound design and implementation, but I think this is every bit as applicable to MUSIC and VOICE production. I also think that viewing the processes and decision making like this makes it very clear how our production and collaboration processes can be improved (e.g. less implementation steps using separated software is always a goal). A Voice workflow, for example often works in an iterative way at the RECORDING stage (getting many takes of the same lines in different ways to give more choices later on), rather than at the REVIEW stage (although callbacks and re-writes have become more commonplace), meaning that hearing voice IN CONTEXT and making review decisions and direction decisions is less based on a context-led-rationale than it is in sound FX design. There are many industrial reasons why this is different, but opening up the FX iteration path visually, certainly allows us to perhaps see where we can innovate and improve some of the more rigid industrial structures that are imposed, rather than designed.

Another area I wanted this to highlight, is the ITERATION process. This is the most fundamental part of the whole process, in fact, it is the REVIEW & ITERATION cycle that drives the whole model. Until you get a sound into the game, triggering, playing back, you can never know if it is doing its job or not. Chances are 9 times out of 10, that it is not, or that it could be improved in some way with a tweak of some kind. There is always something that needs to be done. Sometimes it is the re-recording of new material, which results in a journey back to the beginning of the process. Sometimes it is re-visiting assets in the sound library, and sometimes it is down to tweaking in the run-time realm of the game and audio engine. The more this process is repeated, the idea is that the less times you have to revisit the areas nearer the beginning of the process and spend more time refining the run-time game parameter side of the process. All iteration processes aim to refine what is there, and the sooner you can get ANYTHING into the game, the sooner you can start the process of getting closer to the run-time.

Another thing to note is that, there is no ‘finished?’ or ‘complete?’ stage in this process. That is simply because I don’t think the process ever really ends until the game is ripped out of your hands, it constantly gets ‘closer’ to finished, particularly the more time you can go through the latter trigger stages of the flow… but it never really ends. Another reason for this is that the game itself is changing underneath your feet, and so sounds & implementations are often required to change to ‘keep up’ with current architectural and optimization snapshot of the game.

I was also writing about a hybrid procedural audio model on Friday, and this is not accommodated in this flow, but would either be a new path of procedural sound object creation and testing (to replace the RED path), or become a part of the implementation (GREEN) path – ideally replacing the recording and editing stage entirely and shifting heavily towards a more implementation and iteration-based flow.

I made the document in Lucid Chart. It is awesome, free and very easy to use.

Sample-based vs Procedural: Its not quite as dramatic as an all out death match between these two approaches and philosophies, even though the temptation is to see things in either/or black/white terms.

One thought is that, procedural audio, even though it has been around for a while now, is still fledgling and even though there are inherent ‘cost’ savings to using this method for sound generation and propagation (particularly in games with huge amounts of content), finding a home in a largely risk-averse entertainment software industry is a big ask as the applicable approaches still feel fundamentally ‘experimental’. The thing I’ve come to realize, perhaps somewhat later than everyone else (and perhaps because of the ‘either/or’ polemics), is that a lot of the techniques and tools we are using are already in transition to a more procedural status.

This is just a quick categorization attempt that I wanted to get down before it evaporates with the rest of my thoughts and doodles on a Friday morning…

The Sample-Based Approach.

Relying entirely on streaming or preloaded sample based assets sitting on a disc.

(Most games of the PS2/PS3 generation and some mobile games today)

Re-triggering of pre-recorded material, usually wave file assets.

The Procedural Approach

Moving the sound generation effort from the disc (and the streaming throughput bandwidth) to the processor.

Synthesis-based sound objects, acoustic models, grain players, noise-shaping and DSP intensive – in-essence everything is generated at run-time, based on (hopefully) elegant, efficient and simple real-time models.

(currently fringe aesthetic games, some music based games)

For me, the process of just writing these two (admittedly loose) definitions down, made me realize that any proposal to exclusively use either of these models would need to be either a) aesthetically niche or b) technically or artistically challenged in some way. And, even though I tried to say definitively which games used these approaches, I think I’m on unsafe ground in my generalizations. It also made me realize that, of course, there is already a ton of crossover in these categories in most proprietary sound engines, and certainly inside middleware audio solutions. A purely sample-based approach is probably getting quite rare these days. So, are we in the midst of a hybrid approach without even really realizing it?

Hybrid Procedural Approach

(Most console games today)

A fundamentally sample-based approach, but one that goes much further towards the implementation side of things than simple triggers. Breaking down sounds into constituent molecules (granular) or even small recognizable chunks (automatic weapons). Parametrization of sound. Sound ‘shaping’ in the form of Procedural DSP used for ‘additional layers’ like reverbs, filters and flutter. Some soundseed or air implementation in wwise, but just as a subtly mixed in ‘layer’, rather than to supply the overall effect.

We are using procedural techniques and technologies more and more in the form of reverbs and DSP effects. But also in our implementation, we are thinking more procedurally about sound, even if still using sample-based playback material as the starting point and raw material. My feeling is that we have moved towards this often without even realizing the big picture. Could this slow-bleed approach eventually end up with interactive sound designers working completely with acoustic models and unique sound object based propagation? Perhaps for certain genres and platforms. But it is difficult to imagine a move away from a hybrid position into exclusivity. I can though, see certain projects leaning one way or the other.

Chances are if you work in game audio, you are already working in a hybrid procedural audio world.


While the arms-race of each successive console generation offers and tantalizes consumers with higher quality entertainment experiences, defining quality itself has started to get more and more tricky. Is it simply a case of (for sound at least) higher sample rates? More fidelity in the surround field? Playing back more voices simultaneously? Higher resolution DSP effects? Consistency? Less glitches and bugs? More convincing (and convincingly captured) performances from actors?

It does begin to blur around the edges as you realize that this is perhaps one of the broadest and most subjective categories to talk about. Yet, it is fundamental to how we navigate, describe (and judge) increasingly expensive (and often complex) entertainment experiences within our industry. Quality is also something that, you soon realize, doesn’t only apply exclusively to big budget games, but also something that applies to much smaller titles, and even down to simple interfaces. Perhaps it helps to think not about the end result, the objective final output of the game, but about the overall experience, and to that end, perhaps the ‘quality’ of processes that go into creating the experiences themselves requires more examination and investment (beyond the unsatisfactory notions of ‘quality’ simply being a shaded area occupying the intersection of features, budget and time).

I’ve been thinking a lot about this lately (too much, hence the overflow into the written word) and, my own ad-hoc definition of “QUALITY”, in a game production context, might shed some light (or maybe raise more questions) on how to evaluate (and produce) the ‘intangible’ notion of ‘quality’ (note: this is not really about ‘polish’ which I consider to be an endeavor almost exclusively achieved and performed in post-production) – and it is actually informed and tracked across several quite different areas.

1) Quality of Interaction (Communication and Collaboration) Ensuring collaboration is happening at the high level (between leadership/studio culture/project management) and at the low level (between coders and implementers) and is both happening vertically (intra-discipline) and horizontally (inter-discipline)

2) Quality of Implementation (use of, and access to, material, ease and speed of implementation (tools & pipelines), expertise, iteration time (refinement and enrichment)

3) Quality of Input (Source Assets) and Output (Signal/Data Path): Correctly isolated (or environment specific) recordings (or synthesis) at the highest sample rates and bit depths + I/O signal path (easily re-configurable mixer hierarchies and parametization of sound), controllable, carefully measurable, predictable and trackable output levels. Having this I/O in place allows both upwards and downwards SCALABILITY to different (or newly emerging) platforms.

In combination – I reckon these three areas invariably allow the delivery of refined ‘high-quality’ features and experiences. I’d also like to imply that these areas are not limited to console development (although it is the source of current questions about what ‘next-gen’ actually is/means), but can apply to any technical system whereby the delivery devices are cyclical and incremental.

Perhaps quality is more simply about how well we are able to convey an idea and an experience to a user, and making that distance between the user and the experience as small as possible, such that, in the end, the technology all but disappears completely.

The day to day work of audio can be very detail oriented, and it is easy to get lost in this forest of sound molecules. Solutions to many day to day issues often rely on decision making of a broader kind, and often audio work can be as much political as it is creative, social or technical. Wrangling resources, ensuring that important production information and risks are on everyone’s radar, selling features, ideas, haggling for more time or budget and communicating across disciplinary voids can require a fair degree of entrepreneurial flair.

I’ve been thinking about some general audio pillars within game development a lot. I thought I’d have a go at throwing together some very high-level pillars for game audio which read, to all intents and purposes, like a kind of manifesto promise. The thought here is to provide high-level transparent goals for the audio department within a development environment, and to serve as a series of checks and balances by having a longer term strategic outlook (without that we are marooned in the reactionary, short-term and arguably heading in no direction in particular). This also serves to hold audio accountable to some tangible realities and deliverables, if things aren’t moving in the direction outlined, then, during regular check-ins, course correction can be applied.

Four Strategic Long-Term Audio Pillars

A Focus on Polish is a Focus on Solid Communication.

Whatever the project, polish is one of the most fundamental areas of audio work (it is the reason we focus so much on having good quality source assets and geek-out about microphones, and also why we focus so much on the idea of a ‘signal path’ ). Call it post-production, or mixing, or whatever, the process of removing any unwanted jagged corners, cuts, glitches, sounds that grab the attention at the wrong time, or don’t help the experience is an aspect that is universal to every single sound project. This can be an emphasis on being visible about and scheduling audio Post-Production time, or a familiar and contributory appearance at scrums. But, in order for sound to actually effectively polish something, the work in other areas of production (animation, scripting, world building etc) has to have been somewhat ‘locked down’ – This is an increasingly difficult subject in today’s fast-moving, ‘never-finished’ digital production domain, but one thing that these changes have emphasized over all others is that communication is critical. Iteration, visibility and ‘connectedness’ to the team’s thinking and planning is important to providing polish in the digital production domain. Using continual verbal, visual, and written comms is absolutely essential to keeping everyone in the loop on what is happening. Polish is as much co-ordination, as it is technical or aesthetic choices, and co-ordination is a political endeavor.

Grow, Nurture, and Invest in the Audio Team

Audio teams are often the smallest in the building. They are outnumbered by Art, Design and Tech departments. They can appear to others to be a black box, where no-one understands the processes and voodoo that goes on in sound-proofed rooms. But, we are just like any other department. There is nothing special about team audio; we may see things differently, and have different connections to the team, we have different needs and different skill-sets, but fundamentally, we are exactly the same. In the early days of game development (which these still are) audio often needs to shout that bit louder for equality and representation on the team and to get a seat at the table as a ‘principle collaborator’, rather than an end-of-production ‘service provider’. Everyone on the team will be trained in, and versed in the language of collaboration and innovation. They will know who to go to, how to present, how to prototype an idea and set goals, they will have resources at their disposal, and they will be encouraged to push forward and improve every aspect of their craft and process – removing every element of drag, friction and resistance from their work. Career paths will be clear, transparent and on par with other disciplines in the studio culture. Members of the team will have autonomy to control their own growth and path. The audio budget will always be discussed and adjusted to fit the requirements of the project, with a focus on VALUE.

Early (and Continued) Involvement for Audio

Involvement in earliest genesis discussions of a project. Early involvement with script development, pre-vis work and prototyping as well as with early scheduling and budgeting. Simply put, “Audio is another Art Department.”. The sound team will be able to participate in design discussions, or be empowered to create those opportunities and discussions where they do not yet exist.

Tools & Tech: Put Designer/Implementer UX before Player UX. (Player comes 2nd! – The only way to truly put the player 1st) –

Push the Technology and pipelines in a meaningful, useful and positive direction. Alleviate the designer/implementer’s struggle. The primary goal is to support the person using the tools and enable them a frictionless experience (alleviate enormous fatiguing or repetitive/heavy lifting tasks) when integrating audio into the game. (From small standalone batching scripts and tools, to game engine and audio engine tools & pipelines – the experience of integrating sound should be simple, straightforward, painless and easy to communicate to others) – focusing tools and processes on the user, allowing audio designers to quickly implement assets, switch them and tune them at run-time is a priority for changing the collaborative nature of review sessions etc. This in turn allows the audio designers to focus more clearly on the ‘player’s experience’ rather than wrestling with their own technical issues.

Every studio culture is different, and has a unique approach that solves design and production problems for a unique product line-up. Also, for some audio departments these are problems that are already long-ago solved, while at others, the problems are so much worse (no audio tools, no audio programmer support or resources, and woefully underdeveloped pipelines) – yet every time, audio finds a way to struggle-on, smash through that which resists and make things work and happen. This is really a hopeful push for a broader, more long-term strategic vision – to build resourceful and confident teams with an elevated view of what is in front of them (and behind them), rather than teams fixated on the short-term problems immediately in front.


“Towers. Open Fire.” – Burroughs

I’ve been meaning to post on an, admittedly experimental, method of Agile Development that i’ve been working with for the last few years, but have never quite got around to documenting. It’s not for everyone, but then neither is Agile. I want to see if anyone else is working with and attempting to formalize adaptive iterative methods like this, and secondly I want to see if there are audio designers or directors who are working with Agile methods in a more ‘by the book’ manner and finding them to be successful. Writing (and re-writing!) this post has led me to some quite deep pondering on the Agile process and how to re-think it specifically for audio, and i’ve come to the conclusion (for now) that by thinking about tasks differently, by scaling them, you can have structure, trackability and freedom. My guess is that most audio folks work this way anyway, though perhaps think of it in different terms.


Obviously different teams and cultures, different software products have different interpretations of Agile development techniques and implement them differently based on what works well for that particular team. I’ve always found the by-the-book Scrum and Kanban techniques to be dry and process-driven, where the most important thing always seems to be following the rules of the game, rather than working collaboratively and openly in more informal discussion groups.

Kanban seemed like a better philosophical approach, and more geared towards open collaborative x-discipline game development, however i’ve found both to be relatively short-sighted and not have enough focus on ‘the big picture’. There is also a big question of who is running the show, the scrum master, or PM, or game director, can often wield quite a lot of power simply by cutting user-strories based on priority, rather than a collective gut-feel for if something is truly worth doing or not. This is I guess why these methods work well for more structured, incremental software development, and perhaps not so well to the awkward, uneven and chaotic world of game development.

This brings me specifically to audio in an agile world, you may suddenly find yourself in the following situation, without much planning or thought about the approach… Let’s say we have a large team. From my own experience, this usually looks like this: There is the team, broken down into cells, all doing their daily morning stand-ups. As an audio team, or even a single audio resource on a project, there is instantly a problem here when several cells are meeting at the same time, and the audio resources are spread so thinly as to not be able to show up in the different cells. Shift the timing of the cell meetings to be able to accommodate, and you have the problem of being in meetings all morning. These morning stand-ups DO work well, when the team is small, meaning around 10 – 15 people, so that a single morning stand-up can happen and audio can be at the table to update and be updated without cell meeting burnout. One of the things you quickly learn is to keep these things short, entertaining and meaningful. So, audio updates tend to be just that, maybe without the entertaining and meaningful parts, but, we do our best. So why does this work better at the small team level? On larger teams the ways in which different members of the audio department work within a cell may be quite different, and may even vary at different times during production. Some may choose to detail all their work down to the smallest element, while others may simply work with broad strokes at a higher level. I find agile methods like scrum tend to focus most heavily in on the small details, which is where I can find them dry and tedious, and I guess the part I like least, is that I tend to see teams and groups getting ‘lost in the details’. By the same token, they also tend to ‘leave out’ the bigger picture stuff… so I’ve always been trying to find a way of simultaneously having a good handle on both. Scrums, for me at least, can often feel like something is missing in terms of focus.

Perhaps now is a also good moment to take a step back from looking at the problems of that ‘scrum daily standup’ moment, to think about what is going on from a higher structural level. I should mention the actual scheduling process. This is critical in determining when the team, and when audio, will be allowed to be ‘agile’ and when they have to morph into a beast of delivery and rigidity. I’ve found that working backwards from a ship date to be the absolute best way of knowing where you are and where you should be right now. Milestones are still very necessary and meaningful from an audio viewpoint, perhaps one of the few barometers for where the entire project needs to be. Audio, being last in the production dependency baton-race will typically indicate where things need to be complete by, and by scheduling things like a final mix, or a sound beta and sound alpha period, all the relevant ground spikes are hammered into the schedule and you can begin to build up a pretty decent skeleton of where things need to be and by when.

Project Level View

Example Schedule of High level Tasks (er… click to go BIG)

Certain audio tasks may not appear to be very compatible with agile methodology at all, at least not in the widely accepted ways they are done inside most development studios. Dialogue production, for one example, has a myriad of dependencies and conditions that need to be met at various milestone: narrative script, cut-scene script, AI script writing, filename and productions script preparation, casting, booking, placeholder recording, implementing, recording, editing, implementing, reviews, re-writes, re-records, re-edits, mastering, mixing – these dependencies are so waterfall-like (Gantt-like) that it is almost impossible to see where agile development can fit into that process once it gets started. It kind of lives outside that whole ‘fast iteration’ cycle – even more difficult with mo-cap. Writing tends to be agile, as it turns in tandem with the twists of the game design – but once you get into the solidified waters of the ‘script’ things gets locked down pretty tight. One small, but significant area where I have found fast iteration can be applied to dialogue is during the actual session, where actors can improvise and provide re-writes on the fly. This is not agile development, but straight up improv within the confines of the ‘dialogue line’ or ‘event’. This is a philosophy you have to be prepared for and ready to embrace, but it can be awesome and create some really compelling performances. So, in that tiny ‘dialogue recording’ window, you have an opportunity to be agile. (As an aside, I think the further we can get away from actors ‘reading scripts’ in video games, and get to the place where they ‘learn lines and improv scenes’ the performances will ‘read’ as much more compelling by the audiences – iterative and improv based dialogue session are one way to do this, but the window of opportunity to capture this is very small – precisely because it is one of the only periods of freedom inside an industrialized Gantt production process).

So, dialogue production is not really something that plays nice with a by-the-book agile methodology. It would be awesome if it could, and there were systems whereby actors could always be available to come in and instantly change content on a weekly basis, but it is so industrial a process, that this is probably not something we’ll see for a while, certainly less likely at the A3 end of development.

Music production, similarly can be a process of either industrial production based on several iterative stages of pre-production sketches to delivering finished pieces mission by mission, milestone by milestone, feedback being provided at various stages and then having them enter a brief post-production phase (of mixing and mastering), or it can be a more ongoing iterative process for smaller titles with less music – the ongoing iterative process here can produce music of finished quality at almost the beginning of the process, and constantly change throughout production right up until the end. The methods of working, and the production pipelines implied by different styles of music (electronic over orchestral for example) largely determine the process and agility inside that process – so then, here agility and iteration times are influenced more by chosen production style, than by a team’s desired way of working. But, as with dialogue, these kinds of productions are likely to include periods, or levels, of both agile and waterfall.

Depending on the scale of production, and here is where I finally get into my idea of a different way of thinking about and scheduling agile tasks, items such as ‘music production’ and ‘sound effects production’, at the highest level, are often better represented on a schedule by several very long-term time boxes. Essentially these huge time-boxes are large spaces of time dedicated entirely to producing and iterating on content until time either runs out, or a part of the process feels finished. The advantages of these ‘large, broad iteration cycles’ (or timeboxes) are many. They simplify the audio production task definitions for non-audio PMs and other disciplines. They show clearly where the agile tasks rigidify into drop-dead delivery dates (so we don’t open the audio up to endless noodling when the clock is running out). They also free up the producers of this content to iterate and experiment with these broad categories in their own way, at their own pace, they can procrastinate, knuckle-down, explore new ideas, essentially have complete ownership over those areas of the game sound. Part of this is to never have anything in the game as ‘final’ or ‘finished’ until the final mix. This leaves flexibility and honesty in the schedule and also allows big decisions to be made at the final mix (where very often sounds are replaced based on new unforeseen contexts that have arisen late in development) and major stakeholders are present for the audio sign-off.

I’ve broken this down elsewhere, but as an audio lead, my high-level audio schedules usually consist of two kinds of tasks. Long-term iterative tasks, and short term tasks. Short term tasks may be things like developing specific audio features for the game on the programmer side, or those waterfall tasks described for dialogue above such as ‘casting’ and ‘recording’. Long-term iterative tasks are those ongoing areas that will always be changing as development trundles along, yet they remain as high-level as possible in terms of description, purely to allow the PMs and scrum tasks to have insight without worrying about every single piece of minutiae about the work.


So, there is a difference here that needs to be addressed, and that is one of TASK SCALE.

I’ve found a useful way of thinking about tasks, and resolution of tasks, is as a kind of atomic scale, like zooming in on that photo in Bladerunner. I reckon there are (at least) three useful levels at which to consider tasks…

1)  The Molecular/Atomic Level (The detailed / nitty gritty /individual wave files, event triggers, volume and pitch attenuations, all the tweaking-knob-twiddling detail of any particular task, the smallest components)

2) The Object Level, the level at which you can give things names that other disciplines can relate to – weapon x, music cue y, location z.

3) The Project (or ‘Feature’) Level, as described above, the large high-level PM way to look at and break down tasks, music, ambience, prop effects, UI sound, mix – big things that you can talk about holistically – the ‘Features’ that make up the back of the box, exec summary type-stuff.

The Molecular/Atomic Level task is not something that I think works well at being tracked at all, so I don’t think this really works with Agile methods, it is too granular (and, in fact, too agile) and is always the kind of stuff that people who start a discussion about need to usually take ‘off-line’ and collaborate together on elsewhere. This is a level of detail that I hinted at earlier as being something that an individual has ownership over and autonomy over – but the ability to ‘scale up’ and be able to talk about and update on those details at the Object level IS important…

The Object Level. This is a category of tasks that I believe can be tracked nicely with Agile methodology (or any other tracking method), and lends itself well to x-discipline group exposure. Discussions are tangible and not too technical, details can be figured out offline, and progress can be tracked e.g. ‘The sound for 10 weapon classes is on schedule for the end of the week’, ‘the music cues for missions x, y and z  are now implemented and ready for feedback/testing’.

The Feature/Project Level. This is a PM, or Audio Director / Lead / Exec level perspective and often requires a solid knowledge of where everything is on the object level (depending on your role). When things change on this level, they ripple down to the levels below in a big way, via either extra polish time, additional levels, objects or animations, or reduced scope.

All these TASK levels related to one another, but depending on another SCALE, the scale of production, the amount of sound resources required to do this effectively changes a lot.

There are two ways a project and its resources can be scaled. A project can either have DEPTH, or BREADTH.

A Racing genre game with very few tracks, but with a massive amount of vehicle types, could be considered a game with shallow feature sets and DEEP content. A third-person linear adventure game, with lots of variety in the different locations and player activities throughout the experience (I’m thinking Uncharted, Arkham Asylum etc) could be said to be game with a BREADTH of features and content, but only one or two DEEP mechanics. An open world game such as GTA, Assassin’s Creed or Saints Row could be said to have both BREADTH (tons of features like driving, shooting, hand-to-hand combat, navigation) and DEPTH (tons of variety in vehicles, weapons, mission types and locations). A mobile title such as Angry Birds or Candy Crush would by contrast have comparatively shallow Depth and Breadth. (There is also the factor of TIME, which for now, I am not considering, but will make a large impact on the amount of sound personnel required).

This is where sound personnel matters a lot, and how they are organized to handle TASK SCALE levels becomes important. An open world title might require several sound designers and implementers with ownership over several broad areas of the game, all handling both Molecular and Object Level tasks and require an Audio Lead to monitor and connect that team to the Project Level Tasks. In a small mobile game studio, one person may be handling all of the task levels themselves.

Attacking a project on all of these levels is kind of what I mean by “Attack on All Fronts”. When a painter or sculptor works, the way they often work is a quick and meditative (subconscious) interplay between the molecular level detail and the big picture level detail, going back and forth very quickly and rapidly iterating using the material they work with. This is how working on a mobile title can feel when you are a single person audio department. The object level completely disappears and there feels like little need to track anything, or communicate anything because the work is so fluid. This can lead to problems in communication, and obviously game development is very different to painting in that there is a team involved in the creation process. For larger teams, that middle step of thinking about the Object Level tasks becomes a pivotal part of the process, where high-level and detail level thinking can interface and I believe in that level is one of the best places, and most collaborative places to discuss the project. Sometimes it is easy to loose sight of the process and the levels of tasks and responsibility…

At certain points in full production, for long long periods, we often find ourselves simply attacking stuff. As it gets added, we attack it. As it changes, our sounds become out of date, and we attack it. This is a full-frontal assault on all aspects of the game from all angles, weapons, hud, foley, ambience, music, voice, UI – that ongoing, iterative approach can become a fog of a war that is “Attacking on All Fronts”. This can be a long period of iteration, and I think it can be crucial in producing good work, provided that relationship between personnel and task scale is maintained. Through iteration, the more something changes, the better it gets, sometimes it gets worse before it gets better, and sometimes it just gets cut. In almost all cases, it is better for the project. Up at the Project level it is an incredibly open and deliberately anti-detail process. This allows you to attack the project in clever ways, via scope for example. If you have ownership over weapons and UI at the Object level, then your long-term tasks, for the entirety of production is to attack those areas as they change on the x-discipline level. The freedom in how to approach that iteration lives at the Molecular level. Molecular level scope is also left totally open and free for that person running that section. In the end, a focus on the end user at all levels provides the incentive for ways to either add to or remove from the work required (Removal of sound is a huge area of sound design and iteration, and not one that you’d instantly think of as being something you’d schedule for. Its almost like scheduling anti-work). Attack on all fronts at all task levels is closer to the process of a painter slowly building up paint on a canvass, or a sculptor working constantly on a sculpture, the feedback is very instant and as things change, the overall ‘work’ begins to form. It changes as it forms and it is important that it is the big picture that is tracked, as much as the details.

Up at the Project Level, these long-term tasks will eventually turn into short-term polish tasks, or post production tasks, whereby they are pre-mixed, finalized and everything starts to solidify and get ready for the final mix. As long as these dates are clear, I find you can work in a totally flexible and free manner until you need to switch gears.

Once post-production, or sound alpha, or sound beta is reached (terminology differs here between developers). The Attack on All Fronts approach continues, but it moves into a different gear. Polish and removal. Removal, again, is as big a part of the iterative process, particularly in terms of Object Level areas like weapons, but when all these different areas come together and are presented in context with one another, a new level of removal needs to take place. This is where things again really get focussed on the player, and the end user experience. Any clutter or sound that is getting in the way of the experience is removed, or diminished, or mixed in such a way that the experience becomes more honed and focussed. This is essentially the job of the final mix, but also a process of sound effects replacement, premixing, polish, cutting, all informed by a period of intense scrutiny and reviews takes the foreground.

Defining the periods of rigidity and the periods of agility at the Project Level seems to be very important in being able to both control and let go of the game development process for audio. To deliver and also to maintain a degree  of freedom and opportunity inside of various tasks feels important. I think that most tasks, even those as rigid and industrial as dialogue production can be broken up into short-term and long-term tasks: with corresponding levels of freedom, experimentation, detail and overview.

Big picture tracking is a fairly difficult thing to quantify and relate to, it is based on constant review and constant iteration and is based a great deal on a ‘feel’ for when something is right in x-disicipline context, rather than just running out of time (although that can put an end to the process too) or being buried in checking small tasks off a checklist. The structures i’ve tried to pin down here are ways of having STUCTURE via the Schedule and tasks at the Project Level, TRACKABILITY at the Object Level and FREEDOM via the open nature of tasks at the Molecular Level.

As already stated, game development projects tend to be very uneven and even ‘chaotic’. I like some of the agile systems as loose frameworks for certain kinds of tasks and for inter-departmental awareness and communication. Finding a balance that works for the project, culture and personnel you have often means a lot of mixing and matching, and a high degree of seeing whatever does and doesn’t work. But I think one step towards that is understanding this notion of identifying task scale, which is something that can sometimes feel easy to get lost inside due to the complexity and constantly changing elements of production.

(excerpt from the afterword of ‘Game Audio Culture’)

It is no longer enough to simply have a good sounding game. It is no longer enough to be able to produce great sounds, or great music, or great speech. This is the basic starting position that I believe sound has explored for the majority of the 20th century, and from which our industry now has the opportunity to grow. Sound, indeed any discipline, should now be approached from a completely fresh starting point – from day one, as an integral part of the design process. There are no excuses, if this is not how your organization is set up, then it is up to you to start the process as soon as you possibly can. In the 21st century the sound artist, no matter what kinds of game or product they are working on, is to be a true multidimensional problem solver and innovator. This imperative is everywhere we look today, the primary thrust of technology is to enable collaboration, enable visibility and transparency, clearly it is trying to fix something that is broken.  With mixing, it is no longer enough to simply mix a game, the final physical acts of moving faders is when the opportunities for mix decisions are almost all closed off – in order to truly influence mix decisions, to nurture mix moments and strategies from concept through to final you need to be there in the beginning. With sound design, with composition, with dialogue, every area under the sound umbrella works in exactly the same way. Sound is a by-product of design decision making, there is little room afterwards for maneuverability – and it is the opportunities for amazing sound design that are most lamented through this segregation/waterfall approach. Every area of specialization will need to undergo this transformation. Sound, art, design and technology all form the moving and interrelated parts of a user experience (is it almost time to rebrand ourselves as UX designers?) – this context is how sound must be able to think of itself and all the interactions of the sound designer must fulfill and resonate among these inter-dependencies and interrelationships. Our responsibility is to be mutually accountable for all the other discipline’s successes and/or failures, and them similarly for ours. The incredible sound and co-ordination in ‘The Last of Us’ wouldn’t have been possible without opportunities provided for the sound team by design, art and the creative director, but it was the sound team’s opportunism and ability to rise to the challenges that made this one particular example (as I write this) shine out above all others so far in 2013.

How do we do this? As sound designers, as audio directors, as freelance content creators? It isn’t really something that any of us really have that much experience in, and I say this because every single game, every single team and every single opportunity is completely new and different.. and it really should be approached in this way. Though I do believe the way to start down this path is simple. It is all about the relationships and trust that we have with other people within a team, creative or otherwise. These relationships are entirely defined by trust, this is never about talent, and rarely about experience – unless it is the experience of letting go – and the collaborative motivation is one which we can foreground above all else, and learn to foreground on a daily basis. The sooner we become as integrated as possible, as early as possible into the veins of  the process, and as trusted a design collaborator as possible in the development process, the better for not just the craft of sound, but for the craft of interactive design as a whole. Being a sound designer isn’t about making great sound, it is about making great games; simply by using sound to help solve design problems.


Get every new post delivered to your Inbox.