Screen Shot 2017-03-10 at 8.08.50 AM.png


I quickly made this Venn diagram on the train as I was trying to define the broad subject areas where useful game audio skills come from… It stems from a frustration at being able to explain, succinctly, what it is we do… and also has a seed of inspiration from an article by Ariel Gross  (Joining Team Audio) who was talking about hiring for game audio teams, and if you had only music production experience, you’d be outgunned, if you had film sound experience only, you’d be outgunned, etc – but if you had the right combinations of wide ranging skills, you’d be more likely to land an interview and thrive inside an audio-focussed game development environment.

So, drinking in from the three broad groups of audio production, software development and the creative arts, and starting to list out some of the subject areas commonly used or discussed from those groups on a day to day basis, we begin to build-up a solid picture, and at the intersection between them, is the thing we know as ‘game audio’.

Having splashed it on the page, and stepping back to look at it, it becomes clear that those of us who have chosen this career, or are in the process of moving into a career in game audio, should be incredibly proud and thrilled – all these incredible things we get to do every day – and the list expands all the time to include new emerging areas (AR,VR etc).

I’ve made the diagram available here for those who wish to download it and please feel free to use it wherever you feel it can help highlight and celebrate, at a glance, the complexities of a career in game audio.



I’ve read a few of these type of columns in the past, and even written one myself back in… 2008, something like that, and looking back the advice is either specific to a particular time period (”Yo, getting a job in game sound is way easier than getting a job in film sound!”) and… out of date, or… woefully out of date (”Yo, always have a CD ROM copy of your reel ready throw like a frizbee at anyone who looks remotely important”). It bothers me that these articles are still lingering around on the internetz and that real live human beings who actually need help on doing this might happen to read them and think they offer good advice. So, here is a simplified, stripped down, and soon-to-be-out-of-date list of ways to get a job in game audio…

  1. Be wary of the snake-oil. Spending money won’t help you get into the game audio business. Everything you need in terms of networking is accessible (attend GDC, Develop, MIGS, but you don’t need the pass – network around the fringes of the various meet ups happening around the conference) – more importantly attend, or even start, your own local game audio focussed meet-ups and events. You don’t need to join any organizations, everything you need in terms of social networking is right there for free (#GameAudio twitter to start). You don’t need to spend tons of money on hardware or software (Wwise and FMOD are free to download, and there are tons of video tutorials free online). There is an enormous amount of serious reading material available free online ( to start). There are some very good value courses (such as the School of Video Game Audio), available that mean you do not need to get into debt, or stretch yourself financially to do this. Need a DAW? Reaper is cheaper! … and SFX libraries (independent) have almost completely made the expensive SFX library monopoly of the past redundant.
  2. Everyone has an origin story, and it is unique to them. It represents a unique entry point into the industry that closed behind them. You need to find your own way in, your own origin story and you need to understand that it is very unlikely that what worked for one person, will work for another. Everyone is unique, every opportunity is unique, every contract is unique, every game and every game developer is unique. By all means, you need to listen to what people have to say, but make up your own mind, take what works for you and follow your path. There are multiple entry-points. The people you should really be listening to about getting into the industry, are the ones who made it in during the last one or two years, technology, attitudes and requirements change incredibly fast and you should be listening to people who are connected to the industry on a day to day basis at the ground level.
  3. Demo Reel. This is my own personal advice, and what I look for (so be wary!), but in a demo reel I like to see three things… technical proficiency of the tools and processes, artistic creative ideas, and the attitude and personality of the person behind the work. A demo reel that is 5 mins of you demoing the implementation of a particular sound project, or passion project in Wwise, FMOD or Unity, with your webcam in the top corner as you explain what is being presented would be perfect. If I get a flavour of who you are, your technical skills and your inspirational spark, and if you match what I’m looking for, then we’ll definitely be talking at some point.
  4. Be a part of the game audio community. This is its own reward, you get out what you put in and hey, if you land a job by being an active member of the community and making connections, that’s all well and good, but do this because you are passionate and engaged in this community … and not because you just want a job. This is an easy thing to sniff out, and you don’t want that smell on you. The game audio community is something you can shape and ultimately is what we want it to be. Get involved, help curate the relevant information!
  5. Do not participate in a culture that crunches. If you allow your talent to be abused by a culture that crunches, then you are condoning the practice by participating in it and as a result making the industry worse for people who come after you. Great games can be made under extraordinarily poor working conditions, but those games could have been so much better (fewer bugs, fewer design irritants) with a healthy, rested workforce at optimal energy. Understand the C word and watch Coray Sieffert’s presentation here, to get well-armed with the good business case against it – . If YOU wish to work extra hours for adding QUALITY, I believe that can be planned and achieved on a time-limited basis: provided you have discussed the length of the push with EVERYONE (your colleagues and your family) and made sure you have their support and have planned how to ramp up and down ahead of time. In the end, it needs to be YOUR decision, not one you are making because of being forced (threatened or bullied) to, or being guilted-into by peers or managers (still bullying). Working sustainably is the only way to have a long and fulfilling career in the games industry without spreading bitterness or negativity.

That’s all for now.


developer scale categories

I am often asked what the difference between working at a small indie studio and a large triple-A development studio is. The difference is scale (and along with scale come various layers of communication and management challenges). I put together this categorization of different studio and team sizes/structures. This is inspired by the movie ‘Pacific Rim’ and their categorization of various sized Kaiju threats.

Blue sky out here this morning, so i’m having a Blue Sky day. *Update at foot of the article…! For the last couple of years while now I’ve been using Wwise to mix games. The run-time environment and routing is fast, intuitive and extremely flexible. Bus hierarchies and behaviours are some of the best control mechanisms I’ve found anywhere in any toolset for live mixing, and the suit of debug tools like the profiler and the voice monitor offer essential data flow visualization of things like memory, voice resources, loudness etc. The Voice Monitor window in combination with the capture log is fantastic, however, with a few extra features, it could really change video game mixing workflows. The Voice Monitor, when you start a capture, visualizes, in the form of a timeline, all the events that are triggered when connected to a live game, and it is this timeline visualization of events that is the key aspect to further developing this window’s functionality…

Screen Shot 2015-04-10 at 8.33.32 AM Screen Shot 2015-04-10 at 8.56.45 AM

The resulting timeline is a history of all the events being stopped or started, when they were triggered, how long they lasted, if they faded in or out, and as well as info on their voice volume at that specific time. It is essentially a timeline recording of the game sound, and as you play through a connected game, every event that the audio engine triggers is reported and shown in the window. Let’s imagine for a second that Audiokinetic extend this feature and were to allow users to ‘record’, save (and even edit) these as ‘capture sessions’ (A new folder in the ‘sessions’ tab could be home to these). This would then allow multiple users to exchange these files and play them back in their Wwise project. This would turn a saved ‘capture session’ into a kind of ‘piano roll’ style event recording of game playthroughs, almost like a MIDI file, and here is where the real benefits for mixing come in, this would allow different people inside (or outside) the developer the ability to play through game using different play styles (from the one the audio implementer /or mixer has) – and to then allow a mixer to mix the game effectively for all these different styles by playing back their performance inside Wwise and tweaking accordingly. This is a kind of ‘performance capture’ of the player.

Player Styles

I used to see the impact different play styles had on a mix quite often while working on open-world games such as [Prototype], Scarface and [Prototype 2]. We’d spend a lot of time mixing the game with the audio team, slowly creeping around inside the game environment, making sure everything that needed a sound had a sound and that it was all balanced. We also made sure that we were following, what we thought were all the correct mission objectives and getting to all the way-points as directed to complete the missions. The game started to sound pretty good and pretty polished. Enter our QA lead to drive the game for a mix review with the leads or for a demo of the game to journalists. This guy played totally differently to us, moved through levels completely differently, had a completely unique, insane combat style and every time he played, he’d miss out all the low-level subtle stuff we’d spent a lot of time on, and go for these big flamboyant and flourish-filed ways of showing off the game. Jaws dropped. We could never play the game this way AND mix at the same time. This highlighted that mixing open world games (or really *any* game) wasn’t going to be as straightforward as we’d first thought – sure, we *knew* there wasn’t just one path through the experience, but we didn’t know just how different that path could be. I’ve often wondered how we could best capture and replicate different play styles without having to constantly pester people to come in and play for us while we tune the mix. Now, if we could spend a day getting 10 – 15 different players with varying kinds of play styles to go through levels in the game, while we recorded their playthroughs using an event-based capture session – one that allowed us to later replay these event logs, in real-time, and see/hear exactly what events were triggering, scrub back and forth, spend time zooming in on details, and zooming out again to consider the big picture of the mix, re-engineer and re-mix the audio – it would enable us to mix faster, mix for more interactive player focussed outcomes, and not have that odd feeling in your gut that the next time you see your shipped game in a Twitch stream, it might sounds kind of ‘not good’.

Video Sync

Another missing piece of this proposed capture system would be the ability to also drop a video file of the same captured gameplay onto the Voice Monitor timeline and slide it around to sync up with the event log. Saving out these capture sessions would then also include a reference to a video file, so that whoever was tasked with mixing or pre-mixing the level or project at that time, would have both the visual reference from the game, as well as all the audio inside the Wwise project to work with and tweak. (Video capture can now more easily be achieved with iOS output via the updates in the latest OS X – and capturing from console or screen capturing via a computer is also fairly trivial these days – I don’t anticipate Wwise being able to capture video, but it would be great to have the ability to drop in and sync up a video on the voice monitor timeline). Another thing that this kind of capture session style playback mixing would enable, is the ability for developers to hand over capture sessions to outside studios, which, with access to relevant Wwise project, would enable them to hear the game in a calibrated environment, particularly if those facilities are not available at the development location. Allowing fresh sets of ears to assess a game’s mix is an essential aspect to game mixing, and using these forms of captured gameplay would be an extremely cool way to achieve this.

There are also obvious benefits for extremely large teams who need to co-ordinate a final mix across multiple studios, and this could be perhaps achieved more easily by passing around these Wwise event capture sessions. There are certainly proprietary solutions that already take on board this recording and playback approach, and not just for audio but for all debugging, alas these only benefit those who have access inside their organizations. Similar reporting and visualization exists in other third-party engines too, and as these systems are already mostly developed align the right lines, with some extra functionality they could accomplish some of what I am thinking about here.

So, I don’t think it will be long before these kinds of log recording, playback and editing features becomes a reality within interactive mixing and game development. * Unknown to me, literally while I was writing this blog post, is that apparently this kind of feature is now available over in FMOD Studio. From Brett… “This is possible with FMOD Studio 1.06 (Released 10th April 2015).  You can connect to the game, do a profile capture session, and save it.  To begin with, the mixed output (sample accurate pcm dump) is available, and you can scrub back and forth with it while looking at all events and their parameter values. The next bit is the bit you’re after though.  You can click the ‘api’ button in the profiler which plays back the command list instead of the pcm dump, so you get a command playback (midi style). If you want, you can take that to another machine that doesn’t have the game, and change the foot steps for a character to duck quacks if you want.  You can do anything with it.”. 

* Added support for capturing API command data via the profiler, when
  connected to game/sandbox
* Added support for playback of API captures within the profiler, allowing
  you to re-audition the session against the currently built banks

When considering QUALITY in the output and processes of a game audio department, there are, more often than not, three key areas that need to be considered. These three areas all put user experience first and foremost.


The first area is that of quality source materials. In audio, everything is a part of a chain (or signal path), and if you put bad quality sound in, you tend to get bad quality sound out. A high quality and organized recording process is critical to maintaining high quality source assets. A clean, undistorted signal path is essential in gathering the highest possible source sound assets, recorded at the highest resolutions i.e. 24bit 96khz (to allow for sample manipulation) and organized into an easily accessible, catalogued, searchable library. Originally recorded material, gathered especially for the requirements of the project will often yield the finest results. Considerations should always be made to the context that something is required to be recorded in, i.e. is it outdoors, indoors, distant, close, wet, dry etc. This applies equally for voices and sound effects. Another essential element is the signal path/output itself; I/O signal path (easily re-configurable mixer hierarchies and parametization of sound), controllable, carefully measurable, predictable and trackable output levels.


Secondly, is the consideration of quality audio creation & implementation tools. This can be measured and made more efficient by scrutinizing the time it takes a sound designer to iterate on a sound implementation (create a system flow diagram like this one to find out where your inefficiencies are). The time it takes from creating a source sound asset to hearing it in the game must be the shortest possible, and also offer the least resistance to the designer through ease-of-use and stability. Improvement of tool & game-engine UX should be made a focus: All frustrations should be noted, targeted and removed from the software and pipelines. The more a sound designer is able to iterate on a sound, the closer the experience will get to something that is tuned and satisfactory for the end-listener / player.


Finally, the quality of interactions between team members, of both the sound team and the rest of the multi-discipline team is critical to quality of feature development and execution. If a sound team member can interact in a free flowing, professional and respectful way with the members of the team, and not have to constantly push through barriers or fight against bureaucracy the better for the implementation of features at a high quality, but also the better for innovation and development of emergent, opportunistic sound design and x-discipline influences. Communication must be unclouded, efficient and clear, as must the studio culture that supports the team members and development process in this regard.

As a secondary part to quality collaboration are good audio TONE TARGETS, as these will play a key role in establishing direction and the resolution of any conflicting ideas. Having a central place for high quality, easy to understand documentation as well as video and audio inspiration is essential in creating and maintaining a healthy decision making process inside the team. Key to consider for tone target material: ‘How Should the Player Feel?’ / What are the Key Adjectives (e.g. Hard, Digital, Harsh, Distortion, Cold, Dark vs Warm, Safe, Soft, Protected etc / ) In conflict resolution: Stay focussed on which proposal delivers best on the tone target.


With these three areas in place and receiving consistent attention, tuning and tweaking, team audio can begin/continue to fulfil its role as a key collaborator in studio culture and development process and always focus on what is the most important thing: delivering a high quality experience for the player.

note: this is a re-write of some ideas first floated around in this earlier blogpost.

(first published on Gamasutra)

Noticing a tweet from sound designer Kelly Pieklo about making the transition from linear sound design to non-linear sound design, and about how sound designers get to determine the parameters that can drive, control and transform the sound elements in a game got me thinking.

For ease of writing, i’m taking the term ‘parameter’ to refer to all the various elements of game data that can be mapped onto audio – including states, triggers, switches and variables.

In film sound, there isn’t really a concept of parametric data from the other departments that the sound designers can use to drive their sounds. Perhaps the closest analogy would be an OMF of temp picture cuts which the sound editors can import into their sessions to keep up to date with scene and shot changes during post-production. Often the ‘parameters’ are supplied by the director, and are not tangible programmatic variables that alter over the course of the movie, but ideas that need to be interpreted by the sound designer, and implemented through more abstract methods.

Imagining some well known movie plots with parameters that control their overall sound is a fun proposition. How would we plot the movement in Apocalypse Now towards Kurtz and affect sound, perhaps via a ‘Distance to Kurtz’ parameter? In The Conversation, we could have a parameter for Harry Caul’s ‘paranoia level’. I’m sure these are too high-level to function, but there is something we could do with those ideas once parametized, and it is a great start for thinking about the main thread of a plot or narrative and breaking it down into more interactive ways that the sound can be affected overall.

I was wondering how feasible it might be to not only have the technical and obvious parameters that we deal with most of the time in game sound, but also a whole new group of more abstracted parameters that reflected things like how the player felt (gathering biometric data from players is something that has been discussed a fair bit recently) or, in terms of more narrative game experiences, how the character ‘feels’.

I recall a feature that we implemented in the open world Scarface game which kind of did this. It was Tony Montana’s ‘Rage Meter’. If you build this meter up enough, you unlocked the ability to enter ‘Rage Mode’ at which point Tony was able to go into a blind rage in first person for a limited amount of time. Now, this wasn’t really a fully scalable parameter with many gradual nuances, but more a switch mechanism for a gameplay mode – but the interesting thing of note is that it was directly mapped onto how the character felt and behaved, and as the emotional state of the main character had changed altering his point-of-view, so too did the sound, music and dialogue being employed during that mode. Sounds were pitched down and filtered with weapon sounds pushed forward in the mix, dialogue switched to utterly insane swearing (as opposed to the regular conversational and relaxed swearing that denoted ‘normal’ gameplay) and music switched to atonal Georgio Moroder synth washes that occured in the same scenes in the motion picture.

Narrative, emotional, or point-of-view parameters might be challenging to figure out, but I think there are lots of opportunities to think more abstractly, and less technically, about game parameters. This recent talk by Randy Thom at the Mix Magazine Immersive Sound Conference gives plenty of nourishment for thoughts in this direction, particularly about point-of-view.

Game parameters and switches are mostly the servants of reality, beleivability and simulation; time of day, relative distances, footstep surface type, speed, height, density etc etc. I think these technical parameters, while entirely necessary, are really just the foundation of beleivability for sound integration and synchronization with the game engine. In an open world title, or simulation title, there are likely to be many more of these kinds of ‘reality’ based parameters and switches.


(above – all of the parameters in my current project are technical, or based on simulation)

It may not be practical to parametize the emotional vectors of a game narrative, or even neccesary. Perhaps music ‘states’ are the best example of something already a little more abstracted and closer to the emotional pulse of the game – states are the most likely to drive music or atmospheric transitions in a game and as such offer some tantalizing ways in which to start thinking about also affecting sound and dialogue. Perhaps when a music state changes from ‘calm’ to ‘fear’ there are a great many more opportunities to alter the way the sound and dialogue are presented to the player too. Maybe, without realizing it, music states are mapping the narrative and emotional beats of the game for us, and maybe tapping into these states to make changes in the rest of the soundtrack is one of the biggest opportunities for much deeper game sound integration.

In June, 2008 I was fortune enough to sit in and observe a couple of day’s mixing at Skywalker Ranch with Randy Thom and Tom Myers. I recently found these notes and thought it would be good to post them here.

Theatrical Mix in Dolby EX (6.1) for re-released print of the film. (Now Available as a Blu-Ray)


FX Mixer, Randy Thom

Music and Foley, Tom Myers

Dialogue Mix (already done by clients) usually is given to the more experienced mixer as the best chops are needed to make production dialogue sound its best. Less of an issue on animated features.


Each section, dialogue, music, foley and fx runs on a separate computer (back in the machine room) all slaved via timecode to the reel. Prior to getting to the mix, an editor, whether dialogue editor, sound editor, foley editor etc, will build the ‘pre’s’ or ‘pre-dubs’ as protocols sessions. They are then all brought onto the mix stage for the final mix.

The mixers work through the feature a reel at a time. First listening through the whole reel, then going back and mixing what needs addressing. Each mixer takes turns, while the other breaks. This is because otherwise they would step on each other’s toes and end up missing something the other wanted to write. It was a mix of just setting levels for tracks and automation for more involved ducking etc. All done at desk. The assistants assign sounds to channels for the mixers and the mixer then watches the scene and rides the faders. Every sound is on a separate track.

The mixers bounce ideas off one another, and the editors also chip-in with suggestions. This makes the whole mix process quite democratic and conversational. It makes good sense to ‘test’ mix ideas and suggestions by bouncing them off the other people in the room. Which is why one person alone doing a mix does not make much sense. Although this does happen.

The mixers make notes on timecode areas (measured in feet) that they wish to revisit, and they punch in and roll back to these spots themselves. They then record the automation to a master. Notes are also received from the client about particular areas that they wish to revisit, and these were also addresses in the mix.

A surprising amount of sound effects design and sound replacement also happens all the way through the mix. Sections are extended, under the direction of the mixer, by the editors, new or replacement sounds are also found and dropped in. While I was there, several scenes had sounds added to them. From additional tire squeals to subtle background additions, like ship horns and distant car horns in scenes where there was a suitable gap in the dialogue. The editors either go offline to find the sounds in soundminer, or copy sounds from elsewhere in the session. When a client is present, a lot more of this kind of thing happens.


Randy talked about the rule of 100%, whereby everyone who works on the soundtrack of the film, assumes in is 100% of their job to provide the content for the feature. So the composer will go for all the spots they can, as will the dialogue, and the same with the sound editors. When it comes to the mix, this often means that there is little room for any one particular element to shine. Which means more mixing decisions have to be made, and often this sounds like, music for example, have just been turned down. In more aesthetically successful movies, collaboration is present earlier, composers decide that it is fine to just drop certain cues etc. When Randy is mixing, he wears the mixers hat, and is at the service of the story and the film, and he often makes decisions to get rid of sounds that he has personally worked hard on.

Sometimes ideas about particular key scenes and mix ideas are talked about early with the director, at the script stage. Randy works this way with Robert Zemickis. However, not enough directors consider sound in pre-production and often end up with the 100% situation and a lot more things to ‘fix’ in the final mix, lots of messy and chaotic sound to figure out.

In Ghost in the Shell, there is very little music. And because of this, where it is used, it has a very powerful/meaningful effect on the story / audience. This meant a lot of great opportunities and space for sound design, some very musical sound design, such as the ambient ship horns were able to occur without offending the composer (adding musical sounds, i.e. sounds of a particular pitch, could be perceived by the audience as part of the music, particularly if they are in ‘tune’ with the underscore). A lot of the backgrounds are also very musical in the feature. Foley is very soft, clean and rich. Randy made a point about foley that they tend to not use shoes that are very clicky as they sound too much like ‘foley’, so they tend to use trainers, soft shoes, even moccasins and slippers, so this way the foley stays out of the way and doesn’t jump out as obviously foley. Randy also said that pink noise can be used for foley, just have a track with pink noise on it and ride and eq the fader so that it matches the movement! A little film trick!

Dialogue in the movie, and sounds, were panned very originally for a feature film. Dialogue remained positional to the characters, even when they were off screen, often meaning that the sound would jump to a rear speaker with a visual cut. Quite original and brave I thought, although these mix decisions were made by the clients in Japan. The music soundtrack had been re-mastered in surround. The film was mixed in Dolby EX for a theatrical re-release. So if a theatre has the rear speakers turned off for whatever reason, the audience may miss some dialogue.

Randy discussed mixing as being a series of choices about what to hear at any particular moment. And it is the graceful blending from one mix moment to the next that constitutes the actual mix. These decisions come from the story, what is important at any particular moment, what the audience needs to hear and focus on. He mentioned that cinema with deep focus photography often made things easier to ‘focus’ on with sound. In actions scenes, particularly longer action scenes, it becomes difficult to go from one thing to another constantly, especially if in the script there is no brief let-up of action to allow the sound to take a break. We talked about the extended chase scene in The Bourne Ultimatum as being a good example of handling this well. Having a scene with no music, dropping out various things at various times. The scene is well written for sound and well mixed. He also cites Spielberg movies as being good for examples of how to use sound and mixes well. Often the arrival of the T-Rex in Jurassic Park is mentioned to him as an effect that a director wants to emulate, yet there is no music in this scene. However directors often go to music first to try and achieve the emotional effect. Saving Private Ryan is also cited a lot as an effect that directors want to achieve, again, no music in the opening scene. Knowing when not to use music seems to be a decision to take at the writing stage of development, however deciding to drop cues also can work at a final mix.

There is a quote that is often thrown around in film and game sound circles about the rule of 100%. I believe the idea originates from Ben Burtt, but is often repeated and conveyed by various respected sound designers, especially in film. I’m paraphrasing, but it goes something like this…

“Everyone on a film assumes it is 100% of their job to tell the story, the composer will write music that hits all the major plot points and moods, the writers cover everything in dialogue telling 100% of the story, and the sound designers will cover every single moment with effects to carry 100% of the movie/game/whatever” – I actually found a better reference to this in Randy Thom’s Designing a Movie for Sound essay found here (section: opening a door for sound)

At the end of a production, this feels very true, and it feels especially true when you are sitting in a final mix, trying to figure out what the heck you are going to get rid of in the moment-to-moment mix. What is important at any given moment? This, in film, is where the collaboration with the director kicks into high-gear and the ‘audience’s experience of the story’ really gets into the veins of the soundtrack – a final mix is, if you like, the ‘implementation’ of the story via the soundtrack. The decisions will be made through discussion, and this is certainly easier in film due to the linearity of the medium, as to what has prominence at any moment, sometimes music is fore grounded, sometimes sound fx, sometimes (most often) dialogue. In video game mixes, the experience can be completely different depending on the team involved, the size of that team, and the scope of the project. Sometimes it is one person mixing the game making all these decisions, but at least with the knowledge of what the game design and experience needs to convey. On bigger projects it might be a small directorial multidiscipline group of leads who sit together and talk about the decisions – either way, the process is complicated by technology and workflow.

I like the idea of sound, music and fx contributing to the storytelling in equal measure. This is certainly more appealing than thinking that each of these elements will attempt to create a logjam by providing 100% each, and leaving it to the final mix to sort out the priorities at each moment. I’ve heard of sound editors in film even providing more than 100% coverage in having multiple different ‘options’ available on the dubbing stage.

Now, this is an idealized and utopian scenario, and every project has different demands of each of our three main threads of sound, but perhaps, at least as a starting point and a way of thinking about what will be important in your project from sound, breaking these areas down into the three chunks that ‘ideally’ are responsible for 33% each will work better.

33% of the soundtrack will be about music moments.

33% will be about sound moments.

33% will be about dialogue moments.

It is an oversimplification, and perhaps the practicalities of budgeting and rework make this a difficult proposition, but it is a better starting point than the 100% rule: which creates that logjam at the backend. Thinking about these numbers at the beginning of a project, rather than the 100%, is a more realistic guideline for everyone involved. It should even encourage more forethought and planning as to ‘whose moment’ is required up front, it might kick into gear some early mapping of a project in terms of FX, Music and VO. All of these elements simply can’t be going all the time, so these kinds of decisions do need to be made.

Perhaps an even more simple pre-check before commissioning any sound work should be along these lines…

Should it make a sound? (Yes / No)

Should it have a music cue? (Yes / No)

Should it be conveyed through dialogue? (Yes / No)

The emphasis here being on a reduction of overall sound, rather than an increase of overall sound content.

Leaving the ’what plays and what doesn’t play’ decisions to a final mix is making a lot of work for yourself in those crucial few weeks at the end, and the finished project will sound, more often than not, like ‘music was turned down here’ and ‘sound effects were turned down here’, rather than the coordinated orchestration of specifically written and implemented music, vo and fx to be found in, for example, The Last of Us. In this game, no one element feels as though it is trying to overpower the others, they seem to be very much working together, and the more you think about this (because it isn’t something that you notice when you are enjoying and playing the experience) the more you realize that this has all been very carefully thought out in advance and didn’t just happen to ‘come-together’ at the last moment.

I like the idea of a composer setting out with the knowledge that their contribution is going to be only a third of the entire soundtrack. Similarly I like the idea that writers are starting out with the notion that one third of the experience is going to involve spoken dialogue. I like the idea that, as content creators, we can fully expect, from the outset, to throw away 66% of the responsibility to carry everything on our shoulders. It is also just good common-sense editorial.

Being the sole audio developer at an Indie studio, and having a background as an audio director, I tend to think immediately of any project in terms of it being my responsibility to cover 100% of the soundtrack (foley, fx, ambience, music, vo). But it is only when I start to think about actually creating the content that I realize that it isn’t anywhere close to 100% of my sound or music work that is going to be doing this, but the work of many collaborators. It is very important, I realize, to define the scope of what is needed at the point of delegating out the work, as well as a schedule for its completion and integration. Any projects where I have contributed sound or music myself, I always find I have a hard time at the mixing stage ‘removing’ things. Just too attached to it. I can see the amount of work that has gone into things, and it is natural to resist decisions whereby that content is effectively removed or demoted, even though for the good of the project.

This is why I believe we have so much to learn from watching and listening to mixers. There is a useful, Eno-like idea that in attempting to mix a project you wear the ‘mixers hat’, not the sound designers hat, not the friend-of-dialogue (writer’s) hat, and not the composer’s hat. That is no longer your role. It is in wearing the mixer’s hat that you are allowed to remove yourself from the work done up to that point on every element of the content, and effectively make cold, hard decisions about what is needed, about what can be pushed to the foreground, and what can be removed. Mixing is a very subtle art in that decisions don’t need to be black and white, (“either there is music or there isn’t music”), but several things can co-exist up to a point, music can be ducked out of the way, yet still be audible, as can backgrounds and fx. A massive part of that subtle art is also political, (though it doesn’t really need to be). However, it is at this point of ducking things that you realize that a far better method of approaching this would have been to have designed the music to get out of the way at that particular moment in the first place. Predicting these moments where possible will enhance the interrelationship between the three major food groups of a soundtrack (leading to a more cohesive and telepathic whole), it will also make for a better experience for the audience/player. And, it will also make ‘mixing’ so much easier… another way to think of good planning is as “mixing in pre-production”.

This is something I’m trying out in initiating new projects. I’m hoping to be thinking about the final effect, and the final mix decisions long before we actually arrive there, and already in reality, the closer we get towards a final mix, the closer we get to determining exactly what is required of each of the three components of the soundtrack. Some of the most useful ways of doing this I’ve found are narrative, or gameplay, dynamics maps (detailed here ) – these will give an idea of what is needed from each of the three elements, though these are like graphic scores that allow a great deal of interpretation from the artist charged with creation. At least understanding the fundamentals of the dynamics involved in a project will give rise to healthy discussions about whose responsibility it is to, say carry action scenes, as opposed to ambient scenes or exploration and moments of ambiguity. Shifting the focus of sound work to understanding interrelationships between the three main threads of a soundtrack much earlier on in a project is where I see so much scope and opportunity in development right now, no matter what the technology or delivery mechanism for the game.

It is Monday morning. So I thought i’d put something together that i’ve been meaning to do for a while, and that is a process document which details some of the high-level decision making and processes that go into the creation and implementation of sound for a game from the asset to the code.


Audio Iteration Process [Click to Enlarge]

Doing this highlights the importance of a generalist skill set in game audio (for those either looking to get into game audio, or those looking to improve/grow skill-set areas). Not only do you have specific areas/groups of very specific processes, like the recording and editing block at the top of the document (in RED), and the implementation block (in GREEN) towards the bottom, but you have the need for a complete interconnectedness that involves social relations and collaboration in order for the model to work at all (decisions, reviews, communication).

Now, this document was put together thinking of sound design and implementation, but I think this is every bit as applicable to MUSIC and VOICE production. I also think that viewing the processes and decision making like this makes it very clear how our production and collaboration processes can be improved (e.g. less implementation steps using separated software is always a goal). A Voice workflow, for example often works in an iterative way at the RECORDING stage (getting many takes of the same lines in different ways to give more choices later on), rather than at the REVIEW stage (although callbacks and re-writes have become more commonplace), meaning that hearing voice IN CONTEXT and making review decisions and direction decisions is less based on a context-led-rationale than it is in sound FX design. There are many industrial reasons why this is different, but opening up the FX iteration path visually, certainly allows us to perhaps see where we can innovate and improve some of the more rigid industrial structures that are imposed, rather than designed.

Another area I wanted this to highlight, is the ITERATION process. This is the most fundamental part of the whole process, in fact, it is the REVIEW & ITERATION cycle that drives the whole model. Until you get a sound into the game, triggering, playing back, you can never know if it is doing its job or not. Chances are 9 times out of 10, that it is not, or that it could be improved in some way with a tweak of some kind. There is always something that needs to be done. Sometimes it is the re-recording of new material, which results in a journey back to the beginning of the process. Sometimes it is re-visiting assets in the sound library, and sometimes it is down to tweaking in the run-time realm of the game and audio engine. The more this process is repeated, the idea is that the less times you have to revisit the areas nearer the beginning of the process and spend more time refining the run-time game parameter side of the process. All iteration processes aim to refine what is there, and the sooner you can get ANYTHING into the game, the sooner you can start the process of getting closer to the run-time.

Another thing to note is that, there is no ‘finished?’ or ‘complete?’ stage in this process. That is simply because I don’t think the process ever really ends until the game is ripped out of your hands, it constantly gets ‘closer’ to finished, particularly the more time you can go through the latter trigger stages of the flow… but it never really ends. Another reason for this is that the game itself is changing underneath your feet, and so sounds & implementations are often required to change to ‘keep up’ with current architectural and optimization snapshot of the game.

I was also writing about a hybrid procedural audio model on Friday, and this is not accommodated in this flow, but would either be a new path of procedural sound object creation and testing (to replace the RED path), or become a part of the implementation (GREEN) path – ideally replacing the recording and editing stage entirely and shifting heavily towards a more implementation and iteration-based flow.

I made the document in Lucid Chart. It is awesome, free and very easy to use.

Sample-based vs Procedural: Its not quite as dramatic as an all out death match between these two approaches and philosophies, even though the temptation is to see things in either/or black/white terms.

One thought is that, procedural audio, even though it has been around for a while now, is still fledgling and even though there are inherent ‘cost’ savings to using this method for sound generation and propagation (particularly in games with huge amounts of content), finding a home in a largely risk-averse entertainment software industry is a big ask as the applicable approaches still feel fundamentally ‘experimental’. The thing I’ve come to realize, perhaps somewhat later than everyone else (and perhaps because of the ‘either/or’ polemics), is that a lot of the techniques and tools we are using are already in transition to a more procedural status.

This is just a quick categorization attempt that I wanted to get down before it evaporates with the rest of my thoughts and doodles on a Friday morning…

The Sample-Based Approach.

Relying entirely on streaming or preloaded sample based assets sitting on a disc.

(Most games of the PS2/PS3 generation and some mobile games today)

Re-triggering of pre-recorded material, usually wave file assets.

The Procedural Approach

Moving the sound generation effort from the disc (and the streaming throughput bandwidth) to the processor.

Synthesis-based sound objects, acoustic models, grain players, noise-shaping and DSP intensive – in-essence everything is generated at run-time, based on (hopefully) elegant, efficient and simple real-time models.

(currently fringe aesthetic games, some music based games)

For me, the process of just writing these two (admittedly loose) definitions down, made me realize that any proposal to exclusively use either of these models would need to be either a) aesthetically niche or b) technically or artistically challenged in some way. And, even though I tried to say definitively which games used these approaches, I think I’m on unsafe ground in my generalizations. It also made me realize that, of course, there is already a ton of crossover in these categories in most proprietary sound engines, and certainly inside middleware audio solutions. A purely sample-based approach is probably getting quite rare these days. So, are we in the midst of a hybrid approach without even really realizing it?

Hybrid Procedural Approach

(Most console games today)

A fundamentally sample-based approach, but one that goes much further towards the implementation side of things than simple triggers. Breaking down sounds into constituent molecules (granular) or even small recognizable chunks (automatic weapons). Parametrization of sound. Sound ‘shaping’ in the form of Procedural DSP used for ‘additional layers’ like reverbs, filters and flutter. Some soundseed or air implementation in wwise, but just as a subtly mixed in ‘layer’, rather than to supply the overall effect.

We are using procedural techniques and technologies more and more in the form of reverbs and DSP effects. But also in our implementation, we are thinking more procedurally about sound, even if still using sample-based playback material as the starting point and raw material. My feeling is that we have moved towards this often without even realizing the big picture. Could this slow-bleed approach eventually end up with interactive sound designers working completely with acoustic models and unique sound object based propagation? Perhaps for certain genres and platforms. But it is difficult to imagine a move away from a hybrid position into exclusivity. I can though, see certain projects leaning one way or the other.

Chances are if you work in game audio, you are already working in a hybrid procedural audio world.