Blue sky out here this morning, so i’m having a Blue Sky day. *Update at foot of the article…! For the last couple of years while now I’ve been using Wwise to mix games. The run-time environment and routing is fast, intuitive and extremely flexible. Bus hierarchies and behaviours are some of the best control mechanisms I’ve found anywhere in any toolset for live mixing, and the suit of debug tools like the profiler and the voice monitor offer essential data flow visualization of things like memory, voice resources, loudness etc. The Voice Monitor window in combination with the capture log is fantastic, however, with a few extra features, it could really change video game mixing workflows. The Voice Monitor, when you start a capture, visualizes, in the form of a timeline, all the events that are triggered when connected to a live game, and it is this timeline visualization of events that is the key aspect to further developing this window’s functionality…
The resulting timeline is a history of all the events being stopped or started, when they were triggered, how long they lasted, if they faded in or out, and as well as info on their voice volume at that specific time. It is essentially a timeline recording of the game sound, and as you play through a connected game, every event that the audio engine triggers is reported and shown in the window. Let’s imagine for a second that Audiokinetic extend this feature and were to allow users to ‘record’, save (and even edit) these as ‘capture sessions’ (A new folder in the ‘sessions’ tab could be home to these). This would then allow multiple users to exchange these files and play them back in their Wwise project. This would turn a saved ‘capture session’ into a kind of ‘piano roll’ style event recording of game playthroughs, almost like a MIDI file, and here is where the real benefits for mixing come in, this would allow different people inside (or outside) the developer the ability to play through game using different play styles (from the one the audio implementer /or mixer has) – and to then allow a mixer to mix the game effectively for all these different styles by playing back their performance inside Wwise and tweaking accordingly. This is a kind of ‘performance capture’ of the player.
I used to see the impact different play styles had on a mix quite often while working on open-world games such as [Prototype], Scarface and [Prototype 2]. We’d spend a lot of time mixing the game with the audio team, slowly creeping around inside the game environment, making sure everything that needed a sound had a sound and that it was all balanced. We also made sure that we were following, what we thought were all the correct mission objectives and getting to all the way-points as directed to complete the missions. The game started to sound pretty good and pretty polished. Enter our QA lead to drive the game for a mix review with the leads or for a demo of the game to journalists. This guy played totally differently to us, moved through levels completely differently, had a completely unique, insane combat style and every time he played, he’d miss out all the low-level subtle stuff we’d spent a lot of time on, and go for these big flamboyant and flourish-filed ways of showing off the game. Jaws dropped. We could never play the game this way AND mix at the same time. This highlighted that mixing open world games (or really *any* game) wasn’t going to be as straightforward as we’d first thought – sure, we *knew* there wasn’t just one path through the experience, but we didn’t know just how different that path could be. I’ve often wondered how we could best capture and replicate different play styles without having to constantly pester people to come in and play for us while we tune the mix. Now, if we could spend a day getting 10 – 15 different players with varying kinds of play styles to go through levels in the game, while we recorded their playthroughs using an event-based capture session – one that allowed us to later replay these event logs, in real-time, and see/hear exactly what events were triggering, scrub back and forth, spend time zooming in on details, and zooming out again to consider the big picture of the mix, re-engineer and re-mix the audio – it would enable us to mix faster, mix for more interactive player focussed outcomes, and not have that odd feeling in your gut that the next time you see your shipped game in a Twitch stream, it might sounds kind of ‘not good’.
Another missing piece of this proposed capture system would be the ability to also drop a video file of the same captured gameplay onto the Voice Monitor timeline and slide it around to sync up with the event log. Saving out these capture sessions would then also include a reference to a video file, so that whoever was tasked with mixing or pre-mixing the level or project at that time, would have both the visual reference from the game, as well as all the audio inside the Wwise project to work with and tweak. (Video capture can now more easily be achieved with iOS output via the updates in the latest OS X – and capturing from console or screen capturing via a computer is also fairly trivial these days – I don’t anticipate Wwise being able to capture video, but it would be great to have the ability to drop in and sync up a video on the voice monitor timeline). Another thing that this kind of capture session style playback mixing would enable, is the ability for developers to hand over capture sessions to outside studios, which, with access to relevant Wwise project, would enable them to hear the game in a calibrated environment, particularly if those facilities are not available at the development location. Allowing fresh sets of ears to assess a game’s mix is an essential aspect to game mixing, and using these forms of captured gameplay would be an extremely cool way to achieve this.
There are also obvious benefits for extremely large teams who need to co-ordinate a final mix across multiple studios, and this could be perhaps achieved more easily by passing around these Wwise event capture sessions. There are certainly proprietary solutions that already take on board this recording and playback approach, and not just for audio but for all debugging, alas these only benefit those who have access inside their organizations. Similar reporting and visualization exists in other third-party engines too, and as these systems are already mostly developed align the right lines, with some extra functionality they could accomplish some of what I am thinking about here.
So, I don’t think it will be long before these kinds of log recording, playback and editing features becomes a reality within interactive mixing and game development. * Unknown to me, literally while I was writing this blog post, is that apparently this kind of feature is now available over in FMOD Studio. From Brett… “This is possible with FMOD Studio 1.06 (Released 10th April 2015). You can connect to the game, do a profile capture session, and save it. To begin with, the mixed output (sample accurate pcm dump) is available, and you can scrub back and forth with it while looking at all events and their parameter values. The next bit is the bit you’re after though. You can click the ‘api’ button in the profiler which plays back the command list instead of the pcm dump, so you get a command playback (midi style). If you want, you can take that to another machine that doesn’t have the game, and change the foot steps for a character to duck quacks if you want. You can do anything with it.”.
* Added support for capturing API command data via the profiler, when connected to game/sandbox * Added support for playback of API captures within the profiler, allowing you to re-audition the session against the currently built banks
When considering QUALITY in the output and processes of a game audio department, there are, more often than not, three key areas that need to be considered. These three areas all put user experience first and foremost.
QUALITY SOURCE & SIGNAL PATH
The first area is that of quality source materials. In audio, everything is a part of a chain (or signal path), and if you put bad quality sound in, you tend to get bad quality sound out. A high quality and organized recording process is critical to maintaining high quality source assets. A clean, undistorted signal path is essential in gathering the highest possible source sound assets, recorded at the highest resolutions i.e. 24bit 96khz (to allow for sample manipulation) and organized into an easily accessible, catalogued, searchable library. Originally recorded material, gathered especially for the requirements of the project will often yield the finest results. Considerations should always be made to the context that something is required to be recorded in, i.e. is it outdoors, indoors, distant, close, wet, dry etc. This applies equally for voices and sound effects. Another essential element is the signal path/output itself; I/O signal path (easily re-configurable mixer hierarchies and parametization of sound), controllable, carefully measurable, predictable and trackable output levels.
QUALITY TOOLS / PIPELINES
Secondly, is the consideration of quality audio creation & implementation tools. This can be measured and made more efficient by scrutinizing the time it takes a sound designer to iterate on a sound implementation (create a system flow diagram like this one to find out where your inefficiencies are). The time it takes from creating a source sound asset to hearing it in the game must be the shortest possible, and also offer the least resistance to the designer through ease-of-use and stability. Improvement of tool & game-engine UX should be made a focus: All frustrations should be noted, targeted and removed from the software and pipelines. The more a sound designer is able to iterate on a sound, the closer the experience will get to something that is tuned and satisfactory for the end-listener / player.
QUALITY COLLABORATION / INTERACTION
Finally, the quality of interactions between team members, of both the sound team and the rest of the multi-discipline team is critical to quality of feature development and execution. If a sound team member can interact in a free flowing, professional and respectful way with the members of the team, and not have to constantly push through barriers or fight against bureaucracy the better for the implementation of features at a high quality, but also the better for innovation and development of emergent, opportunistic sound design and x-discipline influences. Communication must be unclouded, efficient and clear, as must the studio culture that supports the team members and development process in this regard.
As a secondary part to quality collaboration are good audio TONE TARGETS, as these will play a key role in establishing direction and the resolution of any conflicting ideas. Having a central place for high quality, easy to understand documentation as well as video and audio inspiration is essential in creating and maintaining a healthy decision making process inside the team. Key to consider for tone target material: ‘How Should the Player Feel?’ / What are the Key Adjectives (e.g. Hard, Digital, Harsh, Distortion, Cold, Dark vs Warm, Safe, Soft, Protected etc / ) In conflict resolution: Stay focussed on which proposal delivers best on the tone target.
With these three areas in place and receiving consistent attention, tuning and tweaking, team audio can begin/continue to fulfil its role as a key collaborator in studio culture and development process and always focus on what is the most important thing: delivering a high quality experience for the player.
note: this is a re-write of some ideas first floated around in this earlier blogpost.
(first published on Gamasutra)
Noticing a tweet from sound designer Kelly Pieklo about making the transition from linear sound design to non-linear sound design, and about how sound designers get to determine the parameters that can drive, control and transform the sound elements in a game got me thinking.
For ease of writing, i’m taking the term ‘parameter’ to refer to all the various elements of game data that can be mapped onto audio – including states, triggers, switches and variables.
In film sound, there isn’t really a concept of parametric data from the other departments that the sound designers can use to drive their sounds. Perhaps the closest analogy would be an OMF of temp picture cuts which the sound editors can import into their sessions to keep up to date with scene and shot changes during post-production. Often the ‘parameters’ are supplied by the director, and are not tangible programmatic variables that alter over the course of the movie, but ideas that need to be interpreted by the sound designer, and implemented through more abstract methods.
Imagining some well known movie plots with parameters that control their overall sound is a fun proposition. How would we plot the movement in Apocalypse Now towards Kurtz and affect sound, perhaps via a ‘Distance to Kurtz’ parameter? In The Conversation, we could have a parameter for Harry Caul’s ‘paranoia level’. I’m sure these are too high-level to function, but there is something we could do with those ideas once parametized, and it is a great start for thinking about the main thread of a plot or narrative and breaking it down into more interactive ways that the sound can be affected overall.
I was wondering how feasible it might be to not only have the technical and obvious parameters that we deal with most of the time in game sound, but also a whole new group of more abstracted parameters that reflected things like how the player felt (gathering biometric data from players is something that has been discussed a fair bit recently) or, in terms of more narrative game experiences, how the character ‘feels’.
I recall a feature that we implemented in the open world Scarface game which kind of did this. It was Tony Montana’s ‘Rage Meter’. If you build this meter up enough, you unlocked the ability to enter ‘Rage Mode’ at which point Tony was able to go into a blind rage in first person for a limited amount of time. Now, this wasn’t really a fully scalable parameter with many gradual nuances, but more a switch mechanism for a gameplay mode – but the interesting thing of note is that it was directly mapped onto how the character felt and behaved, and as the emotional state of the main character had changed altering his point-of-view, so too did the sound, music and dialogue being employed during that mode. Sounds were pitched down and filtered with weapon sounds pushed forward in the mix, dialogue switched to utterly insane swearing (as opposed to the regular conversational and relaxed swearing that denoted ‘normal’ gameplay) and music switched to atonal Georgio Moroder synth washes that occured in the same scenes in the motion picture.
Narrative, emotional, or point-of-view parameters might be challenging to figure out, but I think there are lots of opportunities to think more abstractly, and less technically, about game parameters. This recent talk by Randy Thom at the Mix Magazine Immersive Sound Conference gives plenty of nourishment for thoughts in this direction, particularly about point-of-view.
Game parameters and switches are mostly the servants of reality, beleivability and simulation; time of day, relative distances, footstep surface type, speed, height, density etc etc. I think these technical parameters, while entirely necessary, are really just the foundation of beleivability for sound integration and synchronization with the game engine. In an open world title, or simulation title, there are likely to be many more of these kinds of ‘reality’ based parameters and switches.
(above – all of the parameters in my current project are technical, or based on simulation)
It may not be practical to parametize the emotional vectors of a game narrative, or even neccesary. Perhaps music ‘states’ are the best example of something already a little more abstracted and closer to the emotional pulse of the game – states are the most likely to drive music or atmospheric transitions in a game and as such offer some tantalizing ways in which to start thinking about also affecting sound and dialogue. Perhaps when a music state changes from ‘calm’ to ‘fear’ there are a great many more opportunities to alter the way the sound and dialogue are presented to the player too. Maybe, without realizing it, music states are mapping the narrative and emotional beats of the game for us, and maybe tapping into these states to make changes in the rest of the soundtrack is one of the biggest opportunities for much deeper game sound integration.
In June, 2008 I was fortune enough to sit in and observe a couple of day’s mixing at Skywalker Ranch with Randy Thom and Tom Myers. I recently found these notes and thought it would be good to post them here.
Theatrical Mix in Dolby EX (6.1) for re-released print of the film. (Now Available as a Blu-Ray)
FX Mixer, Randy Thom
Music and Foley, Tom Myers
Dialogue Mix (already done by clients) usually is given to the more experienced mixer as the best chops are needed to make production dialogue sound its best. Less of an issue on animated features.
Each section, dialogue, music, foley and fx runs on a separate computer (back in the machine room) all slaved via timecode to the reel. Prior to getting to the mix, an editor, whether dialogue editor, sound editor, foley editor etc, will build the ‘pre’s’ or ‘pre-dubs’ as protocols sessions. They are then all brought onto the mix stage for the final mix.
The mixers work through the feature a reel at a time. First listening through the whole reel, then going back and mixing what needs addressing. Each mixer takes turns, while the other breaks. This is because otherwise they would step on each other’s toes and end up missing something the other wanted to write. It was a mix of just setting levels for tracks and automation for more involved ducking etc. All done at desk. The assistants assign sounds to channels for the mixers and the mixer then watches the scene and rides the faders. Every sound is on a separate track.
The mixers bounce ideas off one another, and the editors also chip-in with suggestions. This makes the whole mix process quite democratic and conversational. It makes good sense to ‘test’ mix ideas and suggestions by bouncing them off the other people in the room. Which is why one person alone doing a mix does not make much sense. Although this does happen.
The mixers make notes on timecode areas (measured in feet) that they wish to revisit, and they punch in and roll back to these spots themselves. They then record the automation to a master. Notes are also received from the client about particular areas that they wish to revisit, and these were also addresses in the mix.
A surprising amount of sound effects design and sound replacement also happens all the way through the mix. Sections are extended, under the direction of the mixer, by the editors, new or replacement sounds are also found and dropped in. While I was there, several scenes had sounds added to them. From additional tire squeals to subtle background additions, like ship horns and distant car horns in scenes where there was a suitable gap in the dialogue. The editors either go offline to find the sounds in soundminer, or copy sounds from elsewhere in the session. When a client is present, a lot more of this kind of thing happens.
Randy talked about the rule of 100%, whereby everyone who works on the soundtrack of the film, assumes in is 100% of their job to provide the content for the feature. So the composer will go for all the spots they can, as will the dialogue, and the same with the sound editors. When it comes to the mix, this often means that there is little room for any one particular element to shine. Which means more mixing decisions have to be made, and often this sounds like, music for example, have just been turned down. In more aesthetically successful movies, collaboration is present earlier, composers decide that it is fine to just drop certain cues etc. When Randy is mixing, he wears the mixers hat, and is at the service of the story and the film, and he often makes decisions to get rid of sounds that he has personally worked hard on.
Sometimes ideas about particular key scenes and mix ideas are talked about early with the director, at the script stage. Randy works this way with Robert Zemickis. However, not enough directors consider sound in pre-production and often end up with the 100% situation and a lot more things to ‘fix’ in the final mix, lots of messy and chaotic sound to figure out.
In Ghost in the Shell, there is very little music. And because of this, where it is used, it has a very powerful/meaningful effect on the story / audience. This meant a lot of great opportunities and space for sound design, some very musical sound design, such as the ambient ship horns were able to occur without offending the composer (adding musical sounds, i.e. sounds of a particular pitch, could be perceived by the audience as part of the music, particularly if they are in ‘tune’ with the underscore). A lot of the backgrounds are also very musical in the feature. Foley is very soft, clean and rich. Randy made a point about foley that they tend to not use shoes that are very clicky as they sound too much like ‘foley’, so they tend to use trainers, soft shoes, even moccasins and slippers, so this way the foley stays out of the way and doesn’t jump out as obviously foley. Randy also said that pink noise can be used for foley, just have a track with pink noise on it and ride and eq the fader so that it matches the movement! A little film trick!
Dialogue in the movie, and sounds, were panned very originally for a feature film. Dialogue remained positional to the characters, even when they were off screen, often meaning that the sound would jump to a rear speaker with a visual cut. Quite original and brave I thought, although these mix decisions were made by the clients in Japan. The music soundtrack had been re-mastered in surround. The film was mixed in Dolby EX for a theatrical re-release. So if a theatre has the rear speakers turned off for whatever reason, the audience may miss some dialogue.
Randy discussed mixing as being a series of choices about what to hear at any particular moment. And it is the graceful blending from one mix moment to the next that constitutes the actual mix. These decisions come from the story, what is important at any particular moment, what the audience needs to hear and focus on. He mentioned that cinema with deep focus photography often made things easier to ‘focus’ on with sound. In actions scenes, particularly longer action scenes, it becomes difficult to go from one thing to another constantly, especially if in the script there is no brief let-up of action to allow the sound to take a break. We talked about the extended chase scene in The Bourne Ultimatum as being a good example of handling this well. Having a scene with no music, dropping out various things at various times. The scene is well written for sound and well mixed. He also cites Spielberg movies as being good for examples of how to use sound and mixes well. Often the arrival of the T-Rex in Jurassic Park is mentioned to him as an effect that a director wants to emulate, yet there is no music in this scene. However directors often go to music first to try and achieve the emotional effect. Saving Private Ryan is also cited a lot as an effect that directors want to achieve, again, no music in the opening scene. Knowing when not to use music seems to be a decision to take at the writing stage of development, however deciding to drop cues also can work at a final mix.
There is a quote that is often thrown around in film and game sound circles about the rule of 100%. I believe the idea originates from Ben Burtt, but is often repeated and conveyed by various respected sound designers, especially in film. I’m paraphrasing, but it goes something like this…
“Everyone on a film assumes it is 100% of their job to tell the story, the composer will write music that hits all the major plot points and moods, the writers cover everything in dialogue telling 100% of the story, and the sound designers will cover every single moment with effects to carry 100% of the movie/game/whatever” – I actually found a better reference to this in Randy Thom’s Designing a Movie for Sound essay found here (section: opening a door for sound)
At the end of a production, this feels very true, and it feels especially true when you are sitting in a final mix, trying to figure out what the heck you are going to get rid of in the moment-to-moment mix. What is important at any given moment? This, in film, is where the collaboration with the director kicks into high-gear and the ‘audience’s experience of the story’ really gets into the veins of the soundtrack – a final mix is, if you like, the ‘implementation’ of the story via the soundtrack. The decisions will be made through discussion, and this is certainly easier in film due to the linearity of the medium, as to what has prominence at any moment, sometimes music is fore grounded, sometimes sound fx, sometimes (most often) dialogue. In video game mixes, the experience can be completely different depending on the team involved, the size of that team, and the scope of the project. Sometimes it is one person mixing the game making all these decisions, but at least with the knowledge of what the game design and experience needs to convey. On bigger projects it might be a small directorial multidiscipline group of leads who sit together and talk about the decisions – either way, the process is complicated by technology and workflow.
I like the idea of sound, music and fx contributing to the storytelling in equal measure. This is certainly more appealing than thinking that each of these elements will attempt to create a logjam by providing 100% each, and leaving it to the final mix to sort out the priorities at each moment. I’ve heard of sound editors in film even providing more than 100% coverage in having multiple different ‘options’ available on the dubbing stage.
Now, this is an idealized and utopian scenario, and every project has different demands of each of our three main threads of sound, but perhaps, at least as a starting point and a way of thinking about what will be important in your project from sound, breaking these areas down into the three chunks that ‘ideally’ are responsible for 33% each will work better.
33% of the soundtrack will be about music moments.
33% will be about sound moments.
33% will be about dialogue moments.
It is an oversimplification, and perhaps the practicalities of budgeting and rework make this a difficult proposition, but it is a better starting point than the 100% rule: which creates that logjam at the backend. Thinking about these numbers at the beginning of a project, rather than the 100%, is a more realistic guideline for everyone involved. It should even encourage more forethought and planning as to ‘whose moment’ is required up front, it might kick into gear some early mapping of a project in terms of FX, Music and VO. All of these elements simply can’t be going all the time, so these kinds of decisions do need to be made.
Perhaps an even more simple pre-check before commissioning any sound work should be along these lines…
Should it make a sound? (Yes / No)
Should it have a music cue? (Yes / No)
Should it be conveyed through dialogue? (Yes / No)
The emphasis here being on a reduction of overall sound, rather than an increase of overall sound content.
Leaving the ’what plays and what doesn’t play’ decisions to a final mix is making a lot of work for yourself in those crucial few weeks at the end, and the finished project will sound, more often than not, like ‘music was turned down here’ and ‘sound effects were turned down here’, rather than the coordinated orchestration of specifically written and implemented music, vo and fx to be found in, for example, The Last of Us. In this game, no one element feels as though it is trying to overpower the others, they seem to be very much working together, and the more you think about this (because it isn’t something that you notice when you are enjoying and playing the experience) the more you realize that this has all been very carefully thought out in advance and didn’t just happen to ‘come-together’ at the last moment.
I like the idea of a composer setting out with the knowledge that their contribution is going to be only a third of the entire soundtrack. Similarly I like the idea that writers are starting out with the notion that one third of the experience is going to involve spoken dialogue. I like the idea that, as content creators, we can fully expect, from the outset, to throw away 66% of the responsibility to carry everything on our shoulders. It is also just good common-sense editorial.
Being the sole audio developer at an Indie studio, and having a background as an audio director, I tend to think immediately of any project in terms of it being my responsibility to cover 100% of the soundtrack (foley, fx, ambience, music, vo). But it is only when I start to think about actually creating the content that I realize that it isn’t anywhere close to 100% of my sound or music work that is going to be doing this, but the work of many collaborators. It is very important, I realize, to define the scope of what is needed at the point of delegating out the work, as well as a schedule for its completion and integration. Any projects where I have contributed sound or music myself, I always find I have a hard time at the mixing stage ‘removing’ things. Just too attached to it. I can see the amount of work that has gone into things, and it is natural to resist decisions whereby that content is effectively removed or demoted, even though for the good of the project.
This is why I believe we have so much to learn from watching and listening to mixers. There is a useful, Eno-like idea that in attempting to mix a project you wear the ‘mixers hat’, not the sound designers hat, not the friend-of-dialogue (writer’s) hat, and not the composer’s hat. That is no longer your role. It is in wearing the mixer’s hat that you are allowed to remove yourself from the work done up to that point on every element of the content, and effectively make cold, hard decisions about what is needed, about what can be pushed to the foreground, and what can be removed. Mixing is a very subtle art in that decisions don’t need to be black and white, (“either there is music or there isn’t music”), but several things can co-exist up to a point, music can be ducked out of the way, yet still be audible, as can backgrounds and fx. A massive part of that subtle art is also political, (though it doesn’t really need to be). However, it is at this point of ducking things that you realize that a far better method of approaching this would have been to have designed the music to get out of the way at that particular moment in the first place. Predicting these moments where possible will enhance the interrelationship between the three major food groups of a soundtrack (leading to a more cohesive and telepathic whole), it will also make for a better experience for the audience/player. And, it will also make ‘mixing’ so much easier… another way to think of good planning is as “mixing in pre-production”.
This is something I’m trying out in initiating new projects. I’m hoping to be thinking about the final effect, and the final mix decisions long before we actually arrive there, and already in reality, the closer we get towards a final mix, the closer we get to determining exactly what is required of each of the three components of the soundtrack. Some of the most useful ways of doing this I’ve found are narrative, or gameplay, dynamics maps (detailed here http://www.gamasutra.com/view/feature/132531/dynamics_of_narrative.php ) – these will give an idea of what is needed from each of the three elements, though these are like graphic scores that allow a great deal of interpretation from the artist charged with creation. At least understanding the fundamentals of the dynamics involved in a project will give rise to healthy discussions about whose responsibility it is to, say carry action scenes, as opposed to ambient scenes or exploration and moments of ambiguity. Shifting the focus of sound work to understanding interrelationships between the three main threads of a soundtrack much earlier on in a project is where I see so much scope and opportunity in development right now, no matter what the technology or delivery mechanism for the game.
It is Monday morning. So I thought i’d put something together that i’ve been meaning to do for a while, and that is a process document which details some of the high-level decision making and processes that go into the creation and implementation of sound for a game from the asset to the code.
Doing this highlights the importance of a generalist skill set in game audio (for those either looking to get into game audio, or those looking to improve/grow skill-set areas). Not only do you have specific areas/groups of very specific processes, like the recording and editing block at the top of the document (in RED), and the implementation block (in GREEN) towards the bottom, but you have the need for a complete interconnectedness that involves social relations and collaboration in order for the model to work at all (decisions, reviews, communication).
Now, this document was put together thinking of sound design and implementation, but I think this is every bit as applicable to MUSIC and VOICE production. I also think that viewing the processes and decision making like this makes it very clear how our production and collaboration processes can be improved (e.g. less implementation steps using separated software is always a goal). A Voice workflow, for example often works in an iterative way at the RECORDING stage (getting many takes of the same lines in different ways to give more choices later on), rather than at the REVIEW stage (although callbacks and re-writes have become more commonplace), meaning that hearing voice IN CONTEXT and making review decisions and direction decisions is less based on a context-led-rationale than it is in sound FX design. There are many industrial reasons why this is different, but opening up the FX iteration path visually, certainly allows us to perhaps see where we can innovate and improve some of the more rigid industrial structures that are imposed, rather than designed.
Another area I wanted this to highlight, is the ITERATION process. This is the most fundamental part of the whole process, in fact, it is the REVIEW & ITERATION cycle that drives the whole model. Until you get a sound into the game, triggering, playing back, you can never know if it is doing its job or not. Chances are 9 times out of 10, that it is not, or that it could be improved in some way with a tweak of some kind. There is always something that needs to be done. Sometimes it is the re-recording of new material, which results in a journey back to the beginning of the process. Sometimes it is re-visiting assets in the sound library, and sometimes it is down to tweaking in the run-time realm of the game and audio engine. The more this process is repeated, the idea is that the less times you have to revisit the areas nearer the beginning of the process and spend more time refining the run-time game parameter side of the process. All iteration processes aim to refine what is there, and the sooner you can get ANYTHING into the game, the sooner you can start the process of getting closer to the run-time.
Another thing to note is that, there is no ‘finished?’ or ‘complete?’ stage in this process. That is simply because I don’t think the process ever really ends until the game is ripped out of your hands, it constantly gets ‘closer’ to finished, particularly the more time you can go through the latter trigger stages of the flow… but it never really ends. Another reason for this is that the game itself is changing underneath your feet, and so sounds & implementations are often required to change to ‘keep up’ with current architectural and optimization snapshot of the game.
I was also writing about a hybrid procedural audio model on Friday, and this is not accommodated in this flow, but would either be a new path of procedural sound object creation and testing (to replace the RED path), or become a part of the implementation (GREEN) path – ideally replacing the recording and editing stage entirely and shifting heavily towards a more implementation and iteration-based flow.
I made the document in Lucid Chart. It is awesome, free and very easy to use.
Sample-based vs Procedural: Its not quite as dramatic as an all out death match between these two approaches and philosophies, even though the temptation is to see things in either/or black/white terms.
One thought is that, procedural audio, even though it has been around for a while now, is still fledgling and even though there are inherent ‘cost’ savings to using this method for sound generation and propagation (particularly in games with huge amounts of content), finding a home in a largely risk-averse entertainment software industry is a big ask as the applicable approaches still feel fundamentally ‘experimental’. The thing I’ve come to realize, perhaps somewhat later than everyone else (and perhaps because of the ‘either/or’ polemics), is that a lot of the techniques and tools we are using are already in transition to a more procedural status.
This is just a quick categorization attempt that I wanted to get down before it evaporates with the rest of my thoughts and doodles on a Friday morning…
The Sample-Based Approach.
Relying entirely on streaming or preloaded sample based assets sitting on a disc.
(Most games of the PS2/PS3 generation and some mobile games today)
Re-triggering of pre-recorded material, usually wave file assets.
The Procedural Approach
Moving the sound generation effort from the disc (and the streaming throughput bandwidth) to the processor.
Synthesis-based sound objects, acoustic models, grain players, noise-shaping and DSP intensive – in-essence everything is generated at run-time, based on (hopefully) elegant, efficient and simple real-time models.
(currently fringe aesthetic games, some music based games)
For me, the process of just writing these two (admittedly loose) definitions down, made me realize that any proposal to exclusively use either of these models would need to be either a) aesthetically niche or b) technically or artistically challenged in some way. And, even though I tried to say definitively which games used these approaches, I think I’m on unsafe ground in my generalizations. It also made me realize that, of course, there is already a ton of crossover in these categories in most proprietary sound engines, and certainly inside middleware audio solutions. A purely sample-based approach is probably getting quite rare these days. So, are we in the midst of a hybrid approach without even really realizing it?
Hybrid Procedural Approach
(Most console games today)
A fundamentally sample-based approach, but one that goes much further towards the implementation side of things than simple triggers. Breaking down sounds into constituent molecules (granular) or even small recognizable chunks (automatic weapons). Parametrization of sound. Sound ‘shaping’ in the form of Procedural DSP used for ‘additional layers’ like reverbs, filters and flutter. Some soundseed or air implementation in wwise, but just as a subtly mixed in ‘layer’, rather than to supply the overall effect.
We are using procedural techniques and technologies more and more in the form of reverbs and DSP effects. But also in our implementation, we are thinking more procedurally about sound, even if still using sample-based playback material as the starting point and raw material. My feeling is that we have moved towards this often without even realizing the big picture. Could this slow-bleed approach eventually end up with interactive sound designers working completely with acoustic models and unique sound object based propagation? Perhaps for certain genres and platforms. But it is difficult to imagine a move away from a hybrid position into exclusivity. I can though, see certain projects leaning one way or the other.
Chances are if you work in game audio, you are already working in a hybrid procedural audio world.
While the arms-race of each successive console generation offers and tantalizes consumers with higher quality entertainment experiences, defining quality itself has started to get more and more tricky. Is it simply a case of (for sound at least) higher sample rates? More fidelity in the surround field? Playing back more voices simultaneously? Higher resolution DSP effects? Consistency? Less glitches and bugs? More convincing (and convincingly captured) performances from actors?
It does begin to blur around the edges as you realize that this is perhaps one of the broadest and most subjective categories to talk about. Yet, it is fundamental to how we navigate, describe (and judge) increasingly expensive (and often complex) entertainment experiences within our industry. Quality is also something that, you soon realize, doesn’t only apply exclusively to big budget games, but also something that applies to much smaller titles, and even down to simple interfaces. Perhaps it helps to think not about the end result, the objective final output of the game, but about the overall experience, and to that end, perhaps the ‘quality’ of processes that go into creating the experiences themselves requires more examination and investment (beyond the unsatisfactory notions of ‘quality’ simply being a shaded area occupying the intersection of features, budget and time).
I’ve been thinking a lot about this lately (too much, hence the overflow into the written word) and, my own ad-hoc definition of “QUALITY”, in a game production context, might shed some light (or maybe raise more questions) on how to evaluate (and produce) the ‘intangible’ notion of ‘quality’ (note: this is not really about ‘polish’ which I consider to be an endeavor almost exclusively achieved and performed in post-production) – and it is actually informed and tracked across several quite different areas.
1) Quality of Interaction (Communication and Collaboration) Ensuring collaboration is happening at the high level (between leadership/studio culture/project management) and at the low level (between coders and implementers) and is both happening vertically (intra-discipline) and horizontally (inter-discipline)
2) Quality of Implementation (use of, and access to, material, ease and speed of implementation (tools & pipelines), expertise, iteration time (refinement and enrichment)
3) Quality of Input (Source Assets) and Output (Signal/Data Path): Correctly isolated (or environment specific) recordings (or synthesis) at the highest sample rates and bit depths + I/O signal path (easily re-configurable mixer hierarchies and parametization of sound), controllable, carefully measurable, predictable and trackable output levels. Having this I/O in place allows both upwards and downwards SCALABILITY to different (or newly emerging) platforms.
In combination – I reckon these three areas invariably allow the delivery of refined ‘high-quality’ features and experiences. I’d also like to imply that these areas are not limited to console development (although it is the source of current questions about what ‘next-gen’ actually is/means), but can apply to any technical system whereby the delivery devices are cyclical and incremental.
Perhaps quality is more simply about how well we are able to convey an idea and an experience to a user, and making that distance between the user and the experience as small as possible, such that, in the end, the technology all but disappears completely.
The day to day work of audio can be very detail oriented, and it is easy to get lost in this forest of sound molecules. Solutions to many day to day issues often rely on decision making of a broader kind, and often audio work can be as much political as it is creative, social or technical. Wrangling resources, ensuring that important production information and risks are on everyone’s radar, selling features, ideas, haggling for more time or budget and communicating across disciplinary voids can require a fair degree of entrepreneurial flair.
I’ve been thinking about some general audio pillars within game development a lot. I thought I’d have a go at throwing together some very high-level pillars for game audio which read, to all intents and purposes, like a kind of manifesto promise. The thought here is to provide high-level transparent goals for the audio department within a development environment, and to serve as a series of checks and balances by having a longer term strategic outlook (without that we are marooned in the reactionary, short-term and arguably heading in no direction in particular). This also serves to hold audio accountable to some tangible realities and deliverables, if things aren’t moving in the direction outlined, then, during regular check-ins, course correction can be applied.
Four Strategic Long-Term Audio Pillars
A Focus on Polish is a Focus on Solid Communication.
Whatever the project, polish is one of the most fundamental areas of audio work (it is the reason we focus so much on having good quality source assets and geek-out about microphones, and also why we focus so much on the idea of a ‘signal path’ ). Call it post-production, or mixing, or whatever, the process of removing any unwanted jagged corners, cuts, glitches, sounds that grab the attention at the wrong time, or don’t help the experience is an aspect that is universal to every single sound project. This can be an emphasis on being visible about and scheduling audio Post-Production time, or a familiar and contributory appearance at scrums. But, in order for sound to actually effectively polish something, the work in other areas of production (animation, scripting, world building etc) has to have been somewhat ‘locked down’ – This is an increasingly difficult subject in today’s fast-moving, ‘never-finished’ digital production domain, but one thing that these changes have emphasized over all others is that communication is critical. Iteration, visibility and ‘connectedness’ to the team’s thinking and planning is important to providing polish in the digital production domain. Using continual verbal, visual, and written comms is absolutely essential to keeping everyone in the loop on what is happening. Polish is as much co-ordination, as it is technical or aesthetic choices, and co-ordination is a political endeavor.
Grow, Nurture, and Invest in the Audio Team
Audio teams are often the smallest in the building. They are outnumbered by Art, Design and Tech departments. They can appear to others to be a black box, where no-one understands the processes and voodoo that goes on in sound-proofed rooms. But, we are just like any other department. There is nothing special about team audio; we may see things differently, and have different connections to the team, we have different needs and different skill-sets, but fundamentally, we are exactly the same. In the early days of game development (which these still are) audio often needs to shout that bit louder for equality and representation on the team and to get a seat at the table as a ‘principle collaborator’, rather than an end-of-production ‘service provider’. Everyone on the team will be trained in, and versed in the language of collaboration and innovation. They will know who to go to, how to present, how to prototype an idea and set goals, they will have resources at their disposal, and they will be encouraged to push forward and improve every aspect of their craft and process – removing every element of drag, friction and resistance from their work. Career paths will be clear, transparent and on par with other disciplines in the studio culture. Members of the team will have autonomy to control their own growth and path. The audio budget will always be discussed and adjusted to fit the requirements of the project, with a focus on VALUE.
Early (and Continued) Involvement for Audio
Involvement in earliest genesis discussions of a project. Early involvement with script development, pre-vis work and prototyping as well as with early scheduling and budgeting. Simply put, “Audio is another Art Department.”. The sound team will be able to participate in design discussions, or be empowered to create those opportunities and discussions where they do not yet exist.
Tools & Tech: Put Designer/Implementer UX before Player UX. (Player comes 2nd! – The only way to truly put the player 1st) –
Push the Technology and pipelines in a meaningful, useful and positive direction. Alleviate the designer/implementer’s struggle. The primary goal is to support the person using the tools and enable them a frictionless experience (alleviate enormous fatiguing or repetitive/heavy lifting tasks) when integrating audio into the game. (From small standalone batching scripts and tools, to game engine and audio engine tools & pipelines – the experience of integrating sound should be simple, straightforward, painless and easy to communicate to others) – focusing tools and processes on the user, allowing audio designers to quickly implement assets, switch them and tune them at run-time is a priority for changing the collaborative nature of review sessions etc. This in turn allows the audio designers to focus more clearly on the ‘player’s experience’ rather than wrestling with their own technical issues.
Every studio culture is different, and has a unique approach that solves design and production problems for a unique product line-up. Also, for some audio departments these are problems that are already long-ago solved, while at others, the problems are so much worse (no audio tools, no audio programmer support or resources, and woefully underdeveloped pipelines) – yet every time, audio finds a way to struggle-on, smash through that which resists and make things work and happen. This is really a hopeful push for a broader, more long-term strategic vision – to build resourceful and confident teams with an elevated view of what is in front of them (and behind them), rather than teams fixated on the short-term problems immediately in front.