Loudness - The Problems With R128 - An Alternative View

03-06-2014 01:55 PM

Loudness is an issue I have been going on about for a couple of years not as I believe it will bring real benefits to the consumer. It would be great not to have to dive for the remote every time the ads or other audio between programmes comes on. Recently I covered it in my piece about the Sound Matters series that are being held round the UK and we discussed it some more towards the end of Podcast 106.
Jake Knott describes himself is a Manchester based television sound supervisor, recordist, boom operator and gram op with 29 years of broadcast experience. He builds shows, mixes them as loud as you want them, and makes things work. Jake is no armchair expert, he has been been doing the right thing and trying to mix the programmes he works on, to the new standards and finding it hard. He has just posted a comment piece on his blog and with his permission we post it here too, over to you Jake…
The Problems With R128

First off, a statement: I have no problem with the principle of mixing to a constant loudness, because it’s how I was trained at Evesham 30 years ago and it’s how I’ve mixed shows for 20 years. The idea is not new, but in recent years due to lack of training it’s fallen out of practice and people have become slaves to the PPM without using their ears,
Second statement: I do not consider myself to be the greatest mixer in the world. I could be wrong about all this. But I DO mix 10+ hours of network TV every week, and I have lots of chances to measure and experiment. There are some major problems I have found, and nobody has yet been able to answer them; there are many “Heads of Technology” who are quoting me chapter and verse on R128, but few craft mixers who have actually done it.
I’ll start on the small bits and then work up.
1. Why are we doing this?

I thought the biggest complaints from viewers were about adverts and promos, yet these are not covered by R128! There WILL be a spec for them, but why are we going to all this trouble when all we needed to do was tame the commercials. Incidentally, the last year has seen a reduction in advert level to the point where there doesn’t appear to be a major discrepancy any more.
2. The measuring algorithm.

This whole thing is based around a metering system which is meant to tell you how loud something sounds, rather than how much it meters. Hang on, didn’t we already have something that could do that? Like a pair of EARS? Wouldn’t it have been a lot simpler to just train people to use the PPM and LISTEN, like I was taught? Then we could have kept the system people were familiar with. I even had an acronym for this.
P.E.T.: PPM. Ears. Training.
But if you’re going to replace humans with machines, then you will have made sure that your ear-measuring meter actually matches what a human ear hears? When we moved to our new home at MediaCity our studio gained a loudness meter, and I started measuring everything we did. I found that between shows, with a fixed monitor gain, speech which sounded constant to me was reading DIFFERENTLY on the loudness meter. Even within a run of the same show, there were inconsistencies. It seems that the faster the speech, the more it “fills up” the integration, even though it doesn’t sound louder to the ear. I’ve seen errors of up to 2 LUs between shows; this is as much as the variance allowed on the DPP spec!
Now I’m aware that my opinion alone may be incorrect, so I’ve asked a few people, both sound and non-sound staff, what they thought, taking care to word the question in a way that doesn’t influence them. They seem to agree with me. This is, of course, not a scientific study, but it does seem to confirm what I think I’m hearing.
Maybe hearing is such a subjective thing that any algorithm to measure it should take into account a lot more than just the “area under the curve” of an integrator?
3. The selection of the “Integrated” value as the criterion for program acceptance.

The idea that the viewer cares about the overall level of a program is, I think, flawed. What the listener cares about is having to turn their TV up or down, and the thing that causes this is dialogue variation, whether the wrong level or with too much dynamic range. The EBU say that this allows for greater dynamics within a show, but with no restrictions on short-term level variation it is possible to mix a program that has so much dynamic range it’s unlistenable even though it hits the numbers. This was, of course, possible with the old system, but I thought the new one was meant to sort this out? I’ve taken show segments, made them unlistenable with massive level variance, and then submitted them for testing to QC. All of them passed! How can this possibly benefit the home viewer? Again, if the new system is no better, why not stick with the old one and train people to use it properly?
4. There is a major problem with only specifying integrated as a delivery requirement.

This is a major practical issue for me, and so far I have conflicting replies from anyone I’ve asked. Bear with me, this may take a while to explain.
First, a question, the answer of which is very important: “Across a network, should the average loudness of normal presenter speech be constant between shows, and is so what should it be?”
The official answer I had from the DPP is “Yes, -23LUFS”, as is that of Hugh Robjohns in this month’s Sound On Sound article on loudness. There is NO information on this in the delivery requirements.
To understand the problem, we must follow through the implications of a Yes or No answer to the above. We must also look at the basic components of a TV show mix, and how they interact.
A TV studio sound mix typically has three components: Speech, Music and Applause/FX. These are balanced to sound correct relative to each other, and the idea is to maintain an appropriate dynamic range for the home listener.
The problem is very simple. For some shows, the speech is the loudest part of the show, for others the quietest, and for some it’s in the middle.

  1. Speech is loudest: Newsnight
  2. Speech is in the middle: Countdown
  3. Speech is quietest: Jeremy Kyle

I know these are on different networks, but the principle is valid regardless.
Consider the YES case:

We wish speech to be constant between shows. The DPP recommend -23 LUFS, so that is what we do.
Example 1 - Newsnight

This is easy; there is mostly speech and very little else, so that works giving us a final integrated value of -23 LUFS.
Example 2 - Countdown

This is also easy; there are quiet sections (the clock bed) and loud sections (the applause and music) around the speech which more or less cancel out to give us -23 Integrated again.
Example 3 - Kyle

This is a problem. The speech is the QUIETEST element, and there is much shouting, music and applause which are all louder. So if we put the speech at -23 LUFS then the final integrated reading will be well over -23 because of the louder bits. Typically the show sits at around -19 LUFS using this method.
Result For The YES Case: The viewer is happy, the mix sounds right but we cannot conform to the integrated DPP spec for R128.
Consider the NO case:

This is easier from our point of view. If we don’t have to match speech levels, then we can keep our mixes the same, and simply offset the level to hit the DPP spec.
Result For The NO Case: The DPP spec has been satisfied, the mix sounds right but the viewer is still left with inconsistent dialogue levels and STILL has to turn their TV up or down between shows. Nothing has been gained.
There is a third case:

We remove all dynamics from our audience shows so everything is at -23LUFS.
Result For The Third Case: The DPP spec has been satisfied, the viewer has constant loudness, but the whole experience of the show is lost, and everything sounds like Radio 1. As a craft mixer, I don’t consider this an acceptable solution.
Although the DPP recommend the “Yes” case, in practice the “No” case is what is happening. Mixers can set their speech as they wish to allow higher or lower levels, and it all seems fine if you consider a show in isolation and NOT as part of a greater network. We end up with what we had before: inconsistent levels between shows which defeats the entire object of R128.
All of this results from the decision to use the “Integrated” value only, which as you see causes major problems. It’s interesting that our European cousins have all picked different things to measure from the original EBU spec. The American networks seem to have realised the problems, and often have rules for dialogue levels as well as the overall show level.
This last question in particular is the one that I have never had properly answered, Every time I raise this, I get chapter and verse of R128 quoted at me, but the questions are never answered and eventually everyone goes quiet.
I am not trying to be awkward; I believe these are genuine concerns, and I want them to be answered in a satisfactory manner. The new spec is only acceptable if it works for all genres, and it clearly does not.
I believe we are throwing out a standard which had flaws but worked fairly well for 80 years, and we are replacing it with one that won’t even solve the problems it was meant to.
Why did nobody ask the people at the sharp end?
So what do you think? Does Jake have some genuine concerns? Please continue the discussion in the comments below.