Axino home icon

Check-up: Freeview Loudness
December 2015

For the last three years, a voluntary standard based on EBU guidelines for audio loudness, has been in place for the major New Zealand broadcasters using the Freeview platform. The audio standard BS.1770 is referenced in 'Transmission Rules for the DVBT(Terrestrial)Network' (pdf) and also in 'Technical Standards and Documentation Guide for the Delivery of Television Commercials' to which Maori TV, MediaWorks and TVNZ are signatories. The audio level rules within these documents are closely aligned to the OP-59 standard used for TV sound in Australia.

In early 2013, measurements were made of N.Z Freeview audio levels. Read the 2013 article here. At that time, many stations were both still too loud on average and there were also too many loud excursions well above the target loudness. It is time to revisit the measurements.

2015 Measurements

Recordings were made of seven Freeview stations on the DVBT UHF network. Each recording is 1 hour in length, taken at random times from May to August during 2015. The measurements utilize the free Orban meter version 2.8. First, a table of numbers, showing the true-peak levels, loudness and loudness range for each station.

levels table

The long-term target for loudness is -24dB LKFS and the result for this is in the 6th column labelled Median loudness (green). A graphical representation of the median levels follows:

error graph

On this plot above, the correct -24dB LKFS loudness is at zero; quieter is to the left and louder is to the right. Only TV4, Maori and Choice came close to the correct long-term average loudness, while TV1, TV2, TV3 and Prime are gaming the system a little. An error of +1.6dB (TV1 ) may not seem too great, however it is more that the 1dB recommended in the standards, and may force the viewer to reach for the volume control when switching between a loud and a quiet station.

A better illustration of the range of loudness for each station is by the histograms below. The first one is an idealized illustration to show what a 'good' distribution of loudness might look like, while allowing a little tolerance both sides. The horizontal segments are 1/2 dB steps, centered at -24dB LKFS, while the vertical scale is simply the number of measurements within each 1/2 dB 'bucket'.

histogram1histogram2histogram3histogram4histogram5histogram6histogram7histogram8

Now these are quite revealing. We note first, that no two stations are the same. We have to bear in mind here that these recordings are 1 hour continuous segments, which essentially means one third of the time is ads and two thirds programme. The technical standards are mainly to do with normalising loudness of advertisements (short-form programming), but also recognise that over an extended time, the reference loudness must also apply to long-form programming - the bits between ad-breaks. It is harder to closely control long-form audio levels because the distribution of levels depends on the type of programme. A movie with many quiet parts and programmes that are mostly dialogue will have a skew of distribution toward lower levels. In all cases though, exceeding 2dB above the reference has to be avoided. Accordingly, it is not appropriate to be too critical of a long tail to the left in the plots above, however, a tail to the right and a centring point above the reference loudness are rightfully censured. For a consolidated test like this, most loudness readings should remain within +2dB and -5dB of reference.

Observations on the histograms:

Is a one hour test representative?

In anticipation of the question about whether a one hour test is representative, further checks have been made. The differences found are quite minor compared to the presented results. Peak values of histograms and the spread of loudness on the high side of the median value form similar shapes for each station after less than 15 minutes. What does vary is the spread of values on the low side. Although lengthy periods of dialogue and quiet scenes without underscoring do tend to pull down the median value, around 26% of every hour comprises ads and these tend to maintain the long term median to the values shown.

Are ads responsible for the high values in the histograms?

The answer is certainly yes, but ads are not the only factor. Of importance is that after having isolated several examples of ads within these 1 hour recordings, most are in fact closely controlled in loudness as per the transmission rules.

Below are some loudness vs time plots for some ad-breaks selected at random:

TV1ad plot For the TV1 ad-break shown above, median value is -22.8dB LKFS; actually 0.4dB lower than the median of the full 1 hour recording. The levels are closely controlled and there are no periods greater than 2dB above the median loudness of the ad-break. There are short dips but these are of no consequence in the overall scheme. The larger dip at 150 seconds is actually the bit in the Nova ad where the homeowner and salesperson look at a picture of the cliffs of Dover, pause, then say "England".

TV3ad plot For this 4 minute TV3 ad-break, the median loudness is -24.6dB LKFS and would be slightly lower except for the loud programme promotion bit at the start. TV3's ad breaks are fairly well controlled and are actually have reduced loudness on average than the long-term programme.

Primead plot This nearly 4 minute ad-break on Prime has a median loudness of -21.9dB LKFS, which is 2.1dB higher than the standard and around 1/2 dB higher than their long term median loudness. The range is moderately well controlled with only 8 seconds of this clip more than 2dB above the median value. Prime TV; owned by SKY, is not actually a party to the technical standards cited above and their audio level control appears 'looser' than the other broadcasters.

What have we found?

For the most part, TV ads do have their loudness range controlled to the recommendations. Companies exist; for example 'Ebus'; now owned by the IMD Group, which specialise in ad processing. Such companies are used by many ad-makers to normalise quality of ads including their loudness. The use of such facilities is entirely voluntary and it is not known just how rigorous the broadcasters actually are when it comes to accepting technical standards of ads for broadcast. The programme segments that are the loudest are not now so much the ads, but station promotions of programmes, short station identifiers, underscoring music particularly the grinding racket that commonly underscores sports programme promos, applause tracks and sometimes sting music within sit-coms and such. Broadcasters are not applying the same stringent standards to their in-house material as they claim to do for externally produced short-form programmes (ads). Accuracy of overall station loudness could also be improved, particularly for TV1,2,3 and Prime.

Some of the issues are always going to be hard to control. Although we have a method to normalise the short-form programmes and can see that entire ad-breaks more or less comply with the standards, the transition points going into and leaving the ad-breaks is still problematical. Although a long-form programme might comply with the correct loudness on an overall basis, the points at which the ad-breaks are inserted may well contain higher or (usually) lower audio loudness. Some prior planning by the operational people might mitigate this, if they can manage to better tune insert points to match loudness from short-form to long-form. There are further issues around how long-form programming is normalised with decisions having to be made whether loudness should be anchored to dialogue only or to the whole mix.

Are the loudness standards helping?

There are several parts to this question. First, is the BS.1770 loudness standard good enough to define loudness as a human would hear? Secondly, has the implementation of the loudness standard actually reduced complaints and viewer annoyance? Finally, is annoyance and irritation related to loudness anyway?

Acceptance of BS.1770

The BS.1770 standard is now widely adopted by broadcasters internationally. There is debate about its effectiveness in imitating perceived loudness for all types of audio. This was briefly discussed in the 2013 article. A document from Orban (pdf) relating the merits of BS.1770 against the CBS Laboratories loudness algorithms can be found here. I will say that the concept of audio levelling based on a loudness metric is invaluable. It may not be a complete answer, but is certainly an improvement over the old peak level systems.

Have viewer complaints about loud ads reduced in three years?

In the United States, government enacted the 'CALM' (Commercial Advertisement Loudness Mitigation) act in 2011. This forces broadcasters (and cable-TV operators) to normalise the loudness of ads to be the same as the programmes. The standard to be applied was called ATSC A/85, which in turn, references BS.1770. as the methodology. The U.S government enacted this law to try and reduce the number of viewer complaints about loud ads to the FCC (Federal Communications Commission).

There was a problem. Early versions of BS.1770 had a flaw. This was that silence was allowed to be included in the averaging algorithm. Strictly speaking, a broadcaster that put a completely silent commercial to air (however unlikely) would be non-compliant with the act, which allows a plus or minus 2dB loudness range based on the entire spot. Furthermore, advertisers were making use of this loophole. They would make an ad with a few seconds of really quiet audio, either at the start or at the end, punctuated by blasts of extremely loud audio, well above the normal levels and they would average out to be within the 2dB window and hence be compliant. This would do nothing to lessen viewer complaints of course, but they were within the law. Later versions of BS-1770 from -2 onward have fixed this. There is a gating system in use so that audio levels lower than 10dB below the mean level are ignored, meaning that low audio levels do not modify the loudness values returned. Unfortunately in the U.S, the original standard remained until the ATSC A/85 committee revisited the problem in late 2013 and started enforcing the newer gated standard from June 2015.

Three years after the CALM act originally came into force, the FCC say they have received fewer complaints. Around 20,000 complaints were still filed in 2013 but that has now lessened to about one third as many. The conclusion about the CALM act is that there is no conclusion yet. People may have stopped complaining for many reasons; finding reduced annoyance of ads is only one explanation. Perhaps when the later standard becomes effective?

Here, in New Zealand, there is no CALM act, nor any law on the matter at all. Neither do we have any FCC-equivalent organisation who would either collate complaints or enforce any law, should one ever exist. Viewers have to complain to individual broadcasters, who in turn, would rather do nothing, because that would antagonise the advertisers, who are of course the primary income source for the broadcaster. No complaint data is forthcoming from the broadcasters, but the fact that some have voluntarily compiled a technical standard suggests that they would rather avoid complaints.

Does controlling loudness stop annoyance?

Certainly it does up to a point, but is not a complete answer, nor will there be a complete answer. Annoyance is a human subjective response, variable in the extreme and a technical solution can perhaps only mitigate the problem; helping some people but not others. The full topic is well above my pay grade, however an illustration is below. Three ads, which have exactly the same loudness overall. They have been trimmed a little to preserve your valuable time (and keep the filesize down). Graphs have an audio slider below. Clips are less than 30 seconds in length.

UKpension plot
FlyBuys plot
ElectricianAd plot

From the plots, the green lines are the loudness values. Median loudness is the same in each case and the range is narrow. These would all be fully compliant with the standard. The orange dotted lines are the running value of loudness range. Note the differences. The first one has a low loudness range, meaning highly compressed. The second plot starts with a moderate loudness range but which lowers as the singer goes into screech mode. For this clip, the transition to highly compressed is shorter than indicated because the LRA integration time is long. The third plot carries a moderate loudness range throughout. Although not presented here, another indication of degree of compression is to plot rate of change of the minimum value of momentary loudness (400ms integration) from one logged data point to the next. A plot with rapidly changing values is audio with some variation of levels on a momentary basis; such audio tends to be less annoying.

The fact that these three examples would all be compliant with the loudness regulations is a clear indication that controlling loudness is not the complete answer to reducing loud ad annoyance.

Dolby surround vs stereo

TV1,TV2 and TV3 all broadcast a Dolby AC3 5.1 channel surround audio stream in addition to the compulsory stereo stream (AAC encoded). You (or your receiver) choose which stream to use. Most receivers, if left at default settings, will switch to using the surround track if one is present. This means that TV1,2,3 audio will be on the Dolby stream, while every other station will be on stereo. Some receivers allow you to set a preference for stereo and therefore will not automatically switch. It is a shame they all do not have this facility.

The reason that this is a problem is that the volume you get can be noticeably different between streams. This is not always the fault of the station, but does generate complaints. The Dolby stream can have embedded metadata which is used by the receiver to set audio gain. There is a default setting which the receiver will apply if metadata is missing. The primary metadata value (dialnorm) is set by the programme producer to relate to the loudness of the dialogue only. For short form material (ads), the full mix is used to generate the dialnorm value instead. This can create another level jump at the transition between movie and ad-break. The Dolby stream is meant for surround capable home-theatre systems. It has no value if you listen using the TV speakers, or with only two speakers or with headphones. The broadcasting rules do allow for a greater loudness range and peak level on the Dolby tracks although the broadcasters don't always make use of that. When the Dolby 5-channel stream is mixed down to 2 channels, level differentials from the stereo stream can be exacerbated.

Possible reasons for a variation of volume between Dolby surround and Musicam stereo AAC are listed below in no particular order.

  1. You are feeding the digital audio to your home theatre receiver and do not have the balance of speakers set correctly.
  2. You are feeding the digital audio to your home theatre receiver, have only front speakers connected while choosing a surround mode.
  3. As for above but the receiver does not correctly mix down the channels to stereo.
  4. You listen on the TV set speakers but the receiver does not correctly mix down to stereo or does not implement Dolby TB11 to boost levels by 11dB.
  5. The same issue as (3) or (4) above may exist if you feed the analogue stereo outputs of a receiver to your home-theatre.
  6. The broadcaster has set an incorrect, or inappropriate value of dialnorm to the Dolby stream metadata.
  7. The broadcaster has placed down-mixed stereo audio on the surround tracks and not adjusted levels for the situation
  8. The receiver uses an inappropriate gain setting if only stereo is present on the surround tracks.
  9. You have a compressed audio mode chosen in the receiver settings.
Because of the large number of possible reasons for differential audio volume between streams, no formal evaluation has been made of these differences. A brief check was made of the first 20 minutes of the same TV2 recording used above. The Orban loudness meter was set to 5.1 mode and correctly sums the channels according to the standard to create the rolling loudness values and loudness range. It was noted that the Dolby stream median loudness on this segment was 3dB below the stereo stream. However, the loudness range (LRA) was identical. 3dB is enough to be noticed but not generally enough change to need to rush for the remote.

If you do regularly have a large differential and have dealt with the possibilities above as far as you are able, then the least worse option for you may be to engage the audio compression options on your receiver.

About this test

To make these measurements, I used a Hauppauge WinTV HVR-3300 tuner which receives local DVB-T stations. The tuner produces a transport stream output, which is saved as a .ts file. The file is played back using VLC. Audio measurements on playback are made by the Orban Loudness meter V2.8 using the Win7 pc WASAPI interface. The EBU calibration files verify that the VLC player is responding correctly. The csv log from the Orban meter is imported into Excel to do the summaries and produce the plots and histograms.

A big thanks to Bob Orban of Orban Industries and the team who created the meter software, graciously making it available free of charge.

Comments

There are no comments yet

Make a comment via the contact page