Debunking HDR [video]

(yedlin.net)

113 points | by plastic3169 4 days ago

13 comments

_wire_ 3 days ago
Solid overview of applied color theory for video, so worth watching.
As to what was to be debunked, the presentation not only fails to set out a thesis in the introduction, it doesn't even beg a question, so you've got to watch hours to get to the point: SDR and HDR are two measurement systems which when correctly used for most cases (legacy and conventional content) must produce the visual result. The increased fidelity of HDR makes it possible to expand the sensory response and achieve some very realistic new looks that were impossible with SDR, but the significance and value of any look is still up to the creativity of the photographer.
This point could be more easily conveyed by this presentation if the author explained in the history of reproduction technology, human visual adaptation exposes a moment by moment contrast window of about 100:1, which is constantly adjusting across time based on average luminance to create an much larger window of perception of billions:1(+) that allows us to operate under the luminance conditions on earth. But until recently, we haven't expected electronic display media to be used in every condition on earth and even if it can work, you don't pick everywhere as your reference environment for system alignment.
(+)Regarding difference between numbers such as 100 or billions, don't let your common sense about big or small values phase your thinking about differences: perception is logarithmic; it's the degree of ratios that matter more than the absolute magnitude of the numbers. As a famous acoustics engineer (Paul Klipsch) said about where to focus design optimization of response traits of reproduction systems: "If you can't double it or halve it, don't worry about it."
[-]
- dperfect 11 hours ago
  It's hard to boil it down to a simple thesis because the problem is complicated. He admits this in the presentation and points to it being part of the problem itself; there are so many technical details that have been met with marketing confusion and misunderstanding that it's almost impossible to adequately explain the problem in a concise way. Here's my takeaway:
  - It was clearly a mistake to define HDR transfer functions using absolute luminance values. That mistake has created a cascade of additional problems
  - HDR is not what it was marketed to be: it's not superior in many of the ways people think it is, and in some ways (like efficiency) it's actually worse than SDR
  - The fundamental problems with HDR formats have resulted in more problems: proprietary formats like Dolby Vision attempting to patch over some of the issues (while being more closed and expensive, yet failing to fully solve the problem), consumer devices that are forced to render things worse than they might be in SDR due to the fact that it's literally impossible to implement the spec 100% (they have to make assumptions that can be very wrong), endless issues with format conversions leading to inaccurate color representation and/or color banding, and lower quality streaming at given bit rates due to HDR's reliance on higher bit depths to achieve the same tonal gradation as SDR
  - Not only is this a problem for content delivery, but it's also challenging in the content creation phase as filmmakers and studios sometimes misunderstand the technology, changing their process for HDR in a way that makes the situation worse
  Being somewhat of a film nerd myself and dealing with a lot of this first-hand, I completely agree with the overall sentiment and really hope it can get sorted out in the future with a more pragmatic solution that gives filmmakers the freedom to use modern displays more effectively, while not pretending that they should have control over things like the absolute brightness of a person's TV (when they have no idea what environment it might be in).
- strogonoff 19 hours ago
  Regardless of whether it is HDR or SDR, when processing raw data for display spaces one must throw out 90%+ of information of what was captured by the sensor (which is often a small amount of what was available at the scene already). There can simply be no objectivity, it is always about what you saw and what you want others to see, an inherently creative task.
  [-]
  - ttoinou 19 hours ago
    90% really ? What color information get ejected exactly ? For the sensor part are you talking about the fact that the photosites don't cover all the surface ? Or that we only capture a short band of wavelength ? Or that the lens only focuses rays unto specific exact points and make the rest blurry and we loose 3D ?
    [-]
    - Uehreka 24 minutes ago
      Third blind man touching the elephant here: the other commenters are wrong! it’s not about bit depth or linear-to-gamma, it’s the fact that the human eye can detect way more “stops” (the word doesn’t make sense you have to just look it up) of brightness (I guess you could say “a wider range of brightness”, but photography people all say “stops”) than the camera, and the camera can detect more stops of brightness than current formats can properly represent!
      So you have to decide whether to lose the darker parts of the image or the brighter parts of the image you’re capturing. Either way, you’re losing information.
      (In reality we’re all kind of right)
    - sansseriff 18 hours ago
      Cameras capture linear brightness data, proportional to the number of photons that hit each pixel. Human eyes (film cameras too) basically process the logarithm of brightness data. So one of the first things a digital camera can do to throw out a bunch of unneeded data is to take the log of the linear values it records, and save that to disk. You lose a bunch of fine gradations of lightness in the brightest parts of the image. But humans can't tell.
      Gamma encoding, which has been around since the earliest CRTs was a very basic solution to this fact. Nowadays it's silly for any high-dynamic image recording format to not encode data in a log format. Because it's so much more representative of human vision.
      [-]
      - ttoinou 17 hours ago
        Ok so similar to the other commentator then, thanks. According to that metric its much more than 90% we’re throwing out then (:
        [-]
        lucyjojo 16 hours ago
        well technically there's a bunch of stuff that happens after the sensor gets raw data. (also excluding the fact that normal sensors do not capture light phase)
        demosaicing is a first point of loss of data (there is a tiling of monochrome small sensors, you reconstruct color from little bunches with various algorithms)
        there is also a mapping to a color space of your choosing (probably mentioned in the op video, i apologize for i have not watched yet...). sensor color space do not need to match that rendered color space...
        note of interest being that sensors actually capture some infrared light (modulo physical filters to remove that). so yeah if you count that as color, it gets removed. (infrared photography is super cool!)
        then there is denoising/sharpening etc. that mess with your image.
        there might be more stuff i am not aware of too. i have very limited knowledge of the domain...
        [-]
        ttoinou 15 hours ago
        But even before sensor data we go from 100 bits of photons data to 42 bits counted by photosites. Mh well maybe my calculations are too rough
        strogonoff 9 hours ago
        The amount of captured sensor data thrown out when editing heavily depends on the scene and shooting settings, but as I wrote it is probably almost always 90%+ even with the worst cameras and widest possible dynamic range display technology available today.
        In a typical scene shot with existing light outdoors it is probably 98%+.
      - nyanpasu64 13 hours ago
        Raw photographs, don't, do that?
    - michaelt 1 hour ago
      A 4k 30fps video sensor capturing 8 bits per pixel (bayer pattern) image, is capturing 2 gigabits per second. That same 4k 30fps video on Youtube will be 20 megabits per second or less.
      Luckily, it turns out relatively few people need to record random noise, so when we lower the data rate by 99% we get away with it.
    - adgjlsfhk1 18 hours ago
      Presumably they're referring to the fact that most cameras capture ~12-14 bits of brightness vs the 8 that (non-hdr) displays show.
      [-]
      - ttoinou 18 hours ago
        Oh that's normal then. There are mandatory steps of dynamic range reduction in the video editing / color grading pipeline (like a compressor in audio production). So the whole information is not lost but the precision / details can be yes. But that's a weird definition, there are so many photons in daylight capture that you could easily say we really need minimum 21 bits per channel minimum (light intensity of sun / light intensity of moon)
        [-]
        strogonoff 9 hours ago
        > So the whole information is not lost but the precision / details can be yes.
        That does not seem a meaningful statement. Information, and by far most of it, is necessarily discarded. The creative task of the photographer is in deciding what is to be discarded (both at shooting time and at editing time) and shaping the remaining data to make the optimal use of the available display space. Various ways of competing dynamic range is often part of this process.
        > like a compressor in audio production
        Audio is a decent analogy and an illustration of why it is a subjective and creative process. You don’t want to just naively compress everything into a wall of illegible sound, you want to make some things pop at the expense of other things, which is a similar task in photography. Like with photography, you must lose a lot of information at it, because if you preserve all the finest details no one would be able to hear much in real-life circumstances.
        bdavbdav 17 hours ago
        But that’s not seen at the sensor - at least not at once - look at the sun and then look immediately at the dark sky moon (if it were possible) - the only reason you get the detail on the moon is the aperture in front. You couldn’t see the same detail if they were next to each other. The precision is the most dark in the scene next to the most bright, as opposed to the most dark possible next to the most bright. That’s the difference.
        [-]
        ttoinou 17 hours ago
        Hum I can look at a moon croissant and the sun at the same time
        [-]
        bdavbdav 8 hours ago
        Do you not find it takes your eyes time to adjust to different brightness levels? There’s a good reason boats use red lights inside at night.
dcrazy 17 hours ago
I’m 14 minutes into this 2 hour 15 minute presentation that hinges on precision in terminology, and Yedlin is already making oversimplifications that hamper delivery of his point. First of all, he conflates the actual RGB triplets with the colorspace coordinates they represent. He chooses a floating point representation where each value of the triplet corresponds to a coordinate on the normalized axes of the colorspace, but there are other equally valid encodings of the same coordinates. Integers are very common.
Secondly, Rec. 2100 defines more than just a colorspace. A coordinate triple in the Rec. 2100 colorspace does not dictate both luminance and chromaticity. You need to also specify a _transfer function_, of which Rec. 2100 defines two: PQ and HLG. They have different nominal maximum luminance: 10,000 nits for PQ and 1,000 nits for HLG. Without specifying a transfer function, a coordinate triple merely identifies chromaticity. This is true of _all_ color spaces.
On the other hand his feet/meters analogy is excellent and I’m going to steal it next time I need to explain colorspace conversion to someone.
[-]
- matt-attack 11 minutes ago
  He’s also inaccurate when he states that digital cinema is encoded in the P3 color space. It’s actually encoded in the.XYZ colorspace but in most cases, the code values delivered are limited to those that fall within the P3 color gamut. But the XYZ colorspace that’s delivered can encode any color the human eye can see plus many others.
- wizzledonker 16 hours ago
  If you watch a little further until about 20 minutes what follows is an explanation of what the primaries represent (described by you as “colorspace coordinates”) along with a reasonable simplification of what a transfer function is, describing it as part of the colorspace. I believe that’s reasonable? He merely explains Rec. 2100 as if using the PQ transfer function is innate. Definitely all seems appropriate and well presented for the target audience.
  [-]
  - dcrazy 13 hours ago
    I wasn’t able to resume watching, but if he never describes HLG I would call that a miss for his stated goal.
    I don’t want to criticize too much, though. Like I said I’ve only watched 15 minutes, and IIRC this is also the guy who convinced a lot of cinematographers that digital was finally good enough.
    [-]
    - ikatson 3 hours ago
      The author has a FAQ related to the video and he expands on "Why don’t you mention HLG in the demo" in it https://www.yedlin.net/HDR_Demo_FAQ.html
- layer8 17 hours ago
  Continue watching, his overall points are quite valid.
  The presentation could surely be condensed, but also depends on prior knowledge and familiarity with the concepts.
nayuki 19 hours ago
Unrelated to the video content, the technical delivery of the video is stunningly good. There is no buffering time, and clicking at random points in time on the seek bar gives me a result in about 100 ms. The minimal UI is extremely fast - and because seek happens onmousedown, oftentimes the video is already ready by the time I do onmouseup on the physical button. This is important to me because I like to skip around videos to skim the content to look for anything interesting.
Meanwhile, YouTube is incredibly sluggish on my computer, with visible incremental rendering of the page UI, and seeking in a video easily takes 500~1000 ms. It's an embarrassment that the leading video platform, belonging to a multi-billion-dollar company, has a worse user experience than a simple video file with only the web browser's built-in UI controls.
[-]
- 12_throw_away 16 hours ago
  I just want to emphasize parent's point: this is a simple video file with web browser controls. And it's an excellent user experience! You don't need youtube hosting! You don't need javascript! Here is all the code necessary:
```
  <video id="DebunkingHDR" width="100%" height="auto" controls="" autoplay="" preload="preload" bgcolor="black" onended="backtopage()" poster="www.yedlin.net/images/DebunkingHDR_Poster.png">
   <source src="https://yedsite.sfo2.cdn.digitaloceanspaces.com/Debunking_HDR_v102.mp4">
   Your browser does not support HTML5 video.
  </video>
```
  [-]
  - dcrazy 13 hours ago
    > You don't need YouTube hosting
    It looks like he’s using DigitalOcean’s CDN though. This isn’t an mov file thrown on an Apache vhost. And it’s probably not gone viral.
    [-]
    - _kb 1 hour ago
      HTTP range requests [1] are enabled out-of-the-box on both Apache and NGINX for static content. If you slap a fMP4 [2] onto a vhost it will work. No CDN needed.
      Going viral is a seperate technical challenge, but probably not needed in almost every use case.
      [1]: https://http.dev/range-request
      [2]: https://cloudinary.com/glossary/fragmented-mp4
- CharlesW 17 hours ago
  > Unrelated to the video content, the technical delivery of the video is stunningly good.
  To save readers a "View Source", this is the typical progressive file download user experience with CDNs that support byte-range requests.
- tedunangst 14 hours ago
  That said, I appreciate thumbnails while scrubbing, and critically need playback speed multipliers. And key controls to skip ten or twenty seconds. Plenty of room for browsers to improve.
  [-]
  - jeroenhd 4 hours ago
    Out of all of those features, Firefox only seems to lack thumbnails while scrubbing (and those are very expensive to calculate on the fly, which is why Youtube serves up tiny JPEGs when you scroll).
- echoangle 17 hours ago
  It’s not that surprising that a massive page would have to compromise quality for scalability (to decrease server load and storage) compared to a smaller page with less visitors.
  [-]
  - Scaevolus 13 hours ago
    Most of the perceptual latency on YouTube derives from the massive front-end application, not inherent latency in the backend offering adaptive quality codecs for each video.
- elcritch 15 hours ago
  > It's an embarrassment that the leading video platform, belonging to a multi-billion-dollar company, has a worse user experience than a simple video file with only the web browser's built-in UI controls.
  You're surprised because you view Youtube as a video platform. It was that once but now it's an advertising platform that happens to show videos.
  Luckily for now you can pay for YT premium. It makes the experience so much better. no checking for ads every-time you skip.
- phonon 18 hours ago
  More I-frames.
  [-]
  - CharlesW 17 hours ago
    Not necessarily more, but importantly the cadence is fixed at one GOP per second — a good (and not-unusual) choice for progressive download delivery.
sansseriff 18 hours ago
An excellent video. I've admired Yedlin's past work in debunking the need for film cameras over digital when you're going after a 'film look'
I wish he shared his code though. Part of the problem is he can't operate like a normal scientist when all the best color grading tools are proprietary.
I think it would be really cool to make an open source color grading software that simulates the best film looks. But there isn't enough information on Yedlin's website to exactly reproduce all the research he's done with open source tools.
ttoinou 19 hours ago
His previous stuff is so interesting and it's very refreshing to see a Hollywood professional able to dig so deep into those topics and teach us about it https://yedlin.net/NerdyFilmTechStuff/index.html
I think the point that SDR inputs (to a monitor) can be _similar_ to HDR input to monitors that have high dynamic ranges is obvious if you look at the maths involved. Higher dynamic gives you more precision in the information, you can choose to do what you want with it : higher maximum luminosity, better blacks with less noise, more details in the middle etc.
Of course we should also see "HDR" as a social movement, a new way to communicate between engineers, manufacturers and consumers, it's not "only" a math conversion formula.
I believe we could focus first on comparing SDR and HDR black and white images, to see how higher dynamic range only in the luminosity is in itself very interesting to experience
But in the beginning he is saying the images look similar on both monitors. Surely we could find counter examples and that only applies to his cinema stills ? If he can show this is true for all images then indeed he can show that "SDR input to a HDR monitor" is good enough for all human vision. I'm not sure this is true, as I do psychedelic animation I like to use all the gamut of colors I have at my hand and I don't care about representing scenes from the real world, I just want maximum color p0rn to feed my acid brain : 30 bits per pixels surely improve that, as well as wider color gamut / new LEDs wavelengths not used before
[-]
- rocqua 9 hours ago
  As far as I know, most real screens decided to have different display based on HDR or SDR input. Mostly because on HDR they are more willing to do very high brightness for a very small part of the screen.
  Most displays have the ability to simulate their HDR range on SDR input I believe by dynamically inferring the contrast and seeing if it can punch up small local bright areas.
magicalhippo 14 hours ago
I just skimmed through parts of the video as I'm about to head to bed, but at least the bits I listened to sounded more like arguing for why 24bit audio isn't necessary for playback, 16bit will do just fine.
Back in the days I made ray tracers and such, and going from an internal SDR representation to an internal HDR representation was a complete game changer, especially for multiple reflections. That was a decade or more before any consumer HDR monitors were released, so it was all tonemapped to SDR before displaying.
That said, I would really like to see his two monitors display something with really high dynamic range. From the stills I saw in the video, they all seemed quite limited.
Anyway, something to watch fully tomorrow, perhaps he addresses this.
[-]
- plastic3169 11 hours ago
  I think he argues more that if 24-bit audio would have been brought to the market like HDR it would have been advertised as being able to be louder. The new stereos would lack the volume control as 24-bit audio would encode absolute volume levels and older formats would play quietly from the new amazing hardware. Songs would be mastered to have more dynamic range changes so that the sound would pop over to loud volumes every now and then. This would be attributed to the higher bit depth of the signal format even though it would be just a different style of producing music. In order to play old songs loud in the new amazing stereo you would need to buy remastered version which would not simply play louder but would overall be quieter to emphasize the volume spikes added in remastering.
- adgjlsfhk1 12 hours ago
  For pretty much any graphics, you shouldn't be using a capped display space at all. You should be doing all of your math with 32 bit floats (or integers) in a linear color space (wrt photons or human color perception depending on what type of effect you're performing). You should only be converting to a display color space at the end.
virtualritz 5 hours ago
Curiously, when I play this back on my Android phone, after entering full screen, I can not exit/close the tab go to another tab. I can also not reach the system launcher any more.
I had to kill Chrome via system prefs to make it stop.
Seems like this video not only exposes the issues with HDR but also a rather weird bug on Chrome/Android.
IshKebab 4 hours ago
For those of us that don't have time to randomly watch a 2 hour video, what exactly is being debunked here?
autobodie 15 hours ago
I read a whole book about this last year and it made me furious. Well, technically the book was about ACES but it was also about HDR and the problems overlap tremendously. I emphatically agree with this entire video and I will be sharing it widely.
[-]
- leguminous 6 hours ago
  What book was it?
FrostKiwi 11 hours ago
Presentation done in fully in Nuke. That's so funny to me. Technical and artistic excellence.
naikrovek 19 hours ago
The switch from 24-bit color to 30-bit color is very similar to the move from 15-bit color on old computers to 16-bit color.
You didn’t need new displays to make use of it. It wasn’t suddenly brighter or darker.
The change from 15 to 16 bit color was at least visible because the dynamic range of 16-bit color is much lower than 30-bit color, so you could see color banding improve, but it wasn’t some new world of color, like how HDR is sold.
Manufacturers want to keep the sales boom that large cheap TVs brought when we moved away from CRTs. That was probably a “golden age” for screen makers.
So they went from failing to sell 3D screens to semi-successfully getting everyone to replace their SDR screen with an HDR screen, even though almost no one can see the difference in those color depths when shown with everything else being equal.
What really cheeses me on things like this is that TV and monitor manufacturers seem to gate the “blacker blacks” and “whiter whites” behind HDR modes and disable those features for SDR content. That is indefensible.
[-]
- jakkos 17 hours ago
  > Manufacturers want to keep the sales boom that large cheap TVs brought when we moved away from CRTs. That was probably a “golden age” for screen makers.
  IMO the difference between LCD and OLED is massive and "worth buying a new tv" over.
  I've never tried doing an 8-bit vs 10-bit-per-color "blind" test, but I think I'd be able to see it?
  > What really cheeses me on things like this is that TV and monitor manufacturers seem to gate the “blacker blacks” and “whiter whites” behind HDR modes and disable those features for SDR content. That is indefensible.
  This 100%. The hackery I have to regularly perform just to get my "HDR" TV to show an 8-bit-per-color "SDR" signal with it's full range of brightness is maddening.
  [-]
  - o11c 11 hours ago
    > I've never tried doing an 8-bit vs 10-bit-per-color "blind" test, but I think I'd be able to see it?
    In my tests with assorted 24-bit sRGB monitors, a difference of 1 in a single channel is almost always indistinguishable (and this might be a matter of monitor tuning); even a difference of 1 simultaneously in all three channels is only visible in a few places along the lerps. (Contrast all those common shitty 18-bit monitors. On those, even with temporal dithering, the contrast between adjacent colors is always glaringly distracting.)
    (If testing yourself, note that there are 8 corners of the color cube, so 8×7÷2=28 unique pairs. You should use blocks of pixels, not single pixels - 16x16 is nice even though it requires scrolling or wrapping on most monitors, since 16×256 = 4096. 7 pixels wide will fit on a 1920-pixel-wide screen naturally.)
    So HDR is only a win if it adds to the "top". But frankly, most people's monitors are too bright and cause strain to their eyes anyway, so maybe not even then.
    More likely the majority of the gain has nothing to do with 10-bit color channels, and much more to do about improving the quality ("blacker blacks" as you said) of the monitor in general. But anybody who is selling something must necessarily be dishonest, so will never help you get what you actually want.
    (For editing of course, using 16-bit color channels is a good idea to prevent repeated loss of precision. If also using separate alpha per channel, that gives you a total of 96 bits per pixel.)
    [-]
    - erincandescent 4 hours ago
      > In my tests with assorted 24-bit sRGB monitors, a difference of 1 in a single channel is almost always indistinguishable (and this might be a matter of monitor tuning); even a difference of 1 simultaneously in all three channels is only visible in a few places along the lerps. (Contrast all those common shitty 18-bit monitors. On those, even with temporal dithering, the contrast between adjacent colors is always glaringly distracting.)
      Now swap the sRGB primaries for the Rec.2020 primaries. This gives you redder reds, greener greens, and slightly bluer blues (sRGB blue is already pretty good)
      This is why Rec.2020 specifies a minimum of 10-bit per channel colour. It stretches out the chromaticity space and so you need additional precision.
      This is "just" Wide Colour Gamut, not HDR. But even retaining the sRGB gamma curve, mapping sRGB/Rec.709 content into Rec.2020 without loss of precision requires 10-bit precision.
      Swap out the gamma curve for PQ or HLG and then you have extended range at the top. Now you can go super bright without "bleeding" the intensity into the other colour channels. In other words: you can have really bright things without them turning white.
      Defining things in terms of absolute brightness was a bit of a weird decision (probably influenced by how e.g. movie audio is mixed assuming the 0dBFS = 105dB(SPL) reference level that theaters are supposed to be callibrated to) but pushing additional range above the SDR reference levels is reasonable, especially if you expect that range to be used judiciously and/or you do not expect displays to be able to hit their maximum values on that across the whole screen continuously.
    - leguminous 6 hours ago
      On my 8-bit-per-channel monitor, I can easily see banding, though it is mostly obvious in the darker areas in a darkened room. Where this commonly manifests itself is "bloom" from a light object on a dark background.
      I can no longer see banding if I add dither, though, and the extra noise is imperceptible when done well, especially at 4k and with a temporal component.
  - jiggawatts 16 hours ago
    > I've never tried doing an 8-bit vs 10-bit-per-color "blind" test, but I think I'd be able to see it?
    It's only really visible on subtle gradients on certain colours, especially sky blue, where 8 bits isn't sufficient and would result in visible "banding".
    In older SDR footage this is hidden using film grain, which is essentially a type of spatial & temporal dithering.
    HDR allows smooth gradients without needing film grain.
- ttoinou 19 hours ago
  As long as the right content was displayed I instantly saw the upgrade to HDR screens (first time I saw one was a smartphone less than 10 years ago I believe), I knew something was new.
  The same way I could instantly tell when I saw a screen showing a footage with more than 40 fps. And I see constantly on youtube wrongly converted footage from 24 fps to 25 fps, one frame every second jumps / is duplicated
empiricus 19 hours ago
At a minimum we should start from something captured close the reality, and then get creative from that point.. SDR is like black-and-white movies (not quite, but close). We can get creative with it, but can we just see the original natural look? HDR (and the wider color space associated) has a fighting chance to look real, but looking real seems far away from what movie makers are doing.
[-]
- jiggawatts 16 hours ago
  I have a Nikon Z8, which can capture 60 fps 8K footage with (roughly speaking) the equivalent of 1000-nit HDR. The video data rate is something absurd, measured in many gigabits.
  If you look at the raw footage on an OLED HDR monitor, it's like looking out of a window! You get a feeling that you could just reach out and touch the objects behind the panel.
  I've seen a few modern HDR productions that have the same "realism aesthetic" and I rather liked it. I've also enjoyed HDR used as a "special effect" with gleaming highlights everywhere.
  Both styles have their place.
  [-]
  - nottorp 7 hours ago
    But the "special effect" should come with a warning label.
    Tbh so should the excessive messing with colours that's in fashion these days.
jekwoooooe 19 hours ago
[dead]