There are two things I’m talking about here. One is that I think the warring audio factions might be talking about two very different things (although the FR ppl seem to think there’s only one thing?). The other is which one I think is more important. It’s a wall of words, and in the end I’m not sure if I truly understand it myself so I’m probably gonna get torn to shreds for suggesting it.

I probably should use the word “timing” instead of “time domain”

I think I personally value the timing realm more than the frequency (pitch) realm. The audio engineers are right… you can only discern so much in terms of pitch. It’s 20 - 20,000 and even that’s generous considering 16,000 is already the limit for lots of older listeners. They’re also right that there are psychoacoustic things about sound. BUT I wonder if they forget about the timing when it comes to audio, because from what I can tell all ‘measurements’ when it comes to audio, are related to the Frequency Response (pitch) and not timing. A visual equivalent might be Audio is Color Spectrum and timing is “Frames Per Second”.

Maybe all the in-fighting over the topic is this mis-understanding? On the one side you have the equivalent of FR people focusing on the ‘color reproduction’ saying “You can’t even see Infrared light!” or “If you adjust the color, then the two pictures are exactly the same”. But then team “timing” is talking about resolution and motion fidelity, not necessarily color reproduction.

For example. How do we determine the location of sounds? The difference in timing between when audio reaches the left and right ears. It can be as low as 10 microseconds according to this article:

https://www.sciencefocus.com/science/why-is-there-left-and-right-on-headphones

Another article mentions that humans can detect even less than 10 microseconds (3 - 5 microseconds?) of timing difference:

https://phys.org/news/2013-02-human-fourier-uncertainty-principle.html

So many things can be explained by this. Spatial Cues like staging and imaging. Transients and Textures depend on the speed of changes in frequency, not the frequencies themselves. I think those same things help in determining how detailed & resolving things seem and relate to micro and macro dynamics. It’s known that if you compare a piano note to a guitar note… it’s the brief attack characteristics, the pluck vs the hammer, that clue us into which sound comes from which instrument. I think all of the “life-like” things are mostly in timing dependent vs frequency or pitch.

From what I can tell… the things that make Hi-Fi gear stand out from just the cheapest gear with good EQ applied, are tied to the timing. I’ve been lucky enough to go to a Can Jam before and listened to very expensive things and everything below in terms of price. To my ears, there IS a difference and it didn’t matter what the price tag said, I wasn’t gonna buy the expensive stuff anyways… I just wanted to hear the differences for myself.

I’ve listened to things that “measure perfectly”, like the near perfect Dan Clark Stealth and Dan Clark Expanse. DC uses meta materials to help dampen and “shape” the sound and coincidently measure nearly perfect to the Harman Curve. I’ve listened to many Chi-Fi DACs and AMPs that also measure perfectly (they all use mounds of negative feedback). And to my ears, those are some of the most boring and life-less things to listen to.

** So in my opinion, faithful reproduction of Frequency is NOT the holy grail. You can EQ things anyway you like and I agree that EQ is excellent! It changes the sound more than most things. But good FR performance is cheap in my opinion and that’s great. What’s not widely available are things that perform well in the timing. From what I can tell, that’s what people pay up for.

I’d be interested to see if one day the industry starts creating ways to measure time-domain performance. In my analogy above I use the metaphor of “Frames Per Second”, but timing changes can also be represented in Hz. In the first article, Humans can use timing cues as small as 10 microseconds (μs) which equates to 100,000 Hz in order to position a sound source. In the second article, Humans can detect changes as small as 3 μs. The article mentions 13x to 10x better time difference detection than expected so if 3 μs is on the extreme 13x side that means the other participants were closer to 4 μs or the 10x figure. Going by the 4 μs figure, that would equate to 250,000 Hz resolution. It’s not about pitch, it’s about changes in the audio.

  • wagningerB
    link
    fedilink
    English
    arrow-up
    1
    ·
    11 months ago

    Maybe you know more about this or have a source for how this works, but you comment reminded me of something that is a bit of a mystery to me: if the ear is a pressure detector, how does stuff like staging work in headphones, when there are just 2 membranes for output and 2 ears for input?

    I get how it works that you hear sounds more on the left than on the right, that’s just a difference in volume… but precise positioning on something like a virtual stage?

    • SupOrSaladB
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      There is the timing and volume level between the ears, which has already been commented on. But as well, if you were to listen to a source in real life from different locations, the response at your ear will be different as well.

      Here’s an example of a Kemar, with free field frequency response measurements at different positions. This is just the left ear, and showing how the response changes at different positions and distances from the head. https://imgur.com/a/Lj8Di0R

      So as a source would change its location, not only would the timing and volume change for each ear, but the sound itself changes too for each ear, which our brain can interpret and compare all the information from both responses to pin point the location of a source. It’s really interesting because even though the sound is changing, your brain still hears it as the same sound

      With headphones, it is a little different since the sound localization is coming from “nowhere”. But with binaural recordings or certain mixing, it does seem possible to simulate some of the localization effects in the recording itself.