HQPlayer vs SOX

Discussion in 'Computer Audiophile: Software, Configs, Tools' started by Woland, Aug 7, 2021.

  1. Garns

    Garns Friend

    Pyrate
    Joined:
    Jul 9, 2016
    Likes Received:
    2,484
    Trophy Points:
    93
    Location:
    Sydney, AUS
    ./configure --with-flac will attempt to compile in flac support. It needs libFLAC to link dynamically against. You could also make a static compile of flac and pass it in to the linker. There are some instructions here.
     
  2. Scott Kramer

    Scott Kramer Friend

    Pyrate
    Joined:
    May 3, 2016
    Likes Received:
    1,455
    Trophy Points:
    93
    Thanks! ./configure --with-flac was it.

    Screen Shot 2021-11-29 at 12.41.38 AM.jpg
     
  3. Woland

    Woland Friend

    Pyrate
    Joined:
    Jan 13, 2021
    Likes Received:
    1,322
    Trophy Points:
    93
    Location:
    a friendly land
  4. ohshitgorillas

    ohshitgorillas Friend

    Pyrate
    Joined:
    Nov 27, 2015
    Likes Received:
    685
    Trophy Points:
    93
    Location:
    Sacramento, CA
    The talk around town here has got me curious about HQPlayer and upsampling in general. Since my main DAC is an Android DAP which can't run HQP it and my other DAC is a Bifrost 2 which is 'limited' to 16-bit / 192 kHz, I think SoX is my best bet. Also, I'm a huge nerd and I almost always prefer the DIY route over pre-paid solutions and I've been looking for an excuse to get back into learning Linux.

    Right now my goal is to create upsampled versions of a handful of albums (offline), upsampled to 16X, to defeat the interpolation filters on my Shanling M8.

    I've downloaded the modified version from this thread but can't compile it:
    Code:
    checking for pkg-config... /usr/bin/pkg-config
    checking pkg-config is at least version 0.9.0... yes
    ./configure: line 10252: syntax error near unexpected token `-fstack-protector-strong'
    ./configure: line 10252: `AX_APPEND_COMPILE_FLAGS(-fstack-protector-strong)'
    I've also downloaded a modified version with DSD support (https://github.com/mansr/sox) and applied @Garns modifications to the files in src to raise the number of taps, then recompiled, but for some reason I'm still capped at 32767 taps.

    Also, using the formula above, I'm successfully able to create 4X upsampled versions but I'm getting an error when I try to create 16X upsampled versions at 705.6kHz (changing the sample rate to 705600 and "upsample 16"):

    Code:
    /home/viserion/bin/sox-dsd/src/.libs/sox FAIL formats: can't open output file `test16x.flac': FLAC__STREAM_ENCODER_IO_ERROR
    edit: I am able to create WAV files at 705.6kHz, just not FLACs... apparently FLAC is limited to 655,350 Hz.

    edit2: figured out why I couldn't increase the tap size; I had missed the changes to the rate and sinc files.
     
    Last edited: Dec 29, 2021
  5. fastfwd

    fastfwd Friend

    Pyrate
    Joined:
    Aug 29, 2019
    Likes Received:
    1,010
    Trophy Points:
    93
    Location:
    Silicon Valley
    24/192
    If you're using a VERY old version of GCC (like 4.83), you might have to change that to just "-fstack-protector".
     
  6. ohshitgorillas

    ohshitgorillas Friend

    Pyrate
    Joined:
    Nov 27, 2015
    Likes Received:
    685
    Trophy Points:
    93
    Location:
    Sacramento, CA
    It's definitely capable of accepting 24 bits. I was referring to the fact that Schiit claims their multibit DACs are capable of a certain "real" bit depth below 24 like how people were resampling to 21bit earlier in the thread. I recall reading that the Bifrost 2 has 16 "real bits" although maybe I'm misunderstanding.
     
  7. ohshitgorillas

    ohshitgorillas Friend

    Pyrate
    Joined:
    Nov 27, 2015
    Likes Received:
    685
    Trophy Points:
    93
    Location:
    Sacramento, CA
    Here are some additional resources I've found helpful for following along with this discussion, for anyone else who is curious:

    DAC Digital Filters and their impact in the time and frequency domains
    DAC Digital Filters part 2: Deeper dive into the AK4490 and AK4493 filters
    These posts demonstrate the trade-offs in oversampling filters. I'm not sure I agree with all of his opinions or interpretations (difficult for most, if not all, to hear? it's simple if you know what to listen for...), but otherwise it's a great explanation.


    This is a very clear explanation of FIR design and window functions, plus it demonstrates the custom filter designer for MATLAB/octave referenced earlier.

    There are still a few aspects of this that I don't quite understand yet but I'm getting there.

    Interesting that many people consider sharp filters to be superior... with headphones, I usually prefer the sense of depth and separation offered by slow filters over the relative 'in your face' aggressiveness of sharp filters. I don't mind sharp filters on speakers, though I haven't really done any A/B testing on speakers.

    I will also say after testing that the files I've upsampled using @Garns 64 million tap sinc filter at 4x, 8x, and 16x do sound better than their 16 bit / 44.1 kHz counterparts. The upsampled files make the originals sound somewhat compressed, constrained, congested. What I hear from my Shanling M8 is, in particular, less bloat and better separation in the low end during busy tracks; more realistic and natural textures; improved plankton retrieval, staging, and spatial cues; and generally cleaner sound. The benefits are subtle, e.g. if you can't hear the difference between a slow and sharp filter then it's probably a waste of time... but for those audiophiles dropping major cash chasing 1-2% improvements, this one is solid and free.

    I'm still trying to decide where the sweet spot is for me--4X, 8X, or 16X. Probably depends on the quality of the original recording, but so far I don't hear a ton of difference between 8X and 16X... at least not anything that justifies the massive increase in file size from FLAC to WAV. I also have yet to try upsampling to DSD.
     
    Last edited: Jan 2, 2022
  8. soumya

    soumya Acquaintance

    Joined:
    Jun 24, 2018
    Likes Received:
    42
    Trophy Points:
    18
    Location:
    Mordor, Middle Earth
    Happy New Year to all friends here!

    Returning here after a while - tight deadlines on work front meant more Java and less (nearly 0) DSP fiddling in Python/Octave.
    No updates from Henrik for a new camilladsp release. There are about 16 items open - so quite some items on his plate.
    I will wait for some more time, else will start the project using sox as intermediate pipeline for up-sampling.
     
  9. soumya

    soumya Acquaintance

    Joined:
    Jun 24, 2018
    Likes Received:
    42
    Trophy Points:
    18
    Location:
    Mordor, Middle Earth
    Perfect, on the right track !
    Yes it will look daunting at first. But it's also more rewarding in long run.

    Some quick responses -
    1. Steep filters won't sound bad if designed correctly being cognizant of the taps (computation resource available), the optimal transition width and sampling rate. However with long , steep filters you do start hearing flaws of downstream components more vividly.
    2. Sweet-spot IMO will vary greatly if it's an oversampling or NOS DAC. For Delta Sigma, there is no other option but to defeat the initial 8x digital interpolation filters. You still have 0 order hold or IIR filters after that to take it to Mhz region before modulator comes to play.
    Or convert to DSD but then modulator of the DSD encoder followed by modulator of DAC chip will still determine how things finally sound.
    For NOS R2R DACs, 4x is a very good sweet-spot. Beyond which improvements become more subtle if perceptible at all. Speaking from my experience with Holo Spring via IIS. At every sampling rate make sure you are using the same transition width. Assuming other parameters are optimal, it's the width steepness that determines how much transient information gets recovered.
    3. There is more to the Kaiser beta parameter than I talked about in this thread.
    Here is the thing - using Rectangular Window function not only takes insane amount of resources for similar attenuation of side-lobes which others have figured out; the tones will sound way too soft, fuzzy, won't convey as much subtle details and room/ambient information.
    We have to after all, give the dominant energy in a window it's own space to distinguish from other tones + noise. But we should not attenuate all other energy so much that it makes it sound too thin.

    In DSP, it's all about balance. We just can't take one extreme.
    The other issue is our hearing takes a while to acclimatize and only after an extended run we understand if the changes are good or hurting. Digital filters are not supposed to wow us but present a more subtle improvement which we need to evaluate with as much variety of content before reaching a conclusion.

    Coming back to Kaiser Window - there seems to be a sweet spot for the attenuation of side-lobes (beta parameter).
    Lesser than this, the tones sound thick, peaks sound blunted, soft, fuzzy and background is more grey than black.
    Higher than this value, it begins sounding thin and importantly grainy.
     
  10. audiofool

    audiofool New

    Joined:
    Jan 19, 2022
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    moon


    I tried your coefficients with sox for 44.1 to 176 - very impressive! I need to upsample to 705/768 at 32 bits from each of 44.1, 48, 96, 176, 192. Is it possible you could post the coefficient files? I noticed the volume is lower than using sox rate, is there a gain reduction built in? I have my own workflow adjusting the gain in advance after converting to 64 bit float so I don't need any gain reduction.
    Thanks
     
  11. audiofool

    audiofool New

    Joined:
    Jan 19, 2022
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    moon

    Thanks for the info, I am starting down the rabbit hole with Octave.
    Trying to create an apodizing linear phase filter upsamping 16x
    Found some code to start with, not sure if I have this right. Corner frequency set at 21khz I think makes it apodizing?

    Thanks for any input.

    fn=352800 % Nyquist freq. (Hz)
    fc=21000 % Corner freq. (Hz)
    tbw=331800 % Transition band width (Hz)
    attn=300 % Stopband attenuation (dB)

    % Make filter:
    d=10^(-attn/20)
    [n, w, beta, ftype] = kaiserord ([fc-tbw/2, fc+tbw/2], [1, 0], [d d], fn*2);
    b = fir1 (n, w, kaiser (n+1, beta), ftype, "noscale");

    % Plot magnitude response:
    [h f] = freqz(b,1,2^18); plot(f/pi*fn, 20*log10(abs(h))); grid; pause
     
  12. audiofool

    audiofool New

    Joined:
    Jan 19, 2022
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    moon


    I have the following code working:
    fs=705600 % Sample Rate (Hz)
    fc=21000 % Corner freq. (Hz)
    tbw=80 % Transition band width (Hz)
    attn=300 % Stopband attenuation (dB)

    % Make filter:
    bands=[fc-tbw/2, fc+tbw/2]
    mag=[1,0]
    d=10^(-attn/20)
    dev=[d,d]
    [n, w, beta, ftype] = kaiserord(bands, mag, dev, fs);
    b = fir1 (n, w, ftype, kaiser (n+1, beta), "noscale");

    I had to delete the last coefficient, seems to mess it up for some reason?
    Creates about 180k coefficients, Does it look ok?
    You mentioned something about 2 stage filter, any example code you can share?
    Thank you for the reference to camiladsp - looks promising, will test soon.
     
  13. soumya

    soumya Acquaintance

    Joined:
    Jun 24, 2018
    Likes Received:
    42
    Trophy Points:
    18
    Location:
    Mordor, Middle Earth

    Hey there,
    So when you say apodizing - do you imply smoothening out the transition from 1 to 0 ?
    In general , applying a window function to a convolution kernel itself is often referred to apodization. The underlying notion is same.
    So, if you are using Kaiser Window, first determine for a given number of taps and transition bandwidth, how much attenuation is possible . IIRC, Octave doesn't have such a util. SciPy does.
    https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.kaiser_atten.html

    Say you want to design a steep low pass filter for sampling rate of 176400 Hz (up-sampled 4x from 44100Hz), with transition bandwidth of 32 Hz for 64K taps (65536)

    In SciPy you can do
    nyquist_limit of target_fs = 176400 / 2 = 88200
    transition_bandwidth / nyquist_limit = 32 / 88200 = 0.0003628117913832199546485

    kaiser_atten(65536, 0.0007256235827664399092971)
    gives 178.63319903986257 dB attenuation which is way below than even 24 bit noise floor.

    Next use these values to feed in to kaiserord to compute the optimal beta value
    numtaps, beta = kaiserord( 178.63319903986257, transition_bandwidth / (0.5 * target_fs) )

    This returns a beta of 18.72663853419286 :)

    Now we have all what it takes to create an FIR filter using Kaiser Window as the apodizing function in to the convolution kernel

    Happy learning!
     
  14. soumya

    soumya Acquaintance

    Joined:
    Jun 24, 2018
    Likes Received:
    42
    Trophy Points:
    18
    Location:
    Mordor, Middle Earth
    I might want to move this in to a separate thread later... just apprising others of what I have been up to.

    I have been optimizing the coefficients of different lengths and different up-sampling ratios ranging from 4Fs to 16Fs.
    Last weekend I had a major breakthrough for 16Fs and sub 1 Million length coefficients.

    While at it , I couldn't help appreciating the similarities in philosophies of Schiit Closed Form filter and Rob Watt's WTA filter. If you think, of it they are both trying to achieve the same thing - a steep filter that has very good time domain performance too.
    Striking this balance is not easy.

    From my subjective listening experiences , this is what I observed
    I. Kaiser Window is the best (conventional) window when it comes to excellent stop band rejection. With right number of taps and beta values, it comes close to theoretical brickwall LPF.
    This is 256 K taps Kaiser Window offering full 32 bit Dynamic range for 4x up-sampling. Look great in frequency domain!
    [​IMG]

    Now look at its performance in Time Domain
    [​IMG]

    And there in lies the problem - only a tiny centre region is actually having sinc coefficients. This has several side-effects
    1. The loudest sound will overpower other sounds
    2. Time domain inaccuracy will translate to poor micro and macro dynamics specially since a large number of them are closer to 0
    3. Because of poor temporal performance, the sound-stage also takes a hit in comparison to say listening on a NOS R2R DAC.
    4. Natural music can sound thin - again refer to point 1.

    II. Rectangular Window has best time domain performance. Of course :p
    But due to the abrupt transition from 1 to 0 causes it to sound soft and importantly the transition bandwidth is huge. No brick-wall like steepness. Leakage (due to Gibb's phenomenon) will be high
    [​IMG]
    Time domain
    [​IMG]
    Notice how the rectangular window never touches 0 within the window. This lack of smoothening to 0 or in other words - abrupt transition does not make it a good candidate by itself for steep filters or interpolation in general despite having best temporal performance.

    So the question that I posed some time back is can we arrive at some form of trade-off - sacrificing little at the beginning and end of the window in time domain and some steepness in frequency domain to get an overall great sounding filter ?

    it turns out we can :)
    So first, amongst the convention filters take a look at Tukey Window.
    https://en.wikipedia.org/wiki/Window_function#Tukey_window
    It's a convolution of rectangular window with a tapered cosine function. While this in itself sounded better, it still lacks Kaiser's excellent frequency domain performance and steepness.

    The ideal solution , would be to to have as much as sinc coefficients in between but smoothened by Kaiser coefficients at the beginning or end of the window to desired attenuation levels.

    And this is what I got so far :)
    I am sacrificing a tiny bit of steepness and allowing just a hint more leakage. So much that is not perceptible from Kaiser all that much.
    [​IMG]
    But now look at its Time Domain performance
    [​IMG]


    More to continue ....
     
    Last edited: Mar 2, 2022
  15. audiofool

    audiofool New

    Joined:
    Jan 19, 2022
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    moon
    SciPy looks good, how do I get the FIR coefficients out of it? Can I avoid Octave and do everything in SciPy?

    I like either the min phase short or linear long filters from hqplayer in apodizing form. I don't have a full understanding of Apodizing since it is being used to mean different things by different companies. What I am trying to do is what hqplayer does when defining apodizing, ie. replace the original ringing with the new filters ringing - can be any phase of filter. I originally thought simply reducing the bandwith would accomplish this but I think it is more complicated.
     
  16. audiofool

    audiofool New

    Joined:
    Jan 19, 2022
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    moon
    Started experimenting with SciPy. Goal is to create linear phase brick wall Chord type filter.
    This seems to work but creates messy stuff on each end of the window with high tap numbers, maybe it's a precision issue? Probably better ways to do it in SciPy?
    taps = signal.firwin(1025233, 20000, width=14, window='kaiser', pass_zero='lowpass', scale=352800, fs=705600)
    scipy.io.wavfile.write('coef.wav', 705600, taps.astype(np.float32))

    I like your idea of multiple windows, I think WTA is using a combination of rectangular and kaiser. I haven't figured out how to combine them yet.

    My other issue is work flow related - I use ffmpeg to convert formats to 64 float and then sox to integer upsample and then apply FIR. I think sox(and maybe also ffmpeg) will run into limits with large tap numbers so will use camilladsp. Would be nice if I could upsample using ffmpeg to avoid sox but haven't found a way to do it.
     
  17. audiofool

    audiofool New

    Joined:
    Jan 19, 2022
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    moon
    Trying to duplicate what you suggested using Tukey and Kaiser, this seems to work but probably there are more precise ways to do this?

    sample_rate = 705600

    # The Nyquist rate of the signal.
    nyq_rate = sample_rate / 2.0

    # The desired width of the transition from pass to stop,
    # relative to the Nyquist rate. We'll design the filter
    # with a 5 Hz transition width.
    width = 3.0/nyq_rate

    # The desired attenuation in the stop band, in dB.
    ripple_db = 300.0

    # Compute the order and Kaiser parameter for the FIR filter.
    N, beta = kaiserord(ripple_db, width)

    # The cutoff frequency of the filter.
    cutoff_hz = 20000.0

    # Use firwin with a Tukey window to create a lowpass FIR filter.
    taps1 = firwin(N, cutoff_hz/nyq_rate, window='tukey')
    # Use firwin with a Kaiser window to create a lowpass FIR filter.
    taps2 = firwin(N, cutoff_hz/nyq_rate, window=('kaiser', beta))
    # Convolve both windows
    taps3 = signal.fftconvolve(taps1, taps2, mode='same')

    # output
    scipy.io.wavfile.write('coef.wav', 705600, taps3.astype(np.float64))
     
  18. ohshitgorillas

    ohshitgorillas Friend

    Pyrate
    Joined:
    Nov 27, 2015
    Likes Received:
    685
    Trophy Points:
    93
    Location:
    Sacramento, CA
    I am trying to mess around with a version of SoX that is modded to convert to DSD: https://github.com/mansr/sox with the goal of (offline) upsampling files to DSD using a sinc filter with a stupidly high number of taps.

    I've followed these instructions:

    I was able to edit fft4g.c, but fft4g,h doesn't contain the FFT4G_MAX_SIZE parameter:

    Code:
    /* This library is free software; you can redistribute it and/or modify it
     * under the terms of the GNU Lesser General Public License as published by
     * the Free Software Foundation; either version 2.1 of the License, or (at
     * your option) any later version.
     *
     * This library is distributed in the hope that it will be useful, but
     * WITHOUT ANY WARRANTY; without even the implied warranty of
     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser
     * General Public License for more details.
     *
     * You should have received a copy of the GNU Lesser General Public License
     * along with this library; if not, write to the Free Software Foundation,
     * Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
     */
    
    void lsx_cdft(int, int, double *, int *, double *);
    void lsx_rdft(int, int, double *, int *, double *);
    void lsx_ddct(int, int, double *, int *, double *);
    void lsx_ddst(int, int, double *, int *, double *);
    void lsx_dfct(int, double *, double *, int *, double *);
    void lsx_dfst(int, double *, double *, int *, double *);
    
    void lsx_cdft_f(int, int, float *, int *, float *);
    void lsx_rdft_f(int, int, float *, int *, float *);
    void lsx_ddct_f(int, int, float *, int *, float *);
    void lsx_ddst_f(int, int, float *, int *, float *);
    void lsx_dfct_f(int, float *, float *, int *, float *);
    void lsx_dfst_f(int, float *, float *, int *, float *);
    
    #define dft_br_len(l) (2 + (1 << (int)(log(l / 2 + .5) / log(2.)) / 2))
    #define dft_sc_len(l) (l / 2)
    
    /* Over-allocate h by 2 to use these macros */
    #define LSX_PACK(h, n)   h[1] = h[n]
    #define LSX_UNPACK(h, n) h[n] = h[1], h[n + 1] = h[1] = 0;
    Unfortunately, the modifications to fft4g.c aren't enough as running the command 'sox input.flac -r 2822400 -b 1 output.dsf sinc -22050 -n 1000000 rate -u 2822400' tells me that the number of taps must be between 11 and 32767.

    Any ideas how I can hack the DSD-modded SoX to experiment with DSD upsampling? I unfortunately only speak Matlab, so this is way out of my wheelhouse.
     
  19. fastfwd

    fastfwd Friend

    Pyrate
    Joined:
    Aug 29, 2019
    Likes Received:
    1,010
    Trophy Points:
    93
    Location:
    Silicon Valley
    Search all the source code for the text of that error message, excluding the 11 and 32767 (i.e., search for "The number of taps must be between" or whatever the exact error message is). If you're lucky, the statement you find will be something like:
    Code:
    printf("The number of taps must be between %u and %u.\n", MINTAPS, MAXTAPS);
    And then you can search the code for MAXTAPS (or whatever the actual name is). If you're lucky enough to find "#define MAXTAPS 32767" or "MAXTAPS = 32767" -- and especially if you find only one line that looks like that -- change the 32767 to 50000 and see whether you can successfully use 50000 taps.

    If 50000 works, try 66000. Then if that works, go ahead and try 1000000 or 16777216 or whatever.
     
  20. ohshitgorillas

    ohshitgorillas Friend

    Pyrate
    Joined:
    Nov 27, 2015
    Likes Received:
    685
    Trophy Points:
    93
    Location:
    Sacramento, CA
    Thanks, it wasn't that easy but 'grep -r "taps"' allowed me to find it in src/sinc.c:

    Code:
    GETOPT_NUMERIC(optstate, 'n', num_taps[1], 11, 1000000)
    Now I just need to learn how to actually use the SoX cli... somehow the first round of files that I made came out with a 352.8 kHz sampling rate at 1 bit... and they did not sound great.

    Edit: After some further digging around, I've discovered that this sox implementation has its own filters, although any documentation I can find (in the form of forum posts from the author) are out of date.

    According to the man page,

    Code:
    sdm [-f filter] [-t order] [-n num] [-l latency]
                  Apply a 1-bit sigma-delta modulator producing DSD output.  The input should be previously upsampled, e.g. with the rate effect, to a high rate, 2.8224MHz for DSD64.  The -f option selects the noise-shaping filter from the following list where the number  indi‐
                  cates the order of the filter:
                     clans-4      sdm-4
                     clans-5      sdm-5
                     clans-6      sdm-6
                     clans-7      sdm-7
                     clans-8      sdm-8
    
                  The noise filter may be combined with a partial trellis/viterbi search by supplying the following options:
    
                  -t     Trellis order, max 32.
    
                  -n     Number of paths to consider, max 32.
    
                  -l     Output latency, max 2048.
    
                  The result of using these parameters is hard to predict and can include high noise levels or instability.  Caution is advised.
    however,
    Code:
    sox input.flac -b 1 -r 2822400 output.dsf sdm -f clans-8
    yields the error
    Code:
    /home/adam/sox/src/.libs/sox FAIL sdm: invalid filter name 'clans-8'
    . Fack.
     
    Last edited: Oct 19, 2023

Share This Page