
My Program Sucks!
Tsoding Daily
Overview
This video explores why a self-made music visualizer "sucks" compared to professional examples, diving deep into the technical aspects of Fast Fourier Transforms (FFT) and signal processing. The creator identifies several shortcomings in their implementation, including how frequency data is processed and displayed, and the lack of proper "windowing" techniques. Through experimentation and referencing external resources, they systematically improve the visualizer, demonstrating how concepts like logarithmic frequency scaling, using the magnitude of complex numbers, and applying windowing functions (like the Hann window) significantly enhance the visualization's accuracy and aesthetic appeal, ultimately bridging the gap between theoretical FFT knowledge and practical application.
Save this permanently with flashcards, quizzes, and AI chat
Chapters
- The creator's music visualizer, while functional, produces a 'janky' and unappealing visualization compared to online examples.
- Key issues include flickering frequencies and less distinct note representations.
- The goal is to investigate and fix these problems by understanding the underlying signal processing techniques.
- A comparison is made with a superior visualization from 'bottleofbeats.com' which clearly shows musical notes.
- Audio is captured as samples in a buffer, which is then processed by the sound thread.
- The Fast Fourier Transform (FFT) converts time-domain audio samples into the frequency domain, representing the amplitude of different frequencies.
- Complex numbers are used in FFT to represent both magnitude and phase of frequencies.
- Frequencies are often displayed logarithmically because the human ear perceives sound non-linearly.
- The creator's initial approach of averaging frequency chunks and starting from 20Hz proved problematic.
- Switching from averaging frequency chunks to taking the maximum value within a chunk provides a better representation.
- Displaying frequencies linearly initially reveals mirroring, which is an artifact of FFT on real-valued signals; the second half can often be ignored.
- Adjusting the starting frequency ('low_f') and reducing the number of samples can impact visual clarity.
- Using 'ceil' when calculating the next frequency in a logarithmic scale helps avoid gaps and ensures smoother progression.
- The initial method of calculating amplitude by taking the maximum of the real and imaginary parts is suboptimal.
- A more accurate approach is to use the magnitude (length) of the complex number representing the frequency, calculated using `hypot` or `cabsf`.
- While this change didn't immediately fix flickering, it's a more correct representation of the frequency's amplitude.
- The flickering might stem from other issues, such as how the FFT window interacts with the signal.
- FFT assumes the input signal is infinitely repeating, but real-world audio is processed in finite chunks (windows).
- Abruptly cutting off a signal creates 'tearing' at the window edges, introducing spurious frequencies (artifacts).
- Windowing functions, like the Hann window, are applied to the signal *before* FFT to smooth these edges.
- The Hann window tapers the signal towards zero at the beginning and end of the window, reducing spectral leakage and phantom frequencies.
- Applying the Hann window significantly cleans up the visualization, removing unwanted flickering and artifacts.
- Applying a logarithmic scale to the *power* (square of amplitude) of frequencies, rather than just amplitude, can further improve visualization by preventing powerful frequencies from overwhelming quieter ones.
- A potential issue was identified with the audio library (Rea) incorrectly handling mono input, always treating it as stereo, which distorted the FFT results.
- Correcting the mono/stereo interpretation resolved significant visualization errors, particularly at the end of the display.
- The characteristic 'ramp' seen in visualizations of square waves (common in chiptune music) is a direct consequence of their harmonic content.
Key takeaways
- Practical audio visualization requires more than just a basic understanding of FFT; techniques like logarithmic scaling, correct amplitude calculation, and windowing are essential.
- Windowing functions (e.g., Hann window) are critical for mitigating spectral leakage and artifacts caused by processing finite audio segments.
- The human ear's non-linear perception of frequency necessitates logarithmic scaling for accurate visual representation.
- FFT artifacts like mirroring and phantom frequencies can often be resolved by understanding the underlying mathematics and applying appropriate signal processing techniques.
- Accurate interpretation of audio input (e.g., mono vs. stereo) by the processing library is fundamental for correct FFT analysis.
- The visual signature of specific waveforms (like square waves) can be directly observed in their frequency domain representation.
Key terms
Test your understanding
- Why is averaging frequency chunks an inferior method for visualization compared to using the maximum value?
- How does windowing (e.g., the Hann window) improve FFT-based visualizations, and what problem does it solve?
- Explain why displaying frequencies on a logarithmic scale is generally preferred for audio visualization.
- What is spectral leakage, and how can it manifest visually in an audio spectrum analyzer?
- Why is it important for an audio processing library to correctly distinguish between mono and stereo input when performing FFT?