My Program Sucks!

Tsoding Daily

6 chapters6 takeaways13 key terms5 questions

Overview

This video explores why a self-made music visualizer "sucks" compared to professional examples, diving deep into the technical aspects of Fast Fourier Transforms (FFT) and signal processing. The creator identifies several shortcomings in their implementation, including how frequency data is processed and displayed, and the lack of proper "windowing" techniques. Through experimentation and referencing external resources, they systematically improve the visualizer, demonstrating how concepts like logarithmic frequency scaling, using the magnitude of complex numbers, and applying windowing functions (like the Hann window) significantly enhance the visualization's accuracy and aesthetic appeal, ultimately bridging the gap between theoretical FFT knowledge and practical application.

How was this?

Save this permanently with flashcards, quizzes, and AI chat

Chapters

The creator's music visualizer, while functional, produces a 'janky' and unappealing visualization compared to online examples.
Key issues include flickering frequencies and less distinct note representations.
The goal is to investigate and fix these problems by understanding the underlying signal processing techniques.
A comparison is made with a superior visualization from 'bottleofbeats.com' which clearly shows musical notes.

Understanding the specific flaws in a current implementation is the first step toward effective improvement and learning advanced concepts.

The creator contrasts their visualizer's 'janky' output with the clear, responsive visualization seen on bottleofbeats.com for the same song.

Audio is captured as samples in a buffer, which is then processed by the sound thread.
The Fast Fourier Transform (FFT) converts time-domain audio samples into the frequency domain, representing the amplitude of different frequencies.
Complex numbers are used in FFT to represent both magnitude and phase of frequencies.
Frequencies are often displayed logarithmically because the human ear perceives sound non-linearly.
The creator's initial approach of averaging frequency chunks and starting from 20Hz proved problematic.

A foundational understanding of how FFT works is crucial for diagnosing and fixing visualization issues related to frequency representation.

The creator explains that their visualizer uses an array where each element represents a frequency, unlike the original audio samples which represent amplitude over time.

Switching from averaging frequency chunks to taking the maximum value within a chunk provides a better representation.
Displaying frequencies linearly initially reveals mirroring, which is an artifact of FFT on real-valued signals; the second half can often be ignored.
Adjusting the starting frequency ('low_f') and reducing the number of samples can impact visual clarity.
Using 'ceil' when calculating the next frequency in a logarithmic scale helps avoid gaps and ensures smoother progression.

These adjustments directly address how raw FFT data is interpreted and presented, leading to a more accurate and visually coherent output.

The creator demonstrates how changing the calculation from averaging to taking the maximum value within a frequency band affects the visualizer's responsiveness.

The initial method of calculating amplitude by taking the maximum of the real and imaginary parts is suboptimal.
A more accurate approach is to use the magnitude (length) of the complex number representing the frequency, calculated using `hypot` or `cabsf`.
While this change didn't immediately fix flickering, it's a more correct representation of the frequency's amplitude.
The flickering might stem from other issues, such as how the FFT window interacts with the signal.

Correctly calculating the amplitude of each frequency is essential for accurately representing the intensity of sound components.

The creator replaces the manual max of real/imaginary parts with a call to `cabsf` (complex absolute value) to get the true magnitude of the frequency component.

FFT assumes the input signal is infinitely repeating, but real-world audio is processed in finite chunks (windows).
Abruptly cutting off a signal creates 'tearing' at the window edges, introducing spurious frequencies (artifacts).
Windowing functions, like the Hann window, are applied to the signal *before* FFT to smooth these edges.
The Hann window tapers the signal towards zero at the beginning and end of the window, reducing spectral leakage and phantom frequencies.
Applying the Hann window significantly cleans up the visualization, removing unwanted flickering and artifacts.

Windowing is a fundamental signal processing technique that dramatically improves the accuracy of FFT analysis by mitigating artifacts caused by finite signal segments.

The creator demonstrates how applying a Hann window function to the input signal before FFT eliminates the 'phantom frequencies' that caused flickering in their visualizer.

Applying a logarithmic scale to the *power* (square of amplitude) of frequencies, rather than just amplitude, can further improve visualization by preventing powerful frequencies from overwhelming quieter ones.
A potential issue was identified with the audio library (Rea) incorrectly handling mono input, always treating it as stereo, which distorted the FFT results.
Correcting the mono/stereo interpretation resolved significant visualization errors, particularly at the end of the display.
The characteristic 'ramp' seen in visualizations of square waves (common in chiptune music) is a direct consequence of their harmonic content.

These final adjustments address subtle but important aspects of audio visualization, including dynamic range compression and correct audio input interpretation, leading to a much more polished result.

The creator identifies that the visualizer was misinterpreting mono audio as stereo, leading to incorrect FFT data, and fixes this by recompiling with the correct assumption about the audio API.

Key takeaways

1Practical audio visualization requires more than just a basic understanding of FFT; techniques like logarithmic scaling, correct amplitude calculation, and windowing are essential.
2Windowing functions (e.g., Hann window) are critical for mitigating spectral leakage and artifacts caused by processing finite audio segments.
3The human ear's non-linear perception of frequency necessitates logarithmic scaling for accurate visual representation.
4FFT artifacts like mirroring and phantom frequencies can often be resolved by understanding the underlying mathematics and applying appropriate signal processing techniques.
5Accurate interpretation of audio input (e.g., mono vs. stereo) by the processing library is fundamental for correct FFT analysis.
6The visual signature of specific waveforms (like square waves) can be directly observed in their frequency domain representation.

Key terms

Fast Fourier Transform (FFT)Time DomainFrequency DomainAmplitudeFrequencyComplex NumbersLogarithmic ScaleWindowingHann WindowSpectral LeakageArtifactsMagnitudeMono/Stereo

Test your understanding

1Why is averaging frequency chunks an inferior method for visualization compared to using the maximum value?
2How does windowing (e.g., the Hann window) improve FFT-based visualizations, and what problem does it solve?
3Explain why displaying frequencies on a logarithmic scale is generally preferred for audio visualization.
4What is spectral leakage, and how can it manifest visually in an audio spectrum analyzer?
5Why is it important for an audio processing library to correctly distinguish between mono and stereo input when performing FFT?