How Pitch Shifting Works

A long-form, technical but readable explanation of resampling, phase vocoders, and why retuning changes duration.

If you have ever sped up a vinyl record, you already understand the core of pitch shifting. The sound becomes higher and shorter at the same time. Slow it down and it becomes lower and longer. That is the simplest and most transparent form of pitch shifting: resampling, or changing playback rate.

But modern audio tools often promise a different result: change pitch without changing duration, or change duration without changing pitch. That is where time stretching and algorithms like the phase vocoder come in.

This article walks through the main methods and why your retuned files sound the way they do.

Method 1: Resampling (playback rate)

Resampling changes the speed of audio playback. When the playback rate increases, each waveform cycle happens faster, so pitch rises. Because the cycles happen faster, the audio finishes sooner, so duration decreases.

This is the most direct and predictable method. It does not invent new information. It simply plays the existing signal at a different rate. In audio engineering terms, it is a change in sample rate without a compensating time-stretch step.

The result:

Pitch changes in proportion to the playback rate
Duration changes by the inverse of the same ratio
No time stretching artifacts

This is the method used by our tool. It is simple, transparent, and reliable for retuning to a new reference frequency.

Time stretching aims to change duration without changing pitch. A common approach uses the Short Time Fourier Transform (STFT). The signal is split into overlapping windows, transformed into the frequency domain, and then re-synthesized with modified time spacing.

A phase vocoder tracks the phase relationships between successive frames to reduce discontinuities during reconstruction. It is powerful but can introduce artifacts such as smearing, transient blurring, or a watery texture. These artifacts are why extreme time stretching often sounds unnatural.

Many commercial tools combine time stretching with pitch shifting, but the tradeoff is that more processing tends to mean more artifacts.

Method 3: Hybrid and modern algorithms

Modern pitch shifting tools often use hybrid methods that preserve transients or separate harmonic and percussive components. Some are excellent. Some are not. The key point is that every algorithm makes choices about what to preserve and what to alter.

For simple retuning, a direct playback rate change is often the clearest choice. It keeps the signal consistent, avoids algorithmic artifacts, and produces an output that is faithful to the source recording.

Why duration changes are expected

When you retune by changing playback rate, duration changes. This is not a bug. It is the expected physical relationship between time and pitch in a sampled signal. If you want to preserve duration, you need additional processing, and that processing will inevitably alter the signal in other ways.

So if you notice that your retuned file is a little shorter or longer, that is normal and correct for a transparent retune.

Practical takeaway

If your goal is a clean, consistent retune to a specific frequency standard, resampling is the most honest method. If your goal is to preserve duration at all costs, then time stretching tools may be useful, but be aware of the artifacts they can introduce.

How the Song Re-Tuner implements resampling (in plain code terms)

If you’re curious about exactly how the retune is performed, here’s the pipeline in plain English. The browser provides a built-in audio engine — the Web Audio API — and the tool uses it directly. No custom DSP code is required for the actual pitch shift; that’s the elegant part.

Decode the source file. The browser’s AudioContext.decodeAudioData() takes the raw bytes of your MP3 / WAV / M4A and returns a normalized AudioBuffer of 32-bit float samples at the source’s sample rate.
Calculate the playback-rate ratio. For 432 Hz, the ratio is 432 / 440 ≈ 0.9818. For other Solfeggio targets, the tool uses ratios that map nearby semitones to the target, minimizing the disruption to the music.
Create an OfflineAudioContext. This is a Web Audio context that renders into memory instead of playing back. Its length is set to ceil(originalDuration / playbackRate * sampleRate) so the rendered output has the right number of samples for the longer/shorter retuned version.
Apply the rate change. A BufferSourceNode is connected to the offline context with playbackRate.value = ratio. The Web Audio engine internally handles the resampling using a high-quality interpolation algorithm.
Render to PCM. offlineCtx.startRendering() returns a new AudioBuffer with the retuned audio. This buffer is then serialized to a WAV file (16-bit signed integer PCM) and, in parallel, encoded to MP3 using lamejs — a pure-JavaScript port of the LAME encoder.

The total amount of custom code involved is small. Most of the heavy lifting is delegated to the browser’s audio engine, which has been heavily optimized by Chrome, Firefox, and Safari teams. That delegation is also why the retune sounds clean: you’re using the same DSP code that powers web games, music apps, and audio editors.

Why we don’t use a phase vocoder for retuning

A frequent question: why not preserve duration? Two reasons:

Honesty. A pure playback-rate change is a transparent, reversible transformation. You can mathematically describe exactly what happened. Phase vocoder time-stretching is lossy in a subtle way — transients smear, room ambience can develop a “watery” character, and the result is no longer a pristine version of the source.
Solfeggio framing. The whole point of retuning to 432 Hz or another reference is to shift the pitch reference of the entire recording. That’s what resampling does perfectly. Adding a separate time-stretch step on top to “fix” duration introduces processing artifacts in service of a side-effect (duration) rather than the actual goal (tuning).

If you want to preserve duration, that’s a separate (and more invasive) operation. There are open-source phase vocoder libraries you can run in a browser, but they trade audible artifacts for the duration property. We made the editorial choice to keep the retune transparent.

A useful debug tool: spectrograms

If you want to see the effect of a retune, a spectrogram makes it visible immediately. Tools like Sonic Visualiser or even online spectrogram viewers will show every harmonic in a track. When you compare an original and a 432 Hz retune side by side:

Every horizontal harmonic line in the spectrogram moves down by the same proportional amount (~32 cents for 432 Hz).
The track is slightly longer (≈ 1.8% for 440 → 432).
The relative spacing and structure of the harmonics is preserved — i.e., it’s still the same song, just transposed.

If a “432 Hz” track you found online doesn’t show this pattern — for example, if the harmonics look identical to the original but with adjusted volume in certain bands — it wasn’t actually retuned, just EQ’d. The spectrogram is the cleanest way to verify.

Retune your music now: /