Groovebox, chapter 2

· 956 words · 5 minute read

One of my June 2022 ideas was to build a groovebox. I’ve made some progress on the software side, and I’d like to talk about some of the interesting parts of the work.

First: I’m developing in Rust. Choosing Rust was actually a no-brainer; it’s the right choice for new projects intended to compile directly to native machine instructions. So far, it’s been a great experience. I can see why people become Rust zealots. The language has many excellent features, such as enum variants, type inference, matching, tuples as first-class function parameters and return types, and traits. The tooling and support have a level of consistency and polish that makes onboarding easy; in particular, I like the well-integrated package-management system, the astonishingly helpful build-time error messages, the ample documentation, and the integration with LSP-aware editors like VSCode. (I know I’m giving some credit to Rust for things like LSP and type inference that aren’t unique to the Rust ecosystem, but the ecosystem does deserve a lot of credit for being opinionated in the right places, so that there tends to be one of each cool thing, rather than bombarding new developers with choices.)

This doesn’t mean that learning Rust has been easy. Rust takes entire phases of an engineering career and transforms them to compile-time errors. Those 18 months you spent becoming the company expert on your product’s threading model and resulting race conditions? With Rust, that’s the job of the borrow checker and the Send/Sync traits. They prevent you from writing certain major classes of memory-usage errors and synchronization bugs. All of which sounds wonderful. But the price you pay is having to deal with deep design questions simply to get your project’s Hello World app to build for the first time. There is no “yes, I know this is wrong but it’s OK for now” in Rust (unless you want to wrap everything in unsafe).

Once you get past this up-front mental cost of doing nearly anything in Rust, it becomes second nature, and you become a better coder because of it. But it’s a special kind of firehose to drink from. Of the five months I’ve spent on my music project, I estimate that I spent two weeks learning Rust syntax, and two months figuring out how to express my design in the language. That sounds expensive, but if I’d written this prototype in C++, I’m sure I’d have spent close to the same amount tracking down memory leaks, memory smashers, and threading bugs. Moreover, I can already tell my productivity in Rust is rapidly increasing, whereas even experienced C++ developers continually deal with errors arising from the language’s unfettered power.

I’m happy with the decision to invest in Rust, and I’m looking forward to wider adoption of the language.

Second: Digital sounds are easy. Good digital sounds are hard. One problem is “aliasing.” If you’ve ever seen videos of helicopters flying with stationary propellers, then you understand what aliasing is. In the same way a series of video images can give the false impression that moving blades are standing still, digital audio can misrepresent a sound. And unlike video, where aliasing can look cool or at least not annoying, audio aliasing almost always sounds awful, like a robotic dog whistle or general harshness. When recording real-world sounds, people deal with aliasing by filtering the input sound to prevent at-risk frequencies from getting into the system. But if you’re generating sounds from software, you can unwittingly create mathematically perfect sounds that contain frequencies that alias like crazy. The solution is complicated; in simple terms, your square wave can’t actually be square.

And in case you were still clinging to sin(x) as the way to generate a sine wave, what should x be? There are lots of problems.

  • If x is a float, then how do you properly round it to calculate the Nth sample in a 44.1KHz series? You can multiply N by x, but you still have to decide whether you’re representing the left side of the Nth sample, the right side, or somewhere in the middle. Or you can add some constant factor for each sample, then eventually run out of floating-point precision or accumulate enough error to notice. TL;DR: floating-point is hard.
  • Does x restart at zero for each note-down event? If not, then your amplitude will start at a semi-random place each time.
  • If your oscillator’s pitch is automated, then it will sound choppy if you recompute sin(x) according to a varying frequency. You actually want only the rate of change to change, so that the amplitude varies smoothly. That’s harder than just looking up a sine value in a table.

It’s tempting to rathole on any of these problems to produce specific kinds of high-quality sounds. Eventually I’ll have to do that. But I’m just one person, and I don’t have infinite time to produce something other people can see. I have to prioritize, and for now, I’m choosing a feature-complete 1.0 over a high-quality 0.1.

Third: Debugging sound is a learned skill. Anyone who’s played with synthesizers knows how a sawtooth sounds compared to a square wave. But what’s the difference between a synced and unsynced second oscillator? Turns out it’s why a didgeridoo sounds different from an alien bagpipe; the synced one shapes the primary oscillator’s timbre, and the unsynced one contributes a second tone. And what’s the difference between a low-pass filter automated on a linear scale and one using a logarithmic scale? It’s the difference between a silly bloop and a french horn. In retrospect, I could have used better unit tests and more time in Audacity to get these basic building-blocks correct. But I didn’t realize at the time how important they were.