SawSing, Singing Synthesis, DDSP
Here's the paper:
https://arxiv.org/abs/2208.04756
They refer to the technique as "subtractive synthesis". It's not wrong, but it's potentially misleading. The way they are using it, you could call reverberation a form of subtractive synthesis too.
Bibliography is very self-congratulatory. Most of it is citations from papers from the last 2 years that they wrote themselves. No classical research in singing or speech other than a small paper by Perry Cook. Disappointing.
SawSing, Singing Synthesis, DDSP
This singing synthesis paper for a technique called "SawSing" popped up. It's a source-filter model that takes a saw wave and shapes the spectrum using an FIR filter designed by a neural network.
The whole technique basically just sounds like LPC with extra steps, and we've had that for decades.
Vocal Synthesis in Rick and Morty s5e2, minor spoilers
The wooden puppets in s5e2 of R+M employ something reminiscent of vintage speech synthesis techniques. Specifically, they make use of a low-fi sibilance detector. Every time the input signal makes a sibilant sound or a plosive, it injects noise is the output speech synthesizer. Thing is, the noise is so low-fi and dark, the output result is this artificial lisp sound. It's a nice detail.
Why I'm interested in vocal synthesis
1. It sounds funny. Seriously. Vocal synthesis is one of the raree instances where computer music can have a sense of humor. I find that compelling.
2. It encourages new ways to think about computer generated music. Getting a computer to "sing" often benefits from building new tools.
Singing synthesis tools to make at some point
VoxLathe: this will be software that allows one to "sculpt" vocal tract shapes to produce phonemes. Emphasis on craft and not realism.
VoxBox: the vocal synthesizer itself, that loads states generated from VoxLathe, and is controlled via GestVM, my gesture sequence. Sequencing control is done via Uxn, and it'll be able to plug into it as a virtual device.
Vocal Synthesizer Updates
Fricatives can wait. Vowels can get you very far in singing synthesis. And I know how to do that.
Once again, I find myself leafing through the Mrayati paper on the distinctive region model (DRM), which seems to be one of the few ways of parametrically controlling tract waveguides in a meaningful way. It was a very hard paper to find. In theory, meaningful phoneme shapes can be made with only 8 parameters, instead of trying to control say 44 individual parameters.
Vocal Synthesizer Updates
Ugh. This sampling rate issue feels like a deal breaker. I've yet to make this sound good. I already have an implementation that sounds better (it's a higher-order filter. I think it makes a difference)
Vocal Synthesizer Updates
The waveguide model I'm using to represent the vocal tract is sample-rate dependent. The original shapes I was using as a starting point were done with the model working at 20kHz. I'm wanting the model to work at 44.1kHz. So I need new shapes. I'm going to need to build a real time interface that works well enough to "sculpt" shapes by ear. I think I need this built in order to move forward.
Hoping to use my Grid/Arc for this. Yak shaving is in my future.
Various thoughts related to #gesture in the context of music https://pbat.ch/brain/gesture/
I think if you're following me here instead of at @paul, expect more things related to composition, music, and creative process. I may also talk more about DSP related to vocal and singing synthesis.
For croissants and herons, see other account.
#introduction Hi I'm Paul, aka @paul
I'm a computer music composer and researcher, currently interested in what it means for computers to "sing".
I've been a big fan of the Merveilles community for years, and I'm excited to be amongst you all now :)
I teach computers how to sing.