
DiTTo-TTS: The TTS System That Doesn't Need Your Phonemes (And Why That's a Big Deal)
Text-to-speech has always been that one AI domain where you couldn't just throw data at the problem and call it a day. “Data is the moat” is straight up not a thing here. Want to build a TTS system? Better get comfortable with phonemizers, forced aligners, duration predictors,