Coding Speech through Vocal Tract Kinematics
Audio Samples
Resynthesis Samples (LibriTTS-R)
| Ground Truth |
Resynthesized |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Resynthesis Samples (VCTK)
| Ground Truth |
Resynthesized |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Resynthesis Samples (Multilingual)
| Language |
Ground Truth |
Resynthesized (English-Only-Trained) |
Resynthesized (Fine-Tuned) |
| German |
|
|
|
| Dutch |
|
|
|
| Portuguese |
|
|
|
| Italian |
|
|
|
| Polish |
|
|
|
| Spanish |
|
|
|
| French |
|
|
|
| Korean |
|
|
|
| Japanese |
|
|
|
| Chinese |
|
|
|
Controllability Demo 1: Interpolating Tongue Traces to Manipulate Place of Articulation Interpolation Samples ("lock-rock")
| Mixing Ratio |
Transcription |
Vocal Tract Visualization |
|
100% lock + 0% rock
|
lock
|
|
|
80% lock + 20% rock
|
lock
|
|
|
60% lock + 40% rock
|
lock
|
|
|
40% lock + 60% rock
|
lock
|
|
|
20% lock + 80% rock
|
rock
|
|
|
0% lock + 100% rock
|
rock
|
|
|
-20% lock + 120% rock
|
rock
|
|
Controllability Demo 2: Translating "Loudness" Trace to Manipulate Voice Onset Time ("may-bay-pay")
| Shift in Loudness (ms) |
Transcription |
Simulated Sample |
|
-100
|
may
|
|
|
-80
|
may
|
|
|
-60
|
may
|
|
|
-40
|
may
|
|
|
-20
|
may
|
|
|
0
|
bay
|
|
|
20
|
bay
|
|
|
40
|
bay
|
|
|
60
|
pay
|
|
|
80
|
pay
|
|
|
100
|
pay
|
|