Speech After Gender: A Trans-Feminine Perspective on Next Steps for Speech Science and Technology

Abstract

As experts in voice modification, trans-feminine gender-affirming voice teachers have unique perspectives on voice that confound current understandings of speaker identity. To demonstrate this, we present the Versatile Voice Dataset (VVD), a collection of three speakers modifying their voices along gendered axes. The VVD illustrates that current approaches in speaker modeling, based on categorical notions of gender and a static understanding of vocal texture, fail to account for the flexibility of the vocal tract. Utilizing publicly-available speaker embeddings, we demonstrate that gender classification systems are highly sensitive to voice modification, and speaker verification systems fail to identify voices as coming from the same speaker as voice modification becomes more drastic. As one path towards moving beyond categorical and static notions of speaker identity, we propose modeling individual qualities of vocal texture such as pitch, resonance, and weight.

Samples

Below we provide examples from the Versatile Voice Dataset from a High Pitch, High Resonance, Low Weight configuration to a Low Pitch, Low Resonance, High Weight Configuration, with intermediate samples provided. The L1-Distance between each configuration and the High-High-Low configuration is reported.

High Pitch, High Resonance, Low Weight (L1-Distance = 0)

Sentence 001 002 003
The blue spot is on the key again.
How hard did he hit him?

High Pitch, Medium Resonance, Low Weight (L1-Distance = 1)

Sentence 001 002 003
The blue spot is on the key again.
How hard did he hit him?

Medium Pitch, Medium Resonance, Low Weight (L1-Distance = 2)

Sentence 001 002 003
The blue spot is on the key again.
How hard did he hit him?

Medium Pitch, Medium Resonance, Medium Weight (L1-Distance = 3)

Sentence 001 002 003
The blue spot is on the key again.
How hard did he hit him?

Low Pitch, Medium Resonance, Medium Weight (L1-Distance = 4)

Sentence 001 002 003
The blue spot is on the key again.
How hard did he hit him?

Low Pitch, Low Resonance, Medium Weight (L1-Distance = 5)

Sentence 001 002 003
The blue spot is on the key again.
How hard did he hit him?

Low Pitch, Low Resonance, High Weight (L1-Distance = 6)

Sentence 001 002 003
The blue spot is on the key again.
How hard did he hit him?