Skip to the content.

This is the demo page for the paper “KARASINGER: SCORE-FREE SINGING VOICE SYNTHESIS WITH VQ-VAE USING MEL-SPECTROGRAMS”

Abstract

In this paper, we propose a novel neural network model called KaraSinger for a less-studied singing voice synthesis (SVS) task named score-free SVS, in which the prosody and melody are spontaneously decided by machine. KaraSinger comprises a vector-quantized variational autoencoder (VQ-VAE) that compresses the Mel-spectrograms of singing audio to sequences of discrete codes, and a language model (LM) that learns to predict the discrete codes given the corresponding lyrics. For the VQ-VAE part, we employ a Connectionist Temporal Classification (CTC) loss to encourage the discrete codes to carry phoneme-related information. For the LM part, we use location-sensitive attention for learning a robust alignment between the input phoneme sequence and the output discrete code. We keep the architecture of both the VQ-VAE and LM light-weight for fast training and inference speed. We validate the effectiveness of the proposed design choices using a proprietary collection of 550 English pop songs sung by multiple amateur singers. The result of a listening test shows that KaraSinger achieves high scores in intelligibility, musicality, and the overall quality.


Audio Samples

We provide short samples from the subjective evalutaion described in the paper and long-length samples.

Short samples

Lyrics:

  1. Just a small town girl living in a lonely world
  2. She took the midnight train going anywhere
  3. It goes on and on and on and on
  4. She’s got a smile that it seems to me
  5. Take a sad song and make it better
KaraSinger 3-level noCTC
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5

Long samples with accompaniments

Lyrics:
In this paper we propose
a novel neural network model
called Karaoke singer for a less studied
singing voice synthesis task
named score-free SVS
in which the prosody and melody are spontaneously decided by machine.

KaraSinger
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5

Lyrics:
台灣人工智慧實驗室 (Taiwan AI labs)
is a privately funded
research organization based in Taipei.
Our goal is to leverage
unique advantages in Taiwan
to build AI solutions
to solve the worlds problems.

KaraSinger
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5

Contact

Chien-Feng Liao: jerrygood0703@gmail.com

This project is developed and supported by Taiwan AI Labs