SlideShare a Scribd company logo
1 of 72
Nagoya University, Japan
tomoki@icts.nagoya‐u.ac.jp
Hands on Voice Conversion
July 26th, 2018
Tomoki TODA
100
80
60
40
20
0
1 2 3 4 5
MOS on naturalness
Similarity score [%]
Result of Voice Conversion Challenge 2018 
(VCC2018) [Lorenzo‐Trueba; ’18a]
Let’s develop this 
baseline system!
Baseline system
 Naturalness score = 3.5
 Speaker similarity score = 70%
Let’s Start VC Research & Development!
• Purpose: Understand overall procedure of statistical VC and a current 
baseline level of statistical VC techniques so that you will be able to start  
VC research or develop VC systems.
• Goal: Learn to use open‐source VC software to develop a basic VC system 
for speaker conversion using a parallel speech dataset.
• Contents
• Let’s use open‐source VC software, sprocket!
• Let’s develop a traditional GMM‐based VC system!
• Let’s develop a vocoder‐free GMM‐based VC system!
• Let’s learn tips on VC system development!
Outline
Let’s use sprocket!
[Kobayashi; ’18a]
K. Kobayashi, T. Toda,
“sprocket: open‐source voice conversion software,”
Proc. Odyssey 2018, pp. 203—210, June 2018.
https://www.isca‐speech.org/archive/Odyssey_2018/pdfs/47.pdf
sprocket
Open‐Source VC Software: sprocket
• Developed by Dr. Kazuhiro Kobayashi of Nagoya University, JAPAN
• Motivation: provide an environment for both expert and                           
non‐expert users to easily use statistical VC framework
• Simply developed using existing libraries
• Implemented know‐how accumulated through our VC research (> 15 years)
• Freely available for both research and industrial purposes (MIT license)
• Used as a baseline system for Voice Conversion Challenge 2018 (VCC2018)
• Features:
• Traditional VC method based on GMM
• Vocoder‐free VC method based on DIFFGMM
• Supply Python3 VC library
• What we can do using sprocket?
• Can easily reproduce converted voices using VCC2016 & VCC2018 datasets 
[Toda; ’16][Lorenzo‐Trueba; ’18b]
• Can develop VC system using other parallel speech datasets
[Lorenzo‐Trueba; ’18a]
sprocket: 1
Download
• Freely available from GitHub
• Directory structure of sprocket
sprocket‐master/
docs/ # documentation for running an example
examples/ # framework (for running an example)
sprocket/ # sprocket libraries
README.md # README file
LICENSE.txt # license file
requirements.txt
setup.py # setup script
Other files
https://github.com/k2kobayashi/sprocket
sprocket: 2
Install Procedure
Note: You need to use Python3 instead of Python2!
• First, install required libraries by executing the following commands:
$  pip3  install  numpy
$  pip3  install  ‐r  requirements.txt
• Then, install sprocket by executing the following command:
$  python3  setup.py  install
NOTE: These install procedure has already been done in your computer.      
You may execute these commands to confirm it.
sprocket: 3
Let’s Run an Example Script!
• Instructions for running an example script is described in 
doc/vc_example.md.
• An example script can be run under the working directory, example/.
$  cd example/
example/
conf/ # directory for configure files
data/ # data directory
Initialize.py # command script
list/ # directory for list files
run_f0_transformation.py # command script
run_sprocket.py # command script
src/ # source codes
Others
sprocket: 4
Let’s develop traditional
GMM‐based VC system!
[Toda; ’07]
T. Toda, A.W. Black, K. Tokuda,
“Voice conversion based on maximum likelihood estimation of spectral parameter trajectory,”
IEEE Transactions on Audio, Speech, and Language Processing,
Vol. 15, No. 8, pp. 2222—2235, 2007.
GMM-based VC
Overall Procedure of Statistical VC
Source
voices
F0 & power
histograms
F0 & power
histograms
Joint features
Conversion modelsConverted mel-cepstrum
Source speech
parameter sequence
Source feature
sequence
Target feature
sequence
Speaker-dependent statistics
Converted
feature sequence
Time warping function
1. Speech analysis
2. Statistics calculation
3. Joint feature
development
4. Model training
5. Conversion
Target speech
parameter sequence
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
Converted voices
GMM-based VC: 1
Converted voices
Procedure: Preparation Step
Source
voices
F0 & power
histograms
F0 & power
histograms
Joint features
Conversion modelsConverted mel-cepstrum
Source speech
parameter sequence
Source feature
sequence
Target feature
sequence
Speaker-dependent statistics
Converted
feature sequence
Time warping function
1. Speech analysis
2. Statistics calculation
3. Joint feature
development
4. Model training
5. Conversion
Target speech
parameter sequence
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
GMM-based VC: 2
Preparation of Speech Dataset
• Speech waveform files of a parallel dataset between source and target 
speakers need to be prepared.
• Only the following wav file format is supported.
• Sampling rate: 16, 22.05, 44.1, or 48 kHz
• Quantization bit: 16 (signed‐integer)
• Number of channels: 1
• Each utterance (e.g., around 5 seconds) needs to be stored in one wav file.
• Wav files need to be put in data/wav/ directory, e.g., 
data/wav/speakerA/*.wav
data/wav/speakerB/*.wav
• Download script for VCC datasets (download_speech_corpus.py) is also 
available for automatically setting wav files, e.g., 
$  python3  download_speech_corpus.py  downloader_conf/vcc2016.yml
GMM-based VC: 3
Example of Speech Dataset
• In this session, let’s use a part of VCC2018 database [Lorenzo‐Trueba; ’18b].
• Sampling frequency: 22.05 kHz
• Only speakers for parallel training
• Only 60 out of 81 training utterances and 20 out of 35 evaluation utterances
• Note: The following wav files have already been set in your computer.
data/wav/{SF1, SF2, SM1, SM2}/ # 4 source speakers
data/wav/{TF1, TF2, TM1, TM2}/ # 4 target speakers
10001.wav – 10060.wav # 60 utterances for training
30001.wav – 30020.wav # 20 utterances for evaluation
• Let’s check each speaker’s voice by listening to some wav files, e.g.,
data/wav/SF1/10001.wav & data/wav/TF1/10001.wav
data/wav/SM1/30001.wav & data/wav/TM1/30001.wav
: :
GMM-based VC: 4
You can download VCC2018 database by executing
$  python3  download_speech_corpus.py  downloader_conf/vcc2018.yml
and use it.  Note that speaker names are slightly different from those shown in these slides.
Converted voices
Procedure: Initialization Step
Source
voices
F0 & power
histograms
F0 & power
histograms
Joint features
Conversion modelsConverted mel-cepstrum
Source speech
parameter sequence
Source feature
sequence
Target feature
sequence
Speaker-dependent statistics
Converted
feature sequence
Time warping function
1. Speech analysis
2. Statistics calculation
3. Joint feature
development
4. Model training
5. Conversion
Target speech
parameter sequence
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
GMM-based VC: 5
Initialization 1: List File Generation
• Select a source & target speaker pair for the same‐gender conversion
• Source speakers: females = SF1 & SF2,  males = SM1 & SM2
• Target speakers: females = TF1 & TF2,  males = TM1 & TM2 
• Generate list files for your selected speaker‐pair by executing 
$  python3  initialize.py  ‐1  SourceSpeaker TargetSpeaker SamplingRate
e.g., if setting source = SF1, target = TF1, sampling = 22.05 kHz,
$  python3  initialize.py  ‐1  SF1  TF1  22050
• 4 list files will be generated under list/ directory.
list/SF1_train.list # training data list for the source speaker, SF1
list/TF1_train.list # training data list for the target speaker, TF1
list/SF1_eval.list # evaluation data list for SF1
list/TF1_eval.list # evaluation data list for TF1
• Modify each list file to define training & evaluation utterance pairs.
list/{SF1,TF1}_train.list # remain only 10001 – 10060 utterances
list/{SF1,TF1}_eval.list # remain only 30001 – 30020 utterances
GMM-based VC: 6
Example of Modified List Files
• Contents of list/SF1_train.list
SF1/10001
SF1/10002
SF1/10003
:
SF1/10060
• Contents of list/SF1_eval.list
SF1/30001
SF1/30002
SF1/30003
:
SF1/30020
• Contents of list/TF1_train.list
TF1/10001
TF1/10002
TF1/10003
:
TF1/10060
• Contents of list/TF1_eval.list
TF1/30001
TF1/30002
TF1/30003
:
TF1/30020
Should be parallel
• The length and order should be consistent between the source & target 
speakers.  These listed data is regarded as a parallel dataset.
Should be parallel
GMM-based VC: 7
Initialization 2: Configure File Generation
• Generate configure files for your selected speaker‐pair by executing 
$  python3  initialize.py  ‐2  SourceSpeaker TargetSpeaker SamplingRate
e.g.,
$  python3  initialize.py  ‐2  SF1  TF1  22050
• 3 configure files will be generated under conf/ directory.
Speaker‐dependent settings are shown in the following YML files:
conf/speaker/SF1.yml # configure for SF1
conf/speaker/TF1.yml # configure for TF1
Speaker‐pair‐dependent setting is shown in the following YML file:
conf/pair/SF1‐TF1.yml # configure for the speaker‐pair, SF1‐TF1
GMM-based VC: 8
Example of Speaker‐Dependent YML File
• Contents of the speaker‐dependent YML file: conf/speaker/SF1.yml
wav:
fs: 22050 # sampling frequency [Hz]
bit: 16 # quantization bit [bit]
fftl: 1024 # FFT length [points]
shiftms: 5 # shift length [msec]
f0:
minf0: 40 # minimum F0 [Hz]
maxf0: 700 # maximum F0 [Hz]
mcep:
dim: 34 # order of mel‐cepstrum
alpha: 0.455 # all‐path filter parameter for mel‐frequency warping
power:
threshold: ‐15 # power threshold to remove silence frames
analyzer: world # speech analysis method
GMM-based VC: 9
Example of Pair‐Dependent YML File
• Contents of the speaker‐pair‐dependent YML file: conf/pair/SF1‐TF1.yml
jnt:
n_iter: 3 # number of iterative time alignments
GMM:
mcep: # GMM settings for mel‐cepstrum conversion
n_mix: 32 # number of mixture components
n_iter: 100 # number of iteration of GMM training
covtype: full # covariance type of GMM
cvtype: mlpg # conversion method
codeap: # GMM settings for aperiodicity conversion
n_mix: 16 # number of mixture components
: # (these lines are the same as the mcep part)
GV:
morph_coeff: 1.0 # GV postfilter parameter
GMM-based VC: 10
Initialization 3: Manual Settings
• Perform speech analysis for your selected speaker‐pair by executing 
$  python3  initialize.py  ‐3  SourceSpeaker TargetSpeaker SamplingRate
e.g.,
$  python3  initialize.py  ‐3  SF1  TF1  22050
• The following message will be printed out.
### 3. create figures to define parameters ###
Extract: data/wav/SF1/10001.wav
Extract: data/wav/SF1/10002.wav
:
• Finally, 4 PNG files will be generated under conf/figure/ directory.
conf/figure/SF1_f0histogram.png # F0 histogram of SF1
conf/figure/SF1_npowhistogram.png # normalized power histogram of SF1
conf/figure/TF1_f0histogram.png # F0 histogram of TF1
conf/figure/TF1_npowhistogram.png # normalized power histogram of TF1
GMM-based VC: 11
Example of F0 Histogram
• It is very effective to adjust an F0 search range to each speaker for reducing 
F0 extraction errors, such as half F0 and double F0 errors.
• NOTE: This is very important process as I will explain later.
Source speaker: SF1 Target speaker: TF1
Proper F0 search range 
might be 140 – 400 Hz
Supposed to be 
half F0 error
Supposed to be 
half F0 error
Proper F0 search range 
might be 140 – 340 Hz
conf/figure/SF1_f0histogram.png conf/figure/TF1_f0histogram.png
GMM-based VC: 12
Example of Normalized Power Histogram
• It is also effective to adjust a normalized power threshold to each speaker 
for improving time alignment accuracy by removing silence frames.
• NOTE: This is also important process as I will explain later.
Silence 
frames
Source speaker: SF1 Target speaker: TF1
conf/figure/SF1_npowhistogram.png conf/figure/TF1_npowhistogram.png
Speech 
frames
Proper threshold 
might be –30 dB
Silence 
frames
Speech 
frames
Proper threshold 
might be –40 dB
GMM-based VC: 13
Let’s Modify Speaker‐Dependent YML Files
• Contents of conf/speaker/SF1.yml
wav:
fs: 22050
bit: 16
fftl: 1024
shiftms: 5
f0:
minf0: 140
maxf0: 400
mcep:
dim: 34
alpha: 0.455
power:
threshold: ‐30
analyzer: world
• Contents of conf/speaker/TF1.yml
wav:
fs: 22050
bit: 16
fftl: 1024
shiftms: 5
f0:
minf0: 140
maxf0: 340
mcep:
dim: 34
alpha: 0.455
power:
threshold: ‐40
analyzer: world
Revise these
3 values based on 
your observation! 
Revise these
3 values based on
your observation! 
GMM-based VC: 14
Also Modify Pair‐Dependent YML File
• Contents of conf/pair/SF1‐TF1.yml
jnt:
n_iter: 3
GMM:
mcep:
n_mix: 16
n_iter: 100
covtype: full
cvtype: mlpg
codeap:
n_mix: 16
:
GV:
morph_coeff: 1.0
Let’s set the number of mixture components 
for mel‐cepstrum conversion to 16 for reducing 
training time!
GMM-based VC: 15
Converted voices
Procedure: Training Step
Source
voices
F0 & power
histograms
F0 & power
histograms
Joint features
Conversion modelsConverted mel-cepstrum
Source speech
parameter sequence
Source feature
sequence
Target feature
sequence
Speaker-dependent statistics
Converted
feature sequence
Time warping function
1. Speech analysis
2. Statistics calculation
3. Joint feature
development
4. Model training
5. Conversion
Target speech
parameter sequence
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
GMM-based VC: 16
Training 1: Speech Analysis
• Let’s perform speech analysis by executing
$  python3  run_sprocket.py  ‐1  SourceSpeaker TargetSpeaker
e.g.,
$  python3  run_sprocket.py  ‐1  SF1  TF1
• The following message will be printed out,
### 1. Extract acoustic features ###
Extract acoustic features: data/wav/SF1/10001.wav
Extract acoustic features: data/wav/SF1/10002.wav
:
• Speech parameter files and analysis‐synthesized wav files will be generated 
utterance by utterance under data/pair/ directory.
data/pair/SF1‐TF1/h5/{SF1,TF1}/100*.h5 # parameter HDF5 files
data/pair/SF1‐TF1/anasyn/{SF1,TF1}/100*.wav # analysis‐synthesized wav files
GMM-based VC: 17
Speech Analysis Processing
• Source code: src/extract_features.py
• WORLD [Morise; ’16] is used as an speech analysis‐synthesis method.
• Spectral envelope is parameterized into mel‐cepstrum [Tokuda; ’94].
Speech 
waveform
F0 sequence
(f0 seq)
Spectral envelope
Sequence (spc seq)
Coded aperiodicity sequence
(codeap seq)
Mel‐cepstrum sequence
(mcep seq)
Normalized power sequence
(npow seq)
Parameter HDF5 file
(f0, mcep, npow, codeap)wav file
aperiodicity
sequence (ap seq)
*NOTE: F0 is used to accurately estimate 
spectral envelope by removing the effects 
of periodicity of excitation.  Therefore, F0
estimation errors cause adverse effects in 
other parameter estimation.
GMM-based VC: 18
Check If Speech Analysis Works Well
• Let’s check quality of analysis‐synthesized speech (e.g., 1st utterance) of 
the source and the target speakers by listening to them.
data/wav/SF1/10001.wav # original source wav files
data/pair/SF1‐TF1/anasyn/SF1/10001.wav # its analysis‐synthesis wav file
Similar to each other?
data/wav/TF1/10001.wav # original target wav files
data/pair/SF1‐TF1/anasyn/TF1/10001.wav # its analysis‐synthesis wav file
Similar to each other?
• If they are similar to original natural speech, speech analysis works well.    
If F0 of analysis‐synthesized speech sounds quite different from that of 
original natural speech, F0 search range needs to be revised.
GMM-based VC: 19
Converted voices
Procedure: Training Step
Source
voices
F0 & power
histograms
F0 & power
histograms
Joint features
Conversion modelsConverted mel-cepstrum
Source speech
parameter sequence
Source feature
sequence
Target feature
sequence
Speaker-dependent statistics
Converted
feature sequence
Time warping function
1. Speech analysis
2. Statistics calculation
3. Joint feature
development
4. Model training
5. Conversion
Target speech
parameter sequence
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
GMM-based VC: 20
Training 2: Statistics Calculation
• Calculate speaker‐dependent statistics by executing
$  python3  run_sprocket.py  ‐2  SourceSpeaker TargetSpeaker
e.g.,
$  python3  run_sprocket.py  ‐2  SF1  TF1
• Speaker‐dependent statistics files will be generated under data/pair/
directory.
data/pair/SF1‐TF1/stats/SF1.h5 #statistics HDF5 file for SF1
data/pair/SF1‐TF1/stats/TF1.h5 #statistics HDF5 file for Tf1
GMM-based VC: 21
Statistics Calculation Processing
• Source code: src/estimate_feature_statistics.py
• Speaker‐dependent statistics to be used for F0 conversion & GV postfilter 
[Toda; ’12] are extracted.
f0 seqs
mcep seqs
Statistics HDF5 file
(f0stats, gv)
Parameter
HDF5 files
Mean & variance 
vectors
Log F0
sequence
Mean & variance 
values
GMM-based VC: 22
Converted voices
Procedure: Training Step
Source
voices
F0 & power
histograms
F0 & power
histograms
Joint features
Conversion modelsConverted mel-cepstrum
Source speech
parameter sequence
Source feature
sequence
Target feature
sequence
Speaker-dependent statistics
Converted
feature sequence
Time warping function
1. Speech analysis
2. Statistics calculation
3. Joint feature
development
4. Model training
5. Conversion
Target speech
parameter sequence
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
GMM-based VC: 23
Training 3: Joint Feature Development
• Develop joint features by executing
$  python3  run_sprocket.py  ‐3  SourceSpeaker TargetSpeaker
e.g.,
$  python3  run_sprocket.py  ‐3  SF1  TF1
• The following message will be printed out.
### 3. Estimate time warping function and jnt ###
## Alignment mcep w/o 0‐th and silence ##
1‐th joint feature extraction starts.
distortion [dB] for 1‐th file: …..
:
• Finally, a joint feature file and time warping function files will be generated 
under data/pair/ directory.
data/pair/SF1‐TF1/jnt/it3_jnt.h5 # joint feature HDF5 file
data/pair/SF1‐TF1/twf/it3_100*.h5 # time warping function HDF5 files
GMM-based VC: 24
Joint Feature Development Processing
• Source code: src/estimate_twf_and_jnt.py
mcep feature 
seqs
Source
parameter
HDF5 files
GMM for mcep
conversion
mcep seqs
codeap seqs
Time warping 
functions
Joint mcep
feature seqs
Converted mcep
feature seqs
Converted 
mcep seqs
codeap
feature seqs
Joint codeap
feature seqs
mcep feature 
seqs
Target
parameter
HDF5 files
mcep seqs
codeap seqs
codeap
feature seqs
Joint feature HDF5 file
(mcep, codeap) 
Time warping function 
HDF5 files (twf) 
Iterative
processing
Utterance-by-utterance
processing
GMM-based VC: 25
GMM-based VC: 26
Dynamic Time Warping (DTW)
• Source code: src/estimate_twf_and_jnt.py
GMM for mcep
conversion
codeap seqs
Converted mcep
feature seqs
Converted 
mcep seqs
codeap
feature seqs
Joint codeap
feature seqs
codeap seqs
codeap
feature seqs
Joint feature HD5 file
(mcep, codeap) 
Time warping function 
HD5 files (twf) 
Iterative
processing
Utterance-by-utterance
processing
mcep feature 
seqs
mcep seqs
Time warping 
functions
Joint mcep
feature seqs
mcep feature 
seqs
Target
parameter
HDF5 files
mcep seqs
Source
parameter
HDF5 files
Feature Extraction for DTW
• It is very important to align source frames to target frames so that they 
share the same linguistic contents.
• There are several tips to robustly perform time alignment!
• Joint static and dynamic mcep features are used.
• Power differences between source & target voices are ignored.
• Silence frames are discarded (automatically by normalized power, npow).  
This process is effective to deal with mismatches of short pause positions.
mcep seq Remove the 0th
coefficients
Append
dynamic features
mcep
feature seq
npow seq Remove
silence frames
*NOTE: power is NOT converted in 
sprocket.  As a conversion model is 
developed without using silence 
frames, conversion accuracy at  
those frames significantly degrades.  
However, it  will not cause significant 
issues as power of those frames is 
too small to be perceived.
GMM-based VC: 27
DTW Process
• Time warping function is determined by minimizing a distance measure 
between aligned feature sequences.
• Joint feature sequence is generated by concatenating source mcep feature 
and target mcep feature at each aligned frame
Source mcep feature seq
Target mcepfeature seq
Joint feature seq
Source part
Target part
Time warping function
GMM-based VC: 28
GMM-based VC: 29
Iterative Time‐Alignment Refinement 
• Source code: src/estimate_twf_and_jnt.py
Source
parameter
HD5 files
codeap seqs
codeap
feature seqs
Joint codeap
feature seqs
Target
parameter
HD5 files
codeap seqs
codeap
feature seqs
Joint feature HD5 file
(mcep, codeap) 
Time warping function 
HD5 files (twf) 
Iterative
processing
Utterance-by-utterance
processing
mcep feature 
seqs
GMM for mcep
conversion
mcep seqs
Time warping 
functions
Joint mcep
feature seqs
Converted mcep
feature seqs
Converted 
mcep seqs
mcep feature 
seqs
mcep seqs
Iterative DTW Process
• Time alignment determined by using source & target feature seqs suffers 
from acoustic differences between source & target voices.
• To improve accuracy of time alignment, iterative DTW process is usually 
used for refining the time warping functions [Abe; ’90]. 
mcep feature 
seqs
GMM for mcep
conversion
mcep seqs
Time warping 
functions
Joint mcep
feature seqs
Converted mcep
feature seqs
Converted 
mcep seqs
mcep feature 
seqs
mcep seqs
Same time structure
Used for developing 
joint features
Used for determining 
time warping function
Acoustically more 
similar to target
GMM-based VC: 30
GMM Training & Conversion Process
• Joint GMM training [Kain; ’98]
• Joint probability density function (p.d.f.) of the source & target mcep features 
is modeled by a joint GMM.
• Trajectory‐based conversion [Toda; ’07]
• The source mcep seq is converted into the target one by maximum likelihood 
parameter generation using a conditional p.d.f. derived from the joint GMM
Joint mcep
feature seqs
Joint 
GMM 
Maximum likelihood estimation
using EM algorithm
Source mcep seq
Remove the 0th
coefficients
Append
dynamic features
Converted mcep seq w/o 
the 0th coefficients
Conditional p.d.f.
Joint 
GMM 
GMM-based VC: 31
GMM-based VC: 32
• Source code: src/estimate_twf_and_jnt.py
DTW for codeap
Iterative
processing
mcep feature 
seqs
GMM for mcep
conversion
mcep seqs
Converted mcep
feature seqs
Converted 
mcep seqs
mcep feature 
seqs
mcep seqs
Joint feature HDF5 file
(mcep, codeap) 
Time warping function 
HDF5 files (twf) 
Utterance-by-utterance
processing
Source
parameter
HDF5 files
codeap seqs
codeap
feature seqs
Joint codeap
feature seqs
Target
parameter
HDF5 files
codeap seqs
codeap
feature seqs
Time warping 
functions
Joint mcep
feature seqs
Converted voices
Procedure: Training Step
Source
voices
F0 & power
histograms
F0 & power
histograms
Joint features
Conversion modelsConverted mel-cepstrum
Source speech
parameter sequence
Source feature
sequence
Target feature
sequence
Speaker-dependent statistics
Converted
feature sequence
Time warping function
1. Speech analysis
2. Statistics calculation
3. Joint feature
development
4. Model training
5. Conversion
Target speech
parameter sequence
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
GMM-based VC: 33
Training 4: Model Training
• Develop conversion models by executing
$  python3  run_sprocket.py  ‐4  SourceSpeaker TargetSpeaker
e.g.,
$  python3  run_sprocket.py  ‐4  SF1  TF1
• The following message will be printed out.
### 4. Train GMM and converted GV ###
:
• Finally, conversion model files will be generated under data/pair/ directory.
data/pair/SF1‐TF1/model/GMM_mcep.pkl # GMM PKL file for mcep
data/pair/SF1‐TF1/model/GMM_codeap.pkl # GMM PKL file for codeap
data/pair/SF1‐TF1/model/cvgv.h5 # GV postfilter HDF5 file
GMM-based VC: 34
GMMs Training & GV Postfilter Calculation
• Source code: src/train_GMM.py
• Joint GMMs training
• Joint GMMs for mcep features and for codeap features are separately trained. 
• GV calculation [Toda; ’07][Toda; ’12]
• Statistics of converted mcep seqs are calculated for GV postfilter
Joint feature 
HDF5 file
(mcep, codeap) 
Joint GMM
for mcep
GMM PKL 
file (mcep)
Joint mcep
feature seqs 
Joint GMM
for codeap
GMM PKL 
file (codeap)
Joint codeap
feature seqs 
mcep seqs
GV statistics 
HDF5 file (cvgv)
Parameter
HDF5 files
Mean & variance 
vectors
Converted
mcep seqs
Joint GMM
for mcep
GMM-based VC: 35
Converted voices
Procedure: Conversion Step
Source
voices
F0 & power
histograms
F0 & power
histograms
Joint features
Conversion modelsConverted mel-cepstrum
Source speech
parameter sequence
Source feature
sequence
Target feature
sequence
Speaker-dependent statistics
Converted
feature sequence
Time warping function
1. Speech analysis
2. Statistics calculation
3. Joint feature
development
4. Model training
5. Conversion
Target speech
parameter sequence
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
GMM-based VC: 36
Conversion: Converted Speech Generation
• Perform voice conversion by executing  
$  python3  run_sprocket.py  ‐5  SourceSpeaker TargetSpeaker
e.g.,
$  python3  run_sprocket.py  ‐5  SF1  TF1
• The following message will be printed out.
### 5. Conversion based on the trained models ###
GMM for mcep conversion mode: None
data/pair/SF1‐TF1/test/SF1/30001_VC.wav
:
• Speech parameter files and analysis‐synthesis wav files will be generated 
utterance by utterance under data/pair/ directory.
data/pair/SF1‐TF1/test/SF1/300*_VC.wav # converted wav files by VC
data/pair/SF1‐TF1/test/SF1/300*_DIFFVC.wav # converted wav files by DIFFVC
GMM-based VC: 37
Converted Speech Generation by VC
• Source code: src/convert.py
• WORLD [Morise; ’16] is used as an speech analysis‐synthesis method.
Speech 
waveform
f0 seq
mcep seq
ap seq
Statistics HDF5 file
(f0stats, gv)
Converted 
F0 seq
GMM PKL 
file (mcep)
Converted mcep
(cvmcep) seq
GV postfiltered 
cvmcep seq
Converted 
waveform
GV statistics 
HDF5 file (cvgv)
Power adjusted 
cvmcep seq
Source
wav file Linear transformation 
of log‐scaled F0 seq
GV postfiltering
w/o power 
conversionTrajectory‐based 
conversion
w/o ap conversion
Converted wav 
file (VC)
GMM-based VC: 38
Already Developed 
Vocoder‐Free VC based on 
DIFFGMM as well!
[Kobayashi; 18b]
K. Kobayashi, T. Toda, S. Nakamura,
“Intra‐gender statistical singing voice conversion with direct waveform modification 
using log‐spectral differential,”
Speech Communication, Vol. 99, pp. 211—220, 2018.
https://doi.org/10.1016/j.specom.2018.03.011
DIFFGMM-based VC
Converted Speech Generation by DIFFVC
• Source code: src/convert.py
• MLSA filter [Tokuda; ’94] is used as to directly convert source waveform
Speech 
waveform
f0 seq
mcep seq
Statistics
HDF5 file (gv)
GMM PKL 
file (mcep)
Differential mcep
(diffmcep) seq
GV postfiltered 
diffmcep seq
Converted wav 
file (DIFFVC)
Converted 
waveform
GV statistics 
HDF5 file (cvgv)
Power adjusted 
diffmcep seq
Source
wav file
GV postfiltering
w/o power 
conversion
Trajectory‐based conversion
DIFFGMM 
for mcep
*NOTE: DIFFVC can generate much higher quality of 
converted speech than VC, but F0 is NOT converted. 
Thus, DIFFVC is very effective for a speaker‐pair with    
a similar F0 range, e.g., in the same gender conversion.
DIFFGMM-based VC: 1
Let’s Listen to Converted Speech Samples!
• Converted wav files by VC
data/pair/SF1‐TF1/test/SF1/30001_VC.wav
data/pair/SF1‐TF1/test/SF1/30002_VC.wav
:
• Converted wav files by DIFFVC
data/pair/SF1‐TF1/test/SF1/30001_DIFFVC.wav
data/pair/SF1‐TF1/test/SF1/30002_DIFFVC.wav
:
• Original source wav files
data/wav/SF1/30001.wav
data/wav/SF1/30002.wav
:
• Original target wav files
data/wav/TF1/30001.wav
data/wav/TF1/30002.wav
:
DIFFGMM-based VC: 2
Let’s Develop Vocoder‐Free 
VC based on DIFFGMM
with F0 Modification!
[Kobayashi; ’16]
K. Kobayashi, T. Toda, S. Nakamura,
“F0 transformation techniques for statistical voice conversion with direct waveform 
modification with spectral differential,”
Proc. IEEE SLT, pp. 693—700, Dec. 2016.
DIFFGMM-based VC w/ F0 transformation
DIFFGMM-based VC w/ F0 transformation: 1
Overall Procedure
Source
voices
F0 & power
histograms
F0 & power
histograms
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
F0 transformed
source voices
1. F0 transformed source
voice generation
The same procedure applied to the new parallel dataset
0. Parallel data preparation & parameter configurations
1. Speech analysis
2. Statistics calculation
3. Joint feature development
4. Model training
5. Conversion
Used as a new parallel dataset
Procedure: Dataset Generation Step
Source
voices
F0 & power
histograms
F0 & power
histograms
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
F0 transformed
source voices
1. F0 transformed source
voice generation
The same procedure applied to the new parallel dataset
0. Parallel data preparation & parameter configurations
1. Speech analysis
2. Statistics calculation
3. Joint feature development
4. Model training
5. Conversion
Used as a new parallel dataset
DIFFGMM-based VC w/ F0 transformation: 2
Initialization Steps
• Select source and target speakers for the cross‐gender conversion!
• Source speakers: females = SF1 & SF2,  males = SM1 & SM2
• Target speakers: females = TF1 & TF2,  males = TM1 & TM2 
• Generate list files for your selected speaker‐pair,                                              
e.g., if setting source = SF2, target = TM2, & sampling = 22.05 kHz,
$  python3  initialize.py  ‐1  SF2  TM2  22050
• Modify list files to select training and evaluation utterance pairs
list/{SF2,TM2}_train.list # remain only 10001 – 10060 utterances
list/{SF2,TM2}_eval.list # remain only 30001 – 30020 utterances
• Generate configure files for your selected speaker‐pair, e.g.,
$  python3  initialize.py  ‐2  SF2  TM2  22050
• Perform speech analysis for your selected speaker‐pair, e.g.,
$  python3  initialize.py  ‐3  SF2  TM2  22050
• Modify speaker‐dependent YML files based on histograms.
conf/speaker/{SF2,TM2}.yml # revise minf0, maxf0, threshold values
DIFFGMM-based VC w/ F0 transformation: 3
Example of Histograms
Source speaker: SF2 Target speaker: TM2
conf/figure/SF2_f0histogram.png conf/figure/TM2_f0histogram.png
min: 120 Hz
max: 340 Hz
min: 60 Hz
max: 270 Hz
threshold:
–30 dB
conf/figure/SF2_npowhistogram.png conf/figure/TM2_npowhistogram.png
threshold:
–30 dB
DIFFGMM-based VC w/ F0 transformation: 4
Procedure: Dataset Generation Step
Source
voices
F0 & power
histograms
F0 & power
histograms
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
F0 transformed
source voices
1. F0 transformed source
voice generation
The same procedure applied to the new parallel dataset
0. Parallel data preparation & parameter configurations
1. Speech analysis
2. Statistics calculation
3. Joint feature development
4. Model training
5. Conversion
Used as a new parallel dataset
DIFFGMM-based VC w/ F0 transformation: 5
F0 Transformed Waveform Generation
• Perform F0 transformation based on waveform modification by executing
$  python3  run_f0_transformation.py  SourceSpeaker TargetSpeaker
e.g.,
$  python3  run_f0_transformation.py  SF2  TM2
• The following message will be printed out.
### 1. F0 transformation of original waveform ###
Extract F0: data/wav/SF2/10001.wav
:
• Finally, F0 transformed wav files will be generated under data/wav/ 
directory as a new source speaker, SourceSpeaker_F0TransformationRatio.
data/wav/SF2_0.73/*.wav # F0 transformation ratio = 0.73
• Let’s check F0 transformed source speaker’s voices by listening to some 
wav files.  Note that not only F0 but also voice quality should be converted. 
DIFFGMM-based VC w/ F0 transformation: 6
F0 Transformation Process 
• Source code: src/f0_transformation.py
• Constant F0 transformation ratio calculated from source & target F0 mean 
values by using training data is applied to all source speech wav files.
Source f0 seqs
F0 transformation 
ratio
F0 transformed 
source wav files
Source wav files Target wav files
Target f0 seqs
F0 transformed 
source wav files
Source wav files
Training data
Evaluation data
Training data
DIFFGMM-based VC w/ F0 transformation: 7
F0 Transformed Waveform Generation
• Duration conversion w/ WSOLA [Verhelst; ’93] and waveform resampling           
is used to generate F0 transformed waveform.
e.g., if setting F0 transformation ratio to 2 (i.e., 100 Hz to 200 Hz),
1.  Make duration of input waveform double w/ WSOLA while keeping F0 values
2.  Resample the modified waveform to make its duration half
Input waveform
Duration modified 
waveform
1.1.  Extract frames by windowing
1.2  Find the best concatenation point
1.3  Overlap and add
fO modified 
waveform
Deletion or down sampling
Duration modified 
waveform
DIFFGMM-based VC w/ F0 transformation: 8
DIFFGMM-based VC w/ F0 transformation: 9
Procedure: Parallel VC Steps
Source
voices
F0 & power
histograms
F0 & power
histograms
Target
voices
0. Parallel data preparation
& parameter configurations
Parameter
configurations
F0 transformed
source voices
1. F0 transformed source
voice generation
The same procedure applied to the new parallel dataset
0. Initialization: parameter configurations
1. Training: speech analysis
2. Training: statistics calculation
3. Training: joint feature development
4. Training: model training
5. Conversion
Used as a new parallel dataset
Initialization for F0 Transformed Speaker
• Perform initialization steps by setting the F0 transformed source speaker to 
a new source speaker
• Generate list files for a new speaker‐pair, e.g.,
$  python3  initialize.py  ‐1  SF2_0.73  TM2  22050
• Modify list files to select training and evaluation utterance pairs
list/SF2_0.73_train.list # remain only 10001 – 10060 utterances
list/SF2_0.73_eval.list # remain only 30001 – 30020 utterances
• Generate configure files for the new speaker‐pair, e.g.,
$  python3  initialize.py  ‐2  SF2_0.73  TM2  22050
• Perform speech analysis for the new speaker‐pair, e.g.,
$  python3  initialize.py  ‐3  SF2_0.73  TM2  22050
• Modify a speaker‐dependent YML file based on histograms.
conf/speaker/SF2_0.73.yml # revise minf0, maxf0, threshold values
conf/pair/SF2_0.73‐TM2.yml # set n_mix for mcep to 16
DIFFGMM-based VC w/ F0 transformation: 10
Training & Conversion Steps
• Perform training and conversion steps for converting the F0 transformed 
source speaker into the target speaker, e.g., by executing
$  python3  run_sprocket.py  ‐1  ‐2  ‐3  ‐4  ‐5  SF2_0.73  TM2
• Finally, the converted voices will be generated under data/pair/ directory.
data/pair/SF2_0.73‐TF1/test/SF2_0.73/300*_DIFFVC.wav # converted wav files
NOTE: converted wav files by VC (*_VC.wav) will also be generated but they can 
be ignored…
• NOTE: only the conversion step can also be performed,                                 
e.g., 
• List data to be converted in list/{SF2,TM2}_eval.list
• Generate F0 transformed source wav files given the F0 transformation ratio:
$  python3  run_f0_transformation.py  ‐‐ev ‐‐f0rate  0.73  SF2_0.73  TM2
• Generate converted wav files:
$  python3  run_sprocket.py  ‐5  SF2_0.73  SF2  TM2
DIFFGMM-based VC w/ F0 transformation: 11
How to Develop
VCC2018 Baseline System?
VCC2018 Baseline
Reproduce Baseline Results of VCC2018!
• You can develop a baseline system of VCC2018 Hub task [Kobayashi; ’18a] by 
using sprocket!
100
80
60
40
20
0
1 2 3 4 5
MOS on naturalness
Similarity score [%]
sprocket
• Hub task: parallel training task
• Source: 2 female & 2 male speakers
• Target: 2 female & 2 male speakers
• 81 utterances for training
• 35 utterances for evaluation
• Baseline system development
• DIFFVC w/o F0 transformation            
for the same‐gender pairs
MOS ≅ 4.0, similarity ≅ 70%
• VC for the cross‐gender pairs
MOS ≅ 3.0, similarity ≅ 70%
• In total, MOS ≅ 3.5, similarity ≅ 70%
Results of VCC2018  [Lorenzo‐Trueba; ’18a]
VCC2018 Baseline: 1
Download VCC2018 Dataset
• Automatically set wav files of VCC2018 datasets [Lerenzo‐Trueba: ’18b] by 
executing a download script (download_speech_corpus.py) as follows:
$  python3  download_speech_corpus.py  downloader_conf/vcc2018.yml
• The following files will be generated.
data/wav/VCC2{SF1, SF2, SM1, SM2}/ # 4 source speakers for parallel data
data/wav/VCC2{TF1, TF2, TM1, TM2}/ # 4 target speakers for parallel data
10001.wav – 10081.wav # 81 utterances for training
30001.wav – 30035.wav # 35 utterances for evaluation
These files will be used in the baseline system development.
On the other hand, the following files will NOT be used. 
data/wav/VCC2{SF3, SF4, SM3, SM4}/ # 4 source speakers for SPOKE task
VCC2018 Baseline: 2
Develop Baseline System
• Just execute initialize.py & run_sprocket.py for each speaker‐pair.
• Use 81 training utterances and 35 evaluation utterances
• Use default settings of the pair‐dependent YML file (32 mixture components)
• May use the following configurations in speaker‐dependent YML files:
• Use DIFFVC.wav for same‐gender pairs, and VC.wav for cross‐gender pairs
Speaker Minimum F0 (minf0) Maximum F0 (maxf0) Power threshold
VCC2SF1 100 450 –31
VCC2SF2 110 350 –31
VCC2SM1 50 200 –31
VCC2SM2 70 300 –40
VCC2TF1 140 350 –45
VCC2TF2 100 400 –30
VCC2TM1 60 200 –23
VCC2TM2 50 280 –31
Source 
speakers
Target
speakers
[Kobayashi; ’18a]
VCC2018 Baseline: 3
Let’s Learn Tips on
VC System Development!
Tips
Tips 1: Target Voice Recording
• If you want to develop a good VC system, use a good parallel dataset!    
How can we develop such a parallel dataset?
• Target voices should be high‐quality!
• Desirable to record them in a high‐quality sound environment
• Quality of target waveforms directly affects that of converted voices.
• The use of noisy target waveforms will generate noisy converted voices.
• Target voices should have desired voice characteristics!
• Not only speaker identity but also a speaking style strongly affects voice 
characteristics.
• If a speaking style of a target speaker is special (e.g., a specific character’s 
voice), it would be often useful to record special utterances suitable to such   
a style rather than to record controlled ones (e.g., phonetically balanced 
sentences).
Tips: 1
Tips 2: Source Voice Recording
• How about source voices?
• Minimize acoustic mismatches between training and conversion!
• Acoustic mismatches easily cause quality degradation of converted voices.
• It would be better to record source voices in the same environment as used in 
the VC system (e.g., if using the VC system in your room, it would be better to 
record source voices there).
• Ask a source speaker to imitate the target speaker’s speaking style!
• Only a part of speech parameters is converted in some VC techniques.
• It is better to ask the source speaker to imitate the target speaker’s voices by 
controlling prosody, such as duration and F0 pattern.  It would be OK to use    
a special speaking way (e.g., falsetto) to do it.  If F0 transformation is not 
necessary, DIFFVC will be available!
• In recording, it will be helpful for the source speaker to listen to the target 
voice sample just before uttering the corresponding utterance.
Tips: 2
Tips 3: Parameter Adjustment
• Which parameter can be adjusted?
• Training step
• Number of mixture components for mcep conversion needs to be changed 
according to the amount of training data as the use of a larger number of 
mixture components is effective for improving the converted speech quality 
but it easily suffers from over‐fitting.
• You may change it like 8, 16, 32, 64, and 128. 
• Conversion step
• The GV postfilter is effective for significantly improving the converted speech 
quality but it also tends to cause artifact sounds.  These sounds are alleviated 
by setting the GV postfilter parameter to a smaller value (from 0 to 1) shown 
in the pair‐dependent YML file as follows:
GV:
morph_coeff: 0.7 # from 0 (no effect) to 1 (full effect)
• There is a tradeoff between the converted voice quality and the artifacts.
Tips: 3
That’s all!
Acknowledgement:
I am grateful to Dr. Kazuhiro Kobayashi of Nagoya 
University, Japan, for the development of sprocket.
I hope now you can start
your own VC research and
VC system development!
[Abe; ’90]  M. Abe, S. Nakamura, K. Shikano, H. Kuwabara.  Voice conversion through vector quantization.  J. 
Acoust. Soc. Jpn (E), Vol. 11, No. 2, pp. 71–76, 1990.
[Kain; ’98]  A. Kain, M.W. Macon.  Spectral voice conversion for text‐to‐speech synthesis.  Proc. IEEE ICASSP, 
pp. 285–288, 1998.
[Kobayashi; ’16]  K. Kobayashi, T. Toda, S. Nakamura.  F0 transformation techniques for statistical voice 
conversion with direct waveform modification with spectral differential.  Proc. IEEE SLT, pp. 693–700, 2016.
[Kobayashi; ’18a]  K. Kobayashi, T. Toda.  sprocket: open‐source voice conversion software.  Proc. Odyssey, 
pp. 203–210, 2018.
[Kobayashi; ’18b]  K. Kobayashi, T. Toda, S. Nakamura.  Intra‐gender statistical singing voice conversion with 
direct waveform modification using log‐spectral differential.  Speech Commun., Vol. 99, pp. 211–220, 2018.
[Lorenzo‐Trueba; ’18a]  J. Lorenzo‐Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, Z. Ling.  
The voice conversion challenge 2018: promoting development of parallel and nonparallel methods.  Proc. 
Odyssey, pp. 195–202, 2018. 
[Lorenzo‐Trueba; ’18b]  J. Lorenzo‐Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, Z. Ling.  
The Voice Conversion Challenge 2018: database and results.  The Centre for Speech Technology Research, 
The University of Edinburgh, UK, 2018. < http://dx.doi.org/10.7488/ds/2337 >
[Morise; ’16]  M. Morise, F. Yokomori, K. Ozawa.  WORLD: a vocoder‐based high‐quality speech synthesis 
system for real‐time applications. IEICE Trans. Inf. & Syst., Vol. E99‐D, No. 7, pp. 1877–1884, 2016.
[Toda; ’07]  T. Toda, A.W. Black, K. Tokuda.  Voice conversion based on maximum likelihood estimation of 
spectral parameter trajectory.  IEEE Trans. Audio, Speech & Lang. Process., Vol. 15, No. 8, pp. 2222–2235, 
2007.
References
References: 1
[Toda; ’12]  T. Toda, T. Muramatsu, H. Banno.  Implementation of computationally efficient real‐time voice 
conversion.  Proc. INTERSPEECH, 4 pages, 2012.
[Toda; ’16]  T. Toda, L.‐H. Chen, D. Saito, F. Villavicencio, M. Wester, Z. Wu, J. Yamagishi.  The Voice 
Conversion Challenge 2016.  University of Edinburgh, School of Informatics, Centre for Speech Technology 
Research, 2016. < http://dx.doi.org/10.7488/ds/1430 >
[Tokuda; ’94]  K. Tokuda, T. Kobayashi, T. Masuko, S. Imai.   Mel‐generalized cepstral analysis – a unified 
approach to speech spectral estimation.   Proc. ICSLP, pp. 1043–1045, 1994.
[Verhelst; ’93]  W. Verhelst, M. Roelands.  An overlap‐add technique based on waveform similarity (WSOLA) 
for high quality time‐scale modification of speech.  Proc. IEEE ICASSP, Vol. 2, pp. 554–557, 1993.
References: 2

More Related Content

What's hot

楽曲中歌声加工における声質変換精度向上のための歌声・伴奏分離法
楽曲中歌声加工における声質変換精度向上のための歌声・伴奏分離法楽曲中歌声加工における声質変換精度向上のための歌声・伴奏分離法
楽曲中歌声加工における声質変換精度向上のための歌声・伴奏分離法NU_I_TODALAB
 
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...Daichi Kitamura
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用NU_I_TODALAB
 
深層生成モデルに基づく音声合成技術
深層生成モデルに基づく音声合成技術深層生成モデルに基づく音声合成技術
深層生成モデルに基づく音声合成技術NU_I_TODALAB
 
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元NU_I_TODALAB
 
音情報処理における特徴表現
音情報処理における特徴表現音情報処理における特徴表現
音情報処理における特徴表現NU_I_TODALAB
 
信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離NU_I_TODALAB
 
【FIT2016チュートリアル】ここから始める情報処理 ~音声編~ by 東工大・篠崎先生
【FIT2016チュートリアル】ここから始める情報処理 ~音声編~ by 東工大・篠崎先生【FIT2016チュートリアル】ここから始める情報処理 ~音声編~ by 東工大・篠崎先生
【FIT2016チュートリアル】ここから始める情報処理 ~音声編~ by 東工大・篠崎先生Toshihiko Yamasaki
 
End-to-End音声認識ためのMulti-Head Decoderネットワーク
End-to-End音声認識ためのMulti-Head DecoderネットワークEnd-to-End音声認識ためのMulti-Head Decoderネットワーク
End-to-End音声認識ためのMulti-Head DecoderネットワークNU_I_TODALAB
 
音声コーパス設計と次世代音声研究に向けた提言
音声コーパス設計と次世代音声研究に向けた提言音声コーパス設計と次世代音声研究に向けた提言
音声コーパス設計と次世代音声研究に向けた提言Shinnosuke Takamichi
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法Shinnosuke Takamichi
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相Takuya Yoshioka
 
音響システム特論 第11回 実環境における音響信号処理と機械学習
音響システム特論 第11回 実環境における音響信号処理と機械学習音響システム特論 第11回 実環境における音響信号処理と機械学習
音響システム特論 第11回 実環境における音響信号処理と機械学習Yuma Koizumi
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)Daichi Kitamura
 
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパスJTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパスShinnosuke Takamichi
 
CREST「共生インタラクション」共創型音メディア機能拡張プロジェクト
CREST「共生インタラクション」共創型音メディア機能拡張プロジェクトCREST「共生インタラクション」共創型音メディア機能拡張プロジェクト
CREST「共生インタラクション」共創型音メディア機能拡張プロジェクトNU_I_TODALAB
 
機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編Daiyu Hatakeyama
 
[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audioDeep Learning JP
 
論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A Survey論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A SurveyToru Tamaki
 

What's hot (20)

楽曲中歌声加工における声質変換精度向上のための歌声・伴奏分離法
楽曲中歌声加工における声質変換精度向上のための歌声・伴奏分離法楽曲中歌声加工における声質変換精度向上のための歌声・伴奏分離法
楽曲中歌声加工における声質変換精度向上のための歌声・伴奏分離法
 
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用
 
深層生成モデルに基づく音声合成技術
深層生成モデルに基づく音声合成技術深層生成モデルに基づく音声合成技術
深層生成モデルに基づく音声合成技術
 
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
 
音情報処理における特徴表現
音情報処理における特徴表現音情報処理における特徴表現
音情報処理における特徴表現
 
信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離信号の独立性に基づく多チャンネル音源分離
信号の独立性に基づく多チャンネル音源分離
 
変調スペクトルを考慮したHMM音声合成
変調スペクトルを考慮したHMM音声合成変調スペクトルを考慮したHMM音声合成
変調スペクトルを考慮したHMM音声合成
 
【FIT2016チュートリアル】ここから始める情報処理 ~音声編~ by 東工大・篠崎先生
【FIT2016チュートリアル】ここから始める情報処理 ~音声編~ by 東工大・篠崎先生【FIT2016チュートリアル】ここから始める情報処理 ~音声編~ by 東工大・篠崎先生
【FIT2016チュートリアル】ここから始める情報処理 ~音声編~ by 東工大・篠崎先生
 
End-to-End音声認識ためのMulti-Head Decoderネットワーク
End-to-End音声認識ためのMulti-Head DecoderネットワークEnd-to-End音声認識ためのMulti-Head Decoderネットワーク
End-to-End音声認識ためのMulti-Head Decoderネットワーク
 
音声コーパス設計と次世代音声研究に向けた提言
音声コーパス設計と次世代音声研究に向けた提言音声コーパス設計と次世代音声研究に向けた提言
音声コーパス設計と次世代音声研究に向けた提言
 
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
リアルタイムDNN音声変換フィードバックによるキャラクタ性の獲得手法
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
 
音響システム特論 第11回 実環境における音響信号処理と機械学習
音響システム特論 第11回 実環境における音響信号処理と機械学習音響システム特論 第11回 実環境における音響信号処理と機械学習
音響システム特論 第11回 実環境における音響信号処理と機械学習
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)
 
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパスJTubeSpeech:  音声認識と話者照合のために YouTube から構築される日本語音声コーパス
JTubeSpeech: 音声認識と話者照合のために YouTube から構築される日本語音声コーパス
 
CREST「共生インタラクション」共創型音メディア機能拡張プロジェクト
CREST「共生インタラクション」共創型音メディア機能拡張プロジェクトCREST「共生インタラクション」共創型音メディア機能拡張プロジェクト
CREST「共生インタラクション」共創型音メディア機能拡張プロジェクト
 
機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編
 
[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio[DL輪読会]Wavenet a generative model for raw audio
[DL輪読会]Wavenet a generative model for raw audio
 
論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A Survey論文紹介:Multimodal Learning with Transformers: A Survey
論文紹介:Multimodal Learning with Transformers: A Survey
 

Similar to Hands on Voice Conversion

Growing as a software craftsperson (part 2) From Pune Software Craftsmanship
Growing as a software craftsperson (part 2) From Pune Software CraftsmanshipGrowing as a software craftsperson (part 2) From Pune Software Craftsmanship
Growing as a software craftsperson (part 2) From Pune Software CraftsmanshipDattatray Kale
 
AI improves software testing by Kari Kakkonen at TQS
AI improves software testing by Kari Kakkonen at TQSAI improves software testing by Kari Kakkonen at TQS
AI improves software testing by Kari Kakkonen at TQSKari Kakkonen
 
Ncicu 2010 presentation
Ncicu 2010 presentationNcicu 2010 presentation
Ncicu 2010 presentationSandra Nicks
 
Sustainable Campus-Wide Captioning Practices to Support Course Videos – Is th...
Sustainable Campus-Wide Captioning Practices to Support Course Videos – Is th...Sustainable Campus-Wide Captioning Practices to Support Course Videos – Is th...
Sustainable Campus-Wide Captioning Practices to Support Course Videos – Is th...D2L Barry
 
No, you don't need to learn python
No, you don't need to learn pythonNo, you don't need to learn python
No, you don't need to learn pythonQuantUniversity
 
Webinar - What Can Libraries Count? Getting a Grip on Social Media Numbers - ...
Webinar - What Can Libraries Count? Getting a Grip on Social Media Numbers - ...Webinar - What Can Libraries Count? Getting a Grip on Social Media Numbers - ...
Webinar - What Can Libraries Count? Getting a Grip on Social Media Numbers - ...TechSoup
 
Webinar - Video Editing and Production with Adobe Premiere Pro - 2016-06-14
Webinar - Video Editing and Production with Adobe Premiere Pro - 2016-06-14Webinar - Video Editing and Production with Adobe Premiere Pro - 2016-06-14
Webinar - Video Editing and Production with Adobe Premiere Pro - 2016-06-14TechSoup
 
Webinar - Training Your Staff on Technology: TechSoup Resources
Webinar - Training Your Staff on Technology: TechSoup ResourcesWebinar - Training Your Staff on Technology: TechSoup Resources
Webinar - Training Your Staff on Technology: TechSoup ResourcesTechSoup
 
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...Gene Kim
 
Mirco hering devops for systems of record final
Mirco hering devops for systems of record finalMirco hering devops for systems of record final
Mirco hering devops for systems of record finalMirco Hering
 
Machine learning prediction of stock markets
Machine learning prediction of stock marketsMachine learning prediction of stock markets
Machine learning prediction of stock marketsNikola Milosevic
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...multimediaeval
 
Jewish Day School Video Academy Introduction
Jewish Day School Video Academy IntroductionJewish Day School Video Academy Introduction
Jewish Day School Video Academy IntroductionSee3 Communications
 
Three Interviews About Static Code Analyzers
Three Interviews About Static Code AnalyzersThree Interviews About Static Code Analyzers
Three Interviews About Static Code AnalyzersAndrey Karpov
 
Building a custom cms with django
Building a custom cms with djangoBuilding a custom cms with django
Building a custom cms with djangoYann Malet
 
Ai progress = leaderboards compute data algorithms 20180817 v3
Ai progress = leaderboards compute data algorithms 20180817 v3Ai progress = leaderboards compute data algorithms 20180817 v3
Ai progress = leaderboards compute data algorithms 20180817 v3ISSIP
 
Static Analysis Tools and Frameworks: Overcoming a Dangerous Blind Spot
Static Analysis Tools and Frameworks: Overcoming a Dangerous Blind SpotStatic Analysis Tools and Frameworks: Overcoming a Dangerous Blind Spot
Static Analysis Tools and Frameworks: Overcoming a Dangerous Blind SpotCigital
 
Deep Dive into Software Estimation - Texavi Tech Bootcamp on How to be a good...
Deep Dive into Software Estimation - Texavi Tech Bootcamp on How to be a good...Deep Dive into Software Estimation - Texavi Tech Bootcamp on How to be a good...
Deep Dive into Software Estimation - Texavi Tech Bootcamp on How to be a good...Texavi Innovative Solutions
 

Similar to Hands on Voice Conversion (20)

Growing as a software craftsperson (part 2) From Pune Software Craftsmanship
Growing as a software craftsperson (part 2) From Pune Software CraftsmanshipGrowing as a software craftsperson (part 2) From Pune Software Craftsmanship
Growing as a software craftsperson (part 2) From Pune Software Craftsmanship
 
AI improves software testing by Kari Kakkonen at TQS
AI improves software testing by Kari Kakkonen at TQSAI improves software testing by Kari Kakkonen at TQS
AI improves software testing by Kari Kakkonen at TQS
 
Ncicu 2010 presentation
Ncicu 2010 presentationNcicu 2010 presentation
Ncicu 2010 presentation
 
Sustainable Campus-Wide Captioning Practices to Support Course Videos – Is th...
Sustainable Campus-Wide Captioning Practices to Support Course Videos – Is th...Sustainable Campus-Wide Captioning Practices to Support Course Videos – Is th...
Sustainable Campus-Wide Captioning Practices to Support Course Videos – Is th...
 
No, you don't need to learn python
No, you don't need to learn pythonNo, you don't need to learn python
No, you don't need to learn python
 
Webinar - What Can Libraries Count? Getting a Grip on Social Media Numbers - ...
Webinar - What Can Libraries Count? Getting a Grip on Social Media Numbers - ...Webinar - What Can Libraries Count? Getting a Grip on Social Media Numbers - ...
Webinar - What Can Libraries Count? Getting a Grip on Social Media Numbers - ...
 
Webinar - Video Editing and Production with Adobe Premiere Pro - 2016-06-14
Webinar - Video Editing and Production with Adobe Premiere Pro - 2016-06-14Webinar - Video Editing and Production with Adobe Premiere Pro - 2016-06-14
Webinar - Video Editing and Production with Adobe Premiere Pro - 2016-06-14
 
Python Open CV
Python Open CVPython Open CV
Python Open CV
 
Webinar - Training Your Staff on Technology: TechSoup Resources
Webinar - Training Your Staff on Technology: TechSoup ResourcesWebinar - Training Your Staff on Technology: TechSoup Resources
Webinar - Training Your Staff on Technology: TechSoup Resources
 
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...
DOES15 - Mirco Hering - Adopting DevOps Practices for Systems of Record – An ...
 
Mirco hering devops for systems of record final
Mirco hering devops for systems of record finalMirco hering devops for systems of record final
Mirco hering devops for systems of record final
 
Machine learning prediction of stock markets
Machine learning prediction of stock marketsMachine learning prediction of stock markets
Machine learning prediction of stock markets
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
Jewish Day School Video Academy Introduction
Jewish Day School Video Academy IntroductionJewish Day School Video Academy Introduction
Jewish Day School Video Academy Introduction
 
Three Interviews About Static Code Analyzers
Three Interviews About Static Code AnalyzersThree Interviews About Static Code Analyzers
Three Interviews About Static Code Analyzers
 
Building a custom cms with django
Building a custom cms with djangoBuilding a custom cms with django
Building a custom cms with django
 
Ai progress = leaderboards compute data algorithms 20180817 v3
Ai progress = leaderboards compute data algorithms 20180817 v3Ai progress = leaderboards compute data algorithms 20180817 v3
Ai progress = leaderboards compute data algorithms 20180817 v3
 
2014 toronto-torbug
2014 toronto-torbug2014 toronto-torbug
2014 toronto-torbug
 
Static Analysis Tools and Frameworks: Overcoming a Dangerous Blind Spot
Static Analysis Tools and Frameworks: Overcoming a Dangerous Blind SpotStatic Analysis Tools and Frameworks: Overcoming a Dangerous Blind Spot
Static Analysis Tools and Frameworks: Overcoming a Dangerous Blind Spot
 
Deep Dive into Software Estimation - Texavi Tech Bootcamp on How to be a good...
Deep Dive into Software Estimation - Texavi Tech Bootcamp on How to be a good...Deep Dive into Software Estimation - Texavi Tech Bootcamp on How to be a good...
Deep Dive into Software Estimation - Texavi Tech Bootcamp on How to be a good...
 

More from NU_I_TODALAB

異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例NU_I_TODALAB
 
敵対的学習による統合型ソースフィルタネットワーク
敵対的学習による統合型ソースフィルタネットワーク敵対的学習による統合型ソースフィルタネットワーク
敵対的学習による統合型ソースフィルタネットワークNU_I_TODALAB
 
距離学習を導入した二値分類モデルによる異常音検知
距離学習を導入した二値分類モデルによる異常音検知距離学習を導入した二値分類モデルによる異常音検知
距離学習を導入した二値分類モデルによる異常音検知NU_I_TODALAB
 
Weakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-AttentionWeakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-AttentionNU_I_TODALAB
 
音素事後確率を利用した表現学習に基づく発話感情認識
音素事後確率を利用した表現学習に基づく発話感情認識音素事後確率を利用した表現学習に基づく発話感情認識
音素事後確率を利用した表現学習に基づく発話感情認識NU_I_TODALAB
 
空気/体内伝導マイクロフォンを用いた雑音環境下における自己発声音強調/抑圧法
空気/体内伝導マイクロフォンを用いた雑音環境下における自己発声音強調/抑圧法空気/体内伝導マイクロフォンを用いた雑音環境下における自己発声音強調/抑圧法
空気/体内伝導マイクロフォンを用いた雑音環境下における自己発声音強調/抑圧法NU_I_TODALAB
 
Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法NU_I_TODALAB
 
CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換NU_I_TODALAB
 
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...NU_I_TODALAB
 
喉頭摘出者のための歌唱支援を目指した電気音声変換法
喉頭摘出者のための歌唱支援を目指した電気音声変換法喉頭摘出者のための歌唱支援を目指した電気音声変換法
喉頭摘出者のための歌唱支援を目指した電気音声変換法NU_I_TODALAB
 
実環境下におけるサイレント音声通話の実現に向けた雑音環境変動に頑健な非可聴つぶやき強調
実環境下におけるサイレント音声通話の実現に向けた雑音環境変動に頑健な非可聴つぶやき強調実環境下におけるサイレント音声通話の実現に向けた雑音環境変動に頑健な非可聴つぶやき強調
実環境下におけるサイレント音声通話の実現に向けた雑音環境変動に頑健な非可聴つぶやき強調NU_I_TODALAB
 
ケプストラム正則化NTFによるステレオチャネル楽曲音源分離
ケプストラム正則化NTFによるステレオチャネル楽曲音源分離ケプストラム正則化NTFによるステレオチャネル楽曲音源分離
ケプストラム正則化NTFによるステレオチャネル楽曲音源分離NU_I_TODALAB
 
音声信号の分析と加工 - 音声を自在に変換するには?
音声信号の分析と加工 - 音声を自在に変換するには?音声信号の分析と加工 - 音声を自在に変換するには?
音声信号の分析と加工 - 音声を自在に変換するには?NU_I_TODALAB
 

More from NU_I_TODALAB (13)

異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例
 
敵対的学習による統合型ソースフィルタネットワーク
敵対的学習による統合型ソースフィルタネットワーク敵対的学習による統合型ソースフィルタネットワーク
敵対的学習による統合型ソースフィルタネットワーク
 
距離学習を導入した二値分類モデルによる異常音検知
距離学習を導入した二値分類モデルによる異常音検知距離学習を導入した二値分類モデルによる異常音検知
距離学習を導入した二値分類モデルによる異常音検知
 
Weakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-AttentionWeakly-Supervised Sound Event Detection with Self-Attention
Weakly-Supervised Sound Event Detection with Self-Attention
 
音素事後確率を利用した表現学習に基づく発話感情認識
音素事後確率を利用した表現学習に基づく発話感情認識音素事後確率を利用した表現学習に基づく発話感情認識
音素事後確率を利用した表現学習に基づく発話感情認識
 
空気/体内伝導マイクロフォンを用いた雑音環境下における自己発声音強調/抑圧法
空気/体内伝導マイクロフォンを用いた雑音環境下における自己発声音強調/抑圧法空気/体内伝導マイクロフォンを用いた雑音環境下における自己発声音強調/抑圧法
空気/体内伝導マイクロフォンを用いた雑音環境下における自己発声音強調/抑圧法
 
Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法Deep Neural Networkに基づく日常生活行動認識における適応手法
Deep Neural Networkに基づく日常生活行動認識における適応手法
 
CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換CTCに基づく音響イベントからの擬音語表現への変換
CTCに基づく音響イベントからの擬音語表現への変換
 
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
 
喉頭摘出者のための歌唱支援を目指した電気音声変換法
喉頭摘出者のための歌唱支援を目指した電気音声変換法喉頭摘出者のための歌唱支援を目指した電気音声変換法
喉頭摘出者のための歌唱支援を目指した電気音声変換法
 
実環境下におけるサイレント音声通話の実現に向けた雑音環境変動に頑健な非可聴つぶやき強調
実環境下におけるサイレント音声通話の実現に向けた雑音環境変動に頑健な非可聴つぶやき強調実環境下におけるサイレント音声通話の実現に向けた雑音環境変動に頑健な非可聴つぶやき強調
実環境下におけるサイレント音声通話の実現に向けた雑音環境変動に頑健な非可聴つぶやき強調
 
ケプストラム正則化NTFによるステレオチャネル楽曲音源分離
ケプストラム正則化NTFによるステレオチャネル楽曲音源分離ケプストラム正則化NTFによるステレオチャネル楽曲音源分離
ケプストラム正則化NTFによるステレオチャネル楽曲音源分離
 
音声信号の分析と加工 - 音声を自在に変換するには?
音声信号の分析と加工 - 音声を自在に変換するには?音声信号の分析と加工 - 音声を自在に変換するには?
音声信号の分析と加工 - 音声を自在に変換するには?
 

Recently uploaded

Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Sean Meyn
 
Mohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxMohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxKISHAN KUMAR
 
Tachyon 100G PCB Performance Attributes and Applications
Tachyon 100G PCB Performance Attributes and ApplicationsTachyon 100G PCB Performance Attributes and Applications
Tachyon 100G PCB Performance Attributes and ApplicationsEpec Engineered Technologies
 
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxVertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxLMW Machine Tool Division
 
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid BodyAhmadHajasad2
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS Bahzad5
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical SensorTanvir Moin
 
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...soginsider
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....santhyamuthu1
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxNaveenVerma126
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfNaveenVerma126
 
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxUNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxrealme6igamerr
 
News web APP using NEWS API for web platform to enhancing user experience
News web APP using NEWS API for web platform to enhancing user experienceNews web APP using NEWS API for web platform to enhancing user experience
News web APP using NEWS API for web platform to enhancing user experienceAkashJha84
 
me3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Ame3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Akarthi keyan
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchrohitcse52
 
How to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfHow to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfRedhwan Qasem Shaddad
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxSAJITHABANUS
 
Carbohydrates principles of biochemistry
Carbohydrates principles of biochemistryCarbohydrates principles of biochemistry
Carbohydrates principles of biochemistryKomakeTature
 

Recently uploaded (20)

Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
 
Mohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxMohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptx
 
Tachyon 100G PCB Performance Attributes and Applications
Tachyon 100G PCB Performance Attributes and ApplicationsTachyon 100G PCB Performance Attributes and Applications
Tachyon 100G PCB Performance Attributes and Applications
 
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxVertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
 
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
 
Lecture 2 .pdf
Lecture 2                           .pdfLecture 2                           .pdf
Lecture 2 .pdf
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical Sensor
 
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
 
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxUNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
 
News web APP using NEWS API for web platform to enhancing user experience
News web APP using NEWS API for web platform to enhancing user experienceNews web APP using NEWS API for web platform to enhancing user experience
News web APP using NEWS API for web platform to enhancing user experience
 
Présentation IIRB 2024 Chloe Dufrane.pdf
Présentation IIRB 2024 Chloe Dufrane.pdfPrésentation IIRB 2024 Chloe Dufrane.pdf
Présentation IIRB 2024 Chloe Dufrane.pdf
 
me3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Ame3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part A
 
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai searchChatGPT-and-Generative-AI-Landscape Working of generative ai search
ChatGPT-and-Generative-AI-Landscape Working of generative ai search
 
How to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdfHow to Write a Good Scientific Paper.pdf
How to Write a Good Scientific Paper.pdf
 
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptxIT3401-WEB ESSENTIALS PRESENTATIONS.pptx
IT3401-WEB ESSENTIALS PRESENTATIONS.pptx
 
Carbohydrates principles of biochemistry
Carbohydrates principles of biochemistryCarbohydrates principles of biochemistry
Carbohydrates principles of biochemistry
 

Hands on Voice Conversion