29. 参考文献
[Stylianous, ‘96] Y. Stylianous, “Harmonic plus noise models for speech, combined with statistical methods,
for speech and speaker modification,” Ph.D thesis, Ecole Nationale Superieure des Telecommunications,
1996.
[Kawahara+, ‘99] H. Kawahara et al., “Restructuring speech representations using a pitch-adaptive
time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a
repetitive structure in sounds,” Speech communication, 27(3-4), 187-207.
[Morise+, ‘16] M. Morise et al., “WORLD: a vocoder-based high-quality speech synthesis system for
real-time applications, IEICE transactions on information and systems,” vol. E99-D, no. 7, pp. 1877-1884,
2016.
[Oord+, ‘16] A. van den Oord et al., “WAVENET: A GENERATIVE MODEL FOR RAW AUDIO,” arXiv
preprint, arXiv:1609.03499, 2016.
[Mehri+, ‘17] S. Mehri et al., “SampleRNN: An Unconditional End-to-End Neural Audio Generation Model,”
in Proc. ICLR, 2017.
[Kalchbrenner+, ‘18] N. Kalchbrenner et al., “Efficient Neural Audio Synthesis”, in Proc. ICML, 2018.
[Valin+, ‘19] Valin et al, “LPCNET: IMPROVING NEURAL SPEECH SYNTHESIS THROUGH LINEAR
PREDICTION,” in Proc. ICASSP 2019.
[Prenger+, ‘19] R. Prenger et al., “WAVEGLOW: A FLOW-BASED GENERATIVE NETWORK FOR
SPEECH SYNTHESIS” in Proc. ICASSP 2019.
[Wang+ ‘19] S. Wang et al., “NEURAL SOURCE-FILTER-BASEDWAVEFORM MODEL FOR STATISTICAL
PARAMETRIC SPEECH SYNTHESIS,” in Proc. ICASSP 2019.
29
30. 参考文献
[Desai+, ‘10] S. Desai et al., “Spectral mapping using artificial neural networks for voice conversion. IEEE
Transactions on Audio, Speech, and Language Processing,” 18(5), 954-964, 2010.
[Sun+, ‘16] L. Sun et al., “Phonetic posteriorgrams for many-to-one voice conversion without parallel data
training,” in Proc ICME, 2016.
[Kaneko+, ‘16] T. Kaneko et al., “CycleGAN-VC Parallel-Data-Free Voice Conversion Using
Cycle-Consistent Adversarial Networks,” in Proc. EUSIPCO 2016.
[Saito+, ‘18] Y. Saito et al., “Non-parallel voice conversion using variational autoencoders conditioned by
phonetic posteriorgrams and d-vectors,” in Proc. ICASSP, 2018.
[Kameoka+, ‘18] H. Kameoka et al., “StarGAN-VC Non-parallel many-to-many voice conversion with star
generatiave adversarial networks,” arXiv preprint, arXiv:1806.02169,, 2018.
[Miyoshi+, ‘17] H. Miyoshi et al., “Voice Conversion Using Sequence-to-Sequence Learning of Context
Posterior Probabilities,” in Proc INTERSPEECH 2017.
[Polyak+ ‘19] A. Polyak et al., “Attention-Based WaveNet Autoencoder for Universal Voice Conversion”, in
Proc. ICASSP, 2019.
[Tobing+, ‘19] P. Lumban Tobing, “VOICE CONVERSION WITH CYCLIC RECURRENT NEURAL
NETWORK AND FINE-TUNED WAVENET VOCODER,” in Proc. ICASSP, 2019.
[Tanaka+, ‘19] K. Tanaka et al., “ATTS2S-VC: SEQUENCE-TO-SEQUENCE VOICE CONVERSION WITH
ATTENTION AND CONTEXT PRESERVATION MECHANISMS, ” in Proc. ICASSP, 2019.
[Zhang+, ‘19] J. Zhang et al., “IMPROVING SEQUENCE-TO-SEQUENCE VOICE CONVERSION BY
ADDING TEXT-SUPERVISION,” in Proc. ICASSP, 2019.
30