SlideShare a Scribd company logo
1 of 19
Download to read offline
1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
IMPROVING VOICE SEPARATION BY INCORPORATING
END-TO-END SPEECH RECOGNITION
Hiroshi Sekiguchi, Morikawa Lab
書誌情報
• “IMPROVING VOICE SEPARATION BY INCORPORATING END-
TO-END SPEECH RECOGNITION”,
Naoya Takahashi1,2, Mayank Kumar Singh3, Sakya Basak4, Parthasaarathy
Sudarsanam5, Sriram Ganapathy4, Yuki Mitsufuji1
1Sony Corporation, Japan, 2University of Tsukuba, Japan
3Indian Institute of Technology Bombay, India, 4Indian Institute of Science,
India
5Sony India Software Centre, India
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), 2020, pp. 41-45, doi:
10.1109/ICASSP40776.2020.9053845.
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE
PROCESSING, VOL. 29, 2021
https://ieeexplore.ieee.org/document/9053845 2
概要
• 大規模な音声データで学習させたEnd –to-End音声認識Neural
Network(E2EASR)を転移させて、重畳音声分離の性能を向上さ
せる
• E2EASRが学習した音声データの音韻と言語情報が、重畳音声
分離に効果がある
• E2EARは、fine-tuningにおいて、用意できる学習データが少量
の場合の転移学習で、下流のタスクの性能向上が実現可能
3
モティベーション
• 従来の重畳音声分離技術は、音響信号レベルの特徴量(例えば、
スペクトラム情報)を利用する手法だが、音韻や言語レベルの情
報を利用することの効果に関心があった
• 聴覚の分離機能は、音韻や言語情報も使って次に来るセマン
ティック情報を予測していると考えられるので興味を持った
4
アジェンダ
• 背景
• 提案手法
• 評価
• まとめ
• 感想
5
背景-1
• Speech enhancement(音声強調)の分野
– 重畳音声分離(Voice separation)
– 声楽音声の分離(Singing voice separation)
– 背景雑音除去
– 部屋の反響除去
• 従来の重畳音声分離法 → 音響レベルの特徴量を処理して分離
– スペクトラムクラスタリング
– Computational auditory scena analysis(聴覚物体分析モデル)
– Non-negative matrix factorization(NMF)
– 深層学習
• 分離性能向上が必要な利用シーン
– ノイズがシビアなシーン
– 学習のための音声データが少量に限るドメインのシーン
6
背景-2
• 新しいトレンド:音響レベルに追加して、別の情報を使って
ノイズ除去性能を上げる
– 唇の動き(ビデオ情報)との併用→システムが高価、occlusion問題
– 目的音声の音韻や言語情報を使う → 本論文で扱うテーマ
学習済み音声認識 + 転移学習
→ 重畳音声分離のドメインで、分離性能向上
7
提案手法
• 音声認識Neural Networkを音声分離への転移学習
– 転移元ドメインの音声と転移先のドメインの音声の質が大きく異なるケース
• 音声分離(転移先ドメイン)は、非コントロール環境下の録音音声(背景雑音あり)
cf.音声認識(転移元ドメイン)は、コントロール環境下の録音音声(スタジオ内)
• 厳しいノイズ環境でも音声分離性能が向上する
– 学習データが少ないドメインの音声分離のケース:例)声楽の歌声分離
• 少ない学習データドメインでも転移学習が良好な分離性能をもたらす
8
E2EASR
Networkを
転移
大規模音声
データ
単語
文章
教師あり学習
E2EASR 音声分離
教師あり学習
転移元ドメイン
転移先ドメイン
重畳音声
データ
目的
音声
End-to-End音声認識(E2EASR)
• E2EASR:Hybrid CTC/attention-based E2E architectureを使用したESPnet
• 入力:音響特徴量
• 出力:文字系列
• 転移元Deep features: BLSTM encoderの出力ℎ𝑡
– 音韻や言語特徴量
9
connectionist temporal
classification
E2ESR概要ブロック図 E2ESR詳細ブロック図
• E2EASRのDeep Features出力を、音声分離部で活用する
– Domain Translation
• E2EASR出力と音声分離Encoder出力の
タイミングとformat整合
• 6 x 1-D Conv with 256 filters
音声分離への転移学習
• 音声分離は実績があるConv-TasNetを使用:
– Loss関数: Scaled Invariant SDR
10
c
c
Domain translation
Domain translation
音声分離(Conv-TasNet)
E2EASRを転移し
音声分離で活用
c : concatenate
音声分離への転移学習
• 音声分離部の学習時:
① 学習済E2EASRにClean音声入力
② Clean音声のE2EASR特徴出力をオラクル出力とし、
音声分離部のDomain translationへ入力
③ 上記と同時に、ノイジーな音声データを音声分離部に
入力
④ 分離部出力と、clean音声のSDRをloss関数にして、
Backpropagationで分離部のNetwork係数を学習
• 音声分離のテスト時:
① Clean音声は無く、重畳音声のみが存在するので、重
畳音声を別の音声分離システムで分離し、暫定的な
clean音声を暫定予測する
② この時の別の音声分離システムとは、Conv-TasNet
分離ブロックのDomain translationの入力をゼロにし
たもので代用→もともとdeep featureはスパースなの
で、ゼロに設定しても、近似の誤差は少ない
③ 別の音声分離システムの出力をE2EASRに入力し、暫
定的に分離した音声に対応したdeep featuresを得る
④ 音声分離部の音声入力に重畳音声を入れて、分離後音
声を得る
11
⑤ E2EASRからの音韻言語特徴量(Deep
features)は、学習時はオラクルなcleanデー
タのものだが、テスト時は別の音声分離シス
テムで予測しただけの擬似cleanデータのも
のなので、分離音声の予測に誤りが含まれる
→このgapを埋めるために、E2EASRでの
Deep features抽出と音声分離を繰り返す
①
②
③
④
①
② ③
④
⑤
声楽の歌声分離への転移
• 声楽の歌声分離はMulti-scale MDenseNetを使用
– Loss関数:MSE in Mel spectrogram
12
Domain translation
c
MDenseNet
E2EASRを転移し
声楽の歌声分離で活用 声楽の歌声分離(Multi-scale MDenseNet)
c
c
c : concatenate
Mel
spectrogram
Mel
spectrogram
• E2EASRのDeep Features出力を
声楽の歌声分離部で活用する
評価方法
• 2つのタスクで評価
① 重畳音声(複数話者音声)+ノイズから目的の音声を分離: 激烈な背景ノイズ
② 声楽から歌声を分離:学習データが少量であるドメインへの転移学習
• データーセット
① 音声認識学習
◼ 音声データ:
◼ LibriSpeechデータセット:960時間の音声
◼ コントロール環境下での録音音声
② 重畳音声(複数話者音声)+ノイズから目的の音声を分離
◼ 音声データ
◼ AVSpeechデータセット:4700時間のYouTubeビデオ音声の一部を使用
◼ 非コントロール環境下での録音音声
◼ 学習データ: 100時間、テストデータ:15時間
◼ ノイズデータ
◼ AudioSetデータセット:YouTubeビデオの10秒のノイズクリップ
◼ 重畳音声生成+ノイズ付加:
◼ AVSpeechから複数話者音声をランダム選択して重畳+AudioSetノイズ(エネルギー比率3:1)
③ 声楽から歌声を分離
◼ 声楽データ
◼ MUSDBデータセット:学習 100曲(6.7時間:少量データ)、テスト 50曲
◼ 3つのデータ:声楽(楽曲+歌声)、楽曲のみ、歌声のみ、 13
評価方法
• ベースライン
① 重畳音声(複数話者音声)+ノイズから目的の音声を分離: 激烈な背景ノイズ
◼ Conv-TasNet オリジナル: loss関数は、Permutation Invariant Training(PIT)
◼ 唇の動きの学習結果を付加したConv-TasNet:
◼ 唇の動きの学習Network:Autoencoder:3 conv層+2 linear層+3 transposed conv層
◼ 唇の動きの学習データ:唇の領域を種々96x96ピクセルのpatchにcrop
◼ 唇の動きのdeep features: Autoencoderのbottleneck層のactivation
② 声楽から歌声を分離:小学習データのドメインへの転移学習
◼ Conv-TasNet オリジナル
◼ ただし、E2EASR特徴量が持つ有効性を示す→SOTA達成が目的ではない
14
評価結果
• 重畳音声(複数話者音声)+ノイズから目的の音声を分離: 激烈な背景ノイ
ズ
• 結果
– 重畳音声+背景ノイズという
悪環境でもE2EASR特徴量を
用いることで、劣悪な障害音
にロバストな分離性能が得られた
– 提案方法は唇の動画クリップで
学習した特徴量を付加した
Conv-TasNetをも凌駕した
– Test時に、本来は手に入らない
clean音声(Oracle)をE2ESDR
に入れた場合(Oracle E2EASR features)と比較しても、0.2dBの差しかない
→ E2EASR特徴量を組み入れが、重畳音声の分離において、目的音声以外の障害音に対し
てロバストである
15
本提案
評価結果
• 声楽から歌声を分離
• 結果
– 本提案法で、ベースラインを
凌駕する性能を得る
• E2EASRの学習は、声楽の歌声
とは異なる音声で学習したにも
かかわらず、ベースラインを凌駕できた
– Test時に、本来は手に入らない
clean音声(Oracle)をE2ESDR
に入れた場合(Oracle E2EASR features)と比較しても、0.2dBの差しかない
→ E2EASR特徴量を組み入れが、歌声の分離において、背景の楽音に対してロバ
ストである
16
本提案
まとめ
• End-to-Endの音声認識Network(E2EASR)を音声分離に活用するための転移学習手法
を提案
• E2EASR特徴量を用いることで、音声分離と音声強調(ノイズ削減)を同時に行う
ことがシミュレーションで確認できた
• 唇の動きを追加するAV法をも凌駕する性能を得た
• 学習データが少ないドメインのfine-tuningで有効な性能を出すことできる
17
感想
• 聴覚は既に構築したセマンティックな知識を用いて次に来る語彙を予
測して分離に活用しているだろうとする説がある。これを、機械学習
の分野で実現する一方法を見た気がした。音声認識で得た音韻や言語
などのセマンティックな特徴量を、転移学習を用いて音声分離の改善
に繋げる点が感心した点。
• 音声認識Networkのどの特徴マップをdeep featuresとするべきかは、
面白い課題。
18
END
19

More Related Content

What's hot

Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesisAnkita Jadhao
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videos
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videosAdria Recasens, DeepMind – Multi-modal self-supervised learning from videos
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videosCodiax
 
Speech processinglecworkshop
Speech processinglecworkshopSpeech processinglecworkshop
Speech processinglecworkshopd_govind
 
W4A 2012-Federico-Furini_AutomaticCaptioning
W4A 2012-Federico-Furini_AutomaticCaptioningW4A 2012-Federico-Furini_AutomaticCaptioning
W4A 2012-Federico-Furini_AutomaticCaptioningMaria Federico
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemREHMAT ULLAH
 

What's hot (10)

Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesis
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videos
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videosAdria Recasens, DeepMind – Multi-modal self-supervised learning from videos
Adria Recasens, DeepMind – Multi-modal self-supervised learning from videos
 
Speech processinglecworkshop
Speech processinglecworkshopSpeech processinglecworkshop
Speech processinglecworkshop
 
W4A 2012-Federico-Furini_AutomaticCaptioning
W4A 2012-Federico-Furini_AutomaticCaptioningW4A 2012-Federico-Furini_AutomaticCaptioning
W4A 2012-Federico-Furini_AutomaticCaptioning
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 

Similar to [DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION

Incremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIncremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIDES Editor
 
Marathi Isolated Word Recognition System using MFCC and DTW Features
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesMarathi Isolated Word Recognition System using MFCC and DTW Features
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesIDES Editor
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUSYuki Saito
 
Curriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory CourseCurriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory Coursesipij
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processingdiegogee
 
Sounf forge - R.D.Sivakumar
Sounf forge - R.D.SivakumarSounf forge - R.D.Sivakumar
Sounf forge - R.D.SivakumarSivakumar R D .
 
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...DataScienceConferenc1
 
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...DataScienceConferenc1
 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptxMounika715343
 
Electrical/DSP Engineer - M.A.Sc.
Electrical/DSP Engineer - M.A.Sc.Electrical/DSP Engineer - M.A.Sc.
Electrical/DSP Engineer - M.A.Sc.AndreGirard37
 
Sound Forge - R.D.Sivakumar
Sound Forge - R.D.SivakumarSound Forge - R.D.Sivakumar
Sound Forge - R.D.SivakumarSivakumar R D .
 
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...NUGU developers
 
Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaokeRenyuan Lyu
 
B.science ii report
B.science ii reportB.science ii report
B.science ii reportBolin Loong
 
Autotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismAutotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismIRJET Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Sound recording glossary by Liam Oven for Unit 73
Sound recording glossary by Liam Oven for Unit 73Sound recording glossary by Liam Oven for Unit 73
Sound recording glossary by Liam Oven for Unit 73ItsLiamOven
 

Similar to [DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION (20)

Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
 
Research_Wu.pptx
Research_Wu.pptxResearch_Wu.pptx
Research_Wu.pptx
 
Incremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIncremental Difference as Feature for Lipreading
Incremental Difference as Feature for Lipreading
 
Marathi Isolated Word Recognition System using MFCC and DTW Features
Marathi Isolated Word Recognition System using MFCC and DTW FeaturesMarathi Isolated Word Recognition System using MFCC and DTW Features
Marathi Isolated Word Recognition System using MFCC and DTW Features
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
 
Curriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory CourseCurriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory Course
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processing
 
Sounf forge - R.D.Sivakumar
Sounf forge - R.D.SivakumarSounf forge - R.D.Sivakumar
Sounf forge - R.D.Sivakumar
 
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...
 
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...
[DSC Europe 23] Paweł Ekk-Cierniakowski - Video transcription with deep learn...
 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptx
 
Electrical/DSP Engineer - M.A.Sc.
Electrical/DSP Engineer - M.A.Sc.Electrical/DSP Engineer - M.A.Sc.
Electrical/DSP Engineer - M.A.Sc.
 
Sound Forge - R.D.Sivakumar
Sound Forge - R.D.SivakumarSound Forge - R.D.Sivakumar
Sound Forge - R.D.Sivakumar
 
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
 
Ry pyconjp2015 karaoke
Ry pyconjp2015 karaokeRy pyconjp2015 karaoke
Ry pyconjp2015 karaoke
 
B.science ii report
B.science ii reportB.science ii report
B.science ii report
 
Autotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismAutotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualism
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Sound recording glossary by Liam Oven for Unit 73
Sound recording glossary by Liam Oven for Unit 73Sound recording glossary by Liam Oven for Unit 73
Sound recording glossary by Liam Oven for Unit 73
 

More from Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについてDeep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLMDeep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究についてDeep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP
 

More from Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について【DL輪読会】Hopfield network 関連研究について
【DL輪読会】Hopfield network 関連研究について
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 

Recently uploaded

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION