23. 実験設定
データセット
CIFAR-10 (60,000 32 x 32 color images, 10 classes)
- 50,000 images for training
- 10,000 images for test
ネットワークアー
キテクチャおよび
パラメータ
- WideResNet [4]
(N = 1, k = 4)
- SGD with Nesterov
momentum
- cross-entropy loss
- the initial learning rate = 0.1
- weight decay = 5.0 x10-4
- momentum = 0.9
- minibatch size = 64
- λ = 0.01
電⼦透かし 256 bit (T = 256)
埋め込み対象 conv2 group
[4] S. Zagoruyko and N. Komodakis. Wide residual networks. In Proc. of ECCV, 2016.
conv1
conv2
group
conv3
group
conv4
group
arg-pool
fc
M= 36864(3 x 3 x 64 x 64 )
27. Robustness: fine-tuning
! 透かしを埋め込んだモデルをfine-tuningして透かしが消えるか?
" 同⼀ドメインでのfine-tuning (CIFAR-10 → CIFAR-10)
" 異なるドメインでのfine-tuning (Caltech-101 → CIFAR-10)
! どちらのケースでもfine-tuningで透かしは消えない
テストエラーも埋め込みなし (8.04%) と同等
Note: Caltech-101 dataset were resized to 32 x 32 for compatibility with the CIFAR-10
dataset though their original sizes is roughly 300 x 200.
埋め込みロス
before after
28. ! モデル圧縮で透かしが消えるか?
" lossless : Huffman cording [5]
" lossy : weight quantization[5, 6], parameter pruning [5, 6]
Robustness: model compression
[5] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained
quantization and huffman coding. In Proc. of ICLR, 2016.
[6] S. Han, J. Pool, J. Tran, and W. J. Dally. Learning both weights and connections for efficient neural networks.
In Proc. of NIPS, 2015.
31. ! It is well-known that deep neural networks have many local
minima, and all local minima are almost optimal [8, 9].
Why Did Our Approach Work So Well?
[7] A. Choromanska et al. The loss surfaces of multilayer networks. In Proc. of AISTATS, 2015.
[8] Y. Dauphin et al. Identifying and attacking the saddle point problem in high-dimensional non-convex
optimization. In Proc. of NIPS, 2014.
Loss
Parameter space
Standard SGD
32. ! It is well-known that deep neural networks have many local
minima, and all local minima are almost optimal [8, 9].
! Our embedding regularizer guides model parameters toward
a local minima, which has the desired watermark.
! Let us assume that we want to
embed the watermark “11”…
Why Did Our Approach Work So Well?
[7] A. Choromanska et al. The loss surfaces of multilayer networks. In Proc. of AISTATS, 2015.
[8] Y. Dauphin et al. Identifying and attacking the saddle point problem in high-dimensional non-convex
optimization. In Proc. of NIPS, 2014.
Loss
Parameter space
00 01 10 11
Detected
watermark
Standard SGD
SGD with
Embedding Loss
33. ! Limitations
" Watermark overwriting
" Robustness against distilling, model transformations
! Alternatives to the watermarking approach
" Digital fingerprinting
Future Work
…and many other things remain as future work. (see paper!)
35. Our code is available at https://github.com/yu4u/dnn-watermark .
Thank you!
For more details, please refer to…
Y. Uchida, Y. Nagai, S. Sakazawa, and S. Satoh,
“Embedding Watermarks into Deep Neural Networks,”
in Proc. of International Conference on Multimedia Retrieval 2017.