Successfully reported this slideshow.
Upcoming SlideShare
×

• Full Name
Comment goes here.

Are you sure you want to Yes No

1. 1. Theory of Domain Adaptation Mark Chang 2019/09/09
2. 2. Outlines • Generalization Bound of Learning from Single Domain • Problem of Domain Adaptation • Generalization Bound of Domain Adaptation • Domain Adaptation Example
3. 3. Generalization Bound of Learning from Single Domain data training data testing data sampling sampling hypothesis h hypothesis h training algorithm: minimize training error change the hypothesis h ✏(h) testing Error
4. 4. Generalization Bound of Learning from Single Domain • Learning is feasible when is small -> is small • With 1-ẟ probability, the following inequality is satisfied ✏(h) numbef of training instances VC-Dimension (model complexity) ✏(h)  ˆ✏(h) + r 8 n log( 4(2n)d )
5. 5. Problem of Domain Adaptation https://www.semanticscholar.org/paper/Attribute-Based- Synthetic-Network-(ABS-Net)%3A-more-Lu- Li/2c3138782317a97526a83a7ce264c0c772ddf7e3 training data: MNIST testing data : MNIST with gray-scale words and background
6. 6. Problem of Domain Adaptation data (source domain) data (target domain) training data testing data testing data ✏S(h)  ˆ✏S(h) + r 8 n log( 4(2n)d ) ˆ✏S(h) ✏S(h) ✏T (h) Generalization Bound of Domain Adaptation
7. 7. Problem of Domain Adaptation • Distance between source feature and target feature source domain target domain 1 target domain 2 small distance large distance
8. 8. Problem of Domain Adaptation • Distance between source labeling function and target labeling function source domain target domain 1 target domain 2 feature: label: 1 0 1 0 1 feature: label: 1 0 1 1 0 feature: label: 1 1 0 1 0 small distance large distance
9. 9. Generalization Bound of Domain Adaptation
10. 10. Generalization Bound of Domain Adaptation source domain data DS , fs target domain data DT, fT ✏S(h) ✏T (h) the distance between source feature DS and target feature DT the distance between source labeling function fS and target labeling function fT ✏T (h)  ✏s(h)+d1(DS, DT )+min ⇣ EDS [|fS(x) fT (x)|], EDT [|fS(x) fT (x)|] ⌘
11. 11. The Distance between Source Feature DS and Target Feature DT d1(DS, DT ) = 2 sup B2B PrDS [B] PrDT [B] B DS DT B1 B2 B = B1 [ B2 PrDS [B] PrDT [B]
12. 12. The Distance between Source Feature DS and Target Feature DT d1(DS, DT ) = 2 sup B2B PrDS [B] PrDT [B] B DS DT = + + =1 + =1 =
13. 13. B DS DT B1 B = B1 [ B2 The Distance between Source Feature DS and Target Feature DT • Searching for the supremum :d1(DS, DT ) = 2 sup B2B PrDS [B] PrDT [B] B DS DT B1 B = B1 [ B2 B DS DT B1 B2 B = B1 [ B2 B DS DT B1 B2 B = B1 [ B2 supremum
14. 14. The Distance between Source Labeling Function fS and Target Labeling Function fT feature label 1 0 1 0 1 feature label 1 0 1 1 0 feature label 1 0 1 0 1 feature label 1 1 0 1 0 Source: Target: Source: Target: EDS [|fS(x) fT (x)|] = 0.4 EDS [|fS(x) fT (x)|] = 0.8 min ⇣ EDS [|fS(x) fT (x)|], EDT [|fS(x) fT (x)|] ⌘
15. 15. Problem of d1(DS,DT) • Hard to Estimate by Finite Samples • Can be Over Estimate DS DT B1 B2 … d1(DS, DT ) = 2 sup B2B PrDS [B] PrDT [B] B B = B1 [ B2 [ · · ·
16. 16. The HΔH-Distance dH H(DS, DT ) = 2 sup h0,h”2H Prx⇠DS [h0 (x) 6= h”(x)] Prx⇠DT [h0 (x) 6= h”(x)] h0 (x) = 0 h0 (x) = 1h0 (x) = 1 h”(x) = 0 h”(x) = 1h”(x) = 0 h0 (x) = h”(x) h0 (x) 6= h”(x) h0 (x) = h”(x) DS DT h0 h”
17. 17. The HΔH-Distance • Searching for the supremum (Training) : = 2 sup h0,h”2H Prx⇠DS [h0 (x) 6= h”(x)] Prx⇠DT [h0 (x) 6= h”(x)] B DS DT h0 h” B DS DT h0 h” B DS DT h0 h” B DS DT h0 h” supremum
18. 18. m training samples The HΔH-Distance • can be estimated from finite samplesdH H(DS, DT ) dH H(DS, DT )  ˆdH H(US, UT ) + 4 r 1 m log( 2(2m)2d ) Source Domain Data DS Target Domain Data DT US UT m training samples dH H(DS, DT ) ˆdH H(US, UT ) distance between DS and DT distance between US and UT
19. 19. The HΔH-Distance • can alleviate the problem of over-estimationdH H(DS, DT ) DS DT B h0 h”
20. 20. The Distance between Source Labeling Function fS and Target Labeling Function fT feature label 1 0 1 0 1 feature label 1 0 1 1 0 feature label 1 0 1 0 1 feature label 1 1 0 1 0 h⇤ (x) 1 0 1 0 0 h⇤ (x) 1 0 0 0 0 Source: Target: Source: Target: = 0.2 + 0.2 = 0.4 = 0.4 + 0.4 = 0.8 = ✏S(h⇤ ) + ✏T (h⇤ ), such that h⇤ = arg min h2H ✏S(h) + ✏T (h)
21. 21. Generalization Bound of Domain Adaptation the distance between source feature DS and target feature DT the distance between source labeling function fS and target labeling function fT ✏T (h)  ✏S(h) + 1 2 dH H(DS, DT ) + ✏T (h)  ✏s(h)+d1(DS, DT )+min ⇣ EDS [|fS(x) fT (x)|], EDT [|fS(x) fT (x)|] ⌘ to be estimated by hypothesis