Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Domain Adaptation

Theory of Domain Adaptation
https://www.alexkulesza.com/pubs/adapt_mlj10.pdf

  • Login to see the comments

Domain Adaptation

  1. 1. Theory of Domain Adaptation Mark Chang 2019/09/09
  2. 2. Outlines • Generalization Bound of Learning from Single Domain • Problem of Domain Adaptation • Generalization Bound of Domain Adaptation • Domain Adaptation Example
  3. 3. Generalization Bound of Learning from Single Domain data training data testing data sampling sampling hypothesis h hypothesis h training algorithm: minimize training error change the hypothesis h ✏(h) testing Error
  4. 4. Generalization Bound of Learning from Single Domain • Learning is feasible when is small -> is small • With 1-ẟ probability, the following inequality is satisfied ✏(h) numbef of training instances VC-Dimension (model complexity) ✏(h)  ˆ✏(h) + r 8 n log( 4(2n)d )
  5. 5. Problem of Domain Adaptation https://www.semanticscholar.org/paper/Attribute-Based- Synthetic-Network-(ABS-Net)%3A-more-Lu- Li/2c3138782317a97526a83a7ce264c0c772ddf7e3 training data: MNIST testing data : MNIST with gray-scale words and background
  6. 6. Problem of Domain Adaptation data (source domain) data (target domain) training data testing data testing data ✏S(h)  ˆ✏S(h) + r 8 n log( 4(2n)d ) ˆ✏S(h) ✏S(h) ✏T (h) Generalization Bound of Domain Adaptation
  7. 7. Problem of Domain Adaptation • Distance between source feature and target feature source domain target domain 1 target domain 2 small distance large distance
  8. 8. Problem of Domain Adaptation • Distance between source labeling function and target labeling function source domain target domain 1 target domain 2 feature: label: 1 0 1 0 1 feature: label: 1 0 1 1 0 feature: label: 1 1 0 1 0 small distance large distance
  9. 9. Generalization Bound of Domain Adaptation
  10. 10. Generalization Bound of Domain Adaptation source domain data DS , fs target domain data DT, fT ✏S(h) ✏T (h) the distance between source feature DS and target feature DT the distance between source labeling function fS and target labeling function fT ✏T (h)  ✏s(h)+d1(DS, DT )+min ⇣ EDS [|fS(x) fT (x)|], EDT [|fS(x) fT (x)|] ⌘
  11. 11. The Distance between Source Feature DS and Target Feature DT d1(DS, DT ) = 2 sup B2B PrDS [B] PrDT [B] B DS DT B1 B2 B = B1 [ B2 PrDS [B] PrDT [B]
  12. 12. The Distance between Source Feature DS and Target Feature DT d1(DS, DT ) = 2 sup B2B PrDS [B] PrDT [B] B DS DT = + + =1 + =1 =
  13. 13. B DS DT B1 B = B1 [ B2 The Distance between Source Feature DS and Target Feature DT • Searching for the supremum :d1(DS, DT ) = 2 sup B2B PrDS [B] PrDT [B] B DS DT B1 B = B1 [ B2 B DS DT B1 B2 B = B1 [ B2 B DS DT B1 B2 B = B1 [ B2 supremum
  14. 14. The Distance between Source Labeling Function fS and Target Labeling Function fT feature label 1 0 1 0 1 feature label 1 0 1 1 0 feature label 1 0 1 0 1 feature label 1 1 0 1 0 Source: Target: Source: Target: EDS [|fS(x) fT (x)|] = 0.4 EDS [|fS(x) fT (x)|] = 0.8 min ⇣ EDS [|fS(x) fT (x)|], EDT [|fS(x) fT (x)|] ⌘
  15. 15. Problem of d1(DS,DT) • Hard to Estimate by Finite Samples • Can be Over Estimate DS DT B1 B2 … d1(DS, DT ) = 2 sup B2B PrDS [B] PrDT [B] B B = B1 [ B2 [ · · ·
  16. 16. The HΔH-Distance dH H(DS, DT ) = 2 sup h0,h”2H Prx⇠DS [h0 (x) 6= h”(x)] Prx⇠DT [h0 (x) 6= h”(x)] h0 (x) = 0 h0 (x) = 1h0 (x) = 1 h”(x) = 0 h”(x) = 1h”(x) = 0 h0 (x) = h”(x) h0 (x) 6= h”(x) h0 (x) = h”(x) DS DT h0 h”
  17. 17. The HΔH-Distance • Searching for the supremum (Training) : = 2 sup h0,h”2H Prx⇠DS [h0 (x) 6= h”(x)] Prx⇠DT [h0 (x) 6= h”(x)] B DS DT h0 h” B DS DT h0 h” B DS DT h0 h” B DS DT h0 h” supremum
  18. 18. m training samples The HΔH-Distance • can be estimated from finite samplesdH H(DS, DT ) dH H(DS, DT )  ˆdH H(US, UT ) + 4 r 1 m log( 2(2m)2d ) Source Domain Data DS Target Domain Data DT US UT m training samples dH H(DS, DT ) ˆdH H(US, UT ) distance between DS and DT distance between US and UT
  19. 19. The HΔH-Distance • can alleviate the problem of over-estimationdH H(DS, DT ) DS DT B h0 h”
  20. 20. The Distance between Source Labeling Function fS and Target Labeling Function fT feature label 1 0 1 0 1 feature label 1 0 1 1 0 feature label 1 0 1 0 1 feature label 1 1 0 1 0 h⇤ (x) 1 0 1 0 0 h⇤ (x) 1 0 0 0 0 Source: Target: Source: Target: = 0.2 + 0.2 = 0.4 = 0.4 + 0.4 = 0.8 = ✏S(h⇤ ) + ✏T (h⇤ ), such that h⇤ = arg min h2H ✏S(h) + ✏T (h)
  21. 21. Generalization Bound of Domain Adaptation the distance between source feature DS and target feature DT the distance between source labeling function fS and target labeling function fT ✏T (h)  ✏S(h) + 1 2 dH H(DS, DT ) + ✏T (h)  ✏s(h)+d1(DS, DT )+min ⇣ EDS [|fS(x) fT (x)|], EDT [|fS(x) fT (x)|] ⌘ to be estimated by hypothesis
  22. 22. Domain Adaptation Example
  23. 23. reduce reduce dH H(DS, DT ) ✏T (h)  ✏S(h) + 1 2 dH H(DS, DT ) +
  24. 24. About the Speaker Mark Chang • Email: ckmarkoh at gmail dot com • Facebook: https://www.facebook.com/ckmarkoh.chang

×