This document presents a lab seminar on semi-supervised learning. It begins with background on semi-supervised learning and examples of applications. It then discusses common semi-supervised learning methods like EM with generative models, co-training, transductive SVMs, and graph-based methods. Next, it covers assumptions of semi-supervised learning, noting the utility of unlabeled data depends on problem structure matching model assumptions. Finally, it proposes future work on multi-edge graph-based semi-supervised learning.
Injustice - Developers Among Us (SciFiDevCon 2024)
Semi-Supervised Learning Lab Presentation
1. Nov. 23rd, 2009
{ On SSL, and beyond }
- Theories, Methods, and a Possible Suggestion on Semi-Supervised Learning -
Lab Seminar Presentation
Eunjeong Park
6. Background the Question (1/2)
• Statistical learning methods require LOTS of training data
– But since we only have a limited amount of labeled data,
– Can we figure out a way for our learning algorithms to take
advantage of all the unlabeled data?
Labeled Unlabeled …
9. Semi-Supervised
Learning Methodology [1]
• Generative models
– Unlabeled data is used to to either modify or reprioritize hypotheses obtained from
labeled data alone
– Given the Bayesian formula:
p( x | y ) P( y )
P( y | x) =
p( x)
we can easily discover that p(x) influences p(y|x)
– Mixture models with EM is in this category, and to some extent self-training, too
• Discriminative models
– Original discriminative training cannot be used for SSL, since p(y|x) is estimated
ignoring p(x)
– To solve the problem, p(x) dependent terms are often brought into the objective
function, which amounts to assuming p(y|x) and p(x) share parameters
– Transductive SVM, Gaussian processes, information regularization, graph-based
methods are in this category
※ For more on GM, DM refer to Appendix 1.
10. Semi-Supervised
Learning Previous methods
SSL Semi-Supervised Learning
• EM w/ Generative Mixture Models (Nigam et al., 2000; Miller & Uyar, 1997)
•Self-Training
• Co-Training and Multiview Learning (Blum & Mitchell, 1998; Goldman & Zhou, 2000)
• TSVMs (Bennett et al., 1999; Joachims, 1999)
•Gaussian Processes
•Information Regularization
•Entropy Minimization
• Graph-based methods (Blum & Chawla, 2001)
Ref [1], [2] reorganized
※ For more on the use of above methods, refer to Appendix 2.
11. Semi-Supervised Previous methods:
Learning
EM w/Generative Models (1/3)
Basic EM Algorithm Incorporated w/ unlabeled data [3]
12. Semi-Supervised Previous methods:
Learning
EM w/Generative Models (2/3)
• In a binary classification problem, if we assume each
class has a Gaussian distribution, then we can use
unlabeled data to help parameter estimation. [1]
13. Semi-Supervised Previous methods:
Learning
EM w/Generative Models (3/3)
14. Semi-Supervised Previous methods:
Learning
Co-Training (1/4)
Professor Cho My Advisor
15. Semi-Supervised Previous methods:
Learning
Co-Training (2/4)
• Key Idea: Classifier1 and Classifier2 must…
– Correctly classify labeled examples
– Agree on classification of unlabeled
Classifier 1: Hyperlinks only Classifier 2: Page only
Professor Cho My Advisor
16. Semi-Supervised Previous methods:
Learning
Co-Training (3/4) [4]
• Given: labeled data L, unlabeled data U
• Loop:
– Train g1 (hyperlink classifier) using L
– Train g2 (page classifier) using L
– Allow g1 to label p positive, n negative examples from U
– Allow g2 to label p positive, n negative examples from U
– Add these self-labeled examples to L
Answer1 Answer2
Classifier1 Classifier2
Professor Cho My Advisor
17. Semi-Supervised Previous methods:
Learning
Co-Training (4/4)
• Experimental Settings:
– begin with 12 labeled web pages (academic course)
– provide 1,000 additional unlabeled web pages
– average error: learning from labeled data 11.1%;
– average error: cotraining 5.0%
19. Semi-Supervised Previous methods:
Learning
Graph-based methods
• Key idea: Define a graph where…
– nodes are labeled and unlabeled examples in the dataset, and
– edges (may be weighted) reflect the similarity of examples
– Then, nodes connected by a large-weight edge tend to have the
same label, and labels can propagation throughout the graph
• Note: Graph-based methods enjoy nice properties from spectral
graph theory
21. Assumptions on
SSL The Utility of Unlabeled Data
• Many SSL papers start with an introduction like…
“labeled data…is often very difficult and expensive to obtain, and
thus…unlabeled data holds significant promise in terms of vastly
expanding the applicability of learning methods [5]”
…but is this necessarily true?
– No! Do not take it for granted!
– Even though you don’t to have to spend as much time labeling
training data, you still need to spend much effort to design good
models / features / kernels / similarity functions for SSL!
• A good matching of problem structure with model assumption is
necessary to effectively use unlabeled data
– Bad matching can lead to degradation in classifier performance
22. Assumptions on
SSL An Example (1/2)
• Unlabeled Data Can Degrade Classification Performance of
Generative Classifiers [6] (1/2)
Naive Bayes classifier from data generated from a Naive Bayes model (left) and a TAN model (right).
Each point summarizes 10 runs of each classifier on testing data; bars cover 30 to 70 percentiles.
23. Assumptions on
SSL An Example (2/2)
Spam=0 Spam=1
#of the word ‘Loan’
Q1: Is this e-mail spam?
Q2: Was this e-mail written on a Sunday?
25. Future Work Multi-Edge Graph-Based SSL
• Aside to Semi-Supervised Classification, there are more…
– Semi-Supervised Clustering
– Semi-Supervised Regression
• There are also very similar methods such as…
– Active learning
• Based on the theories noted above, here’s my question:
f: x→y
<x1i> <x2i> <x3i> <x4i>
26. Future Work Multi-Edge Graph-Based SSL
• Ex1: • Ex2:
28. Appendix 1 GM vs. DM
• Discriminative models
– 방법론: 결정경계의 도입
– PR이 처음 레이더 신호 해석에 쓰이기 시작하던 1950년대부터, 1990년대 중반까지 사실상
PR을 대표하는 독점적인 방법이었음
– Rosenblat의 Perceptron(1958)과, PDP학파의 MLP(1986)역시 이러한 방향에서 주장된 것이
었음
• Generative models
– 1996년, PDP학파의 핵심멤버였던 Geoffrey Hinton에 의해 처음 소개됨 (Hinton, G., Using
Generative Models for Handwritten Digit Recognition, tPAMI, 1996.)
– 이로 인해, clustering 정도 밖에 없다고 여겨졌던 unsupervised learning도 다시 조명을 받
게 되었고, 곧 subspace analysis(ex: PCA)라는 우군을 얻게 되어 급격히 발전함
– 즉, class의 위치가 반드시 서로 다른 class간에 떨어져 있으리란 법이 없으며, 따라서 그보다
는 분포를 잘 묘사할 중심분포, 즉 혼재된 basis들로 기술해야한다는 관점임 (ex: 푸리에 급
수)
29. Appendix 2 The Use of SSL Methods[1]
• Do the classes produce well clustered data?
– EM w/ generative mixture models
• Is the existing supervised classifier complicated and hard to modify?
– Self-training
• Do the features naturally split into two sets?
– Co-training
• Already using SVM?
– TSVMs
• Is it true that two points with similar features tend to be in the same class?
– Graph-based methods
30. References
[1] Zhu, X., (2005). Semi-Supervised Learning Literature Survey, Computer Sciences,
University of Wisconsin-Madison.
[2] Seeger, M., (2001). Learning with labeled and unlabeled data (Technical Survey).
[3] Nigam, K., McCallum, A. K., Mitchell, T. M., (2000). Text Classification from
Labeled and Unlabeled Documents using EM, Machine Learning 39, 103-134.
[4] Mitchell, T. M., (1999). The Role of Unlabeled Data in Supervised Learning, Sixth
International Colloquium on Cognitive Science.
[5] Raina, R., Battle, A., Packer, B., Ng, A. Y., (2007). Self-taught Learning: Transfer
Learning from Unlabeled Data, 24th International Conference on Machine Learning.
[6] Cozman, F. G., Cohen, I., Cirelo M., (2002). Unlabeled data can degrade
classification performance of generative classifiers, FLAIRS-02.
[7] Balcan, M., Blum, A., Choi, P. P., Lafferty, J., Pantano, B., Rwebangira, M. R.,
Zhu, X., (2005). Person Identification in Webcam Images: An Application of Semi-
Supervised Learning, Proc. of the 22 st ICML Workshop on Learning with Partially
Classified Training Data, Bonn, Germany.