Satoshi Hara, Takanori Maehara. Convex Hull Approximation of Nearly Optimal Lasso Solutions. In Proceedings of 16th Pacific Rim International Conference on Artificial Intelligence, Part II, pages 350--363, 2019.
2. Background Lasso and Enumeration
n Lasso Typical approach for feature selection
min
$
1
2
'( − * + + - ( . =: 1 ( , ', * ∈ ℝ5 ×7×ℝ5
n Enumeration for feature selection [Hara & Maehara, AAAI’17]
• Helpful for gaining more insights of data.
2
Ordinary Lasso
• One global optimum, i.e.,
one feature set, is obtained.
Enumeration of Lasso
• Several possible solutions, i.e.,
multiple feature sets, are obtained.
I found one feature set that is
helpful for predicting energy
consumption.
Found:
{Wall Area, Glazing Area}
I found several feature sets
that are helpful for predicting
energy consumption.
Found:
{Wall Area, Glazing Area},
{Wall Area, Overall Height},
{Roof Area, Glazing Area}, …
3. Background Lasso and Enumeration
n Example Lasso Enumeration for 20Newsdata
• Identifying relevant words for article classification.
3
Selected words
in Lasso
solution
adb apple bios bus cable com controller
dos drivers duo fpu gateway ibm ide
mac motherboard simm vlb vram windows
4. Background Lasso and Enumeration
n Example Lasso Enumeration for 20Newsdata
• Identifying relevant words for article classification.
4
Selected words
in Lasso
solution
adb apple bios bus cable com controller
dos drivers duo fpu gateway ibm ide
mac motherboard simm vlb vram windows
Model7
Remove motherboard
cable
adb
drivers
Model8
Remove motherboard
cable
adb
drivers
Model9
Remove motherboard
cable
adb
drivers
Model4
Remove motherboard
cable
adb
drivers
Model5
Remove motherboard
cable
adb
drivers
Model6
Remove motherboard
cable
adb
drivers
Model1
Remove motherboard
cable
adb
drivers
Model2
Remove motherboard
cable
adb
drivers
Model3
Remove motherboard
cable
adb
drivers
Enumerated Models
5. Background Lasso and Enumeration
n Example Lasso Enumeration for 20Newsdata
• Identifying relevant words for article classification.
5
Selected
words in
Lasso
solution
adb apple bios bus cable com controller
dos drivers duo fpu gateway ibm ide
mac motherboard simm vlb vram windows
Model7
Remove motherboard
cable
adb
drivers
Model8
Remove motherboard
cable
adb
drivers
Model9
Remove motherboard
cable
adb
drivers
Model4
Remove motherboard
cable
adb
drivers
Model5
Remove motherboard
cable
adb
drivers
Model6
Remove motherboard
cable
adb
drivers
Model1
Remove motherboard
cable
adb
drivers
Model2
Remove motherboard
cable
adb
drivers
Model3
Remove motherboard
cable
adb
drivers
Enumerated Models
Drawback of Enumeration
Enumerated models can be just a combination
of a few representative patterns.
Exponentially many combinations of similar models can be found.
These similar models are not helpful for gaining insights.
6. Goal of This Study
n Goal
Find small numbers of diverse models.
large numbers similar
n Overview of the Proposed Approach
• Define a set of good models.
! " ≔ $: & $ ≤ "
• Find vertices of ! " .
Vertices = sparse models
Vertices are distinct -> diversity
6
! !Enumeration
" "
7. Outline
n Background and Overview
n Problem Formulation
n Proposed Method
n Experiments
n Summary
7
8. Properties of ! "
n ! " ≔ $: & $ ≔
'
(
)$ − + (
+ - $ ' ≤ "
• A set of models with sufficiently small Lasso objectives.
1. ! " consists of smooth boundaries
and non-smooth vertices.
• Smooth boundaries = dense models
• Non-smooth vertices = sparse models
2. A convex hull of the set of vertices /
can approximate ! " well.
• conv / ≈ ! "
8
9. Problem Approximation of ! "
n Our Approach
Approximate !(") by a set of % points & = () )*+
,
.
n To attain good approximation,
the vertices - of !(") should
be selected as &.
9
10. Problem Approximation of ! "
n Our Approach
Approximate !(") by a set of % points & = () )*+
,
.
n Question How to measure the approximation quality?
10
!(")
How similar
they are?
We use Hausdorff distance.
& = () )*+
,
11. Problem Approximation of ! "
n Def. Hausdorff distance between the two sets.
• Maximum margin in the non-overlapping region.
#$ %, %′ ≔ max sup
/∈1
inf
/5∈15
6 − 68 , sup
/8∈18
inf
/∈1
6 − 68
n We measure the approximation quality by using #$.
Problem Minimization of Hausdorff distance
min
9
#$ conv = , !(") , s. t. = ≤ C
11
%
%′
!(")
conv =Measure #$ = = EF FGH
I
12. Outline
n Background and Overview
n Problem Formulation
n Proposed Method
n Experiments
n Summary
12
13. Method Sampling + Greedy Selection
n Step1 Sampling points from the boundary of ! "
n Step2 Greedily select # points to minimize $%.
13
Step1 Sampling Step2 Greedy Selection
14. Step1 Sampling
n Note Want to sample vertices as much as possible.
n Proposed Sampling Method
• Take a random direction.
• Find an “edge” of ! " at that direction.
14
This method can sample
vertices with high probabilities.
15. Step1 Sampling
n Finding an “edge”
max$ %&', s. t. ' ∈ -(/) (%: random direction)
n Finding an “edge” by binary search
• Dual Problem
min
345
max
$
%&' − 7(8 ' − /)
• Find ' that satisfies 8 ' = /
by finding optimal 7 by using
binary search.
15
solvable with Lasso solvers
large 7
small 7
optimal 7
16. Method Sampling + Greedy Selection
n Step1 Sampling points from the boundary of ! "
n Step2 Greedily select # points to minimize $%.
16
Step1 Sampling Step2 Greedy Selection
17. Step2 Greedy Selection
n Original Problem
min
$
%& conv * , ,(.) , s. t. * ≤ 4
n Approximate , . with the sampled points 5.
• , . ≈ conv 5
min
$⊆8
%& conv * , conv 5 , s. t. * ≤ 4
• Remark
%& conv * , conv 5 = max
<∈8
min
<>∈?@AB $
C − C′
17
,(.)
conv(5)
conv *Measure %&
≈
18. Step2 Greedy Selection
n The problem is NP-hard in general.
• min
$⊆&
'( conv , , conv . , s. t. , ≤ 3
n Our Approach Greedy Selection
• Initialization step
Select one point 4 ∈ .
, 6 ← 4 , . ← . ∖ {4}, and ; ← 1
• While ; < 3
>4 ∈ max
A∈&
min
AB∈CDEF $ G
4 − 4′
, JK6 ← , J ∪ >4 , . ← . ∖ { >4}, and ; ← ; + 1
18
conv(.)
conv(,)
Greedily add one point to ,
that minimizes the objective.
19. Step2 Greedy Selection
n The problem is NP-hard in general.
• min
$⊆&
'( conv , , conv . , s. t. , ≤ 3
n Our Approach Greedy Selection
• Initialization step
Select one point 4 ∈ .
, 6 ← 4 , . ← . ∖ {4}, and ; ← 1
• While ; < 3
>4 ∈ max
A∈&
min
AB∈CDEF $ G
4 − 4′
, JK6 ← , J ∪ >4 , . ← . ∖ { >4}, and ; ← ; + 1
19
conv(.)
conv(,)
Greedily add one point to ,
that minimizes the objective.
20. Step2 Greedy Selection
n The problem is NP-hard in general.
• min
$⊆&
'( conv , , conv . , s. t. , ≤ 3
n Our Approach Greedy Selection
• Initialization step
Select one point 4 ∈ .
, 6 ← 4 , . ← . ∖ {4}, and ; ← 1
• While ; < 3
>4 ∈ max
A∈&
min
AB∈CDEF $ G
4 − 4′
, JK6 ← , J ∪ >4 , . ← . ∖ { >4}, and ; ← ; + 1
20
conv(.)
conv(,)
Greedily add one point to ,
that minimizes the objective.
21. Step2 Greedy Selection
n The problem is NP-hard in general.
• min
$⊆&
'( conv , , conv . , s. t. , ≤ 3
n Our Approach Greedy Selection
• Initialization step
Select one point 4 ∈ .
, 6 ← 4 , . ← . ∖ {4}, and ; ← 1
• While ; < 3
>4 ∈ max
A∈&
min
AB∈CDEF $ G
4 − 4′
, JK6 ← , J ∪ >4 , . ← . ∖ { >4}, and ; ← ; + 1
21
conv(.)
conv(,)
Greedily add one point to ,
that minimizes the objective.
22. Step2 Greedy Selection
n Details of computing !" ∈ max
'∈(
min
'+∈,-./ 0 1
" − "′
n 1. Computing min Quadratic Programming (QP)
• min
'+∈,-./ 0 1
" − "′
⇔ min
5
" − 6
7
8797 , s. t. 8 ≥ 0, 6
7
87 = 1
n 2. Computing max Lazy Update
• A naïve implementation requires searching over all " ∈ B.
• By using a monotonicity of the Hausdorff distance, we
can skip redundant computations and accelerate the
search.
22
23. Method Sampling + Greedy Selection
n Step1 Sampling points from the boundary of ! "
• Sampling random directions + Lasso + Binary Search
n Step2 Greedily select # points to minimize $%.
• Greedy selection
23
Step1 Sampling Step2 Greedy Selection
24. Outline
n Background and Overview
n Problem Formulation
n Proposed Method
n Experiments
n Summary
24
25. Synthetic Experiment Visualization of ! " and #
n Synthetic Problems
• 2D ver. $ =
1 1
1 1 + 1/40
, , =
1
1
• 3D ver. $ =
1 1 1
1 1 + 1/40 1
1 1 1 + 2/40
, , =
1
1
1
n Results
25
2D ver. 3D ver. Hausdorff dist.
2D ver.
3D ver.
26. Synthetic Experiment High-dimensional Data
n Synthetic data
• ! = #$% + '
• % ∼ ) 0, , , ,-. = exp −0.1|6 − 7|
• dimensionality of % = 100
n Result
• Huadorff dist. decreases
as 8 increases.
• Huadorff dist. decreases
as the sampling size 9 increases.
The effect is marginal, though.
In practice, 9 ≈ 1,000 would suffice.
26
27. Real-Data Experiment Diversity verification
n Data: 20Newsgroups
• Classification of news articles into two categories.
(ibm or mac)
• Feature selection = Identification of important words.
! ∈ ℝ$$%&': tf-idf weighted bag-of-words
( ∈ {0, 1}: categories of articles
# of data: 1168
n Model
• Linear logistic regression + ℓ$
n Baseline Methods [Hara & Maehara, AAAI’17]
• Enumeration Exact enumeration of top-K models
• Heuristic Skip similar models while enumeration.
27
28. Real-Data Experiment Diversity verification
n Comparison of the found 500 models
n Visualization with PCA
• Projected found models with PCA.
• The proposed method attained
the largest diversity.
28
Found Words
Enumeration 39
Heuristic 63
Proposed 889
apple macs macintosh
Enumeration ✘ ✘
Heuristic ✘
Proposed
Baseline methods found
combinations of a few
representative patterns only.
Baseline methods missed
some important words.
29. Summary
n Our Goal
• Find small numbers of diverse models for Lasso.
n Our Method
• Find “vertices” of a set of models ! " ≔ $: & $ ≤ "
• Problem: Hausdorff distance minimization.
• Method: Sampling + Greedy Selection
n Verified the effectiveness of the proposed method.
• The proposed method could
find points that can well approximate ! " .
obtain diverse models than the existing enumeration
methods.
29
GitHub: /sato9hara/LassoHull