Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

**Scribd will begin operating the SlideShare business on December 1, 2020**
As of this date, Scribd will manage your SlideShare account and any content you may have on SlideShare, and Scribd's General Terms of Use and Privacy Policy will apply. If you wish to opt out, please close your SlideShare account. Learn more.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Qiaoling Liu, Lead Data Scientist, ... by MLconf 469 views
- Jeremy Nixon, Machine Learning Engi... by MLconf 372 views
- Jacob Eisenstein, Assistant Profess... by MLconf 275 views
- Venkatesh Ramanathan, Data Scientis... by MLconf 836 views
- Jennifer Marsman, Principal Softwar... by MLconf 498 views
- Daniel Shank, Data Scientist, Talla... by MLconf 890 views

553 views

Published on

Graphs are commonly used data structure for representing the real-world relationships, e.g., molecular structure, knowledge graphs, social and communication networks. The effective encoding of graphical information is essential to the success of such applications. In this talk I’ll first describe a general deep learning framework, namely structure2vec, for end to end graph feature representation learning. Then I’ll present the direct application of this model on graph problems on different scales, including community detection and molecule graph classification/regression. We then extend the embedding idea to temporal evolving user-product interaction graph for recommendation. Finally I’ll present our latest work on leveraging the reinforcement learning technique for graph combinatorial optimization, including vertex cover problem for social influence maximization and traveling salesman problem for scheduling management.

Published in:
Technology

No Downloads

Total views

553

On SlideShare

0

From Embeds

0

Number of Embeds

0

Shares

0

Downloads

38

Comments

11

Likes

2

No embeds

No notes for slide

Different from the previous scenario, here we get a single gigantic graph. So we apply stochastic training with truncated backpropagation through time, which is commonly used in recurrent neural network.

- 1. Graph Representation Learning with Deep Embedding Approach Hanjun Dai Ph.D. student in School of Computational Science & Engineering Georgia Institute of Technology
- 2. Drug discovery Dai et.al, structure2vec, ICML 2016 Recommendation Dai et.al, DeepCoevolve, Recsys DLRS 2016 Knowledge Graph Dai et.al, VRN, in submission Trivedi et.al, Know-Evolve, ICML 2017 TSPMaxcut Dai et.al, S2V-DQN, NIPS 2017 Vertex Cover Graph applications
- 3. Outline • Review of traditional approaches • Our architecture • Experiments on RNA and molecules • Extension to social network and recommendation • Application in graph combinatorial optimization
- 4. Review: RNA / Molecule property prediction Application: High throughput virtual screening C U U C A G Structured data Target Power Conversion Efficiency (PCE): Regression problem Handcrafted feature Binding affinity: Binary Classification Problem ,S={ }…… , C ,S={ }…… , U U U U C C A G 0 2 0 0 … … … … 0 2 AA AC AU AG GGGU
- 5. Problem with handcrafted features • Stage 1: Build kernel Matrix • Stage 2: Train classifier on top Not scalable for millions of data 𝑘(𝜒2, 𝜒3) Or High-dim Explicit Bag-of-words Feature Map 0 2 0 0 … … … … 0 2 AA AC AU AG GGGU Constructed features are not aware of task Review: RNA / Molecule property prediction
- 6. Review: Temporal Recommendation who will do what and when? ChristineAliceDavid Jacob TowelShoe Book 𝑅user item ≈ Matrix factorization 𝑈 𝑉 Epoch division 𝑡
- 7. Minimum vertex/set cover Advertisers: influence maximization Review: Graph Combinatorial Optimization 2 - approximation for minimum vertex cover Repeat till all edges covered: • Select uncovered edge with largest total degree Manually designed rule. Can we learn from data? NP-hard problems
- 8. Outline • Review of traditional approaches • Our architecture • Experiments on RNA and molecules • Extension to social network and recommendation • Application in graph combinatorial optimization
- 9. Intuitive understanding: local filters Filter applies to each local patch: Image Graph Filter applies to each 1-hop neighborhood:
- 10. 𝜇2 (1) 𝜇3 (0) 𝜇4 (0) 𝜇1 (0) 𝜒 𝑋1 𝑋2 𝑋3 𝐻3 𝐻1 𝐻2 𝐻4 𝑋4 𝜇2 (0) 𝜇3 (1) 𝜇4 (1) 𝜇1 (1) End to end learning 𝜇2 (1) 𝜇3 (1) 𝜇1 (1) …… 𝜇2 (𝑇) 𝜇3 (𝑇) 𝜇1 (𝑇) …… …… Iteration 1: Iteration 𝑇: Label 𝑦 classification/regression withparameter𝑉 Aggregate 𝜇1 (𝑇) 𝜇2 (𝑇) + + ⋮ = 𝜇 𝑎(𝑊, 𝜒)
- 11. Outline • Review of traditional approaches • Our architecture • Experiments on RNA and molecules • Extension to social network and recommendation • Application in graph combinatorial optimization
- 12. Experiment on RNA sequences 0 0.02 0.04 0.06 0.08 0.1 0.12 Ace2 Aft1 Aft2 Bas1 Cad1 Cbf1 Cin5 Cup9 Dal80 Gat1 Gcn4 Mata2 Mcm1 PWM LM SVR DNN CNN FS S2V 0 0.02 0.04 0.06 0.08 0.1 0.12 Met31 Met32 Msn1 Msn2 Nrg2 Pdr3 Pho4 Reb1 Rox1 Rpn4 Sko1 Stb5 Yap1 Yap3 Yap7 PWM LM SVR DNN CNN FS S2V • MITOMI 2.0 • 28 TFs • Regression problem MAE MAE 14 S2V Cmp RMSE 0.035 0.039 PCC 0.62 0.45 SCC 0.29 0.26 Dai et.al, Bioinformatics 2017
- 13. Experiment on Molecules Dataset Harvard clean energy project Size 2.3 million Avg node # 28 Avg edge # 33 Power Conversion Efficiency (PCE) (0 -12 %) predict Organic Solar Panel Materials Test MAE Test RMSE # parameters Mean predictor 1.986 2.406 1 WL level-3 0.143 0.204 1.6 m WL level-6 0.096 0.137 1378 m DE-MF 0.091 0.125 0.1 m DE-LBP 0.085 0.117 0.1 m
- 14. Experiment on Molecules 0.1M 1M 10M 100M 1000M 0.085 0.095 0.120 0.150 0.280 Parameter number MAE Embedded MF Embedded BP Weisfeiler-Lehman Level 6 Hashed WL Level 6 Embedding reduces model size by 10,000x ! [Dai, Dai & Song 2016]
- 15. Interpretable results Effective ( > 10 ) Ineffective ( < 0.5 ) • Sequence Motifs • Molecule fragments
- 16. Outline • Review of traditional approaches • Our architecture • Experiments on RNA and molecules • Extension to social network and recommendation • Application in graph combinatorial optimization
- 17. Extension to network analysis ℒ = 𝑣∈𝒱 𝑙𝑜𝑠𝑠(𝑦𝑣, 𝑦(𝜇 𝑣)) • One loss term per node
- 18. Experiment on network analysis 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% Citeseer Cora Pubmed Classification Accuracy on Citation Networks DeepWalk node2vec structure2vec Citeseer Cora Pubmed # nodes 3,327 2,708 19,717 # edges 4,732 5,429 44338 # classes 6 7 3 Label Rate 0.036 0.052 0.003 Features 3,703 1,433 500 Document classification in citation network
- 19. Experiment on network analysis 15 20 25 30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Blogcatalog 5 9 13 17 21 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9-Fraction of training- Wikipedia deepwalk node2vec gcn s2v Blogcatalog Wikipedia # nodes 10,312 4,777 # edges 333,983 184,812 # classes 39 40 task Group membership POS tag
- 20. Represent 𝑋1 𝑋2 𝑋3 𝑋4 𝑋5 𝐻8 𝐻9 𝐻6 𝐻7 𝐻4 𝐻1 𝐻5 𝐻2 𝐻3 𝑋6 LVM 𝐺 = (𝒱, ℇ) user/item raw features Interaction time/context time 𝑡0 𝑡2 𝑡1 𝑡3 [Dai, et al. 2016] Dynamic Graphs for Recommendation • Unroll the interaction along timeline 1. The bipartite interaction graph 2. The temporal ordering of events • Mini-batch training using truncated back propagation through time (BPTT)
- 21. Experiment on recommendation Favors ’recurrent’ events: watching TV programs; discussion on Forums; visiting restaurants
- 22. Outline • Review of traditional approaches • Our architecture • Experiments on RNA and molecules • Extension to social network and recommendation • Application in graph combinatorial optimization
- 23. Learning graph opt: Motivation Minimum vertex/set cover • Exponential time complexity Branch and Bound • Results too weak Constructive Approximation Cannot learn from solved instances!
- 24. Learning graph opt: Motivation • Data from same / similar distribution Social network: Barabási–Albert Road network: fixed graph with evolving edge weights Images taken from Wikipedia • Supervised learning? • Reinforcement learning! • No such supervision • Learning by trial and error
- 25. Learning graph opt: RL background [Minh, et al. Nature 2015] Greedy policy: 𝑖∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑄(𝑆, 𝑖) • State 𝑺: current screen • Reward 𝑹(𝒕): score you earned at current step • Action value function 𝑸(𝑺, 𝒊): your predicted future total rewards • Action 𝒊: move your board left / right • Policy 𝝅(𝒔): How to choose your action
- 26. Learning graph opt: RL on graphs [Dai, et al. NIPS 2017] min 𝑥 𝑖∈ 0,1 𝑖∈𝓥 𝑥𝑖 𝑠. 𝑡. 𝑥𝑖 + 𝑥𝑗 ≥ 1, ∀ 𝑖, 𝑗 ∈ 𝓔 Reward: 𝑟 𝑡 = −1 • State 𝑺: current partial solution • Action value function 𝑸(𝑺, 𝒊): Expected negative future loss • Greedy Policy 𝝅(𝒔): Add best node
- 27. Learning graph opt: action-value function • Parameterize Action value function 𝑄(𝑆, 𝑖) with structure2vec • Train with Deep Q-Learning (DQN) Bellman optimality equation
- 28. Learning graph opt: quantitative comparison approximation ratio ≈ 1 • A distribution of scale free networks • Optimal approximated by running CPLEX for 1 hour
- 29. 1 1.001 1.002 1.003 1.004 1.005 1.006 1.007 Generalization to large instances Learning graph opt: quantitative comparison • Train on small graphs with 50-100 nodes • Generalize to not only graphs from same distribution • But also larger graphs • Approximation ratio < 1.007
- 30. Learning graph opt: time-solution tradeoff Embedded MF CPLEX 1st CPLEX 2nd CPLEX 3rd CPLEX 4th 2-approx 2-approx + Embedding produces algorithm with good tradeoff ! RNN • Generate 200 Barabasi- Albert networks with 300 nodes • Let CPLEX produces 1st, 2nd, 3rd, 4th feasible solutions
- 31. Learning graph opt: real-world data http://snap.stanford.edu/netinf/#data MemeTracker graph: 960 nodes and 5000 edges Methods Approximation Ratio Optimal 1.00 ( 473 nodes) S2V-DQN 1.002 ( 474 nodes) MVCApprox-Greedy 1.222 (578 nodes) MVCApprox 1.408 (666 nodes) • Learning from sampled subgraphs
- 32. Learning graph opt: learned strategy • Learned a greedy algorithm which is different from known ones
- 33. Learning graph opt: other problems • Maximum cut • Traveling Salesman Problem optimal Solution found • Set cover Set cover image taken from Wikipedia Learn with Bipartite graph
- 34. Thanks to my collaborators in this project • Advisor: Le Song • Collaborators (alpha-beta order) Bo Dai Bistra Dilkina Elias Khalil Rakshit Trevidi Yichen Wang Yuyu Zhang
- 35. Q&A

No public clipboards found for this slide

Login to see the comments