Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Are we reaching a data science sing... by Natalino Busa 799 views
- Jisheng Wang at AI Frontiers: Deep ... by AI Frontiers 1670 views
- Deep Learning - The Past, Present a... by Lukas Masuch 275791 views
- 2.2.ppt.SC by AMIT KUMAR 2171 views
- Lecture 32 fuzzy systems by university of sar... 974 views
- Software Defined Networking (SDN) f... by POSSCON 537 views

License: CC Attribution-ShareAlike License

No Downloads

Total views

1,490

On SlideShare

0

From Embeds

0

Number of Embeds

6

Shares

0

Downloads

53

Comments

7

Likes

4

No notes for slide

- 1. Natalino Busa Head of Applied Data Science
- 2. Data Scientist, Big and Fast Data Architect Currently at Teradata Previously: Enterprise Data Architect at ING Senior Researcher at Philips Research Interests: Spark, Flink, Cassandra, Akka, Kafka, Mesos Anomaly Detection, Time Series, Deep Learning
- 3. Data Science: approaches Supervised: - you know what the outcome must be Unsupervised: - you don’t know what the outcome must be Semi-Supervised: - You know the outcome only for some samples
- 4. Popularity of Neural Networks: “The cat neuron” Andrew Ng, Jeff Dean et al: 1000 Machines 10 Million images 1 Billion connections Train for 3 days http://research.google.com/archive/unsupervised_icml2012.html
- 5. Popularity of Neural Networks: “AI at facebook” Yann LeCunn Director of AI research at Facebook Ask the AI what it sees in the image “Is there a baby?” Facebook’s AI: “Yes.” “What is the man doing?” Facebook’s AI: “Typing.” “Is the baby sitting on his lap?” Facebook’s AI: “Yes.” http://www.wired.com/2015/11/heres-how-smart-facebooks-ai-has-become/
- 6. Data Science: approaches Supervised: - you know what the outcome must be Unsupervised: - you don’t know what the outcome must be Semi-Supervised: - You know the outcome only for some samples
- 7. Unsupervised Learning - Clustering, Feature extraction Imagining, Medical data, Genetics, Crime patterns, Recommender systems, Climate hot spots analysis, anomaly detection … Given a set of items, it answers the question “how can we efficiently describe the collection? It defines a measure of “similarity” between items.
- 8. Supervised Learning - Classification Marketing Churn, Credit Loan, Success rate Insurance Defaulting, Health conditions and patologies Categorization of wine, real estates, … Given the values of some properties, it answers the question “to which class/group does this item belong?”
- 9. Classification: Dimensionality matters - Number of dimensions or features of your input data - Statistical relations, smoothness of the data - Embedded space input : 784 dimensions output: 10 classes input : 4 dimensions output: 3 classes 28x28 pixels
- 10. AI, complexity and models Does it do well on Training Data ? Does it do well on Test Data ? Bigger Neural Network (rocket engine) More Data (rocket fuel) yes yes no no Done? Different Architecture (new rocket) no https://www.youtube.com/watch?v=CLDisFuDnog
- 11. Evolution of Machine Learning Input Hand Designed Program Rule-based System Output Prof. Yoshua Bengio - Deep Learning https://youtu.be/15h6MeikZNg
- 12. Evolution of Machine Learning Input Hand Designed Program Input Rule-based System Output Hand Designed Features Mapping from features Output Classic Machine Learning Prof. Yoshua Bengio - Deep Learning https://youtu.be/15h6MeikZNg
- 13. Evolution of Machine Learning Input Hand Designed Program Input Input Rule-based System Output Hand Designed Features Mapping from features Output Learned Features Mapping from features Output Classic Machine Learning Representational Machine Learning Prof. Yoshua Bengio - Deep Learning https://youtu.be/15h6MeikZNg
- 14. Evolution of Machine Learning Input Hand Designed Program Input Input Rule-based System Output Hand Designed Features Mapping from features Output Learned Features Mapping from features Output Classic Machine Learning Input Learned Features Learned Complex features Output Mapping from features Representational Machine Learning Deep Learning Prof. Yoshua Bengio - Deep Learning https://youtu.be/15h6MeikZNg
- 15. “dendrites” Axon’s response Activation function From Biology to a Mathematical Model
- 16. Logit model: Perceptron 1 Layer Neural Network Takes: n-input features: Map them to a soft “binary” space ∑ x1 x2 xn f
- 17. Multiple classes: Softmax From soft binary space to predicting probabilities: Take n inputs, Divide by the sum of the predicted values ∑ x1 x2 xn f ∑ f softmax Cat: 95% Dog: 5% Values between 0 and 1 Sum of all outcomes = 1 It behaves like a probability, But it’s just an estimate!
- 18. Cost function: Supervised Learning The actual outcome is different than the desired outcome We measure the difference! This measure can be done in various ways: - Mean absolute error (MAE) - Mean squared error (MSE) - Categorical Cross-Entropy Compares estimated probability vs actual probability
- 19. Minimize cost: How to Learn? The cost function depends on: - Parameters of the model - How the model “composes” Goal : modify the parameters to reduce the error! Vintage math from last century
- 20. Build deeper networks Stack layers of perceptrons - “Sequential Network” - Back propagate the error SOFTMAX Input parameters Classes (estimated probabilities) Feed-forward Cost function supervised : actual output Correct parameters
- 21. Some problems - Calculating the derivative of the Cost function - can be error prone - Automation would be nice! - Complex network graph = complex derivative - Dense Layers (Fully connected) - Harder to converge - Number of parameters grows fast! - Overfitting and Parsimony - Learn “well”, generalization capacity - Be efficient in the number of parameters
- 22. Some Solutions - Calculating the derivative of the Cost function - Software libraries - GPU support for computing vectorial and tensorial data - New Layers Types - Convolution Layers 2D/3D - Dropout layer - Fast activation functions - Faster learning methods - Derived from Stochastic Gradient Descend (SGA) - Weight initializations with Auto-Encoders and RBM
- 23. Convolutional Networks Idea 1: reuse the weights across while scanning the image Idea 2: subsampling results from layers to layers
- 24. Fast Activation Functions Idea: don’t use complex exponential functions, linear functions are fast to compute, and easy to differentiate !
- 25. Dropout Layer, Batch Weight Normalization Dropout: Set randomly some of the input to zero. It improves generalization and makes the network function more robust to errors. Batch Weight Normalization: Normalize the activations of the previous layer at each batch.
- 26. Efficient Symbolic Differentiation There are good libraries which calculate the derivatives symbolically of an arbitrary number of stacked layers ● efficient symbolic differentiation ● dynamic C code generation ● transparent use of a GPU CNTK
- 27. Efficient Symbolic Differentiation (2) There are good libraries which calculate the derivatives symbolically of an arbitrary number of stacked layers ● efficient symbolic differentiation ● dynamic C code generation ● transparent use of a GPU >>> import theano >>> import theano.tensor as T >>> from theano import pp >>> x = T.dscalar('x') >>> y = x ** 2 >>> gy = T.grad(y, x) >>> f = theano.function([x], gy) pp(f.maker.fgraph.outputs[0]) '(2.0 * x)'
- 28. Higher Abstraction Layer: Keras Keras: Deep Learning library for Theano and TensorFlow - Easier to stack layers - Easier to train and test - More ready-made blocks http://keras.io/
- 29. Example 1: Iris classification Categorize Iris flowers based on - Sepal length/width - Petal length/width 3 classes, Dataset is quite small (150 samples) - Iris Setosa - Iris Versicolour - Iris Virginica input : 4 dimensions output: 3 classes
- 30. Iris classification: Network model = Sequential() model.add(Dense(15, input_shape=(4,))) model.add(Activation('relu')) model.add(Dropout(0.1)) model.add(Dense(10)) model.add(Activation('relu')) model.add(Dropout(0.1)) model.add(Dense(nb_classes)) model.add(Activation('softmax')) SOFTMAX RELU RELU Setosa Versicolour Virginica Dropout 10% Dropout 10% Train- Test split 80% - 20% Test accuracy: 96%
- 31. Example 2: telecom customer marketing Semi-synthetic dataset The "churn" data set was developed to predict telecom customer churn based on information about their account. The data files state that the data are "artificial based on claims similar to real world". These data are also contained in the C50 R package. 1 classes (churn) Dataset is quite small (about 3000 samples) 17 input dimensions: State, account length, area code, phone number,international plan,voice mail plan,number vmail messages,total day minutes,total day calls,total day charge,total eve minutes,total eve calls,total eve charge,total night minutes,total night calls,total night charge,total intl minutes,total intl calls,total intl charge,number customer service calls
- 32. Churn telecom: Network model = Sequential() model.add(Dense(50, input_shape=(17,))) model.add(Activation("hard_sigmoid")) model.add(BatchNormalization()) model.add(Dropout(0.1)) model.add(Dense(10)) model.add(Activation("hard_sigmoid")) model.add(BatchNormalization()) model.add(Dropout(0.1)) model.add(Dense(1)) model.add(Activation(sigmoid)) SOFTMAX RELU RELU Churn No-Churn Dropout 10% Dropout 10% Train- Test split 80% - 20% Test accuracy: 82%
- 33. Models: Small Data, Big Data - Not all domains have large amount of data - Think of Clinical Tests, or Lengthy/Costly Experimentations - Small specialized data set and Neural Networks - Good for complex non-linear separation of classes Interesting Read: https://medium.com/@ShaliniAnanda1/an-open-letter-to-yann-lecun-22b244fc0a5a#.ngpal1ojx
- 34. Conclusions - Neural Networks can be used for small data as well - Other methods might be more efficient in this scenario’s - Neural Networks are an extension to GLMs and linear regression - Learn Linear Regression, GLM, SVM as well - Random Forests and Boosted Trees are an alternative - More data = Bigger and better Neural Networks - We have some tools to jump start analysis
- 35. Connect on Twitter and Linkedin ! Thanks!

No public clipboards found for this slide

Login to see the comments