Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Are we reaching a data science singularity ?

This year I have been so kindly invited for a keynote talk at Big Data Spain which will be held in Madrid 17-18 of November. This time, rather than diving into a specific technology or tool, I am reflecting on the state of data analytics, and how cloud, technology and data science are brewing a possible recipe for analytics at scale, towards ai, prescriptive analytics and cognitive processing.

Although it might be a bit ahead of the current state of development in analytical solutions and databases, I am starting to see clear early signals that something amazing is hatching in the realm of data processing, and I would like to share some of these facts/elements with the audience of Big Data Spain. I would like to stay grounded to the current technology developments but also let the imagination soar by showing that today in data analytics the sum is much more than the union of its parts.
Are we reaching a Data Science Singularity? - How Cognitive Computing is emerging from Machine Learning Algorithms, Big Data Tools, and Cloud Services

Prescriptive analytics is the ultimate analytical step which goes beyond predictions into the realm of goal-oriented recommendations. As such, we could consider prescriptive analytics as a particular sort of cognitive computing. In 2016, how far are we from cognitive computing actually? In this talk, I will describe the latest advances in machine learning algorithms, big data tools and cloud engineering practices.

These are the ingredients which are blended together to brew modern AI, prescriptive analytics and cognitive processing solutions. As data, and algorithms are made available into large cloud computing clusters, higher-level, cognitive-like services will solve real-world, complex and often ambiguous cases.

Finally, I will touch on the topic of meta-data science and how automated data science could (re)define the role of the data scientist in the coming years.


Why is AI so difficult?

Videos on AI

Yann LeCunn:
Andrej Karpathy:
Nando de Freitas:
Richard Socher:

for more info see:

  • Login to see the comments

Are we reaching a data science singularity ?

  1. 1. 1 Natalino Busa - @natbusa Natalino Busa Head of Data Science Teradata Are we reaching a data science singularity?
  2. 2. 2 Natalino Busa - @natbusa
  3. 3. 3 Natalino Busa - @natbusa
  4. 4. 4 Natalino Busa - @natbusa
  5. 5. 5 Natalino Busa - @natbusa
  6. 6. 6 Natalino Busa - @natbusa What about (data) science? - technologies and tools are driving innovation in data analytics -
  7. 7. 7 Natalino Busa - @natbusa Man - Machine as integrated cognitive systems
  8. 8. 8 Natalino Busa - @natbusa Learning: The Scientific Method Ørsted's "First Introduction to General Physics" (1811) observation hypothesis deduction synthesis Hans Christian Ørsted experiment Icons made by Gregor Cresnar from is licensed by CC 3.0 BY
  9. 9. 9 Natalino Busa - @natbusa Innovation in Data Analytics Cloud Community AI & ML
  10. 10. 10 Natalino Busa - @natbusa Cloud
  11. 11. 11 Natalino Busa - @natbusa “we live in an age of open source datacenters, so we can stack all these things together and we have open source from the ground to ceiling.” Sam Ramji, CEO of Cloud Foundry
  12. 12. 12 Natalino Busa - @natbusa Analytics in the cloud Bare Metal: Physical Machines IAAS: Virtual Resources CAAS: Containers, dPAAS: Datastores, Data Engines iPAAS: Tools Integration, Flows & Processes DAAAS: Data Analytics as a Service
  13. 13. 13 Natalino Busa - @natbusa DAAAS: AI and ML API’s Cloud Computing for Deep Neural Networks > Models, Compute (Train, Score), and Data AI and ML models for: ● Speech (audio) ● Language (text) ● Vision (images/video) ● Data (classification, regression, clustering, anomaly detection)
  14. 14. 14 Natalino Busa - @natbusa Ephemeral Computing Clusters on a Cloud data create load compute store timeline destroy
  15. 15. 15 Natalino Busa - @natbusa dPaaS: Analytical clusters Ephemeral Short-Lived Data Exploration Isolated, Personal Simple Access Management Permanent Long Lived Production / Operations Co-Ordinated Complex Access Management vs
  16. 16. 16 Natalino Busa - @natbusa GPU’s and Distributed Computing GPU support is coming in Kubernetes, Mesos, Spark out up CPU R,Python Spark TensorFrames
  17. 17. 17 Natalino Busa - @natbusa Community
  18. 18. 18 Natalino Busa - @natbusa Community Develop - Use - Share
  19. 19. 19 Natalino Busa - @natbusa Sharing is caring … speed + Jupyter notebooks, share ideas, code, and data share innovation and scientific results
  20. 20. 20 Natalino Busa - @natbusa Artificial Intelligence Machine Learning
  21. 21. 21 Natalino Busa - @natbusa Google: open-sources NLP parser scoring 95% in grammar accuracy
  22. 22. 22 Natalino Busa - @natbusa Deep Learning in Language Parsing
  23. 23. 23 Natalino Busa - @natbusa Semantic Search: TDA + NNs Word2Vec, Par2Vec, Doc2Vec
  24. 24. 24 Natalino Busa - @natbusa Lip reading LipNet achieves 93.4% accuracy, on GRID corpus.
  25. 25. 25 Natalino Busa - @natbusa Ask me Anything Dynamic Memory Networks for Natural Language Processing Caiming Xiong, Stephen Merity, Richard Socher
  26. 26. 26 Natalino Busa - @natbusa Ask me Anything Dynamic Memory Networks for Natural Language Processing Local context Wider context NLP, Attention Masks Semantic Embeddings from Text, Images
  27. 27. 27 Natalino Busa - @natbusa Network Traffic Patterns Classification
  28. 28. 28 Natalino Busa - @natbusa Network Intrusion Detection It contains 130 million flow records involving 12,027 distinct computers over 36 days (not the full 58 days claimed for the entire data release). Each record consists of: time (to nearest second), duration, source and destination computer ids, source and destination ports, protocol, number of packets and number of bytes Techniques: TDA, Dimensionality Reduction
  29. 29. 29 Natalino Busa - @natbusa Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur, Kaggle Grandmaster - data labels raw data: tables, files Useful dataData munging Feature Engineering Tabular Data ready for ML
  30. 30. 30 Natalino Busa - @natbusa AutoML challenge - based on scikit-learn - 15 classifiers, - 14 feature preprocessing methods - 4 data preprocessing methods - 110 hyperparameters - Supervised classification challenge: 100 different datasets Natalino Busa - @natbusa
  31. 31. 31 Natalino Busa - @natbusa Artificial + Human Intelligence
  32. 32. 32 Natalino Busa - @natbusa Human cognitive biases : Too much information Not enough meaning What should we remember? Need to act fast
  33. 33. 33 Natalino Busa - @natbusa Man vs Machine cognitive limits Model generation Explanation Unsupervised Planning Too much information Not enough meaning Need to act quickly Memory limits
  34. 34. 34 Natalino Busa - @natbusa Theorems often tell us complex truths about the simple things, but only rarely tell us simple truths about the complex ones Marvin Minsky K-Linesː A Theory of Memory (1980)
  35. 35. 35 Natalino Busa - @natbusa Data Science: wear the AI/ML Lenses We are entering a new era of intelligent machines Boost our understanding of data Focus on higher level analyses
  36. 36. 36 Natalino Busa - @natbusa Intelligent Data Systems: Long live the “database” Wikipedia: A database is an organized collection of data. DATA New-SQL ML AI SQL Python - Scala - R NLP UX Speech COG
  37. 37. 37 Natalino Busa - @natbusa The Database. is never going to be the same.
  38. 38. 38 Natalino Busa - @natbusa Thank you. @natbusa
  39. 39. 39 Natalino Busa - @natbusa Credits Cover: courtesy of Big Data Spain - Pictures:,_designed_by_John_Manoogian_III_(jm3).jpg Visualizations: Icons: Icons made by Gregor Cresnar from is licensed by CC 3.0 BY
  40. 40. 40 Natalino Busa - @natbusa bonus slides
  41. 41. 41 Natalino Busa - @natbusa AI & ML: curated list of links Applications Why is AI so difficult? You Tube, great videos on AI Yann LeCunn: Andrej Karpathy: Nando de Freitas: Richard Socher:
  42. 42. 42 Natalino Busa - @natbusa AI & ML: curated list of links NLP Video, Images, Hybrid Deep Learning Networks Topological Data Analysys (TDA), Dim Reduction: Meta Learning:
  43. 43. 43 Natalino Busa - @natbusa Curated list of links Cognitive sciences: Cloud: The Making of a Cloud Native Application Platform - Sam Ramji GPU and distributed Computing: Collaborative coding and research: