SlideShare a Scribd company logo
1 of 28
Introduction to Data Science and
R language
13 August 2013
•Anju Gahlawat
Index
– Introduction to Data Science
– Hidden skills of Data Scientist
– Failure of Current Statistical
tools like SAS and Excel
– Introduction to R language
– R Basic Commands
– Running SQL server with R
– Visualizing Data with R
– Introduction to Shiny
– Future of R
1
Data Science
2
Data Science is all about telling a STORY from the data.
Data Science deals with….
3
5 Hidden Skills for Data Scientists
– Be Clear: Is Your Problem Really A
Big Data Problem?
– Communicating About Your Data
– Invest in Interactive Analytics, not
Reporting
– Understand the Role and Quality
of Human Evaluations of Data
– Spend Time on the Plumbing
4
Difference between Data Science
and Big Data
Big data is more concerned with the engineering components of data and in
answering the following questions:
– How do you store it,
– How do you manipulate it,
– How do you do parallelized computations on it,
– How do you access it,
– How do you mine it
But science is more than that.
– It deals with looking at the algorithmic and mathematical aspects of
extracting knowledge from data.
– Data science applies advanced analytical tools and algorithms to generate
predictive insights and new product innovations that are a direct result of
the data
5
Shortcomings of current
Visualization and statistical tools
– The most commonly-used statistical software tools either fail completely or are
too slow to be useful on huge data sets
– Less scalability
– Less Flexibility to new and fast scalable algorithms
– Problems printing charts in Excel: Missing legend data or sometimes x or y axis
missing
– If there’s a value in the upper-left corner of the data set (A1 in this case), Excel fails
to chart the data correctly. e.g.
6
Introduction to R
– R is a computer language and run-time environment which is used for
data manipulation, statistics, and graphics
– The base part of R comes with a wide range of standard statistical and
graphical analyses and user-developed extension packages built in.
– R is an expression-based language.
– It is possible to interface procedures written in C, C+, or FORTRAN
languages for efficiency, and to write additional primitives.
7
R, And the Rise of the Best Software Money Can’t
Buy
=
R users rely on functions that have been developed for them
by statistical researchers, but they can also create their own or
modify the existing ones as per their needs.
8
Why R?
9
Contd…
10
Getting started R
11
▪ Latest Version 3.0.1 for windows
▪ Link to download R setup http://cran.r-project.org/bin/windows/base/
▪ 51.5MB set up file
▪ GUI for R – R Studio. Latest Version 0.97.551
▪ Link to download R studio
http://www.rstudio.com/ide/download/desktop
▪ 32.5MB exe file.
R Studio
12
Sample R code
– Read a data set into R (from a local file or
network URL).
• bse <- read.csv("bse_table.csv",
header = TRUE, sep=",")
– Examine the basic structure of data
13
Running SQL server with R
Install package – RODBC
Create ODBC connection
channel <- odbcConnect([ODBC Name]);
Tab1 <- sqlQuery(channel, "Select * from TabName")
14
R code - Plotting graph
• > bse$Date <- as.Date(bse$Date, format="%Y-%m-%d")
• > plot(x<- bse$Date, y<-bse$Open,type = "l" , main = "BSE Data",col = blue“,
xlab="Periods", ylab="Index",lwd=2)
15
Stock Analysis - Sample graph
16
Packages in R….
17
Some graphs made using R:
18
Introduction to Shiny – R web UI
•R Package Shiny from RStudio supplies
–interactive web application / dynamic HTML-
Pages with plain R
–GUI for own needs
–Website as server
19
What makes Shiny so special?
– Very Simple: Ready to Use Components
– Shiny is very slick, achieving interactive and pleasant looking web UI’s.
– Event-driven (reactive programming): input <-> output (without requiring a
reload of the browser)
– Shiny user interfaces can be built entirely using R, or can be written directly
in HTML, CSS, and JavaScript for more flexibility.
– A highly customizable slider widget with built-in support for animation.
– Pre-built output widgets for displaying plots, tables, and printed output of R
objects.
– Fast bidirectional communication between the web browser and R using the
websocket package.
20
Stock Analysis - Using Shiny
21
Current Market trend
of
Statistical languages
22
Stats related to R - Google hits
23
R is the most powerful and flexible statistical programming language in the
world………
24
Job trends in Statistical Market
25
Software 2012 2013 Difference Ratio
SAS 13234 12272 -961 0.93
SPSS 3299 3289 -10 1
R 1196 1693 497 1.42
Minitab 1769 1615 -154 0.91
Stata 842 898 56 1.07
JMP 644 619 -25 0.96
Statistica 61 71 10 1.17
Systat 14 15 1 1.07
BMDP 6 10 3 1.53
-1200
-1000
-800
-600
-400
-200
0
200
400
600
SAS SPSS R Minitab Stata JMP Statistica Systat BMDP
Trend of Jobs on Indeed.com in March 2012 and 2013
Final Words of Warning
• “Using R is a bit akin to smoking.
The beginning is difficult, one may
get headaches and even gag the
first few times. But in the long
run,it becomes pleasurable and
even addictive. Yet, deep
down, for those willing to be
honest, there is something not
fully healthy in it.” --Francois
Pinard
26
R
Visualization is only one slice of R
cake……..
27
R deals with
• Machine Learning
• Social Media Analytics
• Sentiment Analysis
• Predictive Modeling
• Network Analysis
• Visualization
• Time series Analysis
• Simulation
• And lot more
To be continued……….

More Related Content

Viewers also liked

Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science IntroductionGang Tao
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
An Obligatory Introduction to Data Science
An Obligatory Introduction to Data ScienceAn Obligatory Introduction to Data Science
An Obligatory Introduction to Data ScienceWesley Eldridge
 
Machine learning workshop @DYP Pune
Machine learning workshop @DYP PuneMachine learning workshop @DYP Pune
Machine learning workshop @DYP PuneGanesh Raskar
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataPaco Nathan
 
Agile reluctancy in india anju gahlawat
Agile reluctancy in india anju gahlawatAgile reluctancy in india anju gahlawat
Agile reluctancy in india anju gahlawatAnju Gahlawat
 
Data Science Presentation
Data Science PresentationData Science Presentation
Data Science PresentationMarta Turetska
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Financial Network Analysis @ Central Bank of Bolivia
Financial Network Analysis @ Central Bank of BoliviaFinancial Network Analysis @ Central Bank of Bolivia
Financial Network Analysis @ Central Bank of BoliviaKimmo Soramaki
 
Data science on big data. Pragmatic approach
Data science on big data. Pragmatic approachData science on big data. Pragmatic approach
Data science on big data. Pragmatic approachPavel Mezentsev
 
Fintech - Presentations from the DataScience Meetup about Banking - Brussels ...
Fintech - Presentations from the DataScience Meetup about Banking - Brussels ...Fintech - Presentations from the DataScience Meetup about Banking - Brussels ...
Fintech - Presentations from the DataScience Meetup about Banking - Brussels ...DigitYser
 
Big Data Science Team Building
Big Data Science Team BuildingBig Data Science Team Building
Big Data Science Team BuildingUIResearchPark
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSean Byrnes
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysisAnimesh Kumar
 
Analysing Banking Data to Provide Relevant Offers to Customers
Analysing Banking Data to Provide Relevant Offers to CustomersAnalysing Banking Data to Provide Relevant Offers to Customers
Analysing Banking Data to Provide Relevant Offers to CustomersMarc Torrens
 

Viewers also liked (20)

Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science Introduction
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
An Obligatory Introduction to Data Science
An Obligatory Introduction to Data ScienceAn Obligatory Introduction to Data Science
An Obligatory Introduction to Data Science
 
Machine learning workshop @DYP Pune
Machine learning workshop @DYP PuneMachine learning workshop @DYP Pune
Machine learning workshop @DYP Pune
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
Agile reluctancy in india anju gahlawat
Agile reluctancy in india anju gahlawatAgile reluctancy in india anju gahlawat
Agile reluctancy in india anju gahlawat
 
Statistics with R
Statistics with R Statistics with R
Statistics with R
 
Data Science Presentation
Data Science PresentationData Science Presentation
Data Science Presentation
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Financial Network Analysis @ Central Bank of Bolivia
Financial Network Analysis @ Central Bank of BoliviaFinancial Network Analysis @ Central Bank of Bolivia
Financial Network Analysis @ Central Bank of Bolivia
 
Data science on big data. Pragmatic approach
Data science on big data. Pragmatic approachData science on big data. Pragmatic approach
Data science on big data. Pragmatic approach
 
Fintech - Presentations from the DataScience Meetup about Banking - Brussels ...
Fintech - Presentations from the DataScience Meetup about Banking - Brussels ...Fintech - Presentations from the DataScience Meetup about Banking - Brussels ...
Fintech - Presentations from the DataScience Meetup about Banking - Brussels ...
 
Curso Modelamiento De Datos
Curso Modelamiento De DatosCurso Modelamiento De Datos
Curso Modelamiento De Datos
 
Big Data Science Team Building
Big Data Science Team BuildingBig Data Science Team Building
Big Data Science Team Building
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysis
 
Analysing Banking Data to Provide Relevant Offers to Customers
Analysing Banking Data to Provide Relevant Offers to CustomersAnalysing Banking Data to Provide Relevant Offers to Customers
Analysing Banking Data to Provide Relevant Offers to Customers
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Introduction to data science and R language

  • 1. Introduction to Data Science and R language 13 August 2013 •Anju Gahlawat
  • 2. Index – Introduction to Data Science – Hidden skills of Data Scientist – Failure of Current Statistical tools like SAS and Excel – Introduction to R language – R Basic Commands – Running SQL server with R – Visualizing Data with R – Introduction to Shiny – Future of R 1
  • 3. Data Science 2 Data Science is all about telling a STORY from the data.
  • 4. Data Science deals with…. 3
  • 5. 5 Hidden Skills for Data Scientists – Be Clear: Is Your Problem Really A Big Data Problem? – Communicating About Your Data – Invest in Interactive Analytics, not Reporting – Understand the Role and Quality of Human Evaluations of Data – Spend Time on the Plumbing 4
  • 6. Difference between Data Science and Big Data Big data is more concerned with the engineering components of data and in answering the following questions: – How do you store it, – How do you manipulate it, – How do you do parallelized computations on it, – How do you access it, – How do you mine it But science is more than that. – It deals with looking at the algorithmic and mathematical aspects of extracting knowledge from data. – Data science applies advanced analytical tools and algorithms to generate predictive insights and new product innovations that are a direct result of the data 5
  • 7. Shortcomings of current Visualization and statistical tools – The most commonly-used statistical software tools either fail completely or are too slow to be useful on huge data sets – Less scalability – Less Flexibility to new and fast scalable algorithms – Problems printing charts in Excel: Missing legend data or sometimes x or y axis missing – If there’s a value in the upper-left corner of the data set (A1 in this case), Excel fails to chart the data correctly. e.g. 6
  • 8. Introduction to R – R is a computer language and run-time environment which is used for data manipulation, statistics, and graphics – The base part of R comes with a wide range of standard statistical and graphical analyses and user-developed extension packages built in. – R is an expression-based language. – It is possible to interface procedures written in C, C+, or FORTRAN languages for efficiency, and to write additional primitives. 7 R, And the Rise of the Best Software Money Can’t Buy
  • 9. = R users rely on functions that have been developed for them by statistical researchers, but they can also create their own or modify the existing ones as per their needs. 8
  • 12. Getting started R 11 ▪ Latest Version 3.0.1 for windows ▪ Link to download R setup http://cran.r-project.org/bin/windows/base/ ▪ 51.5MB set up file ▪ GUI for R – R Studio. Latest Version 0.97.551 ▪ Link to download R studio http://www.rstudio.com/ide/download/desktop ▪ 32.5MB exe file.
  • 14. Sample R code – Read a data set into R (from a local file or network URL). • bse <- read.csv("bse_table.csv", header = TRUE, sep=",") – Examine the basic structure of data 13
  • 15. Running SQL server with R Install package – RODBC Create ODBC connection channel <- odbcConnect([ODBC Name]); Tab1 <- sqlQuery(channel, "Select * from TabName") 14
  • 16. R code - Plotting graph • > bse$Date <- as.Date(bse$Date, format="%Y-%m-%d") • > plot(x<- bse$Date, y<-bse$Open,type = "l" , main = "BSE Data",col = blue“, xlab="Periods", ylab="Index",lwd=2) 15
  • 17. Stock Analysis - Sample graph 16
  • 19. Some graphs made using R: 18
  • 20. Introduction to Shiny – R web UI •R Package Shiny from RStudio supplies –interactive web application / dynamic HTML- Pages with plain R –GUI for own needs –Website as server 19
  • 21. What makes Shiny so special? – Very Simple: Ready to Use Components – Shiny is very slick, achieving interactive and pleasant looking web UI’s. – Event-driven (reactive programming): input <-> output (without requiring a reload of the browser) – Shiny user interfaces can be built entirely using R, or can be written directly in HTML, CSS, and JavaScript for more flexibility. – A highly customizable slider widget with built-in support for animation. – Pre-built output widgets for displaying plots, tables, and printed output of R objects. – Fast bidirectional communication between the web browser and R using the websocket package. 20
  • 22. Stock Analysis - Using Shiny 21
  • 24. Stats related to R - Google hits 23
  • 25. R is the most powerful and flexible statistical programming language in the world……… 24
  • 26. Job trends in Statistical Market 25 Software 2012 2013 Difference Ratio SAS 13234 12272 -961 0.93 SPSS 3299 3289 -10 1 R 1196 1693 497 1.42 Minitab 1769 1615 -154 0.91 Stata 842 898 56 1.07 JMP 644 619 -25 0.96 Statistica 61 71 10 1.17 Systat 14 15 1 1.07 BMDP 6 10 3 1.53 -1200 -1000 -800 -600 -400 -200 0 200 400 600 SAS SPSS R Minitab Stata JMP Statistica Systat BMDP Trend of Jobs on Indeed.com in March 2012 and 2013
  • 27. Final Words of Warning • “Using R is a bit akin to smoking. The beginning is difficult, one may get headaches and even gag the first few times. But in the long run,it becomes pleasurable and even addictive. Yet, deep down, for those willing to be honest, there is something not fully healthy in it.” --Francois Pinard 26 R
  • 28. Visualization is only one slice of R cake…….. 27 R deals with • Machine Learning • Social Media Analytics • Sentiment Analysis • Predictive Modeling • Network Analysis • Visualization • Time series Analysis • Simulation • And lot more To be continued……….