This presentation discusses the following topics:
Basic features of R
Exploring R GUI
Data Frames & Lists
Handling Data in R Workspace
Reading Data Sets & Exporting Data from R
Manipulating & Processing Data in R
3. (CentreforKnowledgeTransfer)
institute
Basic features of R
1. Open-source
2. Strong Graphical Capabilities
3. Highly Active Community
4. A Wide Selection of Packages
5. Comprehensive Environment
6. Can Perform Complex Statistical
Calculations
7. Distributed Computing
8. Running Code Without a Compiler
9. Interfacing with Databases
10. Data Variety
11. Machine Learning
12. Data Wrangling
13. Cross-platform Support
14. Compatible with Other Programming
Languages
15.Data Handling and Storage
16.Vector Arithmetic
17.Compatibility with Other Data Processing
Technologies
18.Generates Report in any Desired Format
4. (CentreforKnowledgeTransfer)
institute
Some Unique Features of R
Programming
Due to a large number of packages available, there are many other handy features as well:
Since R can perform operations directly on vectors, it doesn’t require too much looping.
R can pull data from APIs, servers, SPSS files, and many other formats.
R is useful for web scraping.
It can perform multiple complex mathematical operations with a single command.
Using R Markdown, it can create attractive reports that combine plain text with code
and visualizations of the results.
Due to a large number of researchers and statisticians using it, new ideas and
technologies often appear in the R community first.
5. (CentreforKnowledgeTransfer)
institute
Exploring R GUI
R is a command line driven
program. The user enters commands
at the prompt ( > by default ) and
each command is executed one at a
time. Perhaps the most stable, full-
blown GUI is R Commander, which
can also run under Windows, Linux,
and MacOS
7. (CentreforKnowledgeTransfer)
institute
Data Frames & Lists
DataFrames are generic data objects of
R which are used to store the tabular
data.
They are two-dimensional,
heterogeneous data structures.
A list in R, however, comprises of
elements, vectors, data frames, variables,
or lists that may belong to different data
8. (CentreforKnowledgeTransfer)
institute
Handling Data in R Workspace
The workspace is your current R working
environment and includes any user-defined
objects (vectors, matrices, data frames, lists,
functions).
At the end of an R session, the user can save an
image of the current workspace that is
automatically reloaded the next time R is
started.
9. (CentreforKnowledgeTransfer)
institute
Functions for Reading Data into R
There are a few very useful functions for reading data into R.
read.table() and read.csv() are two popular functions used for reading tabular
data into R.
readLines() is used for reading lines from a text file.
source() is a very useful function for reading in R code files from a another R
program.
dget() function is also used for reading in R code files.
load() function is used for reading in saved workspaces
unserialize() function is used for reading single R objects in binary format.
10. (CentreforKnowledgeTransfer)
institute
Functions for Writing Data to Files
There are similar functions for writing data to files
write.table() is used for writing tabular data to text files (i.e. CSV).
writeLines() function is useful for writing character data line-by-line to a file or
connection.
dump() is a function for dumping a textual representation of multiple R objects.
dput() function is used for outputting a textual representation of an R object.
save() is useful for saving an arbitrary number of R objects in binary format to a file.
serialize() is used for converting an R object into a binary format for outputting to a
connection (or
file).
11. (CentreforKnowledgeTransfer)
institute
Reading Data Files with read.table()
The read.table() function is one of the most commonly used functions for reading data in R.
TO get the help file for read.table() just type ?read.table in R console.
The read.table() function has a few important arguments:
file, the name of a file, or a connection
header, logical indicating if the file has a header line
sep, a string indicating how the columns are separated
colClasses, a character vector indicating the class of each column in the dataset
nrows, the number of rows in the dataset. By default read.table() reads an entire file.
comment.char, a character string indicating the comment character. This defalts to “#”. If there are no commented
lines in your file, it’s worth setting this to be the empty string “”.
skip, the number of lines to skip from the beginning
stringsAsFactors, should character variables be coded as factors? This defaults to TRUE because back in the old
days, if you had data that were stored as strings, it was because those strings represented levels of a categorical
variable.
12. (CentreforKnowledgeTransfer)
institute
Manipulating and processing data in R
Data structures provide the way to represent data in data analytics.
We can manipulate data in R for analysis and visualization.
One of the most important aspects of computing with data in R is its ability to manipulate data and enable
its subsequent analysis and visualization. Let us see few basic data structures in R:
a. Vectors in R : These are ordered container of primitive elements and are used for 1-dimensional data.
b. Types – integer, numeric, logical, character, complex
c. Matrices in R: These are Rectangular collections of elements and are useful when all data is of a single
class that is numeric or characters. Dimensions – two, three, etc.
d. Lists in R: These are ordered container for arbitrary elements and are used for higher dimension data,
like customer data information of an organization. When data cannot be represented as an array or a
data frame, list is the best choice. This is so because lists can contain all kinds of other objects, including
other lists or data frames, and in that sense, they are very flexible.
e. Data frames: These are two-dimensional containers for records and variables and are used for
representing data from spreadsheets etc. It is similar to a single table in the database.