SlideShare a Scribd company logo
1 of 4
Download to read offline
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 7 749 – 752
_______________________________________________________________________________________________
749
IJRITCC | July 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
Elementary Concepts of Big Data and Hadoop
Hasmukh B. Domadiya
Assistant Professor,
National Computer College,
Jamnagar.
Dr. Girish C. Bhimani
Head of Department,
Department of Statistics,
Saurashtra University, Rajkot
Abstract: This paper is an effort to present the basic importance of Big Data and also its importance in an organization from its performance
point of view. The term Big data, refers the data sets, whose volume, complexity and also rate of growth make them more difficult to capture,
manage, process and also analyzed. For such type of data –intensive applications, the Apache Hadoop Framework has newly concerned a lot of
attention. Hadoop is the core platform for structuring Big data, and solves the problem of making it helpful for analytics idea. Hadoop is an open
source software project that enables the distributed processing of enormous data and framework for the analysis and transformation of very large
data sets using the MapReduce paradigm. This paper deals with the architecture of Hadoop with its various components.
Keywords: Big Data, Hadoop, MapReduce, HBase, Avro, Pig, Hive, YARN, Sqoop, ZooKeeper, Mahout
__________________________________________________*****_________________________________________________
I. Introduction:
Big data is a collection of enormous dataset that not
only processed by any traditional computer techniques. Big
data is not only a term , but it has various tools, techniques
and also framework. The term Big data, refers the data sets,
whose volume, complexity and also rate of growth make
them more difficult to capture, manage, process and also
analyzed. Generally, in Big data the data in it will be of
three types: Structured Data –Relational data, Semi-
structured data-XML data and also Unstructured data-
Word, PDF, Text, Media Logs (1).
II. Big Data:
Current age is in the data age. According to Tom,
It’s not easy to measure the total volume of data stored
electronically, but an IDC estimate put the size of the digital
universe at 4.4 zettabytes in 2013 and is forecasting a
tenfold growth by 2020 to 44 zettabytes. A zettabyte is
10^21 bytes, or equivalent one thousand exabytes, one
million petabytes, or one billion terabytes. That is more than
one disk drive for every person in the world (2).
The data in big data is a big no matter how we
serving it. Current era is totally a society of sharers, but in
just sixty seconds of different sharing and searching
activities on internet. In the below figure, we can see that
millions of items are accessed, searched, transferred and
also shared. It’s a shocking amount of data to think about.
(3)
Challenges of Big Data:
1. Volume:
Volume refers to amount or sizes of data. The
storage data size may be represented in Terabytes
or zeta bytes.
2. Variety:
Variety refers different types of data and also
different sources of data. Data may be in
structured, unstructured and also sometimes it is in
semi-structured.
3. Velocity:
Velocity refers the speed of data processing. The
data comes at very high speed.
4. Veracity:
Veracity refers to noise, biases and abnormality
when we dealing with high volume, velocity and
variety of data, the all of data are not going to
100% correct, there will be dirty data (4).
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 7 749 – 752
_______________________________________________________________________________________________
750
IJRITCC | July 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
Hadoop: The solution of Big data:
Hadoop is open-source software for reliable,
scalable and also distributed computing. The Apache
Hadoop library is a framework that allows distributed
processing of enormous data sets across clusters of
computers using a simple programming model.
Hadoop was derived from Google’s Map Reduce
and Google File System (GFS) (5). Currently, Hadoop
consists of Hadoop kernel, MapReduce, HDFS and different
various components like Apache Hive, Base etc.
Fig: Architecture of Hadoop (6)
A. HDFS (Hadoop Distributed File System)
A HDFS is designed to hold a large amount of
information like terabytes or petabytes and also provide
access to this data to many clients distributed across a
network. HDFS has master/slave architecture. Trough
HDFS, data will be written once and then read several times.
Fig: Architecture of HDFS (7)
HDFS is a block-structured file system. Individual
files are divided into the chunks of a fixed size. These all
small chunks are stored across a group of one or more
machines with data storage capacity. HDFS stores three
copies of each file by copying each piece to three different
servers. Size of each block is 64 MB. HDFS architecture is
divided into three different nodes, which are Name node,
Data Node and also HDFS client node (8).
Name Node:
Name node is centrally placed node, which
contains whole information about Hadoop file system. The
main work of this node is it records all the metadata and
attributes and also specific location of files and data blocks
in the data nodes. This node is also called as a master node
because in this node stores all the details about the system
and also provides newly added, modified and also removed
details from any data nodes.
Data Node:
Data node manages the data storage (9) of their
system. Data nodes perform read and write operations on the
file systems, as per client request. This node perform
operations like chunk/block create, delete and also
reproduction according to the instructions of the name node.
HDFS Clients:
An HDFS client is also known as Edge node. It is
basically a link node between name node and data node.
B. MapReduce Architecture
Solanke poona et al. MapReduce is a programming
model and related execution for processing and also
generating enormous data with a parallel processing. This
architecture poised of a two different functions. Map
function used for filtering and sorting data and reduce
function that performs a summary operation (10).
Fig: MapReduce Algorithm (11)
Map stage:
This stage is used to process the input data.
Generally the inputted data is in the form of file or directory
and is also stored in the Hadoop Distributed File System
(HDFS). The inputted file is passing to the (11) map
function through line by line. The map processes that data
and also creates multiple small chunks of data.
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 7 749 – 752
_______________________________________________________________________________________________
751
IJRITCC | July 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
Reduce stage:
This stage is combination of the shuffle stage and
(11) also the Reduce stage. The main job of this stage is to
process the data that comes from the map stage. After
processing, it generates a new set of output, which will be
stored in HDFS.
III. Literature Review:
S. Vikram Phaneendra et al., mentioned that in
previous days the data was very few and anyone can handle
it very easily through the RDBMS but currently it is too
difficult to handle enormous data through that tools. This
major problem to handle huge data is solved through the Big
Data. In this paper, they explain the different dimensions of
Big Data like Volume, Velocity, Variety, Value and also
complexity (Veracity). In this paper, they also illustrated
that big data can be found at anywhere like in Finance,
Retail Industry, Healthcare, Insurance etc (12).
According to Albert Bifet et al., discussed that streaming
examination in current era is now becoming the fastest and
most proficient way to attain meaningful and useful
knowledge, allowing any industry to respond quickly when
any type of issues can appear or detect to improve the
performance. In this paper, they also discussed different
tools used for mining big data are Apache Hadoop, mahout
etc (13).
Bernice Purcell, discusses that in Big Data includes
structured data, semi-structured and also unstructured data.
In this paper, they also discussed that Hadoop architecture is
used to process different unstructured and semi-structured
using MapReduce to locate all related data then select only
the data directly answering the query (14).
Manyika et al. (2011), describes the major contributions of
big data can make to businesses: Transparency creation,
performance improvement, population segmentation,
decision making support and also innovative business
models, products and services (15).
Y. Lee et al., describes that Most of the MapReduce
applications on Hadoop are developed to analyze large texts,
web contents or log files. In this paper, the author specially
highlighted on that first packet processing method of
Hadoop that always analyzes any packet trace files in
sequential manner through the reading all packets across
from multiple HDFS chunks (16).
IV. Hadoop Components:
1). HBase:
HBase is a non-relational column oriented
distributed database designed to run always on top of the
Hadoop Distributed File System (HDFS). HBase is an open-
source and distributed based on the Google’s BigTable. This
is an example of NoSQL data store and it’s written in Java
(17).
2). Avro:
Avro is data serialization format which brings data
interoperability among multiple components of Apache
Hadoop.
3). Pig:
Pig was initially developed at Yahoo Research
around 2006 but it moved into the Apache software
foundation list in 2007(18). It is a platform for analyzing
enormous data sets which is consists of high level query
language like SQL for expressing data analysis and also
processing. Pig can easily process tera bytes of data with
very little lines of code and through this platform writing
and maintaining data processing jobs very easy (19).
4). Hive:
Hive is Data Warehousing application that provides
the SQL interface and also relational model. It is also
structures data into the well-known database concepts like
tables, columns, rows and also partitions. It supports
primitive as well as complex data types like maps, lists and
structs (18).
5). YARN:
Yet Another Resource Negotiator (YARN), is
included in the latest Hadoop release and its goal is to allow
the system to serve as a general data processing framework.
It supports programming models other than MapReduce,
while also improving scalability and resource utilization
(20).
6). Sqoop:
Sqoop is tool which can be used to transfer the data
from relational database environments like Oracle, mysql
into Hadoop environment. This is a command line interface
platform that is used to transferring data between relational
databases and Hadoop (4).
7). ZooKeeper:
Zookeeper is a distributed coordination and also
governing service for Hadoop cluster. It is a centralized
service that provides distributed synchronization and also
maintaining the configuration information(4).
8). Mahout:
Mahout is a simple and extensible programming
environment and framework for building scalable
algorithms very quickly(21).
V. Conclusion:
In this paper, the author provided an overview of
the Big Data and Hadoop. As an author, this paper
considering the five main characteristics of Big Data and
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 7 749 – 752
_______________________________________________________________________________________________
752
IJRITCC | July 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
discussing different components of Hadoop. I also believe
that the information proposed here can help to discuss
technologies already existing and also newly emerging in
the opportunity, classify them and efficient mechanism and
also guide them.
References
[1] http://www.tutotialspoint.com/hadoop/hadoop_big_data_ov
erview.htm
[2] Tom White ,“Hadoop: the Definitive Guide” 4th
edition,
O’Reilly publication
[3] https://ricoheuropebusinessdriver.com/category/big-data
[4] Varsha B. Bobade (Jan-2016), “Survey paper on Big Data
and Hadoop”, International Research Journal of
Engineering and Technology (IRJET), Vol-3, Issue-1, pp.
861- 863, e-ISSN:- 2395-0056, p-ISSN:-2395-0072.
[5] Apache Hadoop, http://hadoop.apache.org
[6] https://www.mssqltips.com/sqlservertip/3140/big-data-
basics--part-3--overview-of-hadoop
[7] Hadoop Architecture:
https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
[8] Hadoop Tutorial:
http://developer.yahoo.com/hadoop/tutotial/module1.html
[9] HDFSTutorial
:https://www.tutorialspoint.com/hadoop/hadoop_hdfs_over
view.htm
[10] Solanke Poonam G, B.M.Patil (2-2016), “An Efficient
Approach for Processing Big Data with Inceremental
MapReduce”, International Journal for Scientific Research
& Development, Vol-4, Issue-2, pp. 53-57, ISSN (online)
2321-0613.
[11] MapReduce Tutorial:
https://www.tutorialspoint.com/hadoop/hadoop_mapreduce
.htm
[12] S.Vikram Phaneendra & E.Madhusudhan Reddy (Apr 19-
23, 2013) “Big Data- solutions for RDBMS problems- A
survey” In 12th IEEE/IFIP Network Operations &
Management Symposium (NOMS 2010) (Osaka, Japan).
[13] Albert Bifet, “Mining Big Data in Real Time”, Informatica
37 (2013) 15-20 Dec-2012.
[14] Bernice Purcell, “The emergence of “big data” technology
and analytics”, Journal of Technology Research.
15). Manyika, J., Chui,. M., Brown, B., Bughin, J., Dobbs, R.,
Roxburgh, C., & Byers, A. H. (2011, June). “Big data: The
next frontier for innovation, competition, and productivity”.
McKinsey Global Institute. Retrieved from
http://www.mckinsey.com/Insights/MGI/Research/Technol
ogy_and_Innovation/Big_data_The_next_frontier_for_
innovation
[16] Y. Lee, W.Kang, and Y.Lee (April-2011), “A Hadoop
based packet trace processing Tool”, International
Workshop on Traffic Monitoring and Analysis (TMA
2011).
[17] Apache HBase, https://hbase.apache.org/
[18] Ashish Thusoo, Joydeep Sen Sama, Namit Jain, Zhend
Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao
Liu and Raghotham Murthy, “Hive- A Petabyte Scale Data
Warehouse Using Hadoop”, By Facebook Data
Infrastructure Team.
[19] Apache Pig, https://pig.apache.org
[20] Poonam S. Patil and Rajesh N. Phursule (Oct-2014),
“Survey paper on Big Data Processing and Hadoop
Components”, International Journal of Science and
Research (IJSR), Vol-3, Issue-10, pp. 585-590, ISSN
(Online): 2319-7064.
[21] Apache mahaout, http://mahout.apache.org/

More Related Content

What's hot

A Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyA Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyAnthonyOtuonye
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Michael Mathioudakis
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methodsijcsity
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
Managing Big data using Hadoop Map Reduce in Telecom Domain
Managing Big data using Hadoop Map Reduce in Telecom DomainManaging Big data using Hadoop Map Reduce in Telecom Domain
Managing Big data using Hadoop Map Reduce in Telecom DomainAM Publications
 
Data resource management
Data resource managementData resource management
Data resource managementNirajan Silwal
 
Thinking Outside the Table
Thinking Outside the TableThinking Outside the Table
Thinking Outside the TableOntotext
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)Toshiyuki Shimono
 
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...shivz3
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsPetr Novotný
 
Representing Non-Relational Databases with Darwinian Networks
Representing Non-Relational Databases with Darwinian NetworksRepresenting Non-Relational Databases with Darwinian Networks
Representing Non-Relational Databases with Darwinian NetworksIJERA Editor
 
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...IJCSIS Research Publications
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
 
Types of data bases
Types of data basesTypes of data bases
Types of data basesJanu Jahnavi
 
A Survey on Graph Database Management Techniques for Huge Unstructured Data
A Survey on Graph Database Management Techniques for Huge Unstructured Data A Survey on Graph Database Management Techniques for Huge Unstructured Data
A Survey on Graph Database Management Techniques for Huge Unstructured Data IJECEIAES
 

What's hot (20)

A Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven SocietyA Novel Framework for Big Data Processing in a Data-driven Society
A Novel Framework for Big Data Processing in a Data-driven Society
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Lecture 04 data resource management
Lecture 04 data resource managementLecture 04 data resource management
Lecture 04 data resource management
 
Managing Big data using Hadoop Map Reduce in Telecom Domain
Managing Big data using Hadoop Map Reduce in Telecom DomainManaging Big data using Hadoop Map Reduce in Telecom Domain
Managing Big data using Hadoop Map Reduce in Telecom Domain
 
Data resource management
Data resource managementData resource management
Data resource management
 
Thinking Outside the Table
Thinking Outside the TableThinking Outside the Table
Thinking Outside the Table
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
Influence of-structured--semi-structured--unstructured-data-on-various-data-m...
 
ITGS - Data And Databases
ITGS - Data And DatabasesITGS - Data And Databases
ITGS - Data And Databases
 
Managing data resources
Managing  data resourcesManaging  data resources
Managing data resources
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Representing Non-Relational Databases with Darwinian Networks
Representing Non-Relational Databases with Darwinian NetworksRepresenting Non-Relational Databases with Darwinian Networks
Representing Non-Relational Databases with Darwinian Networks
 
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
 
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got Semantics
 
Types of data bases
Types of data basesTypes of data bases
Types of data bases
 
A Survey on Graph Database Management Techniques for Huge Unstructured Data
A Survey on Graph Database Management Techniques for Huge Unstructured Data A Survey on Graph Database Management Techniques for Huge Unstructured Data
A Survey on Graph Database Management Techniques for Huge Unstructured Data
 

Similar to Elementary Concepts of Big Data and Hadoop

Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
 
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSBIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSaciijournal
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
 
28 15141Secure Data Sharing with Data Partitioning in Big Data33289 24 12-2017
28 15141Secure Data Sharing with Data Partitioning in Big Data33289 24 12-201728 15141Secure Data Sharing with Data Partitioning in Big Data33289 24 12-2017
28 15141Secure Data Sharing with Data Partitioning in Big Data33289 24 12-2017rahulmonikasharma
 
Big Data A Review
Big Data A ReviewBig Data A Review
Big Data A Reviewijtsrd
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overviewrahulmonikasharma
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
IRJET- A Scenario on Big Data
IRJET- A Scenario on Big DataIRJET- A Scenario on Big Data
IRJET- A Scenario on Big DataIRJET Journal
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective Viewijtsrd
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET Journal
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCEAM Publications,India
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
 

Similar to Elementary Concepts of Big Data and Hadoop (20)

Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSBIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
28 15141Secure Data Sharing with Data Partitioning in Big Data33289 24 12-2017
28 15141Secure Data Sharing with Data Partitioning in Big Data33289 24 12-201728 15141Secure Data Sharing with Data Partitioning in Big Data33289 24 12-2017
28 15141Secure Data Sharing with Data Partitioning in Big Data33289 24 12-2017
 
Big Data A Review
Big Data A ReviewBig Data A Review
Big Data A Review
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
IRJET- A Scenario on Big Data
IRJET- A Scenario on Big DataIRJET- A Scenario on Big Data
IRJET- A Scenario on Big Data
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
 
Memory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective ViewMemory Management in BigData: A Perpective View
Memory Management in BigData: A Perpective View
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 

More from rahulmonikasharma

Data Mining Concepts - A survey paper
Data Mining Concepts - A survey paperData Mining Concepts - A survey paper
Data Mining Concepts - A survey paperrahulmonikasharma
 
A Review on Real Time Integrated CCTV System Using Face Detection for Vehicle...
A Review on Real Time Integrated CCTV System Using Face Detection for Vehicle...A Review on Real Time Integrated CCTV System Using Face Detection for Vehicle...
A Review on Real Time Integrated CCTV System Using Face Detection for Vehicle...rahulmonikasharma
 
Considering Two Sides of One Review Using Stanford NLP Framework
Considering Two Sides of One Review Using Stanford NLP FrameworkConsidering Two Sides of One Review Using Stanford NLP Framework
Considering Two Sides of One Review Using Stanford NLP Frameworkrahulmonikasharma
 
A New Detection and Decoding Technique for (2×N_r ) MIMO Communication Systems
A New Detection and Decoding Technique for (2×N_r ) MIMO Communication SystemsA New Detection and Decoding Technique for (2×N_r ) MIMO Communication Systems
A New Detection and Decoding Technique for (2×N_r ) MIMO Communication Systemsrahulmonikasharma
 
Broadcasting Scenario under Different Protocols in MANET: A Survey
Broadcasting Scenario under Different Protocols in MANET: A SurveyBroadcasting Scenario under Different Protocols in MANET: A Survey
Broadcasting Scenario under Different Protocols in MANET: A Surveyrahulmonikasharma
 
Sybil Attack Analysis and Detection Techniques in MANET
Sybil Attack Analysis and Detection Techniques in MANETSybil Attack Analysis and Detection Techniques in MANET
Sybil Attack Analysis and Detection Techniques in MANETrahulmonikasharma
 
A Landmark Based Shortest Path Detection by Using A* and Haversine Formula
A Landmark Based Shortest Path Detection by Using A* and Haversine FormulaA Landmark Based Shortest Path Detection by Using A* and Haversine Formula
A Landmark Based Shortest Path Detection by Using A* and Haversine Formularahulmonikasharma
 
Processing Over Encrypted Query Data In Internet of Things (IoTs) : CryptDBs,...
Processing Over Encrypted Query Data In Internet of Things (IoTs) : CryptDBs,...Processing Over Encrypted Query Data In Internet of Things (IoTs) : CryptDBs,...
Processing Over Encrypted Query Data In Internet of Things (IoTs) : CryptDBs,...rahulmonikasharma
 
Quality Determination and Grading of Tomatoes using Raspberry Pi
Quality Determination and Grading of Tomatoes using Raspberry PiQuality Determination and Grading of Tomatoes using Raspberry Pi
Quality Determination and Grading of Tomatoes using Raspberry Pirahulmonikasharma
 
Comparative of Delay Tolerant Network Routings and Scheduling using Max-Weigh...
Comparative of Delay Tolerant Network Routings and Scheduling using Max-Weigh...Comparative of Delay Tolerant Network Routings and Scheduling using Max-Weigh...
Comparative of Delay Tolerant Network Routings and Scheduling using Max-Weigh...rahulmonikasharma
 
DC Conductivity Study of Cadmium Sulfide Nanoparticles
DC Conductivity Study of Cadmium Sulfide NanoparticlesDC Conductivity Study of Cadmium Sulfide Nanoparticles
DC Conductivity Study of Cadmium Sulfide Nanoparticlesrahulmonikasharma
 
A Survey on Peak to Average Power Ratio Reduction Methods for LTE-OFDM
A Survey on Peak to Average Power Ratio Reduction Methods for LTE-OFDMA Survey on Peak to Average Power Ratio Reduction Methods for LTE-OFDM
A Survey on Peak to Average Power Ratio Reduction Methods for LTE-OFDMrahulmonikasharma
 
IOT Based Home Appliance Control System, Location Tracking and Energy Monitoring
IOT Based Home Appliance Control System, Location Tracking and Energy MonitoringIOT Based Home Appliance Control System, Location Tracking and Energy Monitoring
IOT Based Home Appliance Control System, Location Tracking and Energy Monitoringrahulmonikasharma
 
Thermal Radiation and Viscous Dissipation Effects on an Oscillatory Heat and ...
Thermal Radiation and Viscous Dissipation Effects on an Oscillatory Heat and ...Thermal Radiation and Viscous Dissipation Effects on an Oscillatory Heat and ...
Thermal Radiation and Viscous Dissipation Effects on an Oscillatory Heat and ...rahulmonikasharma
 
Advance Approach towards Key Feature Extraction Using Designed Filters on Dif...
Advance Approach towards Key Feature Extraction Using Designed Filters on Dif...Advance Approach towards Key Feature Extraction Using Designed Filters on Dif...
Advance Approach towards Key Feature Extraction Using Designed Filters on Dif...rahulmonikasharma
 
Alamouti-STBC based Channel Estimation Technique over MIMO OFDM System
Alamouti-STBC based Channel Estimation Technique over MIMO OFDM SystemAlamouti-STBC based Channel Estimation Technique over MIMO OFDM System
Alamouti-STBC based Channel Estimation Technique over MIMO OFDM Systemrahulmonikasharma
 
Empirical Mode Decomposition Based Signal Analysis of Gear Fault Diagnosis
Empirical Mode Decomposition Based Signal Analysis of Gear Fault DiagnosisEmpirical Mode Decomposition Based Signal Analysis of Gear Fault Diagnosis
Empirical Mode Decomposition Based Signal Analysis of Gear Fault Diagnosisrahulmonikasharma
 
Short Term Load Forecasting Using ARIMA Technique
Short Term Load Forecasting Using ARIMA TechniqueShort Term Load Forecasting Using ARIMA Technique
Short Term Load Forecasting Using ARIMA Techniquerahulmonikasharma
 
Impact of Coupling Coefficient on Coupled Line Coupler
Impact of Coupling Coefficient on Coupled Line CouplerImpact of Coupling Coefficient on Coupled Line Coupler
Impact of Coupling Coefficient on Coupled Line Couplerrahulmonikasharma
 
Design Evaluation and Temperature Rise Test of Flameproof Induction Motor
Design Evaluation and Temperature Rise Test of Flameproof Induction MotorDesign Evaluation and Temperature Rise Test of Flameproof Induction Motor
Design Evaluation and Temperature Rise Test of Flameproof Induction Motorrahulmonikasharma
 

More from rahulmonikasharma (20)

Data Mining Concepts - A survey paper
Data Mining Concepts - A survey paperData Mining Concepts - A survey paper
Data Mining Concepts - A survey paper
 
A Review on Real Time Integrated CCTV System Using Face Detection for Vehicle...
A Review on Real Time Integrated CCTV System Using Face Detection for Vehicle...A Review on Real Time Integrated CCTV System Using Face Detection for Vehicle...
A Review on Real Time Integrated CCTV System Using Face Detection for Vehicle...
 
Considering Two Sides of One Review Using Stanford NLP Framework
Considering Two Sides of One Review Using Stanford NLP FrameworkConsidering Two Sides of One Review Using Stanford NLP Framework
Considering Two Sides of One Review Using Stanford NLP Framework
 
A New Detection and Decoding Technique for (2×N_r ) MIMO Communication Systems
A New Detection and Decoding Technique for (2×N_r ) MIMO Communication SystemsA New Detection and Decoding Technique for (2×N_r ) MIMO Communication Systems
A New Detection and Decoding Technique for (2×N_r ) MIMO Communication Systems
 
Broadcasting Scenario under Different Protocols in MANET: A Survey
Broadcasting Scenario under Different Protocols in MANET: A SurveyBroadcasting Scenario under Different Protocols in MANET: A Survey
Broadcasting Scenario under Different Protocols in MANET: A Survey
 
Sybil Attack Analysis and Detection Techniques in MANET
Sybil Attack Analysis and Detection Techniques in MANETSybil Attack Analysis and Detection Techniques in MANET
Sybil Attack Analysis and Detection Techniques in MANET
 
A Landmark Based Shortest Path Detection by Using A* and Haversine Formula
A Landmark Based Shortest Path Detection by Using A* and Haversine FormulaA Landmark Based Shortest Path Detection by Using A* and Haversine Formula
A Landmark Based Shortest Path Detection by Using A* and Haversine Formula
 
Processing Over Encrypted Query Data In Internet of Things (IoTs) : CryptDBs,...
Processing Over Encrypted Query Data In Internet of Things (IoTs) : CryptDBs,...Processing Over Encrypted Query Data In Internet of Things (IoTs) : CryptDBs,...
Processing Over Encrypted Query Data In Internet of Things (IoTs) : CryptDBs,...
 
Quality Determination and Grading of Tomatoes using Raspberry Pi
Quality Determination and Grading of Tomatoes using Raspberry PiQuality Determination and Grading of Tomatoes using Raspberry Pi
Quality Determination and Grading of Tomatoes using Raspberry Pi
 
Comparative of Delay Tolerant Network Routings and Scheduling using Max-Weigh...
Comparative of Delay Tolerant Network Routings and Scheduling using Max-Weigh...Comparative of Delay Tolerant Network Routings and Scheduling using Max-Weigh...
Comparative of Delay Tolerant Network Routings and Scheduling using Max-Weigh...
 
DC Conductivity Study of Cadmium Sulfide Nanoparticles
DC Conductivity Study of Cadmium Sulfide NanoparticlesDC Conductivity Study of Cadmium Sulfide Nanoparticles
DC Conductivity Study of Cadmium Sulfide Nanoparticles
 
A Survey on Peak to Average Power Ratio Reduction Methods for LTE-OFDM
A Survey on Peak to Average Power Ratio Reduction Methods for LTE-OFDMA Survey on Peak to Average Power Ratio Reduction Methods for LTE-OFDM
A Survey on Peak to Average Power Ratio Reduction Methods for LTE-OFDM
 
IOT Based Home Appliance Control System, Location Tracking and Energy Monitoring
IOT Based Home Appliance Control System, Location Tracking and Energy MonitoringIOT Based Home Appliance Control System, Location Tracking and Energy Monitoring
IOT Based Home Appliance Control System, Location Tracking and Energy Monitoring
 
Thermal Radiation and Viscous Dissipation Effects on an Oscillatory Heat and ...
Thermal Radiation and Viscous Dissipation Effects on an Oscillatory Heat and ...Thermal Radiation and Viscous Dissipation Effects on an Oscillatory Heat and ...
Thermal Radiation and Viscous Dissipation Effects on an Oscillatory Heat and ...
 
Advance Approach towards Key Feature Extraction Using Designed Filters on Dif...
Advance Approach towards Key Feature Extraction Using Designed Filters on Dif...Advance Approach towards Key Feature Extraction Using Designed Filters on Dif...
Advance Approach towards Key Feature Extraction Using Designed Filters on Dif...
 
Alamouti-STBC based Channel Estimation Technique over MIMO OFDM System
Alamouti-STBC based Channel Estimation Technique over MIMO OFDM SystemAlamouti-STBC based Channel Estimation Technique over MIMO OFDM System
Alamouti-STBC based Channel Estimation Technique over MIMO OFDM System
 
Empirical Mode Decomposition Based Signal Analysis of Gear Fault Diagnosis
Empirical Mode Decomposition Based Signal Analysis of Gear Fault DiagnosisEmpirical Mode Decomposition Based Signal Analysis of Gear Fault Diagnosis
Empirical Mode Decomposition Based Signal Analysis of Gear Fault Diagnosis
 
Short Term Load Forecasting Using ARIMA Technique
Short Term Load Forecasting Using ARIMA TechniqueShort Term Load Forecasting Using ARIMA Technique
Short Term Load Forecasting Using ARIMA Technique
 
Impact of Coupling Coefficient on Coupled Line Coupler
Impact of Coupling Coefficient on Coupled Line CouplerImpact of Coupling Coefficient on Coupled Line Coupler
Impact of Coupling Coefficient on Coupled Line Coupler
 
Design Evaluation and Temperature Rise Test of Flameproof Induction Motor
Design Evaluation and Temperature Rise Test of Flameproof Induction MotorDesign Evaluation and Temperature Rise Test of Flameproof Induction Motor
Design Evaluation and Temperature Rise Test of Flameproof Induction Motor
 

Recently uploaded

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 

Elementary Concepts of Big Data and Hadoop

  • 1. International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 5 Issue: 7 749 – 752 _______________________________________________________________________________________________ 749 IJRITCC | July 2017, Available @ http://www.ijritcc.org _______________________________________________________________________________________ Elementary Concepts of Big Data and Hadoop Hasmukh B. Domadiya Assistant Professor, National Computer College, Jamnagar. Dr. Girish C. Bhimani Head of Department, Department of Statistics, Saurashtra University, Rajkot Abstract: This paper is an effort to present the basic importance of Big Data and also its importance in an organization from its performance point of view. The term Big data, refers the data sets, whose volume, complexity and also rate of growth make them more difficult to capture, manage, process and also analyzed. For such type of data –intensive applications, the Apache Hadoop Framework has newly concerned a lot of attention. Hadoop is the core platform for structuring Big data, and solves the problem of making it helpful for analytics idea. Hadoop is an open source software project that enables the distributed processing of enormous data and framework for the analysis and transformation of very large data sets using the MapReduce paradigm. This paper deals with the architecture of Hadoop with its various components. Keywords: Big Data, Hadoop, MapReduce, HBase, Avro, Pig, Hive, YARN, Sqoop, ZooKeeper, Mahout __________________________________________________*****_________________________________________________ I. Introduction: Big data is a collection of enormous dataset that not only processed by any traditional computer techniques. Big data is not only a term , but it has various tools, techniques and also framework. The term Big data, refers the data sets, whose volume, complexity and also rate of growth make them more difficult to capture, manage, process and also analyzed. Generally, in Big data the data in it will be of three types: Structured Data –Relational data, Semi- structured data-XML data and also Unstructured data- Word, PDF, Text, Media Logs (1). II. Big Data: Current age is in the data age. According to Tom, It’s not easy to measure the total volume of data stored electronically, but an IDC estimate put the size of the digital universe at 4.4 zettabytes in 2013 and is forecasting a tenfold growth by 2020 to 44 zettabytes. A zettabyte is 10^21 bytes, or equivalent one thousand exabytes, one million petabytes, or one billion terabytes. That is more than one disk drive for every person in the world (2). The data in big data is a big no matter how we serving it. Current era is totally a society of sharers, but in just sixty seconds of different sharing and searching activities on internet. In the below figure, we can see that millions of items are accessed, searched, transferred and also shared. It’s a shocking amount of data to think about. (3) Challenges of Big Data: 1. Volume: Volume refers to amount or sizes of data. The storage data size may be represented in Terabytes or zeta bytes. 2. Variety: Variety refers different types of data and also different sources of data. Data may be in structured, unstructured and also sometimes it is in semi-structured. 3. Velocity: Velocity refers the speed of data processing. The data comes at very high speed. 4. Veracity: Veracity refers to noise, biases and abnormality when we dealing with high volume, velocity and variety of data, the all of data are not going to 100% correct, there will be dirty data (4).
  • 2. International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 5 Issue: 7 749 – 752 _______________________________________________________________________________________________ 750 IJRITCC | July 2017, Available @ http://www.ijritcc.org _______________________________________________________________________________________ Hadoop: The solution of Big data: Hadoop is open-source software for reliable, scalable and also distributed computing. The Apache Hadoop library is a framework that allows distributed processing of enormous data sets across clusters of computers using a simple programming model. Hadoop was derived from Google’s Map Reduce and Google File System (GFS) (5). Currently, Hadoop consists of Hadoop kernel, MapReduce, HDFS and different various components like Apache Hive, Base etc. Fig: Architecture of Hadoop (6) A. HDFS (Hadoop Distributed File System) A HDFS is designed to hold a large amount of information like terabytes or petabytes and also provide access to this data to many clients distributed across a network. HDFS has master/slave architecture. Trough HDFS, data will be written once and then read several times. Fig: Architecture of HDFS (7) HDFS is a block-structured file system. Individual files are divided into the chunks of a fixed size. These all small chunks are stored across a group of one or more machines with data storage capacity. HDFS stores three copies of each file by copying each piece to three different servers. Size of each block is 64 MB. HDFS architecture is divided into three different nodes, which are Name node, Data Node and also HDFS client node (8). Name Node: Name node is centrally placed node, which contains whole information about Hadoop file system. The main work of this node is it records all the metadata and attributes and also specific location of files and data blocks in the data nodes. This node is also called as a master node because in this node stores all the details about the system and also provides newly added, modified and also removed details from any data nodes. Data Node: Data node manages the data storage (9) of their system. Data nodes perform read and write operations on the file systems, as per client request. This node perform operations like chunk/block create, delete and also reproduction according to the instructions of the name node. HDFS Clients: An HDFS client is also known as Edge node. It is basically a link node between name node and data node. B. MapReduce Architecture Solanke poona et al. MapReduce is a programming model and related execution for processing and also generating enormous data with a parallel processing. This architecture poised of a two different functions. Map function used for filtering and sorting data and reduce function that performs a summary operation (10). Fig: MapReduce Algorithm (11) Map stage: This stage is used to process the input data. Generally the inputted data is in the form of file or directory and is also stored in the Hadoop Distributed File System (HDFS). The inputted file is passing to the (11) map function through line by line. The map processes that data and also creates multiple small chunks of data.
  • 3. International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 5 Issue: 7 749 – 752 _______________________________________________________________________________________________ 751 IJRITCC | July 2017, Available @ http://www.ijritcc.org _______________________________________________________________________________________ Reduce stage: This stage is combination of the shuffle stage and (11) also the Reduce stage. The main job of this stage is to process the data that comes from the map stage. After processing, it generates a new set of output, which will be stored in HDFS. III. Literature Review: S. Vikram Phaneendra et al., mentioned that in previous days the data was very few and anyone can handle it very easily through the RDBMS but currently it is too difficult to handle enormous data through that tools. This major problem to handle huge data is solved through the Big Data. In this paper, they explain the different dimensions of Big Data like Volume, Velocity, Variety, Value and also complexity (Veracity). In this paper, they also illustrated that big data can be found at anywhere like in Finance, Retail Industry, Healthcare, Insurance etc (12). According to Albert Bifet et al., discussed that streaming examination in current era is now becoming the fastest and most proficient way to attain meaningful and useful knowledge, allowing any industry to respond quickly when any type of issues can appear or detect to improve the performance. In this paper, they also discussed different tools used for mining big data are Apache Hadoop, mahout etc (13). Bernice Purcell, discusses that in Big Data includes structured data, semi-structured and also unstructured data. In this paper, they also discussed that Hadoop architecture is used to process different unstructured and semi-structured using MapReduce to locate all related data then select only the data directly answering the query (14). Manyika et al. (2011), describes the major contributions of big data can make to businesses: Transparency creation, performance improvement, population segmentation, decision making support and also innovative business models, products and services (15). Y. Lee et al., describes that Most of the MapReduce applications on Hadoop are developed to analyze large texts, web contents or log files. In this paper, the author specially highlighted on that first packet processing method of Hadoop that always analyzes any packet trace files in sequential manner through the reading all packets across from multiple HDFS chunks (16). IV. Hadoop Components: 1). HBase: HBase is a non-relational column oriented distributed database designed to run always on top of the Hadoop Distributed File System (HDFS). HBase is an open- source and distributed based on the Google’s BigTable. This is an example of NoSQL data store and it’s written in Java (17). 2). Avro: Avro is data serialization format which brings data interoperability among multiple components of Apache Hadoop. 3). Pig: Pig was initially developed at Yahoo Research around 2006 but it moved into the Apache software foundation list in 2007(18). It is a platform for analyzing enormous data sets which is consists of high level query language like SQL for expressing data analysis and also processing. Pig can easily process tera bytes of data with very little lines of code and through this platform writing and maintaining data processing jobs very easy (19). 4). Hive: Hive is Data Warehousing application that provides the SQL interface and also relational model. It is also structures data into the well-known database concepts like tables, columns, rows and also partitions. It supports primitive as well as complex data types like maps, lists and structs (18). 5). YARN: Yet Another Resource Negotiator (YARN), is included in the latest Hadoop release and its goal is to allow the system to serve as a general data processing framework. It supports programming models other than MapReduce, while also improving scalability and resource utilization (20). 6). Sqoop: Sqoop is tool which can be used to transfer the data from relational database environments like Oracle, mysql into Hadoop environment. This is a command line interface platform that is used to transferring data between relational databases and Hadoop (4). 7). ZooKeeper: Zookeeper is a distributed coordination and also governing service for Hadoop cluster. It is a centralized service that provides distributed synchronization and also maintaining the configuration information(4). 8). Mahout: Mahout is a simple and extensible programming environment and framework for building scalable algorithms very quickly(21). V. Conclusion: In this paper, the author provided an overview of the Big Data and Hadoop. As an author, this paper considering the five main characteristics of Big Data and
  • 4. International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 5 Issue: 7 749 – 752 _______________________________________________________________________________________________ 752 IJRITCC | July 2017, Available @ http://www.ijritcc.org _______________________________________________________________________________________ discussing different components of Hadoop. I also believe that the information proposed here can help to discuss technologies already existing and also newly emerging in the opportunity, classify them and efficient mechanism and also guide them. References [1] http://www.tutotialspoint.com/hadoop/hadoop_big_data_ov erview.htm [2] Tom White ,“Hadoop: the Definitive Guide” 4th edition, O’Reilly publication [3] https://ricoheuropebusinessdriver.com/category/big-data [4] Varsha B. Bobade (Jan-2016), “Survey paper on Big Data and Hadoop”, International Research Journal of Engineering and Technology (IRJET), Vol-3, Issue-1, pp. 861- 863, e-ISSN:- 2395-0056, p-ISSN:-2395-0072. [5] Apache Hadoop, http://hadoop.apache.org [6] https://www.mssqltips.com/sqlservertip/3140/big-data- basics--part-3--overview-of-hadoop [7] Hadoop Architecture: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html [8] Hadoop Tutorial: http://developer.yahoo.com/hadoop/tutotial/module1.html [9] HDFSTutorial :https://www.tutorialspoint.com/hadoop/hadoop_hdfs_over view.htm [10] Solanke Poonam G, B.M.Patil (2-2016), “An Efficient Approach for Processing Big Data with Inceremental MapReduce”, International Journal for Scientific Research & Development, Vol-4, Issue-2, pp. 53-57, ISSN (online) 2321-0613. [11] MapReduce Tutorial: https://www.tutorialspoint.com/hadoop/hadoop_mapreduce .htm [12] S.Vikram Phaneendra & E.Madhusudhan Reddy (Apr 19- 23, 2013) “Big Data- solutions for RDBMS problems- A survey” In 12th IEEE/IFIP Network Operations & Management Symposium (NOMS 2010) (Osaka, Japan). [13] Albert Bifet, “Mining Big Data in Real Time”, Informatica 37 (2013) 15-20 Dec-2012. [14] Bernice Purcell, “The emergence of “big data” technology and analytics”, Journal of Technology Research. 15). Manyika, J., Chui,. M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011, June). “Big data: The next frontier for innovation, competition, and productivity”. McKinsey Global Institute. Retrieved from http://www.mckinsey.com/Insights/MGI/Research/Technol ogy_and_Innovation/Big_data_The_next_frontier_for_ innovation [16] Y. Lee, W.Kang, and Y.Lee (April-2011), “A Hadoop based packet trace processing Tool”, International Workshop on Traffic Monitoring and Analysis (TMA 2011). [17] Apache HBase, https://hbase.apache.org/ [18] Ashish Thusoo, Joydeep Sen Sama, Namit Jain, Zhend Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu and Raghotham Murthy, “Hive- A Petabyte Scale Data Warehouse Using Hadoop”, By Facebook Data Infrastructure Team. [19] Apache Pig, https://pig.apache.org [20] Poonam S. Patil and Rajesh N. Phursule (Oct-2014), “Survey paper on Big Data Processing and Hadoop Components”, International Journal of Science and Research (IJSR), Vol-3, Issue-10, pp. 585-590, ISSN (Online): 2319-7064. [21] Apache mahaout, http://mahout.apache.org/