Fundamentals of Apache Hadoop in Bigdata

Apache Hadoop
By Ashwin Kumar R

What is Bigdata?
Big Data is a collection of data that is huge in volume, yet growing
exponentially with time. It is a data with so large size and complexity
that none of traditional data management tools can store it or
process it efficiently. Big data is also a data but with huge size

Why we need Big data analytics?
Big data analytics helps organizations harness their data and use it
to identify new opportunities. That, in turn, leads to smarter business
moves, more efficient operations, higher profits and happier
customers.

What is hadoop?
Hadoop is an open-source software framework for storing data and running
applications on clusters of commodity hardware. It provides massive storage
for any kind of data, enormous processing power and the ability to handle
virtually limitless concurrent tasks or jobs.

Use of Hadoop ?
Apache Hadoop is an open source framework that is used to efficiently store
and process large datasets ranging in size from gigabytes to petabytes of
data. Instead of using one large computer to store and process the data,
Hadoop allows clustering multiple computers to analyze massive datasets in
parallel more quickly.

What is DFS?
A Distributed File System (DFS) as the name suggests, is a file system
that is distributed on multiple file servers or multiple locations.
It allows programs to access or store isolated files as they do with the
local ones, allowing programmers to access files from any network or
computer.

What is HDFS?
HDFS is designed to reliably store very large files across machines in a large
cluster. It stores each file as a sequence of blocks; all blocks in a file except
the last block are the same size.
The blocks of a file are replicated for fault tolerance. The block size and
replication factor are configurable per file.

Advantages
● Scalable. Hadoop is a highly scalable storage platform, because it can store
and distribute very large data sets across hundreds of inexpensive servers
that operate in parallel.
● Cost effective. Hadoop also offers a cost effective storage solution for
businesses' exploding data sets.
● Flexible.
● Fast.
● Resilient to failure.

Disadvantages
● Security Concerns.
● Vulnerable By Nature.
● Not Fit for Small Data.
● Potential Stability Issues.
● General Limitations.

Fundamentals of Apache Hadoop in Bigdata

Recommended

Recommended

More Related Content

Similar to Fundamentals of Apache Hadoop in Bigdata

Similar to Fundamentals of Apache Hadoop in Bigdata (20)

Recently uploaded

Recently uploaded (20)

Fundamentals of Apache Hadoop in Bigdata