SlideShare a Scribd company logo
1 of 18
Download to read offline
Apache Pig
Extract Transform Load
Introduction
Apache Pig:
● is an abstraction over MapReduce
● has structure which is amenable for substantial parallelization
● is generally used with Hadoop
● does lazy evaluation
● uses Pig Latin, which provides
○ Ease of Programming
○ Optimization opportunities
■ Focus on semantics rather than efficiency
○ Extensibility
■ Can create UDFs
Architecture
Pig Latin Data Model
● Fully nested & complex data model
● Atom
● Tuple
● Bag
● Map
● Relation
Pig Environment
Establishing Pig environment contains following three major steps:
● Installation
○ https://pig.apache.org/docs/r0.7.0/setup.html
● Grunt shell
○ Pig interactive shell which accepts Pig-latin based statements.
● Execution
○ Two modes of execution, Mapreduce mode and Local mode, we’ll see both ahead.
Pig Latin
statements
relations
expressions schemas
Data Types
Basic Complex
int, long, biginteger Tuple : (raju, 5)
float, double, bigdecimal Bag: {(raju,5),(ranu,6)}
chararray Map: [‘name’#’raju’,’age’:40]
bytearray
boolean
datetime
Operators and Constructs
● All standard Arithmetic and Relational operator
● Constructs
○ Tuple: ()
○ Bag: {}
○ Map: []
● Relational Operators
○ Load & Storing: LOAD & STORE
○ Filtering: FILTER, DISTINCT, FOREACH, GENERATE, STREAM
○ Grouping & Joining: JOIN, COGROUP, CROSS, GROUP
○ Sorting: ORDER, LIMIT
○ Combining & Splitting: UNION, SPLIT
○ Diagnostic & Debugging: DUMP, DESCRIBE, EXPLAIN, ILLUSTRATE
Sample Queries
student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt'
USING PigStorage(',')
as ( id:int, firstname:chararray, lastname:chararray, phone:chararray,
city:chararray );
STORE student INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (',');
Dump student
describe student;
explain Relation_name;
illustrate student
group_data = GROUP student_details by age
cogroup_data = COGROUP student_details by age, employee_details by age;
result = JOIN relation1 BY columnname, relation2 BY columnname;
cross_data = CROSS customers, orders;
student = UNION student1, student2;
SPLIT student_details into student_details1 if age<23, student_details2 if (22<age and age>25);
filter_data = FILTER student_details BY city == 'Chennai';
distinct_data = DISTINCT student_details;
foreach_data = FOREACH student_details GENERATE id,age,city;
order_by_data = ORDER student_details BY age DESC;
limit_data = LIMIT student_details 4;
Sample Queries (contd.)
Eval & Load/Store Functions
Eval:
AVG() BagToString() CONCAT() COUNT() COUNT_STAR()
DIFF() IsEmpty() MAX() MIN() SIZE() SUM()
Load/Store:
PigStorage()
TextLoader()
BinStorage()
UDFs(User Defined Functions)
● UDF ?
● Piggybanks?
● Types of UDFs:
○ Filter Functions
○ Eval Functions
○ Algebric Functions
● Writing UDFs
● Registering UDFs
● Defining Alias
● Using UDFs
documentation
UDF’s (contd.)
What is UDF?
In addition to the built-in functions, Pig provides excellent support for User Defined
Functions(UDF’s), where we can define our own functions and use them along with
different Pig Latin built-in functions. Complete support to write UDF’s is provided in
Java and partially in other languages, viz. Jython, Python, JS, Ruby and Groovy.
Piggybanks?
Apache Pig provides a central repository for Java based UDF’s using which you can
use the UDF’s written by other people and you can also contribute there.
UDF Example
● This example is written for converting given string to Uppercase,
● You can download the java code from link.
● Extract it and create the jar file from the source code or download from here.
● Now assume we have the jar file at /home/hduser/hadoop/exampleJARs/udfEx1.jar and some sample data at
/hduser/sampleData/sampledata.txt
● Now on grunt shell
REGISTER /home/hduser/hadoop/exampleJARs/udfEx1.jar;
DEFINE udf_eval SampleEval();
data = load '/hduser/sampleData/sampledata.txt' using PigStorage(',') as (id:int, name:chararray, age:int, city:chararray);
upper_case_data = foreach data generate udf_eval(name);
store upper_case_data into 'hdfs://localhost:9000/hduser/outputudf15';
Pig Scripts
● Modes of running Pig Scripts
○ Mapreduce mode
○ Local mode (-x parameter)
pig -x <mode> <pig_script_path>
Pig → MapReduce
Author:
Pathan Mudassirkhan
Twitter Linkedin

More Related Content

Viewers also liked

Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
Mitsuharu Hamba
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Hadoop User Group
 

Viewers also liked (14)

Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)Hadoop and pig at twitter (oscon 2010)
Hadoop and pig at twitter (oscon 2010)
 
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
Hadoop Ecosystem at Twitter - Kevin Weil - Hadoop World 2010
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystème
 
Big Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache HadoopBig Data: Concepts, techniques et démonstration de Apache Hadoop
Big Data: Concepts, techniques et démonstration de Apache Hadoop
 
Pig and Python to Process Big Data
Pig and Python to Process Big DataPig and Python to Process Big Data
Pig and Python to Process Big Data
 
Un introduction à Pig
Un introduction à PigUn introduction à Pig
Un introduction à Pig
 
Apache Cassandra - Concepts et fonctionnalités
Apache Cassandra - Concepts et fonctionnalitésApache Cassandra - Concepts et fonctionnalités
Apache Cassandra - Concepts et fonctionnalités
 

Similar to Apache pig

Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Adam Kawa
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Viswanath Gangavaram
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
 

Similar to Apache pig (20)

Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
Big Data Hadoop Training
Big Data Hadoop TrainingBig Data Hadoop Training
Big Data Hadoop Training
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
 
Practical pig
Practical pigPractical pig
Practical pig
 
Pig latin
Pig latinPig latin
Pig latin
 
Go. Why it goes
Go. Why it goesGo. Why it goes
Go. Why it goes
 
Apache Airflow in Production
Apache Airflow in ProductionApache Airflow in Production
Apache Airflow in Production
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
 
Dart the Better JavaScript
Dart the Better JavaScriptDart the Better JavaScript
Dart the Better JavaScript
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAM
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsA fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFs
 
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
BDX 2015 - Scaling out big-data computation & machine learning using Pig, Pyt...
 
Node.js Course 2 of 2 - Advanced techniques
Node.js Course 2 of 2 - Advanced techniquesNode.js Course 2 of 2 - Advanced techniques
Node.js Course 2 of 2 - Advanced techniques
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 

Apache pig

  • 2. Introduction Apache Pig: ● is an abstraction over MapReduce ● has structure which is amenable for substantial parallelization ● is generally used with Hadoop ● does lazy evaluation ● uses Pig Latin, which provides ○ Ease of Programming ○ Optimization opportunities ■ Focus on semantics rather than efficiency ○ Extensibility ■ Can create UDFs
  • 4. Pig Latin Data Model ● Fully nested & complex data model ● Atom ● Tuple ● Bag ● Map ● Relation
  • 5. Pig Environment Establishing Pig environment contains following three major steps: ● Installation ○ https://pig.apache.org/docs/r0.7.0/setup.html ● Grunt shell ○ Pig interactive shell which accepts Pig-latin based statements. ● Execution ○ Two modes of execution, Mapreduce mode and Local mode, we’ll see both ahead.
  • 7. Data Types Basic Complex int, long, biginteger Tuple : (raju, 5) float, double, bigdecimal Bag: {(raju,5),(ranu,6)} chararray Map: [‘name’#’raju’,’age’:40] bytearray boolean datetime
  • 8. Operators and Constructs ● All standard Arithmetic and Relational operator ● Constructs ○ Tuple: () ○ Bag: {} ○ Map: [] ● Relational Operators ○ Load & Storing: LOAD & STORE ○ Filtering: FILTER, DISTINCT, FOREACH, GENERATE, STREAM ○ Grouping & Joining: JOIN, COGROUP, CROSS, GROUP ○ Sorting: ORDER, LIMIT ○ Combining & Splitting: UNION, SPLIT ○ Diagnostic & Debugging: DUMP, DESCRIBE, EXPLAIN, ILLUSTRATE
  • 9. Sample Queries student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',') as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray ); STORE student INTO ' hdfs://localhost:9000/pig_Output/ ' USING PigStorage (','); Dump student describe student; explain Relation_name; illustrate student group_data = GROUP student_details by age cogroup_data = COGROUP student_details by age, employee_details by age; result = JOIN relation1 BY columnname, relation2 BY columnname;
  • 10. cross_data = CROSS customers, orders; student = UNION student1, student2; SPLIT student_details into student_details1 if age<23, student_details2 if (22<age and age>25); filter_data = FILTER student_details BY city == 'Chennai'; distinct_data = DISTINCT student_details; foreach_data = FOREACH student_details GENERATE id,age,city; order_by_data = ORDER student_details BY age DESC; limit_data = LIMIT student_details 4; Sample Queries (contd.)
  • 11. Eval & Load/Store Functions Eval: AVG() BagToString() CONCAT() COUNT() COUNT_STAR() DIFF() IsEmpty() MAX() MIN() SIZE() SUM() Load/Store: PigStorage() TextLoader() BinStorage()
  • 12. UDFs(User Defined Functions) ● UDF ? ● Piggybanks? ● Types of UDFs: ○ Filter Functions ○ Eval Functions ○ Algebric Functions ● Writing UDFs ● Registering UDFs ● Defining Alias ● Using UDFs documentation
  • 13. UDF’s (contd.) What is UDF? In addition to the built-in functions, Pig provides excellent support for User Defined Functions(UDF’s), where we can define our own functions and use them along with different Pig Latin built-in functions. Complete support to write UDF’s is provided in Java and partially in other languages, viz. Jython, Python, JS, Ruby and Groovy. Piggybanks? Apache Pig provides a central repository for Java based UDF’s using which you can use the UDF’s written by other people and you can also contribute there.
  • 14. UDF Example ● This example is written for converting given string to Uppercase, ● You can download the java code from link. ● Extract it and create the jar file from the source code or download from here. ● Now assume we have the jar file at /home/hduser/hadoop/exampleJARs/udfEx1.jar and some sample data at /hduser/sampleData/sampledata.txt ● Now on grunt shell REGISTER /home/hduser/hadoop/exampleJARs/udfEx1.jar; DEFINE udf_eval SampleEval(); data = load '/hduser/sampleData/sampledata.txt' using PigStorage(',') as (id:int, name:chararray, age:int, city:chararray); upper_case_data = foreach data generate udf_eval(name); store upper_case_data into 'hdfs://localhost:9000/hduser/outputudf15';
  • 15. Pig Scripts ● Modes of running Pig Scripts ○ Mapreduce mode ○ Local mode (-x parameter) pig -x <mode> <pig_script_path>
  • 17.