SlideShare a Scribd company logo
1 of 30
Download to read offline
DATA ENGINEER CERTIFICATION
Austin Sun
June 27th 2017
• Introduction
• Skill set
• How to prepare
• Register test
• Reference & etc.
OUTLINE
• Introduction
• Skill set
• How to prepare
• Register test
• Reference & etc.
OUTLINE
• Data Scientist:
a person employed to analyze and interpret complex digital data,
such as the usage statistics of a website, especially in order to assist
a business in its decision-making.
• Data Engineer:
a worker whose primary job responsibilities involve preparing data for
analytical or operational uses. Data engineers enable data scientists
to do their jobs more effectively.
DATA ENGINEER & DATA SCIENTIST
“An experienced open-source developer who earns
the Cloudera Certified Data Engineer credential is
able to perform core competencies required to
ingest, transform, store, and analyze data in
Cloudera's CDH environment. The credential is
earned after successfully passing the CCP Data
Engineer Exam (DE575).” -- Cloudera
CCP DATA ENGINEER
• Introduction
• Skill set
• How to prepare
• Register test
• Reference & etc.
OUTLINE
• Data Ingest
• Transform, Stage, Store
• Data Analysis
• Workflow
SKILL SET
The skills to transfer data between external systems and
your cluster. This includes the following:
• Import and export data between an external RDBMS and your cluster, including
the ability to import specific subsets, change the delimiter and file format of
imported data during ingest, and alter the data access pattern or privileges.
• Ingest real-time and near-real time (NRT) streaming data into HDFS, including the
ability to distribute to multiple data sources and convert data on ingest from one
format to another.
• Load data into and out of HDFS using the Hadoop File System (FS) commands.
DATA INGEST
Convert a set of data values in a given format stored in HDFS
into new data values and/or a new data format and write
them into HDFS or Hive/HCatalog. This includes the following
skills:
• Convert data from one file format to another
• Write your data with compression
• Convert data from one set of values to another (e.g., Lat/Long to Postal Address
using an external library)
• Change the data format of values in a data set
TRANSFORM, STAGE, STORE
TRANSFORM, STAGE, STORE
• Purge bad records from a data set, e.g., null values
• Deduplication and merge data
• De-normalize data from multiple disparate data sets
• Evolve an Avro or Parquet schema
• Partition an existing data set according to one or more partition keys
• Tune data for optimal query performance
Filter, sort, join, aggregate, and/or transform one or more data
sets in a given format stored in HDFS to produce a specified
result. All of these tasks may include reading from Parquet, Avro,
JSON, delimited text, and natural language text. The queries will
include complex data types (e.g., array, map, struct), the
implementation of external libraries, partitioned data,
compressed data, and require the use of metadata from
Hive/HCatalog.
DATA ANALYSIS
• Write a query to aggregate multiple rows of data
• Write a query to calculate aggregate statistics (e.g., average or sum)
• Write a query to filter data
• Write a query that produces ranked or sorted data
• Write a query that joins multiple data sets
• Read and/or create a Hive or an HCatalog table from existing data in HDFS
DATA ANALYSIS
The ability to create and execute various jobs and actions that
move data towards greater value and use in a system. This
includes the following skills:
• Create and execute a linear workflow with actions that include Hadoop jobs, Hive
jobs, Pig jobs, custom actions, etc.
• Create and execute a branching workflow with actions that include Hadoop jobs,
Hive jobs, Pig jobs, custom action, etc.
• Orchestrate a workflow to execute regularly at predefined times, including
workflows that have data dependencies
WORK FLOW
• Introduction
• Skill set
• How to prepare
• Register test
• Reference & etc.
OUTLINE
• Familiar with all related command tools
• Use Cloudera quickstart VW to practice
• Take sample test
• Hive, Impala, Sqoop, Spark, Crunch, Pig, Kite, Avro,
Parquet, Cloudera HUE, oozie, Flume, DataFu, JDK 7
API Docs, Python 2.7 , Python 3.4 , Scala
Only the above documentation are accessible during the exam.
FAMILIAR WITH ALL RELATED
COMMAND TOOLS
DOWNLOAD QUICKSTART VM
SAMPLE EXAM QUESTION
• Introduction
• Skill set
• How to prepare
• Register test
• Reference & etc.
OUTLINE
• Create an account at www.examslocal.com.
• Select the exam
• Choose a date and time
• Select a time slot for exam
• Pass the compatibility tool and install the screen
sharing Chrome Extension
STEPS TO SCHEDULE EXAM
• The exam is remote, it takes less then 2 hour
• Partly open book exam.
• Some documentation are available online during
the exam
• All other websites, including Google/search
functionality is disabled. No notes or other exam
aids.
WHEN EXAM
• Introduction
• Skill set
• How to prepare
• Register test
• Reference & etc.
OUTLINE
• Apache & Cloudera official documents
• My website:
https://godataengineer.wordpress.com/
USEFUL LINKS
• Data Warehouse for Machine Learning app
• Using Flume, Hive, HDFS, Spark and Phoenix
USE CASE
THANK YOU

More Related Content

What's hot

GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
The Plan Cache Whisperer - Performance Tuning SQL Server
The Plan Cache Whisperer - Performance Tuning SQL ServerThe Plan Cache Whisperer - Performance Tuning SQL Server
The Plan Cache Whisperer - Performance Tuning SQL ServerJason Strate
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveIBM Cloud Data Services
 
How Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at ScaleHow Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at ScaleDatabricks
 
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Armel Nene
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Databricks
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastDatabricks
 
Accelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraAccelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraDatabricks
 
Lessons from Driverless AI going to Production
Lessons from Driverless AI going to ProductionLessons from Driverless AI going to Production
Lessons from Driverless AI going to ProductionSri Ambati
 
ETL Practices for Better or Worse
ETL Practices for Better or WorseETL Practices for Better or Worse
ETL Practices for Better or WorseEric Sun
 
Introduction to Google BigQuery
Introduction to Google BigQueryIntroduction to Google BigQuery
Introduction to Google BigQueryCsaba Toth
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseGrant Fritchey
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lakeMykola Zerniuk
 
MongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL DatabaseMongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL DatabaseGaurav Awasthi
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkDatabricks
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksDatabricks
 

What's hot (20)

GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
The Plan Cache Whisperer - Performance Tuning SQL Server
The Plan Cache Whisperer - Performance Tuning SQL ServerThe Plan Cache Whisperer - Performance Tuning SQL Server
The Plan Cache Whisperer - Performance Tuning SQL Server
 
SQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The MoveSQL To NoSQL - Top 6 Questions Before Making The Move
SQL To NoSQL - Top 6 Questions Before Making The Move
 
How Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at ScaleHow Adobe uses Structured Streaming at Scale
How Adobe uses Structured Streaming at Scale
 
Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352Hadoop vs Java Batch Processing JSR 352
Hadoop vs Java Batch Processing JSR 352
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
 
Accelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraAccelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & Privacera
 
Lessons from Driverless AI going to Production
Lessons from Driverless AI going to ProductionLessons from Driverless AI going to Production
Lessons from Driverless AI going to Production
 
ETL Practices for Better or Worse
ETL Practices for Better or WorseETL Practices for Better or Worse
ETL Practices for Better or Worse
 
Introduction to Google BigQuery
Introduction to Google BigQueryIntroduction to Google BigQuery
Introduction to Google BigQuery
 
R in Power BI
R in Power BIR in Power BI
R in Power BI
 
Data virtualization using polybase
Data virtualization using polybaseData virtualization using polybase
Data virtualization using polybase
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
MongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL DatabaseMongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL Database
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure Databricks
 

Similar to How to obtain the Cloudera Data Engineer Certification

Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorBlueData, Inc.
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014Eli Singer
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data FactoryBizTalk360
 
Sap bods Training in Hyderabad | Sap bods Online Training
Sap bods Training in Hyderabad | Sap bods  Online Training Sap bods Training in Hyderabad | Sap bods  Online Training
Sap bods Training in Hyderabad | Sap bods Online Training CHENNAKESHAVAKATAGAR
 
Sap bods training in hyderabad
Sap bods training in hyderabadSap bods training in hyderabad
Sap bods training in hyderabadRajitha D
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data CertificationAdam Doyle
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Adam Doyle
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsCloudera, Inc.
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empowerDurga Gadiraju
 
Pankaj_Kumar_~3 yr exp.docx
Pankaj_Kumar_~3  yr exp.docxPankaj_Kumar_~3  yr exp.docx
Pankaj_Kumar_~3 yr exp.docxKumar Pankaj
 
Cnam azure ze cloud resource manager
Cnam azure ze cloud  resource managerCnam azure ze cloud  resource manager
Cnam azure ze cloud resource managerAymeric Weinbach
 

Similar to How to obtain the Cloudera Data Engineer Certification (20)

Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data WarehousingDatastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
Solution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab AcceleratorSolution Brief: Big Data Lab Accelerator
Solution Brief: Big Data Lab Accelerator
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Jethro data meetup index base sql on hadoop - oct-2014
Jethro data meetup    index base sql on hadoop - oct-2014Jethro data meetup    index base sql on hadoop - oct-2014
Jethro data meetup index base sql on hadoop - oct-2014
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Sap bods Training in Hyderabad | Sap bods Online Training
Sap bods Training in Hyderabad | Sap bods  Online Training Sap bods Training in Hyderabad | Sap bods  Online Training
Sap bods Training in Hyderabad | Sap bods Online Training
 
Sap bods training in hyderabad
Sap bods training in hyderabadSap bods training in hyderabad
Sap bods training in hyderabad
 
An intro to Azure Data Lake
An intro to Azure Data LakeAn intro to Azure Data Lake
An intro to Azure Data Lake
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
 
Pallavi_Resume
Pallavi_ResumePallavi_Resume
Pallavi_Resume
 
Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019Big Data Retrospective - STL Big Data IDEA Jan 2019
Big Data Retrospective - STL Big Data IDEA Jan 2019
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
Saurabh's_profile
Saurabh's_profileSaurabh's_profile
Saurabh's_profile
 
Pankaj_Kumar_~3 yr exp.docx
Pankaj_Kumar_~3  yr exp.docxPankaj_Kumar_~3  yr exp.docx
Pankaj_Kumar_~3 yr exp.docx
 
Cnam azure ze cloud resource manager
Cnam azure ze cloud  resource managerCnam azure ze cloud  resource manager
Cnam azure ze cloud resource manager
 

More from elephantscale

Building a Big Data Team
Building a Big Data TeamBuilding a Big Data Team
Building a Big Data Teamelephantscale
 
Petrophysics and Big Data by Elephant Scale training and consultin
Petrophysics and Big Data by Elephant Scale training and consultinPetrophysics and Big Data by Elephant Scale training and consultin
Petrophysics and Big Data by Elephant Scale training and consultinelephantscale
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dwelephantscale
 
Oil & Gas Big Data use cases
Oil & Gas Big Data use casesOil & Gas Big Data use cases
Oil & Gas Big Data use caseselephantscale
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Sparkelephantscale
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Thingselephantscale
 

More from elephantscale (8)

AI for Kids
AI for KidsAI for Kids
AI for Kids
 
Building a Big Data Team
Building a Big Data TeamBuilding a Big Data Team
Building a Big Data Team
 
Petrophysics and Big Data by Elephant Scale training and consultin
Petrophysics and Big Data by Elephant Scale training and consultinPetrophysics and Big Data by Elephant Scale training and consultin
Petrophysics and Big Data by Elephant Scale training and consultin
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Oil & Gas Big Data use cases
Oil & Gas Big Data use casesOil & Gas Big Data use cases
Oil & Gas Big Data use cases
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
Reference architecture for Internet Of Things
Reference architecture for Internet Of ThingsReference architecture for Internet Of Things
Reference architecture for Internet Of Things
 
Hadoop to spark_v2
Hadoop to spark_v2Hadoop to spark_v2
Hadoop to spark_v2
 

Recently uploaded

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 

Recently uploaded (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 

How to obtain the Cloudera Data Engineer Certification

  • 2. • Introduction • Skill set • How to prepare • Register test • Reference & etc. OUTLINE
  • 3. • Introduction • Skill set • How to prepare • Register test • Reference & etc. OUTLINE
  • 4. • Data Scientist: a person employed to analyze and interpret complex digital data, such as the usage statistics of a website, especially in order to assist a business in its decision-making. • Data Engineer: a worker whose primary job responsibilities involve preparing data for analytical or operational uses. Data engineers enable data scientists to do their jobs more effectively. DATA ENGINEER & DATA SCIENTIST
  • 5.
  • 6. “An experienced open-source developer who earns the Cloudera Certified Data Engineer credential is able to perform core competencies required to ingest, transform, store, and analyze data in Cloudera's CDH environment. The credential is earned after successfully passing the CCP Data Engineer Exam (DE575).” -- Cloudera CCP DATA ENGINEER
  • 7. • Introduction • Skill set • How to prepare • Register test • Reference & etc. OUTLINE
  • 8. • Data Ingest • Transform, Stage, Store • Data Analysis • Workflow SKILL SET
  • 9. The skills to transfer data between external systems and your cluster. This includes the following: • Import and export data between an external RDBMS and your cluster, including the ability to import specific subsets, change the delimiter and file format of imported data during ingest, and alter the data access pattern or privileges. • Ingest real-time and near-real time (NRT) streaming data into HDFS, including the ability to distribute to multiple data sources and convert data on ingest from one format to another. • Load data into and out of HDFS using the Hadoop File System (FS) commands. DATA INGEST
  • 10. Convert a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS or Hive/HCatalog. This includes the following skills: • Convert data from one file format to another • Write your data with compression • Convert data from one set of values to another (e.g., Lat/Long to Postal Address using an external library) • Change the data format of values in a data set TRANSFORM, STAGE, STORE
  • 11. TRANSFORM, STAGE, STORE • Purge bad records from a data set, e.g., null values • Deduplication and merge data • De-normalize data from multiple disparate data sets • Evolve an Avro or Parquet schema • Partition an existing data set according to one or more partition keys • Tune data for optimal query performance
  • 12. Filter, sort, join, aggregate, and/or transform one or more data sets in a given format stored in HDFS to produce a specified result. All of these tasks may include reading from Parquet, Avro, JSON, delimited text, and natural language text. The queries will include complex data types (e.g., array, map, struct), the implementation of external libraries, partitioned data, compressed data, and require the use of metadata from Hive/HCatalog. DATA ANALYSIS
  • 13. • Write a query to aggregate multiple rows of data • Write a query to calculate aggregate statistics (e.g., average or sum) • Write a query to filter data • Write a query that produces ranked or sorted data • Write a query that joins multiple data sets • Read and/or create a Hive or an HCatalog table from existing data in HDFS DATA ANALYSIS
  • 14. The ability to create and execute various jobs and actions that move data towards greater value and use in a system. This includes the following skills: • Create and execute a linear workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom actions, etc. • Create and execute a branching workflow with actions that include Hadoop jobs, Hive jobs, Pig jobs, custom action, etc. • Orchestrate a workflow to execute regularly at predefined times, including workflows that have data dependencies WORK FLOW
  • 15. • Introduction • Skill set • How to prepare • Register test • Reference & etc. OUTLINE
  • 16. • Familiar with all related command tools • Use Cloudera quickstart VW to practice • Take sample test
  • 17. • Hive, Impala, Sqoop, Spark, Crunch, Pig, Kite, Avro, Parquet, Cloudera HUE, oozie, Flume, DataFu, JDK 7 API Docs, Python 2.7 , Python 3.4 , Scala Only the above documentation are accessible during the exam. FAMILIAR WITH ALL RELATED COMMAND TOOLS
  • 19.
  • 20.
  • 21.
  • 23. • Introduction • Skill set • How to prepare • Register test • Reference & etc. OUTLINE
  • 24. • Create an account at www.examslocal.com. • Select the exam • Choose a date and time • Select a time slot for exam • Pass the compatibility tool and install the screen sharing Chrome Extension STEPS TO SCHEDULE EXAM
  • 25. • The exam is remote, it takes less then 2 hour • Partly open book exam. • Some documentation are available online during the exam • All other websites, including Google/search functionality is disabled. No notes or other exam aids. WHEN EXAM
  • 26. • Introduction • Skill set • How to prepare • Register test • Reference & etc. OUTLINE
  • 27. • Apache & Cloudera official documents • My website: https://godataengineer.wordpress.com/ USEFUL LINKS
  • 28.
  • 29. • Data Warehouse for Machine Learning app • Using Flume, Hive, HDFS, Spark and Phoenix USE CASE