SlideShare a Scribd company logo
1 of 13
Farzad Nozarian
4/25/15 @AUT
Purpose
This guide describes how to get Shark running locally. It creates a small Hive
installation on one machine and allows you to execute simple queries.
The only prerequisite for this guide is that you have Java and Scala
2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can
download it by running:
2
$ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
$ tar xvfz scala-2.9.3.tgz
Running Shark In Other Modes
• You can also start your Shark in one of the three other supported modes:
• Running Shark on EC2
• Running Shark on a Cluster
• Running Shark with Tachyon
3
Let’s Start…(1/3)
• Download the binary distribution of Shark 0.8.
• The package contains two folders, shark-0.8.0 and hive-0.9.0-
shark-0.8.0-bin.
4
$ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin-
hadoop1.tgz # Hadoop 1/CDH3 - or -
$ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin-
cdh4.tgz # Hadoop 2/CDH4
$ tar xvfz shark-*-bin-*.tgz
$ cd shark-*-bin-*
• The Shark code is in the shark-0.8.0/ directory.
Let’s Start…(2/3)
• To setup your environment to run Shark locally, you need to set
HIVE_HOME and SCALA_HOME environmental variables in a file shark-
0.8.0/conf/shark-env.sh to point to the folders you just downloaded.
• Shark comes with a template file shark-env.sh.template that you can
copy and modify to get started:
5
$ cp shark-0.8.0/conf/shark-env.sh.template shark-0.8.0/conf/shark-env.sh
• Now edit the following two lines in shark-env.sh:
export HIVE_HOME=/path/to/hive-0.9.0-shark-0.8.0-bin
export SCALA_HOME=/path/to/scala-2.9.3
Let’s Start…(3/3)
• Next, create the default Hive warehouse directory. This is where Hive will
store table data for native tables:
6
$ sudo mkdir -p /user/hive/warehouse
$ sudo chmod 0777 /user/hive/warehouse # Or make your username the owner
• You can now start the Shark CLI:
$ ./bin/shark
• In addition to the Shark CLI, there are several executables in shark-0.8.0/bin:
bin/shark-withdebug
bin/shark-withinfo
: Runs Shark CLI with DEBUG level logs printed to the console.
: Runs Shark CLI with INFO level logs printed to the console.
Lab
Assignment
1. Launch the Shark shell.
2. Create a table called book … .
3. List all the columns of the table book.
4. Load the book table from the file books in
the local filesystem.
5. Create a table called novel, containing
those records from table book … .
6. Print out the list of available tables.
7. Count the number of records from the
table book.
8. Print out the total cost of the books with
authors who have the same last name.
9. Count the number of distinct last names.
10. Drop the tables.
7
Lab Assignment 5 (1/5)
1. Launch the Shark shell.
2. Create a table called book whose schema includes book's title,
description, author's first name, last name, and cost.
3. List all the columns of the table book.
8
shark
create table
book(title string, description string, firstname string, lastname string, cost int)
row format delimited fields terminated by 't';
describe book;
Lab Assignment 5 (2/5)
4. Load the book table from the file books in the local filesystem. The books
file has the following format:
9
load data local inpath 'books' into table book;
Speed love Long book about love Brian Dog 10
Long day Story about Monday Emily Blue 20
Flying Car Novel about airplanes Phil High 5
Short day Novel about a day Phil Dog 30
Lab Assignment 5 (3/5)
As an alternative solution, you can create the an external table. The
external keyword lets you to create a table and provide a location so that
Hive does not use a default location for this table. This would be useful if
you already have data generated.
10
create external table
exbook(title string, description string, firstname string, lastname string, cost int)
row format delimited fields terminated by 't'
location '<file location, excluding the name of the file>';
5. Create a table called novel, containing those records from table book
that have keyword “novel” in their description and cache it in memory.
create table novel TBLPROPERTIES('shark.cache'='MEMORY_ONLY')
as select * from book where description like "%Novel%";
Lab Assignment 5 (4/5)
6. Print out the list of available tables.
11
show tables;
select lastname, sum(cost) from book group by lastname;
7. Count the number of records from the table book.
select count(*) from book;
8. Print out the total cost of the books with authors who have the same last
name.
9. Count the number of distinct last names.
select count(distinct lastname) from book;
Lab Assignment 5 (5/5)
10. Drop the tables.
12
drop table book;
drop table novel;
References:
• https://github.com/amplab/shark/wiki/Running-Shark-Locally
13

More Related Content

What's hot

Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solrMike Frampton
 
Friends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFSFriends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFSSaumitra Srivastav
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBaseAmal Abid
 
eZ Publish cluster unleashed revisited
eZ Publish cluster unleashed revisitedeZ Publish cluster unleashed revisited
eZ Publish cluster unleashed revisitedBertrand Dunogier
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Install Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. KuteInstall Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. KuteTushar B Kute
 
Install Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. KuteInstall Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. KuteTushar B Kute
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseShiva Rama Krishna Dasharathi
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setupMohammad_Tariq
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee Nur Ahammad
 
InfiniFlux collector
InfiniFlux collectorInfiniFlux collector
InfiniFlux collectorInfiniFlux
 
DSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital LibraryDSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital Libraryrajivkumarmca
 

What's hot (17)

Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solr
 
Friends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFSFriends of Solr - Nutch & HDFS
Friends of Solr - Nutch & HDFS
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBase
 
eZ Publish cluster unleashed revisited
eZ Publish cluster unleashed revisitedeZ Publish cluster unleashed revisited
eZ Publish cluster unleashed revisited
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Install Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. KuteInstall Wordpress in Ubuntu Linux by Tushar B. Kute
Install Wordpress in Ubuntu Linux by Tushar B. Kute
 
Install Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. KuteInstall Drupal in Ubuntu by Tushar B. Kute
Install Drupal in Ubuntu by Tushar B. Kute
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exercise
 
Drupal from scratch
Drupal from scratchDrupal from scratch
Drupal from scratch
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setup
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee DSpace Manual for BALID Trainee
DSpace Manual for BALID Trainee
 
InfiniFlux collector
InfiniFlux collectorInfiniFlux collector
InfiniFlux collector
 
Perl Programming - 04 Programming Database
Perl Programming - 04 Programming DatabasePerl Programming - 04 Programming Database
Perl Programming - 04 Programming Database
 
Introduction to DSpace
Introduction to DSpaceIntroduction to DSpace
Introduction to DSpace
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
DSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital LibraryDSpace Tutorial : Open Source Digital Library
DSpace Tutorial : Open Source Digital Library
 

Viewers also liked

Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 
Big Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsBig Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsFarzad Nozarian
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformFarzad Nozarian
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 

Viewers also liked (8)

Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Big Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing EnvironmentsBig Data Processing in Cloud Computing Environments
Big Data Processing in Cloud Computing Environments
 
Object Based Databases
Object Based DatabasesObject Based Databases
Object Based Databases
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing Platform
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 

Similar to Get Shark Running Locally

Using Puppet on Linux, Windows, and Mac OSX
Using Puppet on Linux, Windows, and Mac OSXUsing Puppet on Linux, Windows, and Mac OSX
Using Puppet on Linux, Windows, and Mac OSXPuppet
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboyKenneth Geisshirt
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
FreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleFreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleMohammed Farrag
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294IkiArif1
 
Tanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder
 
04 02-2018--Slackware Wire Shark Installation
04 02-2018--Slackware Wire Shark Installation04 02-2018--Slackware Wire Shark Installation
04 02-2018--Slackware Wire Shark InstallationAlexander Bitar
 
Geecon 2019 - Taming Code Quality in the Worst Language I Know: Bash
Geecon 2019 - Taming Code Quality  in the Worst Language I Know: BashGeecon 2019 - Taming Code Quality  in the Worst Language I Know: Bash
Geecon 2019 - Taming Code Quality in the Worst Language I Know: BashMichał Kordas
 
Unix primer
Unix primerUnix primer
Unix primerdummy
 
Bash shell
Bash shellBash shell
Bash shellxylas121
 
390aLecture05_12sp.ppt
390aLecture05_12sp.ppt390aLecture05_12sp.ppt
390aLecture05_12sp.pptmugeshmsd5
 
Introduction to linux
Introduction to linuxIntroduction to linux
Introduction to linuxQIANG XU
 
Introduction to linux2
Introduction to linux2Introduction to linux2
Introduction to linux2Gourav Varma
 

Similar to Get Shark Running Locally (20)

Php introduction
Php introductionPhp introduction
Php introduction
 
Using Puppet on Linux, Windows, and Mac OSX
Using Puppet on Linux, Windows, and Mac OSXUsing Puppet on Linux, Windows, and Mac OSX
Using Puppet on Linux, Windows, and Mac OSX
 
Gophers, whales and.. clouds? Oh my!
Gophers, whales and.. clouds? Oh my!Gophers, whales and.. clouds? Oh my!
Gophers, whales and.. clouds? Oh my!
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboy
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
FreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleFreeBSD Jail Complete Example
FreeBSD Jail Complete Example
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294
 
Tanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools short
 
04 02-2018--Slackware Wire Shark Installation
04 02-2018--Slackware Wire Shark Installation04 02-2018--Slackware Wire Shark Installation
04 02-2018--Slackware Wire Shark Installation
 
Linux configer
Linux configerLinux configer
Linux configer
 
Directories description
Directories descriptionDirectories description
Directories description
 
Geecon 2019 - Taming Code Quality in the Worst Language I Know: Bash
Geecon 2019 - Taming Code Quality  in the Worst Language I Know: BashGeecon 2019 - Taming Code Quality  in the Worst Language I Know: Bash
Geecon 2019 - Taming Code Quality in the Worst Language I Know: Bash
 
Unix primer
Unix primerUnix primer
Unix primer
 
Bash shell
Bash shellBash shell
Bash shell
 
390aLecture05_12sp.ppt
390aLecture05_12sp.ppt390aLecture05_12sp.ppt
390aLecture05_12sp.ppt
 
Introduction to linux
Introduction to linuxIntroduction to linux
Introduction to linux
 
Ruby
RubyRuby
Ruby
 
Docker perl build
Docker perl buildDocker perl build
Docker perl build
 
Final Report - Spark
Final Report - SparkFinal Report - Spark
Final Report - Spark
 
Introduction to linux2
Introduction to linux2Introduction to linux2
Introduction to linux2
 

Recently uploaded

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 

Recently uploaded (20)

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 

Get Shark Running Locally

  • 2. Purpose This guide describes how to get Shark running locally. It creates a small Hive installation on one machine and allows you to execute simple queries. The only prerequisite for this guide is that you have Java and Scala 2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can download it by running: 2 $ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz $ tar xvfz scala-2.9.3.tgz
  • 3. Running Shark In Other Modes • You can also start your Shark in one of the three other supported modes: • Running Shark on EC2 • Running Shark on a Cluster • Running Shark with Tachyon 3
  • 4. Let’s Start…(1/3) • Download the binary distribution of Shark 0.8. • The package contains two folders, shark-0.8.0 and hive-0.9.0- shark-0.8.0-bin. 4 $ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin- hadoop1.tgz # Hadoop 1/CDH3 - or - $ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin- cdh4.tgz # Hadoop 2/CDH4 $ tar xvfz shark-*-bin-*.tgz $ cd shark-*-bin-* • The Shark code is in the shark-0.8.0/ directory.
  • 5. Let’s Start…(2/3) • To setup your environment to run Shark locally, you need to set HIVE_HOME and SCALA_HOME environmental variables in a file shark- 0.8.0/conf/shark-env.sh to point to the folders you just downloaded. • Shark comes with a template file shark-env.sh.template that you can copy and modify to get started: 5 $ cp shark-0.8.0/conf/shark-env.sh.template shark-0.8.0/conf/shark-env.sh • Now edit the following two lines in shark-env.sh: export HIVE_HOME=/path/to/hive-0.9.0-shark-0.8.0-bin export SCALA_HOME=/path/to/scala-2.9.3
  • 6. Let’s Start…(3/3) • Next, create the default Hive warehouse directory. This is where Hive will store table data for native tables: 6 $ sudo mkdir -p /user/hive/warehouse $ sudo chmod 0777 /user/hive/warehouse # Or make your username the owner • You can now start the Shark CLI: $ ./bin/shark • In addition to the Shark CLI, there are several executables in shark-0.8.0/bin: bin/shark-withdebug bin/shark-withinfo : Runs Shark CLI with DEBUG level logs printed to the console. : Runs Shark CLI with INFO level logs printed to the console.
  • 7. Lab Assignment 1. Launch the Shark shell. 2. Create a table called book … . 3. List all the columns of the table book. 4. Load the book table from the file books in the local filesystem. 5. Create a table called novel, containing those records from table book … . 6. Print out the list of available tables. 7. Count the number of records from the table book. 8. Print out the total cost of the books with authors who have the same last name. 9. Count the number of distinct last names. 10. Drop the tables. 7
  • 8. Lab Assignment 5 (1/5) 1. Launch the Shark shell. 2. Create a table called book whose schema includes book's title, description, author's first name, last name, and cost. 3. List all the columns of the table book. 8 shark create table book(title string, description string, firstname string, lastname string, cost int) row format delimited fields terminated by 't'; describe book;
  • 9. Lab Assignment 5 (2/5) 4. Load the book table from the file books in the local filesystem. The books file has the following format: 9 load data local inpath 'books' into table book; Speed love Long book about love Brian Dog 10 Long day Story about Monday Emily Blue 20 Flying Car Novel about airplanes Phil High 5 Short day Novel about a day Phil Dog 30
  • 10. Lab Assignment 5 (3/5) As an alternative solution, you can create the an external table. The external keyword lets you to create a table and provide a location so that Hive does not use a default location for this table. This would be useful if you already have data generated. 10 create external table exbook(title string, description string, firstname string, lastname string, cost int) row format delimited fields terminated by 't' location '<file location, excluding the name of the file>'; 5. Create a table called novel, containing those records from table book that have keyword “novel” in their description and cache it in memory. create table novel TBLPROPERTIES('shark.cache'='MEMORY_ONLY') as select * from book where description like "%Novel%";
  • 11. Lab Assignment 5 (4/5) 6. Print out the list of available tables. 11 show tables; select lastname, sum(cost) from book group by lastname; 7. Count the number of records from the table book. select count(*) from book; 8. Print out the total cost of the books with authors who have the same last name. 9. Count the number of distinct last names. select count(distinct lastname) from book;
  • 12. Lab Assignment 5 (5/5) 10. Drop the tables. 12 drop table book; drop table novel;