Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Shark - Lab Assignment

1,679 views

Published on

A tutorial presentation based on github.com/amplab/shark documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.

Published in: Software
  • Login to see the comments

  • Be the first to like this

Shark - Lab Assignment

  1. 1. Farzad Nozarian 4/25/15 @AUT
  2. 2. Purpose This guide describes how to get Shark running locally. It creates a small Hive installation on one machine and allows you to execute simple queries. The only prerequisite for this guide is that you have Java and Scala 2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can download it by running: 2 $ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz $ tar xvfz scala-2.9.3.tgz
  3. 3. Running Shark In Other Modes • You can also start your Shark in one of the three other supported modes: • Running Shark on EC2 • Running Shark on a Cluster • Running Shark with Tachyon 3
  4. 4. Let’s Start…(1/3) • Download the binary distribution of Shark 0.8. • The package contains two folders, shark-0.8.0 and hive-0.9.0- shark-0.8.0-bin. 4 $ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin- hadoop1.tgz # Hadoop 1/CDH3 - or - $ wget https://github.com/amplab/shark/releases/download/v0.8.0/shark-0.8.0-bin- cdh4.tgz # Hadoop 2/CDH4 $ tar xvfz shark-*-bin-*.tgz $ cd shark-*-bin-* • The Shark code is in the shark-0.8.0/ directory.
  5. 5. Let’s Start…(2/3) • To setup your environment to run Shark locally, you need to set HIVE_HOME and SCALA_HOME environmental variables in a file shark- 0.8.0/conf/shark-env.sh to point to the folders you just downloaded. • Shark comes with a template file shark-env.sh.template that you can copy and modify to get started: 5 $ cp shark-0.8.0/conf/shark-env.sh.template shark-0.8.0/conf/shark-env.sh • Now edit the following two lines in shark-env.sh: export HIVE_HOME=/path/to/hive-0.9.0-shark-0.8.0-bin export SCALA_HOME=/path/to/scala-2.9.3
  6. 6. Let’s Start…(3/3) • Next, create the default Hive warehouse directory. This is where Hive will store table data for native tables: 6 $ sudo mkdir -p /user/hive/warehouse $ sudo chmod 0777 /user/hive/warehouse # Or make your username the owner • You can now start the Shark CLI: $ ./bin/shark • In addition to the Shark CLI, there are several executables in shark-0.8.0/bin: bin/shark-withdebug bin/shark-withinfo : Runs Shark CLI with DEBUG level logs printed to the console. : Runs Shark CLI with INFO level logs printed to the console.
  7. 7. Lab Assignment 1. Launch the Shark shell. 2. Create a table called book … . 3. List all the columns of the table book. 4. Load the book table from the file books in the local filesystem. 5. Create a table called novel, containing those records from table book … . 6. Print out the list of available tables. 7. Count the number of records from the table book. 8. Print out the total cost of the books with authors who have the same last name. 9. Count the number of distinct last names. 10. Drop the tables. 7
  8. 8. Lab Assignment 5 (1/5) 1. Launch the Shark shell. 2. Create a table called book whose schema includes book's title, description, author's first name, last name, and cost. 3. List all the columns of the table book. 8 shark create table book(title string, description string, firstname string, lastname string, cost int) row format delimited fields terminated by 't'; describe book;
  9. 9. Lab Assignment 5 (2/5) 4. Load the book table from the file books in the local filesystem. The books file has the following format: 9 load data local inpath 'books' into table book; Speed love Long book about love Brian Dog 10 Long day Story about Monday Emily Blue 20 Flying Car Novel about airplanes Phil High 5 Short day Novel about a day Phil Dog 30
  10. 10. Lab Assignment 5 (3/5) As an alternative solution, you can create the an external table. The external keyword lets you to create a table and provide a location so that Hive does not use a default location for this table. This would be useful if you already have data generated. 10 create external table exbook(title string, description string, firstname string, lastname string, cost int) row format delimited fields terminated by 't' location '<file location, excluding the name of the file>'; 5. Create a table called novel, containing those records from table book that have keyword “novel” in their description and cache it in memory. create table novel TBLPROPERTIES('shark.cache'='MEMORY_ONLY') as select * from book where description like "%Novel%";
  11. 11. Lab Assignment 5 (4/5) 6. Print out the list of available tables. 11 show tables; select lastname, sum(cost) from book group by lastname; 7. Count the number of records from the table book. select count(*) from book; 8. Print out the total cost of the books with authors who have the same last name. 9. Count the number of distinct last names. select count(distinct lastname) from book;
  12. 12. Lab Assignment 5 (5/5) 10. Drop the tables. 12 drop table book; drop table novel;
  13. 13. References: • https://github.com/amplab/shark/wiki/Running-Shark-Locally 13

×