Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Oracle Database 12c - Features for Big Data

A quick introduction to Big Data followed by features in the oracle database 12c which make it Big Data Ready

Oracle Database 12c - Features for Big Data

  1. 1. Oracle Database 12c Features for Big Data Disclaimer : The information presented here is based on my views, and information gathered from online sources, the presentation is only to create an awareness about the features and does not describe a real solution. Presented by Abishek V S
  2. 2. Agenda • What is Big Data • Big Data Versus RDBMS • Oracle In-Memory Column Store • JSON support in Oracle Database • Oracle Database And Hadoop
  3. 3. What is Big Data
  4. 4. What is Big Data Big data is simply data that breaks traditional architectures due to its sheer volume, speed and variety. Structured Unstructured Semi-Structured Multiple Sources Large Volumes
  5. 5. Characterization of Big Data  Volume  Variety  Velocity From “Understanding Big Data” by IBM Veracity, Validity, Volatility
  6. 6. Characterization of Big Data From the dawn of civilization until 2003, humankind generated five exabytes of data. Now we produce five exabytes every two days…and the pace is accelerating. Eric Schmidt, Executive Chairman, Google
  7. 7. Characterization of Big Data
  8. 8. Characterization of Big Data
  9. 9. Big Data: Driving Factors & Motivation • Exponential growth of the internet • Widespread acceptance of E-Commerce • Growth of the Social Network • Commoditization of the computing resources • Per GB cost of storage is more affordable now than 10 years back. • Commodity computers have become more powerful. • Popularity of clusters based on commodity computers • IoT (Internet of Things) – Day by day the devices we own are getting smarter and are learning about us.
  10. 10. • Distributed computing – Distributed Servers and Storage (Cloud based) – Distributed processing Eg : MapReduce with Hadoop • Schema Free Databases – NoSQL Database • In-memory • Semi Structures – JSON – Key, Value pairs • Columnar databases • Big Data Operations • Analytic / Semantic Processing (e.g. R, OWLIM) Big Data: Technologies and Tools
  11. 11. Big Data Versus RDBMS
  12. 12. Big Data versus RDBMS • RDBMS – Data is stored in defined structures (tables) – Transactional in nature – Data consistency is a primary consideration – Drives operational systems – Response time is crucial • Big Data – Data comes in all shapes and sizes – Behavioral Data – Prone to rapid change – Useful in VAS, identifying patterns not exposed by Operational systems – The value derived is of prime importance.
  13. 13. Big Data versus RDBMS RDBMS Captures Business Transactions Ensures Operational Efficiency Operational Decision support Analytics is very limited Integrating external data is expensive ERP, BI, ETL, Data warehouse Big Data Captures User behavioral data System logs, social data Acts as Feedback to business New opportunity exploration Analytics is the key focus Technology aims at integration. User activity log, Web Analytics, Social Media Streaming API, Hadoop Map Reduce, NoSQL data store optimized for Analytics
  14. 14. Big Data versus RDBMS Big Data RDBMS
  15. 15. Oracle In-Memory Column Store
  16. 16. Oracle In-Memory Column Store • A column format database stores each of the attributes about a transaction or record in a separate column structure • A column format is ideal for analytics, as it allows for faster data retrieval when only a few columns are selected but the query accesses a large portion of the data set. • A column format is not so efficient at processing row wise DML: In order to insert or delete a single record in a column format all of the columnar structures in the table must be changed. • Up until now you have been forced to pick just one format and suffer the tradeoff of either suboptimal OLTP or sub-optimal analytics performance.
  17. 17. Oracle In-Memory Column Store Oracle Database In-Memory provides best of both worlds The in-memory column format store cache should be sized to fit the objects that must be stored in memory. Less than 20% overhead in terms of total memory requirements. Database In-Memory uses an In-Memory column store (IM column store), which is a new component of the Oracle Database System Global Area (SGA), called the In- Memory Area (INMEMORY_SIZE).
  18. 18. Oracle In-Memory Column Store • Tablespace Level – ALTER TABLESPACE ts_data INMEMORY; • Table Level – ALTER TABLE sales INMEMORY NO INMEMORY(prod_id); • Partition Level – ALTER TABLE sales MODIFY PARTITION SALES_Q1_1998 NO INMEMORY; • Objects are populated into the IM column store either in a prioritized list immediately after the database is opened or after they are scanned (queried) for the first time. – ALTER TABLE customers INMEMORY PRIORITY CRITICAL;
  19. 19. Oracle In-Memory Column Store • In-Memory Compression • Typically compression is considered only as a space-saving mechanism. However, data populated into the IM column store is compressed using a new set of compression algorithms that not only help save space but also improve query performance
  20. 20. Oracle In-Memory Column Store • In-Memory Scans – Analytic queries typically reference only a small subset of the columns in a table. – Oracle Database InMemory scans only the columns needed by a SQL, and applies any WHERE clause filter predicates to these columns directly without decompressing them. • In-Memory Storage Index – A further reduction in the amount of data accessed – Automatically created and maintained on each of the columns in the IM column store. – Storage Indexes allow data pruning based on the filter predicates in a SQL statement.
  21. 21. • SIMD Vector Processing – Database In-Memory uses SIMD (Single Instruction processing Multiple Data values) vector processing – SIMD vector processing allows a set of column values to be evaluated together in a single CPU instruction. • In-Memory Joins – SQL statements that join multiple tables can also be processed very efficiently in the IM column store as they can take advantage of Bloom Filters. • A Bloom filter transforms a join into a filter that can be applied as part of the scan of the larger table. • In-Memory Aggregation – Analytic style queries often require complex aggregations and summaries. – A new optimizer transformation, called Vector Group By, has been introduced with Oracle Database 12.1.0.2 to ensure more complex analytic queries can be processed using new CPU-efficient algorithms. Oracle In-Memory Column Store
  22. 22. JSON support in Oracle Database
  23. 23. JSON support in Oracle Database • JSON (Java Script Object Notation) is a fast- growing data type often used in web and mobile applications. • JSON is also used as a data interchange format – More lightweight – Bandwidth-non-intensive • JSON integrates into web pages as javascript can directly inherit a JSON
  24. 24. JSON support in Oracle Database • JSON is gaining popularity – APIs (application programming interfaces) • Most Social network providers provide JSON based data services API. • Webservices : RESTful (Representative state transfer) – Big Data • Many NoSQL databases use JSON as the storage format – MongoDB, CouchDB, and Riak – Internet of Things (IoT) • With more personal devices and appliances getting smart and hooking up to the internet, JSON is becoming the choice of use as it is lightweight and better adaptable to these devices.
  25. 25. JSON support in Oracle Database • JSON in Oracle Database 12c R1 (12.1.0.2) – Creating Tables to Hold JSON – Querying JSON Data • Dot Notation • IS JSON • JSON_EXISTS • JSON_VALUE • JSON_QUERY • JSON_TABLE • JSON_TEXTCONTAINS – Identifying Columns Containing JSON – Loading JSON Files Using External Tables
  26. 26. JSON support in Oracle Database • Creating Tables to Hold JSON – No new data type has been added to support JSON. Instead, it is stored in regular VARCHAR2 or CLOB columns. – The IS JSON constraint indicates the column contains valid JSON data. CREATE TABLE json_documents ( id RAW(16) NOT NULL, data CLOB, CONSTRAINT json_documents_pk PRIMARY KEY (id), CONSTRAINT json_documents_json_chk CHECK (data IS JSON) ); Lax or Strict checking “(data is JSON(Strict))” – The [USER|ALL|DBA]_JSON_COLUMNS views can be used to identify tables and columns containing JSON data.
  27. 27. INSERT INTO json_documents (id, data) VALUES (SYS_GUID(), '{ "FirstName" : "John", "LastName" : "Doe", "Job" : "Clerk", "Address" : { "Street" : "99 My Street", "City" : "My City", "Country" : "UK", "Postcode" : "A12 34B" }, "ContactDetails" : { "Email" : "john.doe@example.com", "Phone" : "44 123 123456", "Twitter" : "@johndoe" }, "DateOfBirth" : "01-JAN-1980", "Active" : true }');
  28. 28. COLUMN FirstName FORMAT A15 COLUMN LastName FORMAT A15 COLUMN Postcode FORMAT A10 COLUMN Email FORMAT A25 SELECT a.data.FirstName, a.data.LastName, a.data.Address.Postcode AS Postcode, a.data.ContactDetails.Email AS Email FROM json_documents a ORDER BY a.data.FirstName, a.data.LastName; FIRSTNAME LASTNAME POSTCODE EMAIL --------------- --------------- ---------- ------------------------- Jayne Doe A12 34B jayne.doe@example.com John Doe A12 34B john.doe@example.com
  29. 29. • IS JSON – The IS JSON condition can be used to test if a column contains JSON data. • SELECT JSON_VALUE(a.data, '$.FirstName') AS first_name FROM json_documents_no_constraint a WHERE a.data IS JSON; • JSON_EXISTS – Similar to IS NULL, checks if an element has a value • JSON_VALUE – Returns an element from the JSON document, based on the specified JSON path. • JSON_QUERY – The JSON_QUERY function returns a JSON fragment representing one or more values. • JSON_TABLE – The JSON_TABLE function incorporates all the functionality of JSON_VALUE, JSON_EXISTS and JSON_QUERY. – JSON_TABLE is used for making JSON data look like relational data, which is especially useful when creating relational views over JSON data, • JSON_TEXTCONTAINS – Works with JSON indexes and enables faster text searching through the JSON data.
  30. 30. JSON support in Oracle Database Loading JSON Files Using External Tables • Create the directory objects for use with the external table. CREATE OR REPLACE DIRECTORY order_entry_dir AS '/u01/app/oracle/product/12.1.0.2/db_1/demo/schema/order_entry'; GRANT READ, WRITE ON DIRECTORY order_entry_dir TO test; CREATE OR REPLACE DIRECTORY loader_output_dir AS '/tmp'; GRANT READ, WRITE ON DIRECTORY loader_output_dir TO test; • Create the external table and query it to check if it is working. CREATE TABLE json_dump_file_contents (json_document CLOB) ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY order_entry_dir ACCESS PARAMETERS (RECORDS DELIMITED BY 0x'0A' DISABLE_DIRECTORY_LINK_CHECK BADFILE loader_output_dir: 'JSONDumpFile.bad' LOGFILE order_entry_dir: 'JSONDumpFile.log' FIELDS (json_document CHAR(5000))) LOCATION (order_entry_dir:'PurchaseOrders.dmp')) PARALLEL REJECT LIMIT UNLIMITED;
  31. 31. JSON support in Oracle Database SELECT COUNT(*) FROM json_dump_file_contents; COUNT(*) ---------- 10000 • You can now load the database table with the contents of the external table. TRUNCATE TABLE json_documents; INSERT /*+ APPEND */ INTO json_documents SELECT SYS_GUID(), json_document FROM json_dump_file_contents WHERE json_document IS JSON; COMMIT;
  32. 32. Oracle Database And Hadoop
  33. 33. Oracle Database And Hadoop • Big Data Discussion is incomplete without the mention of Hadoop • Hadoop is a distributed computing framework • Runs Batch operations(MapReduce) on distributed clusters made of commodity computers. • Stores data in a distributed clustered filesystem • Hadoop clusters are a shared nothing paradigm
  34. 34. Oracle Database And Hadoop • MapReduce Paradigm
  35. 35. Oracle Database And Hadoop • In-Database MapReduce • Avoid Shipping of data residing in RDBMS to an external infrastructure • Database security can be applied to the processed data. • Shorter learning curve for both Developers and DBAs • Mix SQL with MapReduce processing for flexibility and efficiency • Uses PL/SQL or Java Pipe-Lined Functions INSERT INTO OUTTABLE SELECT * FROM TABLE (Word_Count_Reduce (:ConfKey, CURSOR(SELECT * FROM TABLE (Word_Cursor_Map(:ConfKey, CURSOR(SELECT * FROM InTable)))))) ;
  36. 36. Oracle Database And Hadoop • Pipelined Functions : Can either return a stream of rows or take it as input too. • Can be Parallelized with a partition key • Implemented using PL/SQL, Java or C • Contains 2 Pipelined Functions, one for mapper the other for reducer. • Further the mapper input source could be an external table, and the reducer output may be placed in a DB table or further sent out to filesystem file. • Can leverage external tables, DBFS, use Java or C to write to files. • The opportunities are endless when coupled with other DB features and options. • DB Scheduler can be used to schedule the mapreduce • Clustered with distributed databases using DBLinks • Add fault tolerance and scalability with RAC.
  37. 37. Oracle Database And Hadoop • Oracle In-Database Hadoop • We will look at this in a future discussion …
  38. 38. Oracle Database And Hadoop
  39. 39. The Road Ahead • Big Data/NoSQL databases WILL NOT replace RDBMS databases. • Oracle’s Roadmap has been Single Vendor Solutions. • Reusing available resources : Both technology and human resource. • Oracle is building more Appliance based solutions.
  40. 40. The Road Ahead • Oracle Big Data Products. – Oracle Big Data Management • Oracle Big Data Appliance • Oracle Big Data SQL • Oracle NoSQL Database – Oracle Big Data Integration • Oracle GoldenGate • Oracle Data Integration • Oracle Event Processing – Big Data Analytics • Oracle Big Data Discovery • Oracle Advanced Analytics • Oracle Business Intelligence Foundation
  41. 41. Please mail me at abishek.vidyashanker@in.unisys.com

×