SlideShare a Scribd company logo
1 of 41
Download to read offline
HBaseCon, May 2012

HBase Filters
Lars George, Solutions Architect
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




2                   ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
About Me

    •  Solutions Architect @ Cloudera
    •  Apache HBase & Whirr Committer
    •  Author of
           HBase – The Definitive Guide
    •  Working with HBase since end
       of 2007
    •  Organizer of the Munich OpenHUG
    •  Speaker at Conferences (Fosdem,
       Hadoop World)

3               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                     or redistribution without written permission is prohibited.
Introduction to Filters

    •  Used in combination with get() and scan()
       API calls
    •  Steps:
      –  Create Filter instance
      –  Create Get or Scan instance
      –  Assign Filter to Get or Scan
      –  Call API and enjoy
    •  More fine-grained control over what is
       returned to the client

4                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Filter Features

    •  Allow client to further narrow down what is
       retrieved
      –  Not just per row or column key, or per column
         family
    •  Predicate Pushdown
      –  Move filtering from client to server to reduce
         network traffic
    •  Varying performance implications,
       dependent on the use-case


5                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Filter Pushdown




6             ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Filter Features (cont.)

    •  Filters have access to the entire row to
       decide its fate
      –  Access to KeyValue instances to check row keys,
         column qualifiers, timestamps, or values
    •  Scan batching might conflict with the above
       and might trigger an “Incompatible Filter”
       exception
      –  Example: DependentColumnFilter
    •  There is no cross invocation state
      –  Cannot filter rows based on dependent rows


7                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Available Filters

    •  Many filters are supplied by HBase
      –  Based on row key, column family, or column
         qualifier
      –  Paging through rows and columns
      –  Based on dependencies

    •  Write your own filters
      –  Use FilterBase class to get a no-op
         skeleton and fill in the gaps


8                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




9                   ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Comparison Filters

 •  Based on CompareFilter class
 •  Adds the compare() method to
    FilterBase!
 •  Takes operator that defines how the
    comparison is performed
     –  Predefined by client API
 •  Also needs a comparator to do the actual
    check
     –  HBase supplies a large set

10                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Comparison Operators




11        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Comparators




12        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Comparison Filters (cont.)

 •  Not all combinations of operator and
    comparator make sense
     –  For example, the SubstringComparator
        replies only 0 (match) and 1(no match)
     –  Only EQUAL and NOT_EQUAL are useful
     –  Using other operators is allowed but will most
        likely yield unexpected results




13                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Comparison Filters (cont.)

 •  HBase filters are usually filtering data out
 •  Comparison filters work in reverse as they
    include matching data
     –  Be mindful when selecting the comparison
        operator!




14               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Available Comparison Filters

 •  Row Filter
     –  Based on row keys comparisons
 •  Family Filter
     –  Based on column family names
 •  Qualifier Filter
     –  Based on column names, aka qualifiers
 •  Value Filter
     –  Based on the actual value of a column


15                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Available Comparison Filters (cont.)

 •  Dependent Column Filter
     –  Based on a timestamp of a reference column
     –  Includes all columns that have the same
        timestamp
     –  Implies that the entire row is accessible, since
        batching will not have access to the reference
        column
        •  No scanner batching allowed!
     –  Useful for loading interdependent changes
        within a row


16                 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                        or redistribution without written permission is prohibited.
Example Code
Scan scan = new Scan();

scan.addColumn(Bytes.toBytes("colfam1"), !
  Bytes.toBytes("col-0")); !
Filter filter = new RowFilter(!
  CompareFilter.CompareOp.LESS_OR_EQUAL, !
new BinaryComparator(Bytes.toBytes("row-22")));
scan.setFilter(filter);

ResultScanner scanner = table.getScanner(scan);
for (Result res : scanner) { !
  System.out.println(res); !
} !
scanner.close(); !
!

17            ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Example Ouput
 keyvalues={row-1/colfam1:col-0/1301043190260/Put/vlen=7} !
 keyvalues={row-10/colfam1:col-0/1301043190908/Put/vlen=8} !
 keyvalues={row-100/colfam1:col-0/1301043195275/Put/vlen=9} !
 keyvalues={row-11/colfam1:col-0/1301043190982/Put/vlen=8} !
 keyvalues={row-12/colfam1:col-0/1301043191040/Put/vlen=8} !
 keyvalues={row-13/colfam1:col-0/1301043191172/Put/vlen=8} !
 keyvalues={row-14/colfam1:col-0/1301043191318/Put/vlen=8} !
 keyvalues={row-15/colfam1:col-0/1301043191429/Put/vlen=8} !
 keyvalues={row-16/colfam1:col-0/1301043191509/Put/vlen=8} !
 keyvalues={row-17/colfam1:col-0/1301043191593/Put/vlen=8} !
 keyvalues={row-18/colfam1:col-0/1301043191673/Put/vlen=8} !
 keyvalues={row-19/colfam1:col-0/1301043191771/Put/vlen=8} !
 keyvalues={row-2/colfam1:col-0/1301043190346/Put/vlen=7} !
 keyvalues={row-20/colfam1:col-0/1301043191841/Put/vlen=8} !
 keyvalues={row-21/colfam1:col-0/1301043191933/Put/vlen=8} !
 keyvalues={row-22/colfam1:col-0/1301043191998/Put/vlen=8} !



18                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




19                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Dedicated Filters

 •  Based directly on FilterBase class
 •  Often less useful for get() calls, since
    entire rows are filtered




20             ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                    or redistribution without written permission is prohibited.
Available Dedicated Filters

 •  Single Column Value Filter
     –  Filter rows based on one specific column
     –  Extra features
       •  “Filter if missing”
       •  “Get latest version only”
     –  Column must be part of the scan selection
       •  Or else it is all or nothing
     –  Also needs compare operation and an
        optional comparator


21                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Single Column Value Exclude Filter
     –  Same as the one before but excludes the
        selection column
 •  Prefix Filter
     –  Based on prefix of row keys
     –  Can early out the scan!
       •  Combine with start row for best performance




22                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)
 •  Page Filter
     –  Allows pagination through rows
     –  Needs to be combined with setting the start row on
        subsequent scans
     –  Can early out the scan when limit is reached
 •  Key Only Filter
     –  Drop the value for every column
 •  First Key Only Filter
     –  Return only the first column key
     –  Useful for row counter, or get newest post type
        applications
     –  Can early out rest of row scan


23                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Inclusive Stop Filter
     –  As opposed to the exclusive stop row, this
        filter will include the final row
 •  Timestamp Filter
     –  Takes list of timestamps to include in result
 •  Column Count Get Filter
     –  Used to limit number of columns returned by a
        get() call


24                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Available Dedicated Filters (cont.)

 •  Column Pagination Filter
     –  Allows to paginate through columns within a
        row
     –  Skips to offset parameter and returns
        limit columns
 •  Column Prefix Filter
     –  Analog to PrefixFilter, here for matching
        column qualifiers
 •  Random Row Filter

25               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




26                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Decorating Filters

 •  Extend filters to gain additional control
    over the returned data
 •  Skip Filter
     –  Skip entire row when a column is filtered
     –  Not all filters are compatible
 •  While Match Filter
     –  Aborts entire scan once the wrapped filter
        indicates a row or column is omitted


27                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




28                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Combining Filters

 •  Implemented by the FilterList class
     –  Wraps list of filters into a Filter compatible
        class
     –  Takes optional operator to decide how to
        handle the results of each wrapped filter
        (default: MUST_PASS_ALL)




29                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Combining Filters

 •  Filter lists can contain other filter lists
 •  Operator is fixed per list, but hierarchy
    allows to create combinations
 •  Using the proper List implementation
    helps controlling filter execution order




30              ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                     or redistribution without written permission is prohibited.
List<Filter> filters = new ArrayList<Filter>();

 Filter filter1 = new RowFilter(!
    CompareFilter.CompareOp.GREATER_OR_EQUAL, !
    new BinaryComparator(Bytes.toBytes("row-03"))); !
 filters.add(filter1); !
 Filter filter2 = new RowFilter(!
    CompareFilter.CompareOp.LESS_OR_EQUAL, !
    new BinaryComparator(Bytes.toBytes("row-06"))); !
 filters.add(filter2); !
 Filter filter3 = new QualifierFilter(!
    CompareFilter.CompareOp.EQUAL, !
    new RegexStringComparator("col-0[03]")); !
 filters.add(filter3);!
 FilterList filterList1 = new FilterList(filters); !
 …!
 FilterList filterList2 = new
 FilterList(FilterList.Operator.MUST_PASS_ONE, filters); !


31                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Agenda

1    Introduction
2    Comparison Filters
3    Dedicated Filters
4    Decorating Filters
5    Combining Filters
6    Custom Filters




32                  ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                         or redistribution without written permission is prohibited.
Custom Filter

 •  Allows users to add missing filters
 •  Either implement Filter interface or use
    FilterBase skeleton
 •  Provides hooks called at different stages
    of the read process




33            ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                   or redistribution without written permission is prohibited.
Filter Interface
 public interface Filter extends Writable { !
   public enum ReturnCode { !
     INCLUDE, SKIP, NEXT_COL, NEXT_ROW,!
     SEEK_NEXT_USING_HINT } !
   public void reset()!
   public boolean filterRowKey(byte[] buffer, !
     int offset, int length) !
   public boolean filterAllRemaining()!
   public ReturnCode filterKeyValue(KeyValue v)!
   public void filterRow(List<KeyValue> kvs)!
   public boolean hasFilterRow()!
   public boolean filterRow()!
   public KeyValue getNextKeyHint(KeyValue !
     currentKV) !
 !


34               ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                      or redistribution without written permission is prohibited.
Filter Return Codes




35          ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                 or redistribution without written permission is prohibited.
Merge Reads




36        ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
               or redistribution without written permission is prohibited.
Filter Flow

 •  Filter hooks are called at
    different stages
 •  Seeks are done initially to
    find the next KeyValue
     –  Hint from previous filter
        invocation might help
 •  Early out checks improve
    performance


37      ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
             or redistribution without written permission is prohibited.
Example Code
public class CustomFilter extends FilterBase{ !
  private byte[] value = null; !
  private boolean filterRow = true; !
  public CustomFilter() { super(); }!
  public CustomFilter(byte[] value) { this.value = value; } !
  @Override

  public void reset() { this.filterRow = true; } !
  @Override !
  public ReturnCode filterKeyValue(KeyValue kv) {!
    if (Bytes.compareTo(value, kv.getValue()) == 0) { !
       filterRow = false; !
    } !
    return ReturnCode.INCLUDE; !
  } !
  @Override !
  public boolean filterRow() { return filterRow; } !
  ...!
} !
!
38                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Deploying Custom Filters

 •    Need to provide JAR file with filter class
 •    Deploy JAR to RegionServers
 •    Add JAR to HBASE_CLASSPATH
 •    Restart RegionServers

 •  Tip: Testing on cluster more involved, test
    on local machine first


39                ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                       or redistribution without written permission is prohibited.
Summary




40         ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                or redistribution without written permission is prohibited.
Summary (cont.)




41         ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction
                or redistribution without written permission is prohibited.

More Related Content

What's hot

Impala presentation
Impala presentationImpala presentation
Impala presentationtrihug
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...Cloudera, Inc.
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.Cloudera, Inc.
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureDataWorks Summit
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017larsgeorge
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardMatthew Blair
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
HBase Backups
HBase BackupsHBase Backups
HBase BackupsHBaseCon
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBaseCon
 

What's hot (20)

Impala presentation
Impala presentationImpala presentation
Impala presentation
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
 
High Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and FutureHigh Availability for HBase Tables - Past, Present, and Future
High Availability for HBase Tables - Past, Present, and Future
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 
HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017HBase and Impala Notes - Munich HUG - 20131017
HBase and Impala Notes - Munich HUG - 20131017
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
HBase: Where Online Meets Low Latency
HBase: Where Online Meets Low LatencyHBase: Where Online Meets Low Latency
HBase: Where Online Meets Low Latency
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 

Viewers also liked

HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
A successful Git branching model
A successful Git branching model A successful Git branching model
A successful Git branching model abodeltae
 
Git branching-model
Git branching-modelGit branching-model
Git branching-modelAaron Huang
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Getting Git Right
Getting Git RightGetting Git Right
Getting Git RightSven Peters
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseCloudera, Inc.
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseCloudera, Inc.
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashCloudera, Inc.
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...Cloudera, Inc.
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...Cloudera, Inc.
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARNHBaseCon
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponCloudera, Inc.
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesCloudera, Inc.
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 

Viewers also liked (20)

HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
A successful Git branching model
A successful Git branching model A successful Git branching model
A successful Git branching model
 
Git branching-model
Git branching-modelGit branching-model
Git branching-model
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Getting Git Right
Getting Git RightGetting Git Right
Getting Git Right
 
HBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
 
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBaseHBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on FlashHBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache HBase on Flash
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
 
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARNHBaseCon 2015: DeathStar - Easy, Dynamic,  Multi-tenant HBase via YARN
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponHBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 

Similar to HBaseCon 2012 | HBase Filtering - Lars George, Cloudera

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsBryan Bende
 
Openfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentOpenfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentGeorgi Kodinov
 
Oracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version ControlOracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version ControlChris Muir
 
OUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source CodeOUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source CodeGeorgi Kodinov
 
44 Slides About 22 Modules
44 Slides About 22 Modules44 Slides About 22 Modules
44 Slides About 22 Modulesheyrocker
 
Oracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners GuideOracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners GuideCourtney Llamas
 
MySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMark Swarbrick
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
FOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component InfrastructureFOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component InfrastructureGeorgi Kodinov
 
Oracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners GuideOracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners GuideCourtney Llamas
 
The Power Boost of Atelier
The Power Boost of AtelierThe Power Boost of Atelier
The Power Boost of AtelierMichelle Stolwyk
 
(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for DevelopmentBIOVIA
 
Advance java session 17
Advance java session 17Advance java session 17
Advance java session 17Smita B Kumar
 
Extending ZF & Extending With ZF
Extending ZF & Extending With ZFExtending ZF & Extending With ZF
Extending ZF & Extending With ZFRalph Schindler
 
Intro to Apache Kafka
Intro to Apache KafkaIntro to Apache Kafka
Intro to Apache KafkaJason Hubbard
 
Provisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack ManagerProvisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack ManagerSimon Haslam
 
Apache - Mod-Rewrite
Apache - Mod-RewriteApache - Mod-Rewrite
Apache - Mod-RewriteMarakana Inc.
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated TestingMorgan Tocker
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Cloudera, Inc.
 

Similar to HBaseCon 2012 | HBase Filtering - Lars George, Cloudera (20)

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Openfest15 MySQL Plugin Development
Openfest15 MySQL Plugin DevelopmentOpenfest15 MySQL Plugin Development
Openfest15 MySQL Plugin Development
 
Oracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version ControlOracle ADF Architecture TV - Development - Version Control
Oracle ADF Architecture TV - Development - Version Control
 
OUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source CodeOUGLS 2016: Guided Tour On The MySQL Source Code
OUGLS 2016: Guided Tour On The MySQL Source Code
 
44 Slides About 22 Modules
44 Slides About 22 Modules44 Slides About 22 Modules
44 Slides About 22 Modules
 
Oracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners GuideOracle Enterprise Manager Security A Practitioners Guide
Oracle Enterprise Manager Security A Practitioners Guide
 
MySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL FabricMySQL London Tech Tour March 2015 - MySQL Fabric
MySQL London Tech Tour March 2015 - MySQL Fabric
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
FOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component InfrastructureFOSDEM19 MySQL Component Infrastructure
FOSDEM19 MySQL Component Infrastructure
 
Oracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners GuideOracle Enterprise Manager Security: A Practitioners Guide
Oracle Enterprise Manager Security: A Practitioners Guide
 
The Power Boost of Atelier
The Power Boost of AtelierThe Power Boost of Atelier
The Power Boost of Atelier
 
(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development
 
Advance java session 17
Advance java session 17Advance java session 17
Advance java session 17
 
Extending ZF & Extending With ZF
Extending ZF & Extending With ZFExtending ZF & Extending With ZF
Extending ZF & Extending With ZF
 
Intro to Apache Kafka
Intro to Apache KafkaIntro to Apache Kafka
Intro to Apache Kafka
 
Provisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack ManagerProvisioning with Oracle Cloud Stack Manager
Provisioning with Oracle Cloud Stack Manager
 
Apache - Mod-Rewrite
Apache - Mod-RewriteApache - Mod-Rewrite
Apache - Mod-Rewrite
 
Using MySQL in Automated Testing
Using MySQL in Automated TestingUsing MySQL in Automated Testing
Using MySQL in Automated Testing
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

HBaseCon 2012 | HBase Filtering - Lars George, Cloudera

  • 1. HBaseCon, May 2012 HBase Filters Lars George, Solutions Architect
  • 2. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 2 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 3. About Me •  Solutions Architect @ Cloudera •  Apache HBase & Whirr Committer •  Author of HBase – The Definitive Guide •  Working with HBase since end of 2007 •  Organizer of the Munich OpenHUG •  Speaker at Conferences (Fosdem, Hadoop World) 3 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 4. Introduction to Filters •  Used in combination with get() and scan() API calls •  Steps: –  Create Filter instance –  Create Get or Scan instance –  Assign Filter to Get or Scan –  Call API and enjoy •  More fine-grained control over what is returned to the client 4 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 5. Filter Features •  Allow client to further narrow down what is retrieved –  Not just per row or column key, or per column family •  Predicate Pushdown –  Move filtering from client to server to reduce network traffic •  Varying performance implications, dependent on the use-case 5 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 6. Filter Pushdown 6 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 7. Filter Features (cont.) •  Filters have access to the entire row to decide its fate –  Access to KeyValue instances to check row keys, column qualifiers, timestamps, or values •  Scan batching might conflict with the above and might trigger an “Incompatible Filter” exception –  Example: DependentColumnFilter •  There is no cross invocation state –  Cannot filter rows based on dependent rows 7 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 8. Available Filters •  Many filters are supplied by HBase –  Based on row key, column family, or column qualifier –  Paging through rows and columns –  Based on dependencies •  Write your own filters –  Use FilterBase class to get a no-op skeleton and fill in the gaps 8 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 9. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 9 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 10. Comparison Filters •  Based on CompareFilter class •  Adds the compare() method to FilterBase! •  Takes operator that defines how the comparison is performed –  Predefined by client API •  Also needs a comparator to do the actual check –  HBase supplies a large set 10 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 11. Comparison Operators 11 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 12. Comparators 12 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 13. Comparison Filters (cont.) •  Not all combinations of operator and comparator make sense –  For example, the SubstringComparator replies only 0 (match) and 1(no match) –  Only EQUAL and NOT_EQUAL are useful –  Using other operators is allowed but will most likely yield unexpected results 13 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 14. Comparison Filters (cont.) •  HBase filters are usually filtering data out •  Comparison filters work in reverse as they include matching data –  Be mindful when selecting the comparison operator! 14 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 15. Available Comparison Filters •  Row Filter –  Based on row keys comparisons •  Family Filter –  Based on column family names •  Qualifier Filter –  Based on column names, aka qualifiers •  Value Filter –  Based on the actual value of a column 15 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 16. Available Comparison Filters (cont.) •  Dependent Column Filter –  Based on a timestamp of a reference column –  Includes all columns that have the same timestamp –  Implies that the entire row is accessible, since batching will not have access to the reference column •  No scanner batching allowed! –  Useful for loading interdependent changes within a row 16 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 17. Example Code Scan scan = new Scan();
 scan.addColumn(Bytes.toBytes("colfam1"), ! Bytes.toBytes("col-0")); ! Filter filter = new RowFilter(! CompareFilter.CompareOp.LESS_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-22"))); scan.setFilter(filter);
 ResultScanner scanner = table.getScanner(scan); for (Result res : scanner) { ! System.out.println(res); ! } ! scanner.close(); ! ! 17 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 18. Example Ouput keyvalues={row-1/colfam1:col-0/1301043190260/Put/vlen=7} ! keyvalues={row-10/colfam1:col-0/1301043190908/Put/vlen=8} ! keyvalues={row-100/colfam1:col-0/1301043195275/Put/vlen=9} ! keyvalues={row-11/colfam1:col-0/1301043190982/Put/vlen=8} ! keyvalues={row-12/colfam1:col-0/1301043191040/Put/vlen=8} ! keyvalues={row-13/colfam1:col-0/1301043191172/Put/vlen=8} ! keyvalues={row-14/colfam1:col-0/1301043191318/Put/vlen=8} ! keyvalues={row-15/colfam1:col-0/1301043191429/Put/vlen=8} ! keyvalues={row-16/colfam1:col-0/1301043191509/Put/vlen=8} ! keyvalues={row-17/colfam1:col-0/1301043191593/Put/vlen=8} ! keyvalues={row-18/colfam1:col-0/1301043191673/Put/vlen=8} ! keyvalues={row-19/colfam1:col-0/1301043191771/Put/vlen=8} ! keyvalues={row-2/colfam1:col-0/1301043190346/Put/vlen=7} ! keyvalues={row-20/colfam1:col-0/1301043191841/Put/vlen=8} ! keyvalues={row-21/colfam1:col-0/1301043191933/Put/vlen=8} ! keyvalues={row-22/colfam1:col-0/1301043191998/Put/vlen=8} ! 18 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 19. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 19 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 20. Dedicated Filters •  Based directly on FilterBase class •  Often less useful for get() calls, since entire rows are filtered 20 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 21. Available Dedicated Filters •  Single Column Value Filter –  Filter rows based on one specific column –  Extra features •  “Filter if missing” •  “Get latest version only” –  Column must be part of the scan selection •  Or else it is all or nothing –  Also needs compare operation and an optional comparator 21 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 22. Available Dedicated Filters (cont.) •  Single Column Value Exclude Filter –  Same as the one before but excludes the selection column •  Prefix Filter –  Based on prefix of row keys –  Can early out the scan! •  Combine with start row for best performance 22 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 23. Available Dedicated Filters (cont.) •  Page Filter –  Allows pagination through rows –  Needs to be combined with setting the start row on subsequent scans –  Can early out the scan when limit is reached •  Key Only Filter –  Drop the value for every column •  First Key Only Filter –  Return only the first column key –  Useful for row counter, or get newest post type applications –  Can early out rest of row scan 23 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 24. Available Dedicated Filters (cont.) •  Inclusive Stop Filter –  As opposed to the exclusive stop row, this filter will include the final row •  Timestamp Filter –  Takes list of timestamps to include in result •  Column Count Get Filter –  Used to limit number of columns returned by a get() call 24 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 25. Available Dedicated Filters (cont.) •  Column Pagination Filter –  Allows to paginate through columns within a row –  Skips to offset parameter and returns limit columns •  Column Prefix Filter –  Analog to PrefixFilter, here for matching column qualifiers •  Random Row Filter 25 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 26. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 26 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 27. Decorating Filters •  Extend filters to gain additional control over the returned data •  Skip Filter –  Skip entire row when a column is filtered –  Not all filters are compatible •  While Match Filter –  Aborts entire scan once the wrapped filter indicates a row or column is omitted 27 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 28. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 28 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 29. Combining Filters •  Implemented by the FilterList class –  Wraps list of filters into a Filter compatible class –  Takes optional operator to decide how to handle the results of each wrapped filter (default: MUST_PASS_ALL) 29 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 30. Combining Filters •  Filter lists can contain other filter lists •  Operator is fixed per list, but hierarchy allows to create combinations •  Using the proper List implementation helps controlling filter execution order 30 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 31. List<Filter> filters = new ArrayList<Filter>();
 Filter filter1 = new RowFilter(! CompareFilter.CompareOp.GREATER_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-03"))); ! filters.add(filter1); ! Filter filter2 = new RowFilter(! CompareFilter.CompareOp.LESS_OR_EQUAL, ! new BinaryComparator(Bytes.toBytes("row-06"))); ! filters.add(filter2); ! Filter filter3 = new QualifierFilter(! CompareFilter.CompareOp.EQUAL, ! new RegexStringComparator("col-0[03]")); ! filters.add(filter3);! FilterList filterList1 = new FilterList(filters); ! …! FilterList filterList2 = new FilterList(FilterList.Operator.MUST_PASS_ONE, filters); ! 31 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 32. Agenda 1 Introduction 2 Comparison Filters 3 Dedicated Filters 4 Decorating Filters 5 Combining Filters 6 Custom Filters 32 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 33. Custom Filter •  Allows users to add missing filters •  Either implement Filter interface or use FilterBase skeleton •  Provides hooks called at different stages of the read process 33 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 34. Filter Interface public interface Filter extends Writable { ! public enum ReturnCode { ! INCLUDE, SKIP, NEXT_COL, NEXT_ROW,! SEEK_NEXT_USING_HINT } ! public void reset()! public boolean filterRowKey(byte[] buffer, ! int offset, int length) ! public boolean filterAllRemaining()! public ReturnCode filterKeyValue(KeyValue v)! public void filterRow(List<KeyValue> kvs)! public boolean hasFilterRow()! public boolean filterRow()! public KeyValue getNextKeyHint(KeyValue ! currentKV) ! ! 34 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 35. Filter Return Codes 35 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 36. Merge Reads 36 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 37. Filter Flow •  Filter hooks are called at different stages •  Seeks are done initially to find the next KeyValue –  Hint from previous filter invocation might help •  Early out checks improve performance 37 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 38. Example Code public class CustomFilter extends FilterBase{ ! private byte[] value = null; ! private boolean filterRow = true; ! public CustomFilter() { super(); }! public CustomFilter(byte[] value) { this.value = value; } ! @Override
 public void reset() { this.filterRow = true; } ! @Override ! public ReturnCode filterKeyValue(KeyValue kv) {! if (Bytes.compareTo(value, kv.getValue()) == 0) { ! filterRow = false; ! } ! return ReturnCode.INCLUDE; ! } ! @Override ! public boolean filterRow() { return filterRow; } ! ...! } ! ! 38 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 39. Deploying Custom Filters •  Need to provide JAR file with filter class •  Deploy JAR to RegionServers •  Add JAR to HBASE_CLASSPATH •  Restart RegionServers •  Tip: Testing on cluster more involved, test on local machine first 39 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 40. Summary 40 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 41. Summary (cont.) 41 ©2012 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.