The document describes Apache Hive hooks, which allow intercepting function calls or events during query execution in Hive. It provides details on the different hook points in Hive, including pre-execution, post-execution, and failure hooks. It also explains how to configure hooks by setting hook properties and the jar paths for hook implementations. Finally, it outlines the interfaces and contexts provided to hooks at each stage of query processing in Hive.
2. Apache Hive Hook
• The reason why I made this is that Ryan asked me about
hive hook, but I couldn’t find any info about hook in hive
wiki.
• I hope this will be helpful to develop applications using Hive
when you want to get extra info while executing a query on
Hive.
• This document was written based on release-0.11 tag
• Source:
- https://github.com/apache/hive (mirror of apache hive)
3. What is a hook?
• As you know, this is about computer programming technique,
but ..
• Hooking
- Techniques for intercepting function calls or
messages or events in an operating system, applications,
and other software components.
• Hook
- Code that handles intercepted function calls, events or
messages
4. Hive provides some hooking
points
• pre-execution
• post-execution
• execution-failure
• pre- and post-driver-run
• pre- and post-semantic-analyze
• metastore-initialize
5. How to set up hooks in Hive
<property>
<name>hive.exec.pre.hooks</name>
<value></value>
<description>
Comma-separated list of pre-execution hooks to be invoked for each statement.
A pre-execution hook is specified as the name of a Java class which implements
the org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.
</description>
</property>
hive-site.xml
<property>
<name>hive.aux.jars.path</name>
<value></value>
</property>
Setting hook property
Setting path of jars contains implementations of hook interfaces or abstract class
You can use hive.added.jars.path instead of hive.aux.jars.path
6. Hive hook properties and interfaces
Property Interface or Abstract class
hive.exec.pre.hooks
org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext
(PreExecute is deprecated)
hive.exec.post.hooks
org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext
(PostExecute is deprecated)
hive.exec.failure.hooks org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext
hive.metastore.init.hooks org.apache.hadoop.hive.metastore.MetaStoreInitListener
hive.exec.driver.run.hooks org.apache.hadoop.hive.ql.HiveDriverRunHook
hive.semantic.analyzer.hook org.apache.hadoop.hive.ql.parse.AbstractSemanticAnalyzerHook
7. When those hooks fire?
• You can submit a query on Hive through the
following entry points
- CLIDriver main method (called by shell script)
- HCatCli main method (called by shell script)
- HiveServer (called by thrift client)
- HiveServer2 (called by thrift client or beeline)
25. HiveServer2.main() ➔ HiveServer2.start()
➔ CLIService.start() ➔ new HiveMetaStoreClient() ➠
➔ HiveSession.getMetaStoreClient()
➔ new HiveMetaStoreClient() ➠
CLIService.executeStatement()
⇒
SemanticAnalyzer ↝ Hive ↝ getMSC() is invoked by many other methods in Hive object
Hive.getMSC() ➔ Hive.createMetaStoreClient() ➔ RetryingHMSHandler.getProxy() ➠
GetColumnsOperation.run()
GetSchemasOperation.run()
GetTablesOperation.run()
26. HiveServer2.main() ➔ HiveServer2.start()
➔ CLIService.start() ➔ new HiveMetaStoreClient() ➠
➔ HiveSession.getMetaStoreClient()
➔ new HiveMetaStoreClient() ➠
➠ new HiveMetaStoreClient()
➔ HiveMetaStore.newHMSHandler()
➔ RetryingHMSHandler.getProxy()
➔ new RetryingHMSHandler()
➔ new HMSHandler() ➔ HMSHandler.init()
➔ HiveMetaStore.init()
CLIService.executeStatement()
⇒
MATASTORE-INIT
SemanticAnalyzer ↝ Hive ↝ getMSC() is invoked by many other methods in Hive object
Hive.getMSC() ➔ Hive.createMetaStoreClient() ➔ RetryingHMSHandler.getProxy() ➠
GetColumnsOperation.run()
GetSchemasOperation.run()
GetTablesOperation.run()
27. How Hive executes hooks
List<HiveDriverRunHook> driverRunHooks;
try {
driverRunHooks = getHooks(HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS,
HiveDriverRunHook.class);
for (HiveDriverRunHook driverRunHook : driverRunHooks) {
driverRunHook.preDriverRun(hookContext);
}
} catch (Exception e) {
• Hive executes multiple hooks on each hook points.
ex. Driver.runInternal()
28. 1. MetaStoreInitListener
public abstract class MetaStoreInitListener implements Configurable {
private Configuration conf;
public MetaStoreInitListener(Configuration config){
this.conf = config;
}
public abstract void onInit(MetaStoreInitContext context) throws MetaException;
@Override
public Configuration getConf() {
return this.conf;
}
@Override
public void setConf(Configuration config) {
this.conf = config;
}
}
29. 1. MetaStoreInitListener
public abstract class MetaStoreInitListener implements Configurable {
private Configuration conf;
public MetaStoreInitListener(Configuration config){
this.conf = config;
}
public abstract void onInit(MetaStoreInitContext context) throws MetaException;
@Override
public Configuration getConf() {
return this.conf;
}
@Override
public void setConf(Configuration config) {
this.conf = config;
}
}
30. What MetaStoreInitContext got
• has Nothing!
- This hook just alarms you when metastore initialize.
(but you, of course, can get HiveConf by calling getConf())
public class MetaStoreInitContext {
}
31. 2. HiveDriverRunHook
• preDriverRun
- Invoked before Hive begins any processing of a command in the Driver,
before compilation
• postDriverRun
- Invoked after Hive performs any processing of a command,
just before a response is returned to the entity calling the Driver.run()
public interface HiveDriverRunHook extends Hook {
public void preDriverRun(
HiveDriverRunHookContext hookContext) throws Exception;
public void postDriverRun(
HiveDriverRunHookContext hookContext) throws Exception;
}
32. What
HiveDriverRunHookContext got
• You can get command string from this hook context.
- This is the only thing that HiveDriverRunHookContext has.
public interface HiveDriverRunHookContext extends Configurable{
public String getCommand();
public void setCommand(String command);
}
33. 3.AbstractSemanticAnalyzerHook
• You can get
- HiveSemanticAnalyzerHookContext and ASTNode (Root node of
abstract syntax tree) before analyze.
- HiveSemanticAnalyzerHookContext and List<Task> after analyze.
public abstract class AbstractSemanticAnalyzerHook implements
HiveSemanticAnalyzerHook {
public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext
context,ASTNode ast)
throws SemanticException {
return ast;
}
public void postAnalyze(HiveSemanticAnalyzerHookContext context,
List<Task<? extends Serializable>> rootTasks) throws
SemanticException {
}
}
34. What
HiveSemanticAnalyzerHookContext got
• Hive Object
- contains information about a set of data in HDFS organized for query
processing. (from comment)
• ReadEntity, WriteEntity
• update method will be invoked after the semantic analyzer completes.
public interface HiveSemanticAnalyzerHookContext extends Configurable{
public Hive getHive() throws HiveException;
public void update(BaseSemanticAnalyzer sem);
public Set<ReadEntity> getInputs();
public Set<WriteEntity> getOutputs();
}
35. How Hive executes analyzer
hooks
List<AbstractSemanticAnalyzerHook> saHooks =
getHooks(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK, AbstractSemanticAnalyzerHook.class);
// ~ ellipsis ~
HiveSemanticAnalyzerHookContext hookCtx = new HiveSemanticAnalyzerHookContextImpl();
hookCtx.setConf(conf);
for (AbstractSemanticAnalyzerHook hook : saHooks) {
tree = hook.preAnalyze(hookCtx, tree);
}
sem.analyze(tree, ctx);
hookCtx.update(sem);
for (AbstractSemanticAnalyzerHook hook : saHooks) {
hook.postAnalyze(hookCtx, sem.getRootTasks());
}
36. How Hive executes analyzer
hooks
List<AbstractSemanticAnalyzerHook> saHooks =
getHooks(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK, AbstractSemanticAnalyzerHook.class);
// ~ ellipsis ~
HiveSemanticAnalyzerHookContext hookCtx = new HiveSemanticAnalyzerHookContextImpl();
hookCtx.setConf(conf);
for (AbstractSemanticAnalyzerHook hook : saHooks) {
tree = hook.preAnalyze(hookCtx, tree);
}
sem.analyze(tree, ctx);
hookCtx.update(sem);
for (AbstractSemanticAnalyzerHook hook : saHooks) {
hook.postAnalyze(hookCtx, sem.getRootTasks());
}
37. How Hive executes analyzer
hooks
List<AbstractSemanticAnalyzerHook> saHooks =
getHooks(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK, AbstractSemanticAnalyzerHook.class);
// ~ ellipsis ~
HiveSemanticAnalyzerHookContext hookCtx = new HiveSemanticAnalyzerHookContextImpl();
hookCtx.setConf(conf);
for (AbstractSemanticAnalyzerHook hook : saHooks) {
tree = hook.preAnalyze(hookCtx, tree);
}
sem.analyze(tree, ctx);
hookCtx.update(sem);
for (AbstractSemanticAnalyzerHook hook : saHooks) {
hook.postAnalyze(hookCtx, sem.getRootTasks());
}
38. How Hive executes analyzer
hooks
List<AbstractSemanticAnalyzerHook> saHooks =
getHooks(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK, AbstractSemanticAnalyzerHook.class);
// ~ ellipsis ~
HiveSemanticAnalyzerHookContext hookCtx = new HiveSemanticAnalyzerHookContextImpl();
hookCtx.setConf(conf);
for (AbstractSemanticAnalyzerHook hook : saHooks) {
tree = hook.preAnalyze(hookCtx, tree);
}
sem.analyze(tree, ctx);
hookCtx.update(sem);
for (AbstractSemanticAnalyzerHook hook : saHooks) {
hook.postAnalyze(hookCtx, sem.getRootTasks());
}
39. How Hive executes analyzer
hooks
List<AbstractSemanticAnalyzerHook> saHooks =
getHooks(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK, AbstractSemanticAnalyzerHook.class);
// ~ ellipsis ~
HiveSemanticAnalyzerHookContext hookCtx = new HiveSemanticAnalyzerHookContextImpl();
hookCtx.setConf(conf);
for (AbstractSemanticAnalyzerHook hook : saHooks) {
tree = hook.preAnalyze(hookCtx, tree);
}
sem.analyze(tree, ctx);
hookCtx.update(sem);
for (AbstractSemanticAnalyzerHook hook : saHooks) {
hook.postAnalyze(hookCtx, sem.getRootTasks());
}
40. 4. ExecuteWithHookContext
• Can be used in the followings
- hive.exec.pre.hooks
- hive.exec.post.hooks
- hive.exec.failure.hooks
public interface ExecuteWithHookContext extends Hook {
/**
*
* @param hookContext
* The hook context passed to each hooks.
* @throws Exception
*/
void run(HookContext hookContext) throws Exception;
}
42. How Hive fires hooks without
executing query physically
• This has the effect of causing the pre/post execute hooks to fire.
ALTER TABLE table_name TOUCH [PARTITION partitionSpec];
43. MetaStore Event Listeners
Property Abstract Class
hive.metastore.pre.event.listeners MetaStorePreEventListener
hive.metastore.end.function.listeners MetaStoreEndFunctionListener
hive.metastore.event.listeners MetaStoreEventListener
package : org.apache.hadoop.hive.metastore
• I think those listeners look like hooks.
• I couldn’t find any particular differences between listeners and hooks while just taking a look.
The only thing I found is that listeners can’t affect query processing. It can only read.
• Anyway, it looks useful to let you know when a metastore do something.
44. MetaStoreEventListener
• The followings will be performed when a particular event occurs on a
metastore.
- onCreateTable
- onDropTable
- onAlterTable
- onDropPartition
- onAlterPartition
- onCreateDatabase
- onDropDatabase
- onLoadPartitionDone
If you need more details, see org.apache.hadoop.hive.metastore.MetaStoreEventListener
45. Be careful!
• Hooks
- can be a critical failure point!
(you should better catch runtime exceptions)
- are preformed synchronously.
- can affect query processing time.
46. Let's try it out
• Demo
- Don’t be surprised if it doesn’t work.
- That’s the way the demo is...