1. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
Google Big Query
UDFs
User Defined Functions
@dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
2. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
What is it
Google Big Query’s UDF[1] are:
● Javascript based functions
● Extend the query to include “Map” like functionality (Mapreduce[2])
● Can return a different schema to the input schema
● Runs the function against every row in the input dataset.
See presentation[3] by Thomas Park[4] & Fillipe Hoffa[5]
[1] https://cloud.google.com/bigquery/user-defined-functions
[2] https://en.wikipedia.org/wiki/MapReduce
[3] http://www.slideshare.net/BigDataSpain/thomas-park-hands-on-with-big-query-javascript-udfs-bigdataspain-2014
[4] no link found
[5] https://twitter.com/felipehoffa
3. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
Where can I find this UDF?
Click here for
UDFs
4. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
Parts of a UDF
A UDF has 3 parts:
1. Emitter Function
○ A function that accepts a row and function (emit) object and runs the function on parts of the
row
2. Helper Function
○ This defines the a working function that can be called in the emitter function. Requires robust
error handling.
3. Registration Function
○ Links the emitter function, input / output schemas and labels together
5. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
Emitter Function
function urlDecode(row, emit) {
emit({title: functionHelper(row.title), requests: row.num_requests});
}
Function Name row: defined by the input column schema
row: defined by the output schema
Helper Function(s) Input to helper function Element from row
6. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
function functionHelper(s [,...]){
try {
[...]
return decodeURI(s);
} catch (ex) {
return s;
}
}
Helper Function
Function Name
Must include one term
Can add additional parameters if needed.
Error Handling
Very Important
Can add additional statements if needed.
Return a single term.
Return a term on failure too !
7. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
How do I test my code...
Google have provided a beautiful page to test out your functions:
http://storage.googleapis.com/bigquery-udf-test-tool/testtool.html
This allows you to enter data (JSON), functions (emitter and helper) and see its
output.
8. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
Silly example...
Input Data:
[{a: "Enter your query data", b:"a", c:"_"},
{a: "here, as JSON,", b:",", c:"*"},
{a: "and click 'Evaluate' to run your function over the data rows!", b:"!", c:"^"}]
User Defined Function:
function(r, emit) {
emit({len: r.a.length, fc: replaceHelper(r.a, r.b, r.c)});
}
function replaceHelper(s, t, r){
try {
return s.replace(t, r);
} catch (ex) {
return s;
}
}
9. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
What about the Registration Function?
Once the Emitter and Helper Functions have been tested, it is now time to apply them to an actual query...
bigquery.defineFunction(
'urlDecode',
['title', 'num_requests'],
[{name: 'title', type: 'string'},
{name: 'requests', type: 'integer'}],
urlDecode
);
Input column names
JSON Output
Schema
Internal Function Reference
Verb used to
call function
from SQL
10. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
So how can I use it?
Enter the 3 UDF parts in the UDF editor, and click the Query Editor
SELECT requests, title
FROM
urlDecode(
SELECT
title, sum(requests) AS num_requests
FROM
[fh-bigquery:wikipedia.pagecounts_201504]
WHERE language = 'fr'
GROUP EACH BY title
)
WHERE title LIKE '%ç%'
ORDER BY requests DESC
LIMIT 100
11. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
Break this down for me...
SELECT requests, title
FROM
urlDecode(
SELECT
title, sum(requests) AS num_requests
FROM
[fh-bigquery:wikipedia.pagecounts_201504]
WHERE language = 'fr'
GROUP EACH BY title
)
WHERE title LIKE '%ç%'
ORDER BY requests DESC
LIMIT 100
Field names defined in the
output schema
SQL Label for the
UDF
Input columns
SQL select statement each
row of this query will have the
UDF applied to them
12. David Gloyn-Cox : @dreffed : https://ca.linkedin.com/in/dreffededited date: 2015-09-10
Questions:
Any questions, please tweet me or +dreffed
Thank You