Tempto is a product test framework that allows developers to write and execute tests for SQL databases running on Hadoop. Individual test requirements such as data generation, HDFS file copy/storage of generated data and schema creation are expressed declaratively and are automatically fulfilled by the framework. Developers can write tests using Java (using a TestNG like paradigm and AssertJ style assertion) or by providing query files with expected results. We will show how we use it for presto product tests.
Benchto is a benchmark framework that provides an easy and manageable way to define, run and analyze macro benchmarks in clustered environment. Understanding behavior of distributed systems is hard and requires good visibility intostate of the cluster and internals of tested system. This project was developed for repeatable benchmarking ofHadoop SQL engines, most importantly Presto.
4. What is Tempto?
β End-to-end product testing framework
β Targeted to software engineers
β For automation
β Tests easy to define
β Focus on test code
β Focus on database systems
β So far used for testing
β Presto
β internal projects
5. How is test defined?
β Java
β SQL convention based
6. Example β Java based test
public class SimpleQueryTest extends ProductTest {
private static class SimpleTestRequirements implements RequirementsProvider{
public Requirement getRequirements(Configuration config) {
return new ImmutableHiveTableRequirement(NATION);
}
}
@Inject
Configuration configuration;
@Test(groups = {"smoke", "query"})
@Requires(SimpleTestRequirements.class)
public void selectCountFromNation()
{
assertThat(query("select count(*) from nation"))
.hasRowsCount(1)
.hasRows(row(25));
}
}
7. Example β Convention based test
allRows.sql:
-- database: hive; tables: blah
SELECT * FROM sample_table
allRows.result:
-- delimiter: |; ignoreOrder: false; types: BIGINT,VARCHAR
1|A|
2|B|
3|C|
14. Executable runner
java -jar target/presto-product-tests-0.120-SNAPSHOT-executable.jar --help
usage: Presto product tests
--config-local <arg> URI to Test local configuration YAML file.
--report-dir <arg> Test reports directory
--groups <arg> Test groups to be run
--excluded-groups <arg> Test groups to be excluded
--tests <arg> Test patterns to be included
-h,--help Shows help message
β All dependencies embedded
β User provides cluster details through yaml config.
17. Goals
β Easy and manageable way to define benchmarks
β Run and analyze macro benchmarks in clustered environment
β Repeatable benchmarking of Hadoop SQL engines, most importantly Presto
β also used for Hive, Teradata components
β Transparent, trusted framework for benchmarking
28. Defining benchmarks - descriptor
β Descriptor is YAML configuration file with various properties and user defined
variables
$ cat benchmarks/presto/concurrency.yaml
datasource: presto
query-names: presto/linear-scan/selectivity-${selectivity}.sql
schema: tpch_100gb_orc
database: hive
concurrency: ${concurrency_level}
runs: ${concurrency_level}
prewarm-runs: 3
before-benchmark: drop-caches
variables:
1:
selectivity: 10, 100
concurrency_level: 10
2:
selectivity: 10, 100
concurrency_level: 20
3:
selectivity: 10, 100
concurrency_level: 50
29. Defining benchmarks β SQL file templating
β SQL files can use keys defined in YAML configuration file β templates are
based on FreeMarker
$ cat sql/presto/tpch/q14.sql
SELECT 100.00 * sum(CASE
WHEN p.type LIKE 'PROMO%'
THEN l.extendedprice * (1 - l.discount)
ELSE 0
END) / sum(l.extendedprice * (1 - l.discount)) AS promo_revenue
FROM
"${database}"."${schema}"."lineitem" AS l,
"${database}"."${schema}"."part" AS p
WHERE
l.partkey = p.partkey
AND l.shipdate >= DATE '1995-09-01'
AND l.shipdate < DATE '1995-09-01' + INTERVAL '1' MONTH
30. Future work
β (Tempto) Support for complex concurrent tests execution
β (Benchto) Automatic regression detection
β (Benchto) Customized dashboards (e.g. overall performance analysis)
β (Benchto) Hardware and configuration awarness
β (Benchto) More complex benchmarking scenarios
β (Benchto) Support for complex concurrency scenarios
β (Benchto) Scheduling mechanism
32. Benchto GUI
β Visualization of benchmarks results
β Linking between tools (Grafana, Presto UI)
β Comparison of multiple benchmarks
33. Grafana monitoring
β We use Grafana dashboard with Graphite
β Benchmark/executions life-cycle events are showed on dashboards
β Provides good visibility into state of the cluster