4. Confidential and Proprietary to Daugherty Business Solutions
What is the Interest Group?
We are a community of Saint Louis-
based data and technology professionals
interested in Trino (formerly PrestoSQL,) an
ANSI SQL-compliant query engine that works
with BI tools such as R, Tableau, Power BI,
Superset and others.
Our purpose is to explore the application of
Trino in the data ecosystem.
5. Confidential and Proprietary to Daugherty Business Solutions
What is our Purpose?
1. To explore the technology and its various uses in the modern
organizations’ tech stack
2. Discuss the innovative things happening in industry
3. Stay up-to-date with the forces that are shaping our digital
world
4. All the while, creating an environment primed for networking
across industries and business functions.
6. Confidential and Proprietary to Daugherty Business Solutions
What are our Goals?
What are our Goals?
1. Connect to current Trino users
2. Connect to people interested in
becoming users of Trino
3. Connect with people interested
in just learning about Trino
4. Create a network of Trino users
How will we Accomplish These
Goals?
1. Get to Know Trino
2. Trino on Ice
3. Hands on workshop
4. Success stories
7. Confidential and Proprietary to Daugherty Business Solutions
Daugherty Business Solutions
For nearly 40 years, Daugherty has been a go-to
partner and preferred strategic advisor to large
corporations. Our practical and flexible approach,
paired with proprietary tools, techniques, and
custom frameworks allow us to tailor solutions to
help our customers achieve better business
outcomes.
8. Confidential and Proprietary to Daugherty Business Solutions
Starburst
Starburst provides a modern solution that addresses
data silo & speed of access problems. Starburst
helps companies harness the value of open-source
Trino, the fastest distributed analytics engine
available today, by adding the connectors, security,
and 24×7 support that meet the needs for fast data
access at scale.
9. R E P R E S E N TAT I V E C L I E N T S & E X P E R I E N C E
R E L E VA N T S K I L L S &
R E C O G N I T I O N
Confidential and Proprietary to Daugherty Business Solutions
Dan Acheson, Senior Consultant
Dan is an information technology and business expert with an MBA
focused in Business Intelligence from Webster University in Saint
Louis, and certified Associate Cloud Engineer on Google Cloud
Platform by Google. Dan is passionate about working in the space
where Data and Business meet.
• Business
Analysis and
Strategy
• Financial
Analysis
• Econometrics
• Implementation
• Dashboard
Development
• ETL
• Team
Leadership
10. Confidential and Proprietary to Daugherty Business Solutions
Representative Clients & Experience Education
Training, Certifications, Relevant Skills
Matthew Boyett, Senior Consultant
10
An exceptional analyst with experience in data analytics and business-IT
communication. Proficient in database utilization, capable in design,
exceptionally strong in communicating data details with business colleagues,
meticulous in analyzing data and code, articulate in navigating challenges and
well equipped to provide solutions.
• Data, BA roles (16 yrs)
• Sr Consultant
• DBS 6 yrs
• Bayer Crop Science Data
Governance
• Father of 2
13. Confidential and Proprietary to Daugherty Business Solutions
History of Trino
Hive made public
by Facebook
2008 2009 2010
14. Confidential and Proprietary to Daugherty Business Solutions
History of Trino
2011
Presto is conceived.
Hive at Facebook is
finding limitations:
250 PB,
1,000’s of queries,
100’s of daily users
2012
Hive made public
by Facebook
2008 2009 2010
15. Confidential and Proprietary to Daugherty Business Solutions
History of Trino
2011
Presto is conceived.
Hive at Facebook is
finding limitations:
250 PB,
1,000’s of queries,
100’s of daily users
2012
Presto is
offered as open source.
Netflix, LinkedIn,
Treasure Data
begin utilizing it
2013
2009 2010
16. Confidential and Proprietary to Daugherty Business Solutions
History of Trino
2011
Presto is conceived.
Hive at Facebook is
finding limitations:
250 PB,
1,000’s of queries,
100’s of daily users
2012
Presto is
offered as open source.
Netflix, LinkedIn,
Treasure Data
begin utilizing it
2013 2014
Teradata contributes
20 engineers for
security and
integration.
Amazon adds Presto to
AWS Elastic
MapReduce
2015
17. Confidential and Proprietary to Daugherty Business Solutions
History of Trino
Amazon
creates
Athena
from
Presto
2016
Presto is conceived.
Hive at Facebook is
finding limitations:
250 PB,
1,000’s of queries,
100’s of daily users
2012
Presto is
offered as open source.
Netflix, LinkedIn,
Treasure Data
begin utilizing it
2013 2014
Teradata contributes
20 engineers for
security and
integration.
Amazon adds Presto to
AWS Elastic
MapReduce
2015
18. Confidential and Proprietary to Daugherty Business Solutions
History of Trino
Amazon
creates
Athena
from
Presto
2016
Starburst
created;
a company
dedicated to
the success
of Presto
2017
Presto is
offered as open source.
Netflix, LinkedIn,
Treasure Data
begin utilizing it
2013 2014
Teradata contributes
20 engineers for
security and
integration.
Amazon adds Presto to
AWS Elastic
MapReduce
2015
19. Confidential and Proprietary to Daugherty Business Solutions
History of Trino
Amazon
creates
Athena
from
Presto
2016
Starburst
created;
a company
dedicated to
the success
of Presto
2017
Presto Software
Foundation started;
PrestoSQL codebase:
goal is to remain
collaborative and
independent
2018
2014
Teradata contributes
20 engineers for
security and
integration.
Amazon adds Presto to
AWS Elastic
MapReduce
2015
20. Confidential and Proprietary to Daugherty Business Solutions
History of Trino
Amazon
creates
Athena
from
Presto
2016
Starburst
created;
a company
dedicated to
the success
of Presto
2017
Presto Software
Foundation started;
PrestoSQL codebase:
goal is to remain
collaborative and
independent
2018
Trino
PrestoSQL
(because PrestoDB)
Commander
Bun Bun
is born
2019 2020
21. Confidential and Proprietary to Daugherty Business Solutions
History of Trino
Trino
PrestoSQL
(because PrestoDB)
Commander
Bun Bun
is born
2020 2022 2023
First Trino
User Group
in St Louis
2024
2021
23. Confidential and Proprietary to Daugherty Business Solutions 26
Trino and Hadoop Hive
• Trino Optimizes Hive
• Hive gives an SQL-like interface to query data
stored in various databases and file systems that
integrate with Hadoop
• This enables MapReduce batch processing
• Trino behaves like Hive, but it does not need to
write intermediate operations to disk to achieve
fault tolerance
• Trino and the Trino Hive connector do not use the
Hive runtime. Trino is a high-performance
replacement for it and is suitable for running
interactive queries. It works directly with the files,
rather than using the Hive runtime and execution
engine.
24. Confidential and Proprietary to Daugherty Business Solutions
MapReduce and Distributed Compute
• Perhaps the most common use case
of Trino is to leverage the Hive
connector to read data from
distributed storage such as HDFS or
cloud storage.
• The Hive connector for Trino allows
you to connect to an HDFS object
storage cluster. It leverages the
metadata in HMS (Hive Metastore
Service) and queries and processes
the data stored in HDFS.
• Disk writes are not required to store
the intermediate process
• This reduces latency and
increases performance
26. Confidential and Proprietary to Daugherty Business Solutions
What is Trino?
Open-Source Model
Trino is open source and is overseen by the Trino Software
Foundation. This foundation is a community of contributors
that are constantly researching and upgrading the technology
and publishing their works. This facilitates many things for
the platform including continuous innovation and helps to
drive adoption of the platform
27. Confidential and Proprietary to Daugherty Business Solutions
Use Case Example
Web App
• Trino can exist within an application as its
data processing engine
• Nodes can be scaled to suit the requests and
workload of the client.
• Easily containerized and deployed into an
application
28. Confidential and Proprietary to Daugherty Business Solutions
Use Case Example
E-commerce website + Analytics
• Trino integration as central query engine to
touch all disparate data sources.
• Unified analytics: Trino can be used by
analysts to run queries without the
need for separate interfaces.
• The data is stored in multiple systems like
MySQL for transactional data, and MongoDB
for user profiles
• Apache Kafka or Cloud Pub/Sub is used as
the messaging bus to coordinate order
receipts, inventory updates, and shipments.
29. Confidential and Proprietary to Daugherty Business Solutions
Demo
• Deploying Trino via Docker
• Use SQL from the command line interface to query data
30. Confidential and Proprietary to Daugherty Business Solutions
Limitations
• Barebones SQL Engine
• You must manage logging and
monitoring
• You must manage security
• Any new connections must be created by
you
• You must scale nodes and provision
compute based on your needs. Not a fully
managed solution
31. Confidential and Proprietary to Daugherty Business Solutions
Starburst
• Provides a fully managed wrapper for Trino
• Abstracts a lot of the work required by DevOps
teams to provision the compute and load balancing
needed to manage a cluster
• Accessing Siloed Data at Scale fast
• Provides security for fine grain access control
• Follows role-based access control mechanisms
• Groups
• Roles
• Privileges
• Objects
• Compared to Hive and Impala, performance is 10x
faster
32. Confidential and Proprietary to Daugherty Business Solutions
What is Trino?
• Not a database
• Distributed Query
Processing Engine
• Open Source
• Supports Many Data
Types
• Semi-Structured
• Structured
• Blob
• ANSII SQL
• Enables More complex ETL
• Map Reduce
• Utilizes Manager
and Worker Nodes
for Massively
Parallel Processing
of Requests (MPP)
• Highly Scalable
33. Confidential and Proprietary to Daugherty Business Solutions
Thank you
• Thank you for attending
• Thanks to Daugherty for the space
• Thanks to Starburst for the support
• Mostly, thanks to you for attending
35. Confidential and Proprietary to Daugherty Business Solutions
What’s on the horizon? *
• Next event
– April 23rd
– Trino on Ice
– Guest from Starburst
• Trino Community Broadcast
– Thursday, Feb 22, 2024
– Thursday, Mar 14, 2024
– Watch the YouTube channel at the
time of this event to see the show live:
https://www.youtube.com/c/TrinoDB
– Podcast: Trino Community Broadcast
(alternative to YouTube)
* find at https://trino.io find “Community” and then “Events”