SlideShare a Scribd company logo
1 of 108
Paco Nathan
bit.ly/pxnnews
@pacoid
“Hadoop and Beyond”
1
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
Failure
Traps
bonus
allocation
employee
PMML
classifier
quarterly
sales
Join
Count
leads
Contents:
1. Conceptual Map
2. Design Patterns
2
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
3
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoop
First
Principles
BeyondHadoop
4
First Principles
we are taught to think of computing resources
in terms of Von Neumann architecture
in other words, we characterize the computing
resources by CPU, RAM, I/O
5
First Principles
we are taught to think of computing resources
in terms of Von Neumann architecture
in other words, we characterize the computing
resources by CPU, RAM, I/O
CPU
6
First Principles
we are taught to think of computing resources
in terms of Von Neumann architecture
in other words, we characterize the computing
resources by CPU, RAM, I/O
RAM
7
First Principles
we are taught to think of computing resources
in terms of Von Neumann architecture
in other words, we characterize the computing
resources by CPU, RAM, I/O
I/O
8
First Principles
back in the day, all the tables required for a
given database could fit onto one computer,
with one memory space, and one file space
9
First Principles
back in the day, all the tables required for a
given database could fit onto one computer,
with one memory space, and one file space
• okay, maybe the CPU was multi-core…
• okay, maybe RAM paged out to virtual memory…
• okay, maybe the disks were in a RAID config…
10
First Principles
back in the day, all the tables required for a
given database could fit onto one computer,
with one memory space, and one file space
• okay, maybe the CPU was multi-core…
• okay, maybe RAM paged out to virtual memory…
• okay, maybe the disks were in a RAID config…
or there were extra caches, or separate busses, etc.
but essentially those were incremental extensions
to aVon Neumann architecture…
11
First Principles
back in the day, all the tables required for a
given database could fit onto one computer,
with one memory space, and one file space
• okay, maybe the CPU was multi-core…
• okay, maybe RAM paged out to virtual memory…
• okay, maybe the disks were in a RAID config…
or there were extra caches, or separate busses, etc.
but essentially those were incremental extensions
to aVon Neumann architecture…
a machine created in his image, if you will
NB: credit should go to Eckert and Mauchly, inventors of the ENIAC
12
First Principles
a generation of computer scientists has been
taught to think “relational” – data on a DB server
RDBMS made sense, with their indexes, b-trees,
normal forms, etc.
Q: need to query bigger data?
A: simple, buy or lease a bigger DB server
13
First Principles
a generation of computer scientists has been
taught to think “relational” – data on a DB server
RDBMS made sense, with their indexes, b-trees,
normal forms, etc.
Q: need to query bigger data?
A: simple, buy or lease a bigger DB server
however, that all changed…
some of the issues encountered in large-scale
data teams are, to put it politely, obscure
starting from first principles, let’s explore a
map of some important points to consider
14
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
15
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop Topologies
16
Topologies
largely due to the rapid rise of machine data, circa late 1990s,
we use distributed systems
because the data won’t fit on one computer anymore
AMZN, EBAY,YHOO, GOOG leveraged horizontal scale-out,
based on commodity hardware
practices at LinkedIn,Apple, Facebook,Twitter, etc., followed
from those early successes
algorithmic modeling, applied to the aggregation of machine
data, allowed for Big Data to become monetized
a feedback loop evolved – refining aggregate social interactions
into data products, which in turn made web apps become
more intelligent
17
RDBMS
SQL Query
result sets
recommenders
+
classifiers
Web Apps
customer
transactions
Algorithmic
Modeling
Logs
event
history
aggregation
dashboards
Product
Engineering
UX
Stakeholder Customers
DW ETL
Middleware
servletsmodels
Circa 2001: post- big e-commerce successes
18
RDBMS
SQL Query
result sets
recommenders
+
classifiers
Web Apps
customer
transactions
Algorithmic
Modeling
Logs
event
history
aggregation
dashboards
Product
Engineering
UX
Stakeholder Customers
DW ETL
Middleware
servletsmodels
Circa 2001: post- big e-commerce successes
“data products”
19
Topologies
Hadoop and other topologies arose from a need for fault-
tolerant workloads, leveraging horizontal scale-out based
on commodity hardware
because the data won’t fit on one computer anymore
a variety of Big Data technologies has since emerged,
which can be categorized in terms of topologies and
the CAP Theorem
20
Apache
Wikipedia
Hadoop, as a topology
components which implement MapReduce:
• name node / data node
• job tracker / task tracker
• submit queue
• task slots
• distributed cache
• HDFS
21
Some Other Topologies…
Spark (iterative/interactive)
Titan (graph database)
Redis (in-memory data grid)
Zookeeper (distributed metadata)
HBase (columnar data objects)
Cassandra (key-value store)
Storm (real-time streams)
ElasticSearch (search index)
MongoDB (document store)
Greenplum (MPP)
SciDB (array database)
22
CAP Theorem
“You can have at most two of these properties for any shared-data
  system… the choice of which feature to discard determines the
  nature of your system.” – Eric Brewer, 2000 (Inktomi/YHOO)
C A
P
strong
consistency
high
availability
partition
tolerance
eventual
consistency
cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
julianbrowne.com/article/viewer/brewers-cap-theorem
23
financial transactions general ledger in RDBMS C A x
ad-hoc queries RDS (hosted MySQL) C A x
reporting, dashboards like Pentaho C A x
log rotation/persistence like Riak, Cassandra x x P
search indexes like ElasticSearch, Solr x A P
static content, archives S3 (durable storage) x A P
key/value data objects like HBase C x P
data prep, ETL, modeling at scale like Hadoop/Cascading C x P
graph queries like Titan C x P
Access → Frameworks → CAP Theorem Forfeits
24
financial transactions general ledger in RDBMS C A x
ad-hoc queries RDS (hosted MySQL) C A x
reporting, dashboards like Pentaho C A x
log rotation/persistence like Riak, Cassandra x x P
search indexes like ElasticSearch, Solr x A P
static content, archives S3 (durable storage) x A P
key/value data objects like HBase C x P
data prep, ETL, modeling at scale like Hadoop/Cascading C x P
graph queries like Titan C x P
Access → Frameworks → CAP Theorem Forfeits
25
Workflow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
Data Products Customers
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
Use Cases Across Topologies
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere
26
Workflow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
Data Products Customers
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
Use Cases Across Topologies
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere
“optimize topologies”
27
Amazon
“Early Amazon: Splitting the website” – Greg Linden
glinden.blogspot.com/2006/02/early-amazon-splitting-website.html
eBay
“The eBay Architecture” – Randy Shoup, Dan Pritchett
addsimplicity.com/adding_simplicity_an_engi/2006/11/you_scaled_your.html
addsimplicity.com.nyud.net:8080/downloads/eBaySDForum2006-11-29.pdf
Inktomi (YHOO Search)
“Inktomi’s Wild Ride” – Erik Brewer (0:05:31 ff)
youtu.be/E91oEn1bnXM
Google
“Underneath the Covers at Google” – Jeff Dean (0:06:54 ff)
youtu.be/qsan-GQaeyk
perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx
MIT Media Lab
“Social Information Filtering for Music Recommendation” – Pattie Maes
pubs.media.mit.edu/pubs/papers/32paper.ps
ted.com/speakers/pattie_maes.html
In their own words…
28
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
29
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop Modeling
30
Modeling
back in the day, we worked with practices based on
data modeling
1. sample the data
2. fit the sample to a known distribution
3. ignore the rest of the data
4. infer, based on that fitted distribution
that served well with ONE computer, ONE analyst,
ONE model… just throw away annoying “extra” data
circa late 1990s: machine data, aggregation, clusters, etc.
algorithmic modeling displaced data modeling
because the data won’t fit on one computer anymore
31
Two Cultures
“A new research community using these tools sprang up.Their goal
was predictive accuracy.The community consisted of young computer
scientists, physicists and engineers plus a few aging statisticians.
They began using the new tools in working on complex prediction
problems where it was obvious that data models were not applicable:
speech recognition, image recognition, nonlinear time series prediction,
handwriting recognition, prediction in financial markets.”
Statistical Modeling: TheTwo Cultures
Leo Breiman, 2001
bit.ly/eUTh9L
in other words, seeing the forest for the trees…
this paper chronicled a sea change from data modeling practices
(silos, manual process) to the rising use of algorithmic modeling
(machine data for automation/optimization)
32
Algorithmic Modeling
“The trick to being a scientist is to be open to using
a wide variety of tools.” – Breiman
circa 2001: Random Forest, bootstrap aggregation, etc.,
yield dramatic increases in predictive power over earlier
modeling such as Logistic Regression
major learnings from the Netflix Prize: the power of
ensembles, model chaining, etc.
the problems at hand have become simply too big and too
complex for ONE distribution, ONE model, ONE team…
stanford.edu/~lmackey/papers/netflix_story-nas11-slides.pdf
33
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
34
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop Attention
35
Attention
impromptu survey:
• how many people say they practice some kind of “Agile” process at work?
• how many people say that they DON’T practice “Agile” ?
• how many people say they are in a lean startup?
Q:
with respect to Big Data practices,
how is that working out?
Abby Fichtner vimeo.com/27797408
36
Agile Data?
some people see a reconciliation of Agile process and Big Data…
Agile Data
Russell Jurney, 2013
amazon.com/dp/1449326269
“Run like a studio, not an assembly line.”
37
Perhaps Not
great values, wrong domain…
that worked when we were building features in web apps
Agile represents industrialization of software engineering,
codifying social interactions, compartmentalizing attention
meanwhile, Data Science is inherently multi-disciplinary:
• teams of people with complementary skill sets
• actionable insights require weeks/months, not hours
• variance and statistical thinking are foreign to CS
LinkedIn-style problems circa 2011 required certain skills…
manipulating the Newtonian physics of data… that money
may be mostly off the table by now
Big Data opportunities ahead require different math?
38
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
39
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
Business
40
Business Disruption
Geoffrey Moore
Mohr DavidowVentures, author CrossingThe Chasm / Hadoop Summit, 2012:
what Amazon did to the retail sector… has put the entire Global 1000
on notice over the next decade… data as the major force… mostly
through apps – verticals, leveraging domain expertise
Michael Stonebraker
INGRES, PostgreSQL,Vertica,VoltDB, Paradigm4, etc. / XLDB, 2012:
complex analytics workloads are now displacing SQL as the basis
for Enterprise apps
Larry Page
CEO, Google / Wired, 2013:
create products and services that are 10 times better than the
competition… thousand-percent improvement requires rethinking
problems entirely, exploring the edges of what’s technically possible,
and having a lot more fun in the process
41
algorithmic modeling + machine data
+ curation, metadata + Open Data
data products, as feedback into automation
evolution of feedback loops
less about “bigness”, more about complexity
internet of things + A/D conversion
+ complex analytics
accelerated evolution, additional feedback loops
orders of magnitude higher data rates
Internet ofThings accelerates this process of disruption
Business Drivers
source: National Geographic
“A kind of Cambrian explosion”
source: National Geographic
42
Internet of Things
43
A Thought Exercise
consider that when a company like Catepillar moves
into data science, they won’t be building the world’s
next search engine or social network
they will most likely be optimizing supply chain,
optimizing fuel costs, automating data feedback
loops integrated into their equipment…
that’s a $50B company,
in a market segment worth $250B
upcoming: tractors as drones –
guided by complex, distributed data apps
Operations Research –
crunching amazing amounts of data
44
Alternatively…
climate.com
45
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
46
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
Algorithms
47
Algorithms
many algorithm libraries used today are based on implementations
back when people used DO loops in FORTRAN, 30+ years ago
MapReduce is Good Enough?
Jimmy Lin, UMD
umiacs.umd.edu/~jimmylin/publications/Lin_BigData2013.pdf
astrophysics and genomics are light years ahead in sophisticated
algorithms work – as Breiman suggested in 2001 – which may take
a while to percolate into industry
other game-changers:
• streaming algorithms, sketches, probabilistic data structures
• significant “Big O” complexity reduction (e.g., skytree.net)
• better architectures and topologies (e.g., GPUs and CUDA)
• partial aggregates – parallelizing workflows
48
How much does it cost you to earn $1B?
also, take a moment to check this out…
(IMHO most interesting algorithm work recently)
QR factorization of a “tall-and-skinny” matrix
• used to solve many data problems at scale,
e.g., PCA, SVD, etc.
• numerically stable with efficient implementation
on large-scale Hadoop clusters
suppose that you have a sparse matrix of customer
interactions where there are 100MM customers,
with a limited set of outcomes…
cs.purdue.edu/homes/dgleich
stanford.edu/~arbenson
github.com/ccsevers/scalding-linalg
David Gleich, slideshare.net/dgleich
Tristan Jehan
49
How much does it cost you to earn $1B?
also, take a moment to check this out…
(IMHO most interesting algorithm work recently)
QR factorization of a “tall-and-skinny” matrix
• used to solve many data problems at scale,
e.g., PCA, SVD, etc.
• numerically stable with efficient implementation
on large-scale Hadoop clusters
suppose that you have a sparse matrix of customer
interactions where there are 100MM customers,
with a limited set of outcomes…
cs.purdue.edu/homes/dgleich
stanford.edu/~arbenson
github.com/ccsevers/scalding-linalg
David Gleich, slideshare.net/dgleich
Tristan Jehan
distributed algorithms for high ROI
use cases on cost-effective clustered
resources…
we’re learning how to do it right
50
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
51
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
Personality
52
Personality
we have perhaps built computers (once named “electronic
brains”) in the image of JohnVon Neumann, et al.: standalone
genius, aristotelian uber-geek, incredible capacity for memory
and logic, overbearing, not particularly cooperative…
one can almost imagine a war-time dialogue,“Get one of these
guys in the room, they’ll solve anything!” … as a result, decades
of mutually assured destruction for global strategy
Q:
have we created software engineering practices which selected for
this kind of personality? selecting for “lone wolf” guys, socially
awkward, ONE person who can understand an entire code base,
able to out-logic and out-argue the rest of the room… charming
fellow, really
have we enabled software process to box these personalities
into something resembling teams? along with overtly described
rules for social conventions… silos, in other words
53
Chasing Unicorns
silos… but didn’t that all change?
because the data won’t fit on one computer anymore
leverage with data science teams is where organizations
tear down internal silos, socializing hard problems
data won’t fit on one computer anymore, problems won’t
fit in one department anymore, the code base won’t fit
into one uber-geek’s memory recall anymore…
so we embrace distributed systems for solutions
Q:
“Why aren’t there more women in engineering?”
IMHO, we’re trying to select for a personality which
doesn’t exist, and would not resolve current challenges;
meanwhile, my data science teams run about 50/50
54
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
55
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
Clusters
56
Clusters
a little secret: people like me make a good living by
leveraging high ROI apps based on clusters, and so
the execs agree to build out more data centers…
clusters for Hadoop/Hive/HBase, clusters for Memcached,
for Cassandra, for MySQL, for Storm, for Nginx, etc.
this becomes expensive!
a single class of workloads on a given cluster is simpler
to manage; but terrible for utilization
leveragingVMs and various notions of “cloud” helps
Cloudera, Hortonworks, probably EMC soon: sell a notion
of “Hadoop as OS” All your workloads are belong to us
regardless of how architectures change, death and taxes
will endure: servers fail, and data must move
Google Data Center, Fox News
~2002
57
Operating Systems, redux
meanwhile, GOOG is 3+ generations ahead,
with much improved ROI on data centers
John Wilkes, et al.
Borg/Omega:“10x” secret sauce
youtu.be/0ZFMlO98Jkc
0%
25%
50%
75%
100%
RAILS CPU
LOAD
MEMCACHED
CPU LOAD
0%
25%
50%
75%
100%
HADOOP CPU
LOAD
0%
25%
50%
75%
100%
t t
0%
25%
50%
75%
100%
Rails
Memcached
Hadoop
COMBINED CPU LOAD (RAILS,
MEMCACHED, HADOOP)
Florian Leibert, Chronos/Mesos @ Airbnb
Mesos, open source cloud OS – like Borg
incubator.apache.org/mesos
58
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
59
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
Trendlines
60
Trendlines
Big Data? we’re just getting started:
• ~12 exabytes/day, jet turbines on commercial flights
• Google self-driving cars, ~1 Gb/s per vehicle
• National Instruments initiative: Big Analog Data™
• 1m resolution satellites skyboximaging.com
• open resource monitoring reddmetrics.com
• Sensing XChallenge nokiasensingxchallenge.org
consider the implications of Jawbone, Nike, etc.,
plus the secondary/tertiary effects of Google Glass
7+ billion people, instrumented better than … how we
have Nagios instrumenting our web servers right now
plus the business implications given that much of the
Global 1000 is positioned to be disrupted technologyreview.com/...
61
Three Laws, or more?
meanwhile, architectures evolve toward much, much larger data…
pistoncloud.com/ ...
Rich Freitas, IBM Research
Q:
what kinds of evolution in topologies could
this imply?
62
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
63
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
Languages
64
Languages
JVM-based languages became popular for Big Data open source
technologies:
• partly becauseYHOO adopted Hadoop, etc.
• partly because Enterprise IT shops have J2EE expertise
• partly because of functional languages: Clojure, Scala
JVM has its drawbacks, especially for low-latency use cases
ample use of languages such as Python and Erlang in Big Data
practices, plus keep in mind that Google uses C++
FunctionalThinking
Neal Ford
youtu.be/plSZIkLodDM
a hunch: issues about current programming languages are
secondary to culture
65
Functional Programming for Big Data
WordCount with token scrubbing…
Apache Hive: 52 lines HQL + 8 lines Python (UDF)
compared to
Scalding: 18 lines Scala/Cascading
functional programming languages help reduce
software engineering costs at scale, over time
66
references…
“Scalable and Flexible Machine LearningWith Scala @ LinkedIn”
Vitaly Gordon [ especially see slide #9 ]
slideshare.net/VitalyGordon/scalable-and-flexible-machine-learning-with-scala-linkedin
Elements Of Functional Programming
Chris Reade
amazon.com/dp/0201129159
67
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
68
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
Organization
69
Organization
How Do Committees Invent?
Melvin Conway, 1968
melconway.com/research/committees.html
Manu Cornet bonkersworld.net
“Any organization that designs a system
(defined more broadly here than just
information systems) will inevitably
produce a design whose structure is a
copy of the organization’s communication
structure.”
Q:
• does this fit with software process?
• does this fit with distributed apps?
see also:
haacked.com/archive/2013/05/13/applying-conways-law.aspx
70
Cooperation
perhaps we have selected for the wrong
personality to idealize…
linkedin.com/today/post/article/20130520190305-110300724-why-nothing-not-even-software-can-eat-the-world
All long-term success depends on eliciting
the voluntary support of an ecosystem.
As the African proverb says,“If you want
to go fast, go alone; if you want to go far,
go with others.” – Geoffrey Moore
71
discovery
discovery
modeling
modeling
integration
integration
appsapps
systems
systems
business process,
stakeholder
data prep, discovery,
modeling, etc.
software engineering,
automation
systems engineering,
access
data
science
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
Team Composition: Needs × Roles
72
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
73
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
Architecture
74
Architecture
Rich Hickey, Nathan Marz, Stuart Sierra, et al.:
functional programming to help reduce
costs over time
1. technical debt? this is how an organization
builds a culture to avoid it
2. Conway's Law corollary: model teams and
communication based on properties of the
desired architecture
3. also consider Mesos/Borg: schedule data
to be located where [CPU, RAM, I/O, surety]
will become available
Rich Hickey, infoq.com/presentations/Simple-Made-Easy
75
Lambda Architecture
Big Data
Nathan Marz, James Warren
manning.com/marz
• batch layer (immutable data, idempotent ops)
• serving layer (to query batch)
• speed layer (transient, cached “real-time”)
• combining results
76
Pattern Language
structured method for solving large, complex design
problems, where the syntax of the language ensures
the use of best practices – i.e., conveying expertise
Failure
Traps
bonus
allocation
employee
PMML
classifier
quarterly
sales
Join
Count
leads
A Pattern Language
Christopher Alexander, et al.
amazon.com/dp/0195019199
77
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
78
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
Culture
79
Culture
Notes from the Mystery Machine Bus
SteveYegge, Google
goo.gl/SeRZa
consider these perspectives
in light of Conway’s Law…
“conservatism” “liberalism”
(mostly) Enterprise (mostly) Start-Up
risk management customer experiments
assurance flexibility
well-defined schema schema follows code
explicit configuration convention
type-checking compiler interpreted scripts
wants no surprises wants no impediments
Java, Scala, Clojure, etc. PHP, Ruby, Python, etc.
Cascading, Scalding, Cascalog, etc. Hive, Pig, Hadoop Streaming, etc.
80
Two Avenues to the App Layer…
scale ➞
complexity➞
Enterprise: must contend with
complexity at scale everyday…
incumbents extend current practices and
infrastructure investments – using J2EE,
ANSI SQL, SAS, etc. – to migrate
workflows onto Apache Hadoop while
leveraging existing staff
Start-ups: crave complexity and
scale to become viable…
new ventures move into Enterprise space
to compete using relatively lean staff,
while leveraging sophisticated engineering
practices, e.g., Cascalog and Scalding
81
approximately 80% of the costs for data-related projects
gets spent on data preparation – mostly on cleaning up
data quality issues: ETL, log files, etc., generally by socializing
the problem
unfortunately, data-related budgets tend to go into
frameworks which can only be used after clean up
most valuable skills:
‣ learn to use programmable tools that prepare data
‣ learn to understand the audience and their priorities
‣ learn to generate compelling data visualizations
‣ learn to estimate the confidence for reported results
‣ learn to automate work, making analysis repeatable
d3js.org
What is needed?
82
BeyondHadoopBeyondHadoop
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
83
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves
BeyondHadoopBeyondHadoop
Learning
Curves
84
Learning Curves
difficulties in the commercial use of distributed systems
often get represented as issues of managing complexity
much of the risk in managing a data science team is about
budgeting for learning curve: some orgs practice a kind of
engineering “conservatism”, with highly structured process
and strictly codified practices – people learn a few things
well, then avoid having to struggle with learning many new
things perpetually…
that approach leads to enormous teams and low ROI scale➞
complexity➞
ultimately, the challenge is about
managing learning curves within
a social context
85
Management
ultimately, the challenge is about managing
learning curves within a social context
est. cost of individual learning, initial impl
est.costofteamre-learning,lifecycle
some technologies constrain the
need to learn, others accelerate
re-learning prior business logic…
choose the latter, FTW!
86
Management
ultimately, the challenge is about managing
learning curves within a social context
est. cost of individual learning, initial impl
est.costofteamre-learning,lifecycle
some technologies constrain the
need to learn, others accelerate
re-learning prior business logic…
choose the latter, FTW!
IMHO, the “agile” part was intended to be
about shared learnings; while the “lean” part
was about how much you have on your plate
at any one time
87
blogs.hbr.org/johnson/2012/09/throw-your-life-a-curve.html
ThrowYour Life a Curve
Whitney Johnson
Aggressively Pro-Active Learning
• deconstruction of the cognitive bias One Size Fits All
• “makes a compelling case for personal disruption”
• “plan your career around learning curves”
• hire people who learn/re-learn efficiently
88
Summary
to be competitive globally with Big Data
requires learning many technologies –
then learning the nuances of a code base for
which the team is responsible, learning the
ever-changing surprises and insights which
are hidden deep within mountains of data,
plus the ever-evolving mathematics needed
to grapple with these conditions effectively
because the data won’t fit on one computer anymore
First
Principles
Topologies
Languages
Modeling Attention
Clusters
Algorithms
Trendlines
Organization
Architecture
Culture
Business
Personality
Learning
Curves you are here
89
Cascading: Workflow Abstraction
Failure
Traps
bonus
allocation
employee
PMML
classifier
quarterly
sales
Join
Count
leads
Design Patterns for Workflows,
Across Departments
90
Anatomy of an Enterprise app
Definition a typical Enterprise workflow which crosses through
multiple departments, languages, and technologies…
ETL
data
prep
predictive
model
data
sources
end
uses
91
Anatomy of an Enterprise app
Definition a typical Enterprise workflow which crosses through
multiple departments, languages, and technologies…
ETL
data
prep
predictive
model
data
sources
end
uses
ANSI SQL for ETL
92
Anatomy of an Enterprise app
Definition a typical Enterprise workflow which crosses through
multiple departments, languages, and technologies…
ETL
data
prep
predictive
model
data
sources
end
usesJ2EE for business logic
93
Anatomy of an Enterprise app
Definition a typical Enterprise workflow which crosses through
multiple departments, languages, and technologies…
ETL
data
prep
predictive
model
data
sources
end
uses
SAS for predictive models
94
Anatomy of an Enterprise app
Definition a typical Enterprise workflow which crosses through
multiple departments, languages, and technologies…
ETL
data
prep
predictive
model
data
sources
end
uses
SAS for predictive modelsANSI SQL for ETL most of the licensing costs…
95
Anatomy of an Enterprise app
Definition a typical Enterprise workflow which crosses through
multiple departments, languages, and technologies…
ETL
data
prep
predictive
model
data
sources
end
usesJ2EE for business logic
most of the project costs…
96
ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
SAS, R, etc. → PMML
business logic in Java,
Clojure, Scala, etc.
sink taps for
Memcached, HBase,
MongoDB, etc.
source taps for
Cassandra, JDBC,
Splunk, etc.
Anatomy of an Enterprise app
Cascading allows multiple departments to combine their workflow components
into an integrated app – one among many, typically – based on 100% open source
a compiler sees it all…
cascading.org
97
a compiler sees it all…
ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
SAS, R, etc. → PMML
business logic in Java,
Clojure, Scala, etc.
sink taps for
Memcached, HBase,
MongoDB, etc.
source taps for
Cassandra, JDBC,
Splunk, etc.
Anatomy of an Enterprise app
Cascading allows multiple departments to combine their workflow components
into an integrated app – one among many, typically – based on 100% open source
FlowDef flowDef = FlowDef.flowDef()
.setName( "etl" )
.addSource( "example.employee", emplTap )
.addSource( "example.sales", salesTap )
.addSink( "results", resultsTap );
 
SQLPlanner sqlPlanner = new SQLPlanner()
.setSql( sqlStatement );
 
flowDef.addAssemblyPlanner( sqlPlanner );
cascading.org
98
a compiler sees it all…
ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
SAS, R, etc. → PMML
business logic in Java,
Clojure, Scala, etc.
sink taps for
Memcached, HBase,
MongoDB, etc.
source taps for
Cassandra, JDBC,
Splunk, etc.
Anatomy of an Enterprise app
Cascading allows multiple departments to combine their workflow components
into an integrated app – one among many, typically – based on 100% open source
FlowDef flowDef = FlowDef.flowDef()
.setName( "classifier" )
.addSource( "input", inputTap )
.addSink( "classify", classifyTap );
 
PMMLPlanner pmmlPlanner = new PMMLPlanner()
.setPMMLInput( new File( pmmlModel ) )
.retainOnlyActiveIncomingFields();
 
flowDef.addAssemblyPlanner( pmmlPlanner );
99
cascading.org
ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
SAS, R, etc. → PMML
business logic in Java,
Clojure, Scala, etc.
sink taps for
Memcached, HBase,
MongoDB, etc.
source taps for
Cassandra, JDBC,
Splunk, etc.
Anatomy of an Enterprise app
Cascading allows multiple departments to combine their workflow components
into an integrated app – one among many, typically – based on 100% open source
visual collaboration for the business logic is a great
way to improve how teams work together:
Literate Programming, Don Knuth
www-cs-faculty.stanford.edu/~uno/lp.html
Failure
Traps
bonus
allocation
employee
PMML
classifier
quarterly
sales
Join
Count
leads
100
ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
SAS, R, etc. → PMML
business logic in Java,
Clojure, Scala, etc.
sink taps for
Memcached, HBase,
MongoDB, etc.
source taps for
Cassandra, JDBC,
Splunk, etc.
Anatomy of an Enterprise app
Cascading allows multiple departments to combine their workflow components
into an integrated app – one among many, typically – based on 100% open source
Failure
Traps
bonus
allocation
employee
PMML
classifier
quarterly
sales
Join
Count
leads
visual collaboration for the business logic is a great
way to improve how teams work together:
Literate Programming, Don Knuth
www-cs-faculty.stanford.edu/~uno/lp.html
multiple departments, working in their respective
frameworks, integrate results into a combined app,
which runs at scale on a cluster… business process
combined in a common space (DAG) for flow
planners, compiler, optimization, troubleshooting,
exception handling, notifications, security audit,
performance monitoring, etc.
cascading.org
101
Workflow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
Data Products Customers
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
Use Cases Across Topologies
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere – Four-Part Harmony
102
Workflow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
Data Products Customers
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
Use Cases Across Topologies
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere – Four-Part Harmony
1. End Use Cases, the drivers
103
Workflow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
Data Products Customers
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
Use Cases Across Topologies
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere – Four-Part Harmony
2. A new kind of team process
104
Workflow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
Data Products Customers
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
Use Cases Across Topologies
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere – Four-Part Harmony
3. Abstraction layer as optimizing
middleware, e.g., Cascading
105
Workflow
RDBMS
near timebatch
services
transactions,
content
social
interactions
Web Apps,
Mobile, etc.History
Data Products Customers
RDBMS
Log
Events
In-Memory
Data Grid
Hadoop,
etc.
Cluster Scheduler
Prod
Eng
DW
Use Cases Across Topologies
s/w
dev
data
science
discovery
+
modeling
Planner
Ops
dashboard
metrics
business
process
optimized
capacitytaps
Data
Scientist
App Dev
Ops
Domain
Expert
introduced
capability
existing
SDLC
Circa 2013: clusters everywhere – Four-Part Harmony
4. Distributed OS, e.g., Mesos
106
Enterprise DataWorkflows
with Cascading
O’Reilly, 2013
amazon.com/dp/1449358721
references…
107
blog, dev community, code/wiki/gists, maven repo,
commercial products, career opportunities, newsletter:
cascading.org
zest.to/group11
github.com/Cascading
conjars.org
goo.gl/KQtUL
concurrentinc.com
bit.ly/pxnnews
drill-down…
108

More Related Content

What's hot

VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUG IT
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Jonathan Seidman
 
a9TD6cbzTZotpJihekdc+w==.docx
a9TD6cbzTZotpJihekdc+w==.docxa9TD6cbzTZotpJihekdc+w==.docx
a9TD6cbzTZotpJihekdc+w==.docxVasimMemon4
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionChetan Khatri
 
Advanced analytics with sap hana and r
Advanced analytics with sap hana and rAdvanced analytics with sap hana and r
Advanced analytics with sap hana and rSAP Technology
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses HadoopNarayan Bharadwaj
 
Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data Jongwook Woo
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopDataWorks Summit
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04Ted Dunning
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureKovid Academy
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Big Data Analysis and Industrial Approach using Spark
Big Data Analysis and Industrial Approach using SparkBig Data Analysis and Industrial Approach using Spark
Big Data Analysis and Industrial Approach using SparkJongwook Woo
 

What's hot (20)

VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware Hadoop
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011
 
a9TD6cbzTZotpJihekdc+w==.docx
a9TD6cbzTZotpJihekdc+w==.docxa9TD6cbzTZotpJihekdc+w==.docx
a9TD6cbzTZotpJihekdc+w==.docx
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
Advanced analytics with sap hana and r
Advanced analytics with sap hana and rAdvanced analytics with sap hana and r
Advanced analytics with sap hana and r
 
Resume - Narasimha Rao B V (TCS)
Resume - Narasimha  Rao B V (TCS)Resume - Narasimha  Rao B V (TCS)
Resume - Narasimha Rao B V (TCS)
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data Introduction to Spark: Data Analysis and Use Cases in Big Data
Introduction to Spark: Data Analysis and Use Cases in Big Data
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Treasure Data: Big Data Analytics on Heroku
Treasure Data: Big Data Analytics on HerokuTreasure Data: Big Data Analytics on Heroku
Treasure Data: Big Data Analytics on Heroku
 
Steve Watt Presentation
Steve Watt PresentationSteve Watt Presentation
Steve Watt Presentation
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architecture
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
r4
r4r4
r4
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Big Data Analysis and Industrial Approach using Spark
Big Data Analysis and Industrial Approach using SparkBig Data Analysis and Industrial Approach using Spark
Big Data Analysis and Industrial Approach using Spark
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 

Viewers also liked

Books a Love Story (pdf with notes)
Books a Love Story (pdf with notes)Books a Love Story (pdf with notes)
Books a Love Story (pdf with notes)Tim O'Reilly
 
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking OSCON Byrum
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Seattle Data Geeks: Hadoop and Beyond
Seattle Data Geeks: Hadoop and BeyondSeattle Data Geeks: Hadoop and Beyond
Seattle Data Geeks: Hadoop and BeyondPaco Nathan
 
How we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenHow we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenOSCON Byrum
 
Introduction to Slideshare at Barcamp Hyderabad
Introduction to Slideshare at Barcamp HyderabadIntroduction to Slideshare at Barcamp Hyderabad
Introduction to Slideshare at Barcamp HyderabadKapil Mohan
 
Comment le picture marketing permet de développer ses ventes en ligne et en b...
Comment le picture marketing permet de développer ses ventes en ligne et en b...Comment le picture marketing permet de développer ses ventes en ligne et en b...
Comment le picture marketing permet de développer ses ventes en ligne et en b...Emilie Marquois
 
OPEN Silcon Valley - Clean-tech is Main-tech: How do you fit in the Green Ec...
OPEN Silcon Valley - Clean-tech is Main-tech:  How do you fit in the Green Ec...OPEN Silcon Valley - Clean-tech is Main-tech:  How do you fit in the Green Ec...
OPEN Silcon Valley - Clean-tech is Main-tech: How do you fit in the Green Ec...Shuja Keen
 
SlideShare's Lean Startup Journey: Lessons Learnt
SlideShare's Lean Startup Journey: Lessons LearntSlideShare's Lean Startup Journey: Lessons Learnt
SlideShare's Lean Startup Journey: Lessons LearntKapil Mohan
 
DSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanDSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanPaco Nathan
 
Open Data: From the Information Age to the Action Age (Keynote File)
Open Data: From the Information Age to the Action Age (Keynote File)Open Data: From the Information Age to the Action Age (Keynote File)
Open Data: From the Information Age to the Action Age (Keynote File)Tim O'Reilly
 
Aspen ideas Festival Talk on Gov20
Aspen ideas Festival Talk on Gov20Aspen ideas Festival Talk on Gov20
Aspen ideas Festival Talk on Gov20Tim O'Reilly
 
Creating actionable marketo reports july, 2013
Creating actionable marketo reports   july, 2013Creating actionable marketo reports   july, 2013
Creating actionable marketo reports july, 2013Inga Romanoff
 
Traffic Signal Movie Preview
Traffic Signal Movie PreviewTraffic Signal Movie Preview
Traffic Signal Movie PreviewKapil Mohan
 
Columbia Law School - Decentralized Ledgers Presentation on 4/7/2014
Columbia Law School - Decentralized Ledgers Presentation on 4/7/2014Columbia Law School - Decentralized Ledgers Presentation on 4/7/2014
Columbia Law School - Decentralized Ledgers Presentation on 4/7/2014Ldger, Inc
 
Larry's Free Culture
Larry's Free CultureLarry's Free Culture
Larry's Free CultureKapil Mohan
 
What we can take for granted in online communities
What we can take for granted in online communitiesWhat we can take for granted in online communities
What we can take for granted in online communitiesChris Messina
 
Just Because You Can Doesn't Mean You Should
Just Because You Can Doesn't Mean You ShouldJust Because You Can Doesn't Mean You Should
Just Because You Can Doesn't Mean You ShouldOReillyWhere20
 

Viewers also liked (20)

Bilan de mobilité
Bilan de mobilitéBilan de mobilité
Bilan de mobilité
 
Books a Love Story (pdf with notes)
Books a Love Story (pdf with notes)Books a Love Story (pdf with notes)
Books a Love Story (pdf with notes)
 
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Seattle Data Geeks: Hadoop and Beyond
Seattle Data Geeks: Hadoop and BeyondSeattle Data Geeks: Hadoop and Beyond
Seattle Data Geeks: Hadoop and Beyond
 
How we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenHow we built our community using Github - Uri Cohen
How we built our community using Github - Uri Cohen
 
Introduction to Slideshare at Barcamp Hyderabad
Introduction to Slideshare at Barcamp HyderabadIntroduction to Slideshare at Barcamp Hyderabad
Introduction to Slideshare at Barcamp Hyderabad
 
Comment le picture marketing permet de développer ses ventes en ligne et en b...
Comment le picture marketing permet de développer ses ventes en ligne et en b...Comment le picture marketing permet de développer ses ventes en ligne et en b...
Comment le picture marketing permet de développer ses ventes en ligne et en b...
 
OPEN Silcon Valley - Clean-tech is Main-tech: How do you fit in the Green Ec...
OPEN Silcon Valley - Clean-tech is Main-tech:  How do you fit in the Green Ec...OPEN Silcon Valley - Clean-tech is Main-tech:  How do you fit in the Green Ec...
OPEN Silcon Valley - Clean-tech is Main-tech: How do you fit in the Green Ec...
 
SlideShare's Lean Startup Journey: Lessons Learnt
SlideShare's Lean Startup Journey: Lessons LearntSlideShare's Lean Startup Journey: Lessons Learnt
SlideShare's Lean Startup Journey: Lessons Learnt
 
DSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanDSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco Nathan
 
Open Data: From the Information Age to the Action Age (Keynote File)
Open Data: From the Information Age to the Action Age (Keynote File)Open Data: From the Information Age to the Action Age (Keynote File)
Open Data: From the Information Age to the Action Age (Keynote File)
 
Aspen ideas Festival Talk on Gov20
Aspen ideas Festival Talk on Gov20Aspen ideas Festival Talk on Gov20
Aspen ideas Festival Talk on Gov20
 
Creating actionable marketo reports july, 2013
Creating actionable marketo reports   july, 2013Creating actionable marketo reports   july, 2013
Creating actionable marketo reports july, 2013
 
Traffic Signal Movie Preview
Traffic Signal Movie PreviewTraffic Signal Movie Preview
Traffic Signal Movie Preview
 
Columbia Law School - Decentralized Ledgers Presentation on 4/7/2014
Columbia Law School - Decentralized Ledgers Presentation on 4/7/2014Columbia Law School - Decentralized Ledgers Presentation on 4/7/2014
Columbia Law School - Decentralized Ledgers Presentation on 4/7/2014
 
Larry's Free Culture
Larry's Free CultureLarry's Free Culture
Larry's Free Culture
 
What we can take for granted in online communities
What we can take for granted in online communitiesWhat we can take for granted in online communities
What we can take for granted in online communities
 
Just Because You Can Doesn't Mean You Should
Just Because You Can Doesn't Mean You ShouldJust Because You Can Doesn't Mean You Should
Just Because You Can Doesn't Mean You Should
 
Word clouds
Word cloudsWord clouds
Word clouds
 

Similar to Hadoop and Beyond

Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big DataPaco Nathan
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?samthemonad
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Kellogg XML Holland Speech
Kellogg XML Holland SpeechKellogg XML Holland Speech
Kellogg XML Holland SpeechDave Kellogg
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architectureJoseph D'Antoni
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approachesLuxoft
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousingSneha Challa
 

Similar to Hadoop and Beyond (20)

Functional programming
 for optimization problems 
in Big Data
Functional programming
  for optimization problems 
in Big DataFunctional programming
  for optimization problems 
in Big Data
Functional programming
 for optimization problems 
in Big Data
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Kellogg XML Holland Speech
Kellogg XML Holland SpeechKellogg XML Holland Speech
Kellogg XML Holland Speech
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
The modern analytics architecture
The modern analytics architectureThe modern analytics architecture
The modern analytics architecture
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approaches
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 

More from Paco Nathan

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryPaco Nathan
 
Computable Content
Computable ContentComputable Content
Computable ContentPaco Nathan
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons LearnedPaco Nathan
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedPaco Nathan
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningPaco Nathan
 

More from Paco Nathan (20)

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with ML
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AI
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industry
 
Computable Content
Computable ContentComputable Content
Computable Content
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Hadoop and Beyond