The @ExperianDataLab hosts a #DataTalk on Thursdays at 5 p.m. ET on Twitter. Join us.
This week, we talked about data preparation, model evaluation, testing effectiveness of predictive analytics, challenges, and trends in predictive analytics.
We learned from Michael Beygelman, Co-founder and CEO of Joberate and Berry Diepeveen, Partner and Enterprise Intelligence Leader at EY in South Africa, and Chuck Robida, Chief Scientist for Experian Decision Analytics.
Learn about past and upcoming chats at:
http://experian.com/datatalk
2. Join our #DataTalk on Thursdays at 5 p.m. ET
This week, we tweeted with Michael Beygelman, Co-founder and
CEO of Joberate, Berry Diepeveen, Partner and Enterprise
Intelligence Leader at EY, and Chuck Robida, Chief Scientist for
Experian Decision Analytics.
Check out all tweets from this Twitter chat:
ex.pn/predictive
4. Michael Beygelman
CEO, Joberate
@beygelman @joberate ex.pn/datatalk
#DataTalk
Predictive analytics is extracting
information from data sets to determine
patterns, predict outcomes and trends.
5. Chuck Robida
Chief Scientist, Experian
@ExperianDA ex.pn/datatalk
#DataTalk
Predictive analytics is the ability
to use data to predict future behavior
based on past behavior.
6. Berry Diepeveen
Partner, EY
@Berry_Diepeveen ex.pn/datatalk
#DataTalk
I think it is as old as business.
Nobody can perfectly predict the future,
but you want to be more accurate about
what is likely to happen.
7. Chuck Robida
Chief Scientist, Experian
@ExperianDA ex.pn/datatalk
#DataTalk
It’s the ability to analyze data in a way
that can scale, be reproduced, and
provide unbiased results.
8. Michael Beygelman
CEO, Joberate
@beygelman @joberate ex.pn/datatalk
#DataTalk
Clever marketers are redefining
predictive analytics into whatever suits
them today, so we need to beware.
9. Chuck Robida
Chief Scientist, Experian
@ExperianDA ex.pn/datatalk
#DataTalk
Some techniques get more attention
than others like machine learning,
but all are used to solve business problems.
10. Berry Diepeveen
Partner, EY
@Berry_Diepeveen ex.pn/datatalk
#DataTalk
It is about being able to intervene.
What is the point of finding out we
lost a customer after he left?
We need to prevent losing one before it happens.
11. Michael Beygelman
CEO, Joberate
@beygelman @joberate ex.pn/datatalk
#DataTalk
I’ve always said that predictive analytics
needs to be actionable like a brake system
in a car. When you press, it does something.
12. Chuck Robida
Chief Scientist, Experian
@ExperianDA ex.pn/datatalk
#DataTalk
Predictive analytics isn’t a crystal ball,
but the value comes in identifying
the propensity of certain behaviors.
21. What type of data do companies
use for predictive analytics?
22. Depends on the business goal.
Generally a mix of fit-for-purpose internal
and external data types, structured
or unstructured data.
#DataTalk
ex.pn/datatalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
23. Sometimes companies start by using
internal data. In my world, payroll data,
promotions, performance reviews, etc.
#DataTalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate ex.pn/datatalk
24. Depending on how much success
they have with internal data, and how quickly,
they’ll usually broaden out to third-party data.
#DataTalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate ex.pn/datatalk
25. For lenders: asset evaluations for loans,
address change for collections -- all good data,
in compliance with regulation.
#DataTalk
ex.pn/datatalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
26. For marketing: social, contact history,
profile data, all good data.
#DataTalk
ex.pn/datatalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
27. If we go back to the fraud detection
use case; you’d have to rely on
internal, external, structured and
unstructured data.
Berry Diepeveen
Partner, EY
@Berry_Diepeveen
#DataTalk
ex.pn/datatalk
28. The beauty is that there is no limit about
what data sources you want to tap into.
It’s always driven by the business and use case,
not the other way around.
Berry Diepeveen
Partner, EY
@Berry_Diepeveen
#DataTalk
ex.pn/datatalk
29. Only limitations are legal, compliance
and your imaginations.
#DataTalk
ex.pn/datatalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
30. How much data preparation
needs to be done before
executing predictive analytics?
31. It requires a very tight collaboration between
business and data science in order
to determine the iterations.
Berry Diepeveen
Partner, EY
@Berry_Diepeveen ex.pn/datatalk
#DataTalk
32. Data preparation is arguably as
important as the rest of the process.
ex.pn/datatalk
#DataTalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate
33. Garbage in, garbage out.
Data preparation is the most important step.
Incorrect or insufficient data equals
bad business decisions
ex.pn/datatalk
#DataTalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
34. We see three phases in any predictive analytics
program: 1) strict data management, 2) building
and applying advanced analytics models, and
3) using data visualization to bring the
insights back to the end user.
Berry Diepeveen
Partner, EY
@Berry_Diepeveen ex.pn/datatalk
#DataTalk
35. If the sample size is massive, it might
be more practical to sample the data;
else you can use the whole sample.
ex.pn/datatalk
#DataTalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate
36. Without strict and rigorous data management,
you should question your investments
in data science.
Berry Diepeveen
Partner, EY
@Berry_Diepeveen ex.pn/datatalk
#DataTalk
37. Decide what to do with incomplete data,
discard it or take guesses at missing data points
by looking at other data in the sample.
ex.pn/datatalk
#DataTalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate
38. Be careful before tossing any data. Bias!
ex.pn/datatalk
#DataTalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
39. Many activities like selecting, combining,
and aggregating data are important,
especially when defining the form for training.
ex.pn/datatalk
#DataTalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate
41. It’s more of a business decision.
If your data is updated quarterly,
no point in updating a model
more often than that.
ex.pn/datatalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate
#DataTalk
42. Frequent model evaluation or validation
is critical + results should be taken
in context of other solutions
and external factors.
ex.pn/datatalk
#DataTalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
43. Building good models is the science.
It involves experimentation,
sufficient quality data
and is time consuming.
Berry Diepeveen
Partner, EY
@Berry_Diepeveen
#DataTalk
ex.pn/datatalk
44. If data is updated daily, and you
choose to update the model quarterly,
you might have to live with some
bad assumptions.
ex.pn/datatalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate
#DataTalk
45. Expect a model to naturally deteriorate
over time. Predictive analytics
needs to be continually validated for
fit for purpose.
ex.pn/datatalk
#DataTalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
46. Regardless of the use case, you need
to update models regularly and
structurally, but additional ad hoc
updates depend on use case.
Berry Diepeveen
Partner, EY
@Berry_Diepeveen
#DataTalk
ex.pn/datatalk
47. Models are fit-for-purpose and consider
things like economy, home values...
Tests + benchmarks exist to ensure
models are robuts.
ex.pn/datatalk
#DataTalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
48. What is often forgotten is that new
models have to be retrained
with the updated data sets -
and results verified.
Berry Diepeveen
Partner, EY
@Berry_Diepeveen
#DataTalk
ex.pn/datatalk
49. What are the best ways to
test the effectiveness of
predictive analytics?
50. There are many scientific ways to test,
but the real question is did the analytics
provide you with actionable insights,
at the right time.
#DataTalk
ex.pn/datatalk
Berry Diepeveen
Partner, EY
@Berry_Diepeveen
51. Splitting data at the outset could be
a good idea so you’re not accidentally
creating a super model that only
works on one set.
#DataTalk
ex.pn/datatalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate
52. Deploy them in a manner where
their impact can be measured in a
controlled environment like
champion-challenger testing.
#DataTalk
ex.pn/datatalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
53. Use a majority of the data (say 65% or so)
for the build of the model, and
use the 35% of the data for the
test of the model.
#DataTalk
ex.pn/datatalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate
54. There are numerous ways to test
models, and some people swear
by some approaches almost like religion.
#DataTalk
ex.pn/datatalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate
55. Test models by using data not
used during development.
Validation won’t yield same results,
so benchmarking plays a big role.
#DataTalk
ex.pn/datatalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
56. One can use lift charts, decile tables,
some people like to use target shuffling.
#DataTalk
ex.pn/datatalk
Michael Beygelman
CEO, Joberate
@beygelman @joberate
57. What are ways companies can
use predictive analytics
in new ways?
58. Possibilities are endless,
but business focus is key.
ex.pn/datatalk
Berry Diepeveen
Partner, EY
@Berry_Diepeveen
#DataTalk
59. Predictive analytics used to scientifically
predict anything from the future state
economy + weather to spread
+ cures for disease.
#DataTalk
ex.pn/datatalk
Chuck Robida
Chief Scientist, Experian
@ExperianDA
60. Newest technologies allow you
to quite efficiently translate
unstructured into structured
such that it can be included in models.
ex.pn/datatalk
Berry Diepeveen
Partner, EY
@Berry_Diepeveen
#DataTalk
63. What are the challenges when
working in predictive analytics?
64. Michael Beygelman
CEO, Joberate
@beygelman @joberate ex.pn/datatalk
#DataTalk
Challenges? Too many :)
But making sure you have ample
relevant data is important,
and making sure you have tested models
71. Michael Beygelman
CEO, Joberate
@beygelman @joberate ex.pn/datatalk
#DataTalk
In more mature markets, the uptake
is “simpler” while in other markets
less so, which creates challenges
for global organizations.
73. Michael Beygelman
CEO, Joberate
@beygelman @joberate ex.pn/datatalk
#DataTalk
In terms of trends, machine learning
to automate the analytics process
itself is certainly one of the
bigger trends.
76. Michael Beygelman
CEO, Joberate
@beygelman @joberate ex.pn/datatalk
#DataTalk
Another trend hard to ignore is the
datafication of our lives;
basketballs to tennis rackets, and
Lumo Lift to help you stop slouching
77. Michael Beygelman
CEO, Joberate
@beygelman @joberate ex.pn/datatalk
#DataTalk
Along the datafication continuum,
data privacy laws are severely lagging
and will need attention.