2. Brevitaz Overview
● Founded in 2014
● Small team of technocrats delivering Big Data Solutions
● Global client-base in Europe and Asia-pacific region
● Expertise
○ Full-text search
○ Real-time analytics
○ Log analytics
○ BigData analytics
○ IoT based solutions
○ Machine learning
○ BigData warehousing
● Technologies
○ Spark, Hadoop, Kafka, Flume, Storm
○ Elasticsearch, Logstash, Kibana
○ MongoDB, Cassandra, HBase, Apache Titan
○ Impala, Spark SQL, Hawk
○ Java & Spring stack, Typesafe stack (Scala, Akka, Spray, Slick)
○ AngularJS
3.
4. Agenda
➔ Big Data & Analytics
➔ Full-text search
➔ Log analytics
➔ Big Data Analytics
➔ Real-time Analytics
➔ IoT Analytics
➔ Machine Learning on BigData
➔ Big Data Warehousing
➔ Big Data is for Everyone
8. “
It’s all about being able to spot
Right Information at Right Time
9. ◎ Relevance search in near real-time
○ Find results matching “iphone”. Please don’t show me
Iphone chargers in first page.
◎ Fuzzy search and search suggestions
○ Find results matching “iphne"
◎ Faceted search
○ Filters in amazon after searching a keyword
◎ Complex search with multiple criteria
○ Find me products matching “iphone” with in price range
30000 INR to 50000 INR and color “Space grey”
◎ Geo-spatial search
○ Find restaurants within 10 km radius from my current
location. And yes, I want to see closer ones on top.
Full-text search - What it is?
12. ◎ Crawl third-party websites
◎ Aggregate and classify the data
◎ Develop custom application on top of classified
data
Use-case - Information Aggregator
13. ◎ Google’s “Did you mean?”
◎ Search suggestions as you type
◎ Text analytics
◎ McGrowHill - Transform text-books into digital
learning resource
◎ SoundCloud - Quickly find music that interests
them
Other use-cases
16. ● Use machine generated logs to get operational
insights
● Sensors, application servers, web servers or any
IoT device logs
To interactively answer questions like...
◎ How many users signed up this week?
◎ How users are using your website / mobile app
◎ How successful is our advertising campaign?
◎ Why is the database slow?
◎ Which are the websites categories my team is
spending the most time at?
◎ Who are the potential employees to resign next?
Log Analytics - What it is?
18. Use-case - Network Logs Analysis
◎ High velocity
◎ High volume
◎ Collect, analyze and improvise
19.
20. ◎ Analyze click stream data to provide
personalized offers and user experience
◎ Interactive drill-down analysis
◎ Compliance reporting through interactive
dashboard
◎ Real-time alerts on invalid login attempts
◎ Detect outages
◎ Multi-channel funnel reporting for your
Advertising campaigns to find out which
channels contribute the most for conversions
Other use-cases
22. “
Combine all sources of data to
uncover hidden patterns and
unknown relations in your data
23. ● Take your transactional data from various
sources
● Take operational and user behaviour logs data
● Collect social data
● Combine data collected from various sources to
To interactively answer questions like...
◎ What is increase or decrease in sales over the
years?
◎ How many unique customers are acquired this
year?
◎ Which products are trending disproportionately
this year?
Big Data Analytics - What it is?
24. Usecase - Supply chain management
◎ RFID labels can indicate which product is where
at what time
◎ Get more accurate business insights
◎ Theft detection
25. ◎ Social media sentiment analysis to get end-user
feedback on launched products
◎ Identify market trends
◎ Predict employees attrition
◎ Customer churn analysis
◎ Influencer analysis
◎ Lead generation
◎ Proactive issues monitoring
◎ For insurance companies, identify potential
customers by combining birth, marriage and
health data
Other use-cases
28. ● Ingest streaming data, possibly at high velocity
● Analyse and react immediately
To solve problems like...
◎ Identify changing trends in real-time
◎ Detect fraud
◎ Analyse policy violations and react immediately
◎ Reduce downtimes
◎ Provide better and quicker business decisions
Real-time Analytics - What it is?
29. Use-case - Enrich Customer Experience
◎ Get real-time feeds about customer location or
products being browsed
◎ Combine with historical user behaviours
◎ Roll out offers in real-time
30. ◎ Hospitality Industry
○ Bad weather reduces travel, which then
reduces overnight lodging
○ Combine weather data with flight
cancellation to identify stranded travellers
○ Offer hotel coupons based on near by
location.
Other use-cases
31. ◎ Fraud detection
◎ Predict and enrich customer experience based on
location, lifestyle
◎ Real-time process visibility across an enterprise
◎ Suggest optimal routes based on current traffic
data
◎ Get player performance metrics in real-time to
substitute players at right time
Other use-cases
33. ● Use sensors to detect low level data
● Report the captured data to server
● Analyse and get back to user
To provide smart alerts and suggestions like
◎ Schedule maintenance of machines
◎ Your pulse rate is disproportionately increasing
◎ Medicines manufactured in a batch is not
complying to standards
IoT Based Smart Solutions - What it is?
35. ◎ In agriculture, Sensors can detect crop health
along with geo data and based on that alert can
be sent to farmers where they need to focus
◎ In retail, smart-shelves can detect and send
alerts on when to replenish
◎ Smart home can analyze the patterns of each
family member and optimize energy usage
Other use-cases
37. What is machine learning?
◎ Machine learning is not programming a machine to
do stuff
◎ Machine learning is making the machine learn and
adapt based on the observed data
38. Where is machine learning used?
● Identify similarities between products, users
● Predict values from past data
● Classify items into categories, like an email is spam
or not spam
in order to ...
◎ Predict expected outcome
◎ Categorize large amounts of data
◎ Optimize algorithms or paths
◎ Find similarities
◎ Improve quality of predictions continuously
40. Use-case - Recommending Products
◎ Compare thousands of
users/products with each other
to find similar “clusters”
◎ Content-based filtering -
Recommend similar products
to what customer has already
bought
◎ Find similar customers to the
current customer and
recommend him what they
have bought
◎ Apply what is known as
Clustering algorithms in
machine learning on Big Data
41. Use-case - Optimise team combination in Sports
◎ Choose best performing team with limited
budget
◎ It was first applied in Baseball, now many
professional games use these techniques
◎ Choose a team consisting of players who could
win at least enough games to make to the play-
offs
◎ Use data analysis techniques to find undervalued
players
43. What they achieved?
◎ Average 90 wins in each
season in less than 30M $
◎ Same number of wins in
1/3rd of budget than
another team
◎ 20 more wins than
another team with similar
budget
44. Other use-cases
◎ Fraud detection in banking and other sectors
◎ Fine grained customer segmentation for targeted
products
◎ Predicting next product failure and sending a
replacement part in advance
◎ Predict best candidates
46. Why modernize Data Warehouse with Big Data?
Traditional Enterprise Data Warehouse (EDW) can only
◎ Store only structured data
◎ Extremely expensive license cost per TB of storage
◎ Capacity constrained with ETL and query workloads
big data will help to...
◎ Store unstructured, semi-structured data
◎ Combine your structured data with other sources
◎ Run interactive SQL queries on big data
◎ Offload ETL workload from your EDW
◎ Offload less frequently used data from your EDW
◎ Save licensing costs
47. Use-case - Modernizing Data Warehouse
◎ Low cost storage for years of data
◎ Data lake for structured, unstructured and semi-
structured data
◎ Interactive queries on historic data
48. ◎ Online archival with reporting
○ Make years of data available
◎ ETL off-loading
○ Spark jobs to reduce ETL job time from hours
to minutes
◎ Batch reports off-loading
○ Reduce load on your warehouse by off-
loading batch reports
◎ Big Data Discovery
○ Proactively find patterns guided by the
system
Other use-cases
52. ◎ Identify sources of your unused data
○ like server logs
○ social streams
◎ Collect and store on cloud to minimize initial
investment
◎ Many cloud options like Amazon EC2,
Databricks, Altiscale...
◎ Use open-source analytics engines like
Elasticsearch, Kibana. They are free to use.
◎ Experience the success
◎ Automate using sensors or IoT devices to add
more sources of useful data
Start small and then scale