Daum evaluated solutions that could address the limitations in the resource-intensive analysis required by Hadoop and the NoSQL database management systems. To meet the data analysis requirements for its search engine and the Internet services businesses, the company selected Pivotal Greenplum Database, which connects to Hadoop and enables the co-processing of both structured and unstructured data within a single solution.
To learn more, visit pivotal.io/big-data/pivotal-greenplum-database.
DevoxxFR 2024 Reproducible Builds with Apache Maven
Daum Communications Case Study
1. DAUM COMMUNICATIONS
Using big data analytics to understand and predict
user behavior
ESSENTIALS
Industry
Telecommunications
Company Size
2,000+ employees
Business Challenges
• Reduced responsiveness due to
inability to perform realtime
analysis
• Increased complexity from NoSQL
database management systems
• Reliance on resource-intensive
data analysis
• Reduced capability to make ad-
hoc queries on unstructured data
Solution
• EMC VNX unified storage
• Pivotal Greenplum Database
OVERVIEW
Daum Communications (Daum) is one of the leading providers of Korean-language
online services, including the news and information portal Daum.net, web-based email
service Hanmail.net, and the Daum Cafe online community. Headquartered in Jeju
Island, the company provides mobile web services, search marketing, and electronic
mapping. It also sells online advertising products through Daum.net. Daum is the
second largest web portal service provider in terms of daily visits in Korea and has
operating centers in Seoul and on Jeju Island.
Through its extensive range of Internet services and sale of online advertising
products, Daum generates vast amounts of unstructured data. The company has one
of the largest Apache Hadoop clusters in Korea, and analyzes its data to gain critical
competitive information in a number of areas, including user preferences and
behavior, search rankings, and advertisement targeting.
COMPLEX ENVIRONMENT IMPEDES DATA ANALYSIS
Facing intense domestic and global competition from a number of search engines that
are growing market share across desktop and mobile searches, Daum’s businesses
needed to make faster and better decisions to protect the company’s 20 percent share
of the Korean search market.
The company needed to analyze and make immediate decisions on its vast data stores
by extracting knowledge from its data in real time. But Daum was more interested in
solving analytic problems than in exploring relationships between data that are
available in traditional relational database systems. As a result, Daum was using
Hadoop to store data, and was using NoSQL non-relational database management
systems such as Cassandra and Storm as the Hadoop Distributed File System (HDFS)
to provide greater speed in performing Big Data analytics on unstructured data. This
solution landscape presented the company with serious challenges.
“Performing ad-hoc and multidimensional queries and analysis through Hadoop on our
unstructured data proved difficult,” says Jun-Sik Eom, Team Manager, Data
Technology Department, Daum Communications. “We were restricted in the speed of
data analysis due the batch processing of both unstructured and structured data,
which meant we relied heavily on the capability of our developers. Data analysis of
complex forms was also challenging in the NoSQL database.”
Because Daum’s data must be constantly reviewed, the company sought a solution
that would enable employees to perform high-speed queries on the data residing in
Hadoop. Additionally, Daum wanted to improve access through tools that were already
familiar to developers and database administrators.
CUSTOMER PROFILE
2. Benefits
• Increased data loading and
processing speeds
• Improved accuracy in generating
search results and predicting user
behaviour
• Increased efficiency by
performing rapid queries on the
data
• Reduced expenditures through
improved scalability
PIVOTAL GREENPLUM DATABASE ENABLES HIGH-SPEED
ANALYSIS OF UNSTRUCTURED DATA
Daum evaluated solutions that could address the limitations in the resource-intensive
analysis required by Hadoop and the NoSQL database management systems. To meet
the data analysis requirements for its search engine and Internet services businesses,
the company selected Pivotal Greenplum Database, which connects to Hadoop and
enables the co-processing of both structured and unstructured data within a single
solution.
“We were attracted to Pivotal Greenplum Database because of the advantage it had in
mixing the merits of database, data warehouse, and business intelligence,” says Eom.
“We can now use a single platform to run high-speed analytic queries on our most
appropriate data stores.”
“We were attracted to Pivotal Greenplum Database because of the
advantage it had in mixing the merits of database, data
warehouse, and business intelligence. We can now use a single
platform to run high-speed analytic queries on our most
appropriate data stores.”
Jun-Sik Eom,
Team Manager, Data Technology Department, Daum Communications
DELIVERING NEW BUSINESS INSIGHTS FROM REALTIME
ANALYSIS
To support its efforts to gain market share, Daum is using Pivotal Greenplum Database
to provide improved services and search accuracy to its users. Through realtime data
gathering and analysis of Internet searches and user behavior within its various online
services, the company can better predict future behavior and demand.
Daum can now make multiple queries—both in real time and over time as user patterns
and knowledge emerge—due to massively parallel processing (MPP) architecture, which
enables fast data loading and high-speed queries on the data. In addition to performing
real-time weblog analysis, the company can re-analyze data that has already been
processed and gain meaningful results with these various interpretations. Pivotal
helped Daum achieve an increased depth of knowledge, which is just as critical as
breadth in terms of delivering services.
ELIMINATING ROADBLOCKS TO SPEEDY QUERYING
Performing ad-hoc queries on the data stored in NoSQL databases from the Pivotal
Greenplum Database means administrators can use familiar SQL commands to perform
massive and multidimensional analysis. This reduces the company’s reliance on finding
specialist NoSQL and Hadoop skill sets, and minimizes the workload for employees.
“One of the most important elements in effectively using Big Data is securing the right
people,” says Eom. “We used to struggle with having the resources needed to perform
queries, which greatly reduced our processing efficiency. Today, instead of performing
queries on the NoSQL systems, we collect the data residing in Hadoop and NoSQL, and
then save it in Pivotal Greenplum Database to execute the analysis.”