Manufacturers have an abundance of data, whether from connected sensors, plant systems, manufacturing systems, claims systems and external data from industry and government. Manufacturers face increased challenges from continually improving product quality, reducing warranty and recall costs to efficiently leveraging their supply chain. For example, giving the manufacturer a complete view of the product and customer information integrating manufacturing and plant floor data, with as built product configurations with sensor data from customer use to efficiently analyze warranty claim information to reduce detection to correction time, detect fraud and even become proactive around issues requires a capable enterprise data hub that integrates large volumes of both structured and unstructured information. Learn how an enterprise data hub built on Hadoop provides the tools to support analysis at every level in the manufacturing organization.
The manufacturing sector was an early and intensive user of data to drive quality and efficiency, adopting information technology and automation to design, build, and distribute products since the dawn of the computer era. In the 1990s, manufacturing companies racked up impressive annual productivity gains because of both operational improvements that increased the efficiency of their manufacturing processes and improvements in the quality of products they manufactured. For example, advanced manufactured products such as computers became much more powerful. Manufacturers also optimized their global footprints by placing sites in, or outsourcing production to, low-cost regions. But despite such advances, manufacturing, arguably more than most other sectors, faces the challenge of generating significant productivity improvement in industries that have already become relatively efficient. We believe that big data can underpin another substantial wave of gains.
These gains will come from improved efficiency in design and production, further improvements in product quality, and better meeting customer needs through more precisely targeted products and effective promotion and distribution. For example, big data can help manufacturers reduce product development time by 20 to 50 percent and eliminate defects prior to production through simulation and testing. Using real-time data, companies can also manage demand planning across extended enterprises and global supply chains, while reducing defects and rework within production plants. Overall, big data provides a means to achieve dramatic improvements in the management of the complex, global, extended value chains that are becoming prevalent in manufacturing and to meet customers’ needs in innovative and more precise ways, such as through collaborative product development based on customer data.
No individual record is particularly valuable, but having every record opens the door to extreme value.
This sector generates data from a multitude of sources, from instrumented production machinery (process control), to supply chain management systems, to systems that monitor the performance of products that have already been sold (e.g., during a single cross-country flight, a Boeing 737 generates 240 terabytes of data). And the amount of data generated will continue to grow exponentially. The number of RFID tags sold globally is projected to rise from 12 million in 2011 to 209 billion in 2021. IT systems installed along the value chain to monitor the extended enterprise are creating additional stores of increasingly complex data, which currently tends to reside only in the IT system where it is generated. Manufacturers will also begin to combine data from different systems including, for example, computer-aided design, computer-aided engineering, computer-aided manufacturing, collaborative product development management, and digital manufacturing, and across organizational boundaries in, for instance, end-to-end supply chain data.
Key takeaway: It is not just a BI or analytics challenge, it is the way that data is managed.
Keeping 3 main high level objectives of an architecture built for Data Discovery in mind- accessing data, analyzing data, and experimenting and iterating fast- we can examine a traditional architecture and see where organizations might run into issues.
Questions for customer: Does this look like your architecture? What limitations are you “living with” today?
Limited Data Access
Data siloes
Archived or deleted data
No unstructured data
Only SQL
Long Time to Value
Resource intensive ad-hoc ELT, CONVERT TO TABLES (SQL)
Inflexible
Adding dimensions takes months
Slow large scale queries
Sub-Optimal Decisions
Limits on data sets
Guessing?
Missing Critical items
Frustrated USERS!
Key takeaway: An EDH provides the foundation to change the way you collect and manage data in order to provide your analyst what they need in less time. No Filter, No missing data!
ETL on the fly: Talk to schema-on-write vs schema-on-read (http://www.slideshare.net/awadallah/schemaonread-vs-schemaonwrite).
1) Unlimited Data Access (Active archive, Scalable storage, Unstructured data)
2) Reduce Time to Value (ETL on the fly, Parallel processing, Complete data access, flexible-any schema, any file)
3) Best Decisions (Decisions on all the data)
Pulling from the “Insights Section”
Why Hadoop slide content:
Even with primarily relational systems, it involved hundreds of sources
Getting a BI tool to connect to so many sources is … not fun
More times than not, we needed to understand a subset or aggregate of this data - not all of the data!
Can use Pig to process, extract, filter the data
Can use Hive - a SQL like query language - to query my data
Why Hadoop slide content:
Even with primarily relational systems, it involved hundreds of sources
Getting a BI tool to connect to so many sources is … not fun
More times than not, we needed to understand a subset or aggregate of this data - not all of the data!
Can use Pig to process, extract, filter the data
Can use Hive - a SQL like query language - to query my data
Why Hadoop slide content:
Even with primarily relational systems, it involved hundreds of sources
Getting a BI tool to connect to so many sources is … not fun
More times than not, we needed to understand a subset or aggregate of this data - not all of the data!
Can use Pig to process, extract, filter the data
Can use Hive - a SQL like query language - to query my data
Why Hadoop slide content:
Even with primarily relational systems, it involved hundreds of sources
Getting a BI tool to connect to so many sources is … not fun
More times than not, we needed to understand a subset or aggregate of this data - not all of the data!
Can use Pig to process, extract, filter the data
Can use Hive - a SQL like query language - to query my data
Link to account record in SFDC: https://na6.salesforce.com/0018000000y2EIt?srPos=0&srKp=001
Omneo, a Division of Camstar, drives $15 to $25 million in annual savings for electronics manufacturers based on its ability to address supply chain issues in near real time.
Background: Today’s consumers have high expectations for the products we use everyday, particularly when it comes to our devices. We want new products to come out faster, at lower prices, with more capabilities than before. But we also demand increased reliability. Camstar, a 30-year veteran in the enterprise manufacturing and supply chain space, saw this trend and identified an opportunity.
Challenge: Electronic device manufacturers are responsible for delivering millions of products, each comprised of hundreds of components that are sourced from all over the globe, put together, and pushed through distribution channels to customers. There’s a large margin for error. Camstar set out to address this by spinning off a division called Omneo, who set out to build 360-degree view into supply chain and product quality.
Solution: After evaluating IBM Netezza, Infobright, Cassandra, MongoDB, and Hadoop, Omneo decided to try out Hadoop based on 3 main factors:
Scalability to grow with customers’ needs over time
Flexibility to meet the needs of diverse customers and data sets in a multi-tenant environment
Low TCO for an efficient big data solution
The team downloaded Cloudera Express since it was easy and no one had any prior experience with the technology. After a few months of demonstrating promising results, Omneo decided to perform a TCO analysis of Cloudera vs. IBM Netezza and their legacy (Oracle) data warehouse. Cloudera’s costs came in 75% lower per TB than IBM Netezza and 90% lower per TB than the incumbent. But before moving forward with a Cloudera Enterprise subscription, the team compared the different Hadoop vendors. They ultimately decided to move forward with Cloudera due to 4 main factors:
Long-term company strategy and viability
Ease of use and maturity of Cloudera Manager
Enterprise-grade support
Dedication to open source
Omneo has deployed a multi-tenant enterprise data hub from Cloudera as the platform behind its supply chain cloud solution, which ingests machine data and existing system data from throughout the manufacturing process, including from clients’ factory data, supplier data, field services, after-market repairs, and re-manufacturing data. The company uses MapReduce to transform and manipulate data into any structure needed; HBase to access specific records in real time; and Cloudera Search to rapidly index all raw data in a way that makes sense for customers.
Results: Omneo’s supply chain SaaS delivers a 360-degree view of the supply chain process in seconds, allowing manufacturers to access their data in different ways, on the fly. If something happens at any supplier that drives a sudden increase in quality issues, they can figure out where the issue stems from and why in minutes or hours. In traditional environments, these investigations would take weeks or months.Instead of spending time trying to pinpoint challenges, manufacturers can spend their time resolving them. Omneo’s clients report total annual savings between $15-25 million each, conservatively.
AMD improves yield predictions with a Cloudera-powered engineering data warehouse.
Background: Advanced Micro Devices (AMD) is a multinational semiconductor manufacturer that designs and builds graphics cards and microprocessors powering millions of the world's personal computers, tablets, gaming consoles, embedded devices, and cloud servers. All of the world’s leading PC and major video game console manufacturers have AMD technology inside. AMD relies on manufacturing test data to ensure product quality and perform engineering analysis in order to improve upon its world-class product designs.
Challenge: The company wanted to empower its engineers by giving them access to larger data sets at faster speeds. But the incumbent environment only stored less than 30% of available data elements, was built with several different integration tools, had many integration steps and relied on a large IT team to support and maintain this system. In 2011, there was an environment outage that took weeks to recover, so AMD initiated an Engineering Data Warehouse (EngDW) project to find a more agile, cost-effective solution and a simpler, more robust way to store, process, and fetch larger amounts of data for AMD’s engineers.
Solution: The semiconductor manufacturer replaced its legacy engineering data warehouse with the Dell Cloudera Solution for Apache Hadoop. AMD runs a 34-node production cluster today, which collects data throughout the manufacturing process. Hundreds of millions of new digital and parametric test readings are loaded to the cluster every day. At the heart of the EngDW project are CDH and HBase. A custom query engine reads from HBase to put the test measurements in the hands of the company’s engineers.
Results: AMD's decision to move from an RDBMS to a Hadoop platform that uses Cloudera on Dell servers powered by AMD Opteron processors has resulted in orders of magnitude performance improvement, in terms of both data loads and analytics.
Query times have been reduced by up to 300%, running on larger data sets than before. 99% of all queries execute in 15 minutes or less, with a median execution time of just 23 seconds.
Queries on hundreds of thousands of units execute two orders of magnitude faster than before.
Data reloads at a rate of three months per day, whereas it used to take a full day to reload 1.5 days’ data—that’s 60X faster.
Not only has AMD's EngDW project brought significant performance benefits, but it delivers greater functionality and value as well. Query results on EngDW now have an unlimited row limit, compared to the previous limit of just 100,000 rows (which had been set to ensure queries would return results in a given period of time). The EngDW project's Hadoop-based cluster allows AMD to store more than 90% of available data elements spanning 1.5-plus years’ history, whereas the previous system stored less than 30% of data available for only three to four months’ history. Now that AMD engineers can access greater amounts of test data in higher detail and at faster speeds, they can apply insights to debug and make continuous improvements to ensure their products meet customer needs.AMD has also significantly reduced the TCO of its EngDW through lower vendor support costs for relational database management software, less vendor support for data integration tools and software, fewer steps and tools needed for data integration, less vendor support for high-end storage arrays (external SAN storage), and a smaller IT support staff needed for end-to-end management.
B+
Today we're in the middle of a shift in how businesses use information. In the past, you'd define a set of business processes, build applications around each of them, and then go about gathering, conforming, and merging the necessary data sets to support those applications. From an infrastructure perspective, you'd be bringing the data over to the compute, often in relational databases. But you'd be leaving quite a lot on the table.
The modern realities of business demand a new approach. Today companies need, more than ever, to become information-driven, but given the amount and diversity of information available, and the rate of change in business, it's simply unsustainable to keep moving around and transforming huge volumes of data.
Pricing Data:Cloudera: HW + SW per-year list prices for Basic thru EDH at various configs
Old Way: Various sources. One of note:
- Cowen / Goldmacher coverage initiation of Teradata, June 17, 2013
- List price of high-end appliance (which he thinks is more comparable to our solution) is $57K/TB + maintenance for an annual cost of $39K/TB
- Prices have likely decreased, but we estimate they are still in excess of $30K/TB/year
- List price of their low-end appliance is $12K/TB + maint or $8K per year
Cloudera partners more broadly and deeply across the Hadoop ecosystem than any other vendor. With over 1200 partners and counting, our partnerships offer:
Compatibility with your existing tools and skills
160+ certified on Cloudera 5, including all 12 of the 12 Gartner Business Intelligence Magic Quadrant leaders
Flexible deployment options
On-premises
Public, private, or hybrid cloud
Appliances and engineered systems
Partnerships you can trust
Deep engineering relationships
Comprehensive certification program