What is Big Data - Big Data represents the information that is generated and requested from the result of connecting over 2.5B people with billions of devices supported by billions more sensors and intelligent connected systems to the Internet. These data will continue to grow at a rapid pace as more and more devices and users join online services. Tools to enable near real time information processing of this data are maturing with updated infrastructure like the cloud, or embedded IT infrastructures and storage. The “Vs.” represent the elements of how to classify the different actions and behavior on this information…Ultimately the goal is to come up with the 4th V – Value or the Quickest and Most Confident Time to Value Response for a given set of requests and conditions on the digital environment you are connected to.
If harnessed and managed correctly the impact of Big Data Insights is monumental. Imagine a world where Person to Person and Machine to Machine Analysis and Understanding is made available at your fingertips for business, social and ecological purposes. The tremendous gains in efficiency for the cities we live in and love to visit would be fantastic.Smart City Project: The Smart Cities project is creating an innovation network between cities and academics to develop & deliver better services to citizens and businesses Example “In Rio de Janeiro, IBM has developed one of the most ambitious urban management commandcenters that have ever arisen. Rio’s operation center is a large situation room that offers real time data of the different systems that govern the city. This integrates, from 400 cameras on the street and a wide network of sensors, 32 municipal agencies to exchange information in order to solve urgent crises such as a power outage, an endless rain or a traffic jam, taking decisions instantly. (source: http://www.paseoproject.eu/en/imaging-smart-city-minds/)Local Example as of 7/11 - Taipei has become the first “smart city with free Internet access” among the world's international cities. All people in Taiwan, whether they are residents or visitors from foreign countries or mainland China, can take advantage of the system, Hau said. (source: www.chinapost.com.tw/taiwan/local/taipei/2011/07/02/308329/Taipei-flips.htm)Taobao.com (Taobao: Cutting Costs and Saving Power)• Established in 2003, Taobao.com was financed by Alibaba Group.• Taobao.com is the largest online shopping platform with a market segment share of over 80 percent of the e-business in China.• Taobao.com has more than 800 million pieces of product information and over 370 million registered users. It is one of the top 20 websites by number of page views worldwide, with over 60 million visits every day.Manufacturing and Retail per McKinsey group report “Big data: The next frontier for innovation, competition, and productivity” (source: https://www.box.com/s/b8ffbe1253be764ec33b)Retail info: In the coming years, the continued adoption and development of big data levers have the potential to further increase sector-wide productivity by at least 0.5 percent a year through 2020. Among individual firms, these levers could increase operating margins by more than 60 percent for those pioneers that maximize their use of big data… US online and Web-influenced retail sales are forecast to become more than half of all sales by 2013… The volume of data is growing inexorably as retailers not only record every customer transaction and operation but also keep track of emerging data sources such as radio-frequency identification (RFID) chips that track products, and online customer behavior and sentiment.Specific example is Amazon’s results from their “you might also want” prompts (note I dropped Amazon specific attribution since we did not get it directly from them)Manufacturing info: Manufacturing stores more data than any other sector—close to 2 Exabyte's of new data stored in 2010. This sector generates data from a multitude of sources, from instrumented production machinery (process control), to supply chain management systems, to systems that monitor the performance of products that have already been sold (e.g., during a single cross-country flight, a Boeing 737 generates 240 terabytes of data).Example: Product lifecycle management (PLM) becomes a platform for “co-creation,” OEM and part suppliers can collaborate on design on-line. Toyota, Fiat, and Nissan have all cut new-model development time by 30 to 50 percent; Toyota claims to have eliminated 80 percent of defects prior to building the first physical prototype.Additional context: Big data are driving additional efficiency in the production process with the application of simulation techniques to the already large volume of data that production generates. The increasing deployment of the “Internet of Things” is also allowing manufacturers to use real-time data from sensors to track parts, monitor machinery, and guide actual operations.
When the Pentium Pro processor was introduced back in 1995, we shipped fewer than a million servers based on Intel processors, and less than 10% of the revenue spent on all server hardware was based on Intel architecture and 90% was based on these other proprietary architectures. And what you’ve seen since is a dramatic growth in the total volume of servers, with a significant portion of that driven by Intel based processors. And of course today the industry ships per IDC a little over 8 million servers a year, with 8 out of 10 of those servers based on Intel. Little did we know back in 1995 that we were one of key ingredients coming together to enable the transformation of the internet and the growth of the worldwide web. The ability to have a standard high volume server so that the internet users could scale in a cost effective manner, combined with the standards of the time: HTML, HTTP. All of that combined with the software, such as Apache web servers and Netscape browsers. All of these factors converged to create the internet phenomena and drive that growth. Of course we didn’t forecast that with the Pentium Pro processor, but we’re very proud to have been a part of it.
Intel Xeon-based servers have long been adopted by IT shops as the leading compute building blocks. And the most popular Xeon-based newest servers today are based on the Xeon E5-2600 processor series which offers advanced capabilities that simplify and save.To meet the growing demands of IT such as readiness for cloud computing, the growth in users and the ability to tackle the most complex technical problems, Intel has focused on increasing the capabilities of the processor that lies at the heart of a next generation data center. The Intel Xeon processor E5-2600 product family is the next generation Xeon processor that replaces Platforms based on the Intel Xeon processor 5600 & 5500 series. These processors offer better than ever performance no matter what your constraint is – floor space, power or budget – and on workloads that range from the most complicated scientific exploration to simple, yet crucial, web serving and infrastructure applications. Building on the success of it’s Xeon 5600 predecessor, the E5-2600 product family has increased processor core count and cache size in addition to supporting more efficient instructions with Intel® Advance Vector Extensions, to deliver up to an average of 80% more performance across a range of workloads. In addition to the raw performance gains, we’ve invested in improved I/O with Intel Integrated I/O which reduces latency ~30% while adding more lanes and higher bandwidth with support for PCIe 3.0. This helps reduce network and storage bottlenecks to unleash the performance capabilities of the latest Xeon processor. Deploying the E5-2600 can reduce your total cost of ownership by up to 66% via savings in utilities, software support and maintenance. Check out the online TCO tool to estimate savings in your specific situation. Now let’s turn to storage…
I want to highlight four of our key enabling programs.With 7,000 member companies, the Intel® Software Partner program helps support optimization efforts by concentrating resources, references, and tools for software optimization into key Technology Focus Areas.Through the Intel® Software Network, developers can connect with a plethora of communities, tools, training, events and more to do their jobs better and to deliver more efficient, higher-performing software to the market more quickly. These communities include mobility, open source, virtualization, visual computing, multi-threading, manageability, Intel® Atom™ processors and more. The Intel Academic Community provides on-line training and supplies higher education institutions with technical curricula and other resources.The Intel AppUpSM development program mentioned previously, provides the tools, resources, and support developers need to easily create, port, package, and sell apps for multiple device platforms worldwide through the Intel AppUp center and 20+ affiliate stores.Other enabling efforts include a customer response team, and high-touch enabling—that might involve on-site engineers—to speed development.
The more misspelled words you collect, the better is your spellcheck applicationIndividual engineers empowered to find answers in the vast data logs driving innovation and value
Key points:The Intel® Xeon® processor E5-2600 product family offers ~80% higher performance on key industry benchmarksOn some synthetic technical computing focused benchmarks we are seeing even high results – over 2X but typical user experience should see improvements closer to 80%Comparisons are top bin 5600 to top bin 2600 in 2S configurationStory:Compared to the best Intel Xeon processor 5600 series part, the Intel Xeon processor E5-2600 product family offers significant performance improvements across a range of workloads. What you’ll see on this slide is a cross section of enterprise and technical computing workloads that give a flavor of the scale of benefits that a typical user would see. For example on integer throughput (aka SPECint_rate) which is a good proxy for a typical enterprise server shows ~70% improvement while technical computing workloads – think everything from supercomputers crunching advanced physics problems to workstations rendering media content - are seeing even stronger results. You may notice that I’ve said that we’re seeing up to 80% performance improvement but actually have measured results over 2X – those results are specific, synthetic benchmarks that are focused on testing specific elements of the processor such as STREAMs which measures memory and LINPACK which is used to rank supercomputer’s theoretical computational power and we’re seeing fantastic results based on the latest microarchitecture, but when I talk to you about performance I want to make sure that I set the right expectation about what you should expect to see when you run you applications that utilize every part of the server not just narrow elements. So what enables us to deliver this kind of performance improvement vs. the prior generation even though these parts are on the same manufacturing technology?
15h00 intel - intel big data for aws summits rev3
The Disruption of Big Data Speaker Name Date
Agenda• Big Data – what is it?• Hardware Economics & Big Data Implications• Benefits of Intel® Inside• Customer Case Studies
Data Growth Phenomenon Photos uploaded to750M Facebook over 2011 new year’s weekend Data stored In Tweets sent 966PB Manufacturing as of 2009 200M every day in August 2011 Video generated every6.7PB day in a Smart City project in China Potential annual value $20B+ Spent on $60B from Big Data to US health care acquisition of data storage, management, and analysis companies in Value for service last 12 months provider from global$100B+ personal location data Decrease in product 50% development, assembly costs for manufacturing Source: McKinsey Global Institute Analysis
What is Big Data? Traditional Data Big DataVolume Gigabytes to Terabytes Petabytes and beyondVelocity Occasional Batch – Complex Event Processing Real-Time Data AnalyticsVariety Centralized, Structured i.e. Database Distributed, Unstructured Multi-format
Why is Big Data Important? Smart City Project: Up to 50% Decrease Improve Public in Product Safety, Boost Development and Economic Growth Assembly Costs1 Online Retailer Generate Revenue Generated 30% of from Data Analytics Sales Due to of B2B Sales Analytics Driven Recomendations1 Data is the Raw Material of the Information Age1::McKinsey Global Institute Analysis *Other brands and names are the property of their respective owners.
Changing Economics for Big Data Challenges Annual Server Unit Shipments Supercomputing in 2010 1997 >500 TFLOPS ~$55K/GFLOP ~1 TFLOP <$100/GFlop1990 2000 2000 2010 Performanc $/GFLOP e
The Heart of a Next Generation Cloud Intel® Xeon® E5: The Cloud’s primary building block • Up to 80% performance boost vs.. prior gen1 at consistent power level • Dramatically reduce compute time with Intel® Advanced Vector Extensions • Performance when you need it with Intel® Turbo Boost Tech 2.0 • Up to 66% reduction in total cost of ownership1 Delivers 100X Performance Boost since 2000 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 1 Over previous generation Intel® processors. Intel internal estimate. For more legal information on performance forecasts go to http://www.intel.com/performance 2 Intel measurements of average time for an I/O device read to local system memory under idle conditions. Improvement compares Xeon E5-2600 family vs.. Xeon 5600 series 3 Source: Intel internal analysis (backup); 2008 of 3 yr TCO. 4X power efficiency of 4 year old servers. See www.intelsalestraining.com/xeonestimator9 4 Intel. As reported at Q1’12 Intel earnings call.
Intel® Xeon® Processors:Solve the Most Important Problems of Any Scale 373 of Top500* supercomputers are powered by Intel® Architecture
Enabling a Vibrant EcosystemIntel Software and Services Intel Software Network: Engaging 7,000 ISVs via Providing resources Software Partner Program to > 8.3M developers Provisioning >2800 academic institutions with curricula, tools, training and research Support Open SourceEnabling AWS to run your choice of OS, Applications & Programming Languages
Amazon Web Services powered byIntel® Xeon® processor E5-2670 Access to Supercomputing On-demand Latest Intel® Xeon® performance enhancements without disruption Business Agility to Efficiently Perform Data Intensive Tasks in Less Time
Intel® Powered Supercomputer at AWS AWS built the 42nd fastest supercomputer in the world 1,064 Amazon EC2 CC2 instances with17,024 cores 240 teraflops cluster (240 trillion calculations per second) Less than $1,000 per hour Based on Intel® Xeon® processor E5-2670 Supercomputers by the Hour … for Everyone
Intel® & AWS deliver scale that lowers your cost AWS Scale & Innovation… … Drives Customer’s Costs Down Attract More Invest in Customers Capital Reduce Invest in Prices Technology 19 AWS Price Reductions Improve Efficiency Fueling Innovation in the Cloud
Business AgilityExperiment Often & Fail Quickly with AWS on Intel Cost of failure falls dramatically People are free to try out new ideas More risk taking, more innovation
Example use cases: Life Science Log analytics Social Networking
With Nimbus Discovery, looking at a cancer drugtarget: • Completed 12.55 Compute Years of Work • Analyzed ~21 Million Ligands • In only 3 hours, at a cost of $4828.85 / hour • Instead of $20+ Million in infrastructureIntel & AWS make impossible Big Science, possible
Weblog Analysis Suggests What You Are Searching For Better consumer experience through Big Data analysis20
Power and Simplicity of AWS on Intel® Xeon® Processors:Speeds your Time to Market
Tick-Tock Development ModelSustained Xeon® Microprocessor LeadershipTick Tock Tick Tock Tick Tock Tick Tock 65nm 45nm 32nm 22nm Intel® Core™ Nehalem Sandy Bridge Microarchitecture Microarchitecture Microarchitecture First high-volume Up to 6 cores Up to 8 cores server Quad-Core and 12MB Cache and 20MB Cache CPUs Integrated memory controller Integrated PCI Express Dedicated high-speed with DDR3 support bus per CPU Turbo Boost 2.0 Turbo Boost, Intel HT, AES- HW-assisted NI1 Intel Advanced Vector virtualization (VT-x) Extensions (AVX) End-to-end HW-assisted virtualization (VT-x, -d, -c)
APPROVED FOR PUBLIC USE Intel® Xeon® Processor E5-2600 Product Family Historical 2S Integer Throughput Performance Integer Throughput Performance Single Core 100X Dual Core Quad Core Six Core Eight Core Baseline Score Higher is better Xeon 1.00Xeon 1.26Xeon 2.20 512KB 3.06Xeon 3.20Xeon 3.60Xeon 3.80 2M 3.00Xeon 3.00 8M L2 QCXeon 2.93 8ML3 QC (2009) GT/s QPI GT/s QP 256KB L2 512KB L2 (2001) L2 (2002) (2003) L3 (2004) L2 (2004) L2 (2005) DC (2006) 12M L2 QC (2008) 2.9 20ML3 8C 8.0 (2010) (2000) Xeon 1M L3 2M 1M Xeon 4M L2 Xeon 3.33 (2007) 3.46 Xeon 6C 6.4 Xeon 12ML3 Intel® Xeon® Delivers 100X Boost in 2S Integer Throughput Performance since 2000 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Internal Assessment and Estimates. For more information go to http://www.intel.com/performance25