講者:SYSTEX 數據加值應用發展部產品經理 | 陶靖霖
議題簡介:認清現實吧! Big Data 是個熱門詞彙、熱門議題,但是問題的核心仍然圍繞在資料處理的流程、架構與技術,要踏入 Big Data 的領域,使用者會遭遇哪些挑戰? Splunk 被譽為「全球最佳的 Big Data Company」,究竟在資料處理的流程中擁有什麼獨特的技術優勢,能夠幫助使用者克服這些挑戰?又有哪些成功幫助使用者從資料中萃取出價值的應用案例?歡迎來認識 Splunk 以及全球 Big Data 成功案例。
9. 9
獲獎無數
Big Data
Innovator
2013 SIEM Magic Quadrant
LEADER
2012 Security Market Growth
#1 Worldwide
2012 IT Operations Market Growth
#3 Worldwide
Best SIEM North America
Best Enterprise
Security Solution EMEA
#1
Most
Innovative#4
33. 33
Search your IT infrastructure
J2EE exception
Last 60 minutes
fail* password sshd
Last 30 minutes
Last 60 minutes
Last 3 hours
Last 24 hours
Last 7 days
All time
Last 24 hours
以關鍵字方式快速探索巨量資料
Splunk is a data engine for your machine data. It gives you real-time visibility and intelligence into what’s happening across your IT infrastructure – whether it’s physical, virtual or in the cloud.
Everybody now recognizes the value of this data, the problem up to now has been getting to it.
At Splunk we applied the search engine paradigm to being able to rapidly harness any and all machine data wherever it originates. The “no predefined schema” design, means you can point Splunk at any of your data, regardless of format, source or location. There is no need to build custom parsers or connectors, there’s no traditional RDBMS, there’s no need to filter and forward.
Here we see just a sample of the kinds of data Splunk can ‘eat’.
Reminder – what’s the ‘big deal’ about machine data? It holds a categorical record of the following:
User transactions
Customer behavior
Machine behavior
Security threats
Fraudulent activity
You can imagine that a single user transaction can span many systems and sources of this data, or a single service relies on many underlying systems. Splunk gives you one place to search, report on, analyze and visualize all this data.
Getting data into Splunk is designed to be as flexible and easy as possible. Because the indexing engine is so flexible and doesn’t generally require configuration for most IT data, all that remains is how to collect and ship the data to your Splunk. There are many options.
First, you can collect data over the network, without an agent. The most common network input is syslog; Splunk is a fully compliant and customizable syslog listner over both TCP and UDP. Further, because Splunk is just software, any remote file share you can mount or symlink to via the operating system is available for indexing as well. To facilitate remote Windows data collection, Splunk has a its own WMI query tool that can remotely collect Windows Event logs and performance counters from your Windows systems. Finally, Splunk has a AD monitoring tool that can connect to AD and get your user meta data to enhance your searching context and monitor AD for replication, policy or user security changes.
When Splunk is running locally as an indexer or lightweight forwarder, you have additional options and greater control. Splunk can directly monitor hundreds or thousands of local files, index them and detect changes. Additionally, many customers use our out-of-the-box scripts and tools to generate data – common examples include performance polling scripts on *nix hosts, API calls to collect hypervisor statistics and for detailed monitoring of custom apps running in debug modes. Also, Splunk has Windows-specific collection tools, including native Event Log access, registry monitoring drivers, performance monitoring and AD monitoring that can run locally with a minimal footprint.
A single indexers it can index 100-200 gigabytes per day depending the data sources and load from searching. If you have terabytes a day you can linearly scale a single, logical Splunk deployment by adding index servers, using Splunk’s built in forwarder load balancing to distribute the data and using distributed search to provide a single view across all of these servers. Unlike some log management products you get full consolidated reporting and alerting not simply merged query results.
When in doubt, the first rule of scaling is ‘add another commodity indexer.’ Splunk indexers are designed to enable nearly limitless fan-out with linear scalability by leveraging techniques like MapReduce to fan-out work in a highly efficient manner.
A single indexers it can index 100-200 gigabytes per day depending the data sources and load from searching. If you have terabytes a day you can linearly scale a single, logical Splunk deployment by adding index servers, using Splunk’s built in forwarder load balancing to distribute the data and using distributed search to provide a single view across all of these servers. Unlike some log management products you get full consolidated reporting and alerting not simply merged query results.
When in doubt, the first rule of scaling is ‘add another commodity indexer.’ Splunk indexers are designed to enable nearly limitless fan-out with linear scalability by leveraging techniques like MapReduce to fan-out work in a highly efficient manner.
To address some of the challenges, we released Splunk Hadoop Connect in October last year. This enables bi-directional integration - users can browse and move data into Splunk and act on it. And since launch we’ve seen nearly 1,000 downloads! (as of June 2013).
Hunk offers Full-featured Analytics in an Integrated Platform
Explore, analyze and visualize data, create dashboards and share reports from one integrated platform.
Hunk enables everyone in your organization to unlock the business value of data locked in Hadoop
Hunk integrates the processes of data exploration, analysis and visualization into a single, fluid user experience designed to drive rapid insights from your big data in Hadoop. Enable powerful analytics for everyone with Splunk’s Data Models and the Pivot interface, first released in Splunk Enterprise 6.
And Hunk works with what you have today
Hunk works on Apache Hadoop and most major distributions, including those from Cloudera, Hortonworks, IBM, MapR and Pivotal, with support for both first-generation MapReduce and YARN (Yet Another Resource Negotiator, the technical acronym for 2nd generation MapReduce). Preview results and interactively search across one or more Hadoop clusters, including from different distribution vendors. Use the ODBC driver for saved searches with report acceleration to feed data from Hunk to third-party data visualization tools or business intelligence software. Streaming Resource Libraries enables developers to stream data from NoSQL and other data stores, such as Apache Accumulo, Apache Cassandra, Couchbase, MongoDB and Neo4j, for exploration, analysis and visualization in Hunk.
Here are just some of the new Splunk Apps that have been delivered over the past year.
Their goal is to make it easier to use Splunk for specific technologies and use cases – prepackaging inputs, field extractions, searches and visualizations.
Highlight a few apps.
These apps along with 100’s of others have been developed not only by Splunk but by partners, customers and members of the Splunk community.
BUILD SPLUNK APPS
The Splunk Web Framework makes building a Splunk app looks and feels like building any modern web application.
The Simple Dashboard Editor makes it easy to BUILD interactive dashboards and user workflows as well as add custom styling, behavior and visualizations. Simple XML is ideal for fast, lightweight app customization and building. Simple XML development requires minimal coding knowledge and is well-suited for Splunk power users in IT to get fast visualization and analytics from their machine data. Simple XML also lets the developer “escape” to HTML with one click to do more powerful customization and integration with JavaScript.
Developers looking for more advanced functionality and capabilities can build Splunk apps from the ground up using popular, standards-based web technologies: JavaScript and Django. The Splunk Web Framework lets developers quickly create Splunk apps by using prebuilt components, styles, templates, and reusable samples as well as supporting the development of custom logic, interactions, components, and UI. Developers can choose to program their Splunk app using Simple XML, JavaScript or Django (or any combination thereof).
EXTEND AND INTEGRATE SPLUNK
Splunk Enterprise is a robust, fully-integrated platform that enables developers to INTEGRATE data and functionality from Splunk software into applications across the organization using Software Development Kits (SDKs) for Java, JavaScript, C#, Python, PHP and Ruby. These SDKs make it easier to code to the open REST API that sits on top of the Splunk Engine. With almost 200 endpoints, the REST API lets developers do programmatically what any end user can do in the UI and more. The Splunk SDKs include documentation, code samples, resources and tools to make it faster and more efficient to program against the Splunk REST API using constructs and syntax familiar to developers experienced with Java, Python, JavaScript, PHP, Ruby and C#. Developers can easily manage HTTP access, authentication and namespaces in just a few lines of code.
Developers can use the Splunk SDKs to:
- Run real-time searches and retrieve Splunk data from line-of-business systems like Customer Service applications
- Integrate data and visualizations (charts, tables) from Splunk into BI tools and reporting dashboards
- Build mobile applications with real-time KPI dashboards and alerts powered by Splunk
- Log directly to Splunk from remote devices and applications via TCP, UDP and HTTP
- Build customer-facing dashboards in your applications powered by user-specific data in Splunk
- Manage a Splunk instance, including adding and removing users as well as creating data inputs from an application outside of Splunk
- Programmatically extract data from Splunk for long-term data warehousing
Developers can EXTEND the power of Splunk software with programmatic control over search commands, data sources and data enrichment.
Splunk Enterprise offers search extensibility through:
- Custom Search Commands - developers can add a custom search script (in Python) to Splunk to create own search commands. To build a search that runs recursively, developers need to make calls directly to the REST API
- Scripted Lookups: developers can programmatically script lookups via Python.
- Scripted Alerts: can trigger a shell script or batch file (we provide guidance for Python and PERL).
- Search Macros: make chunks of a search reuseable in multiple places, including saved and ad hoc searches.
Splunk also provides developers with other mechanisms to extend the power of the platform.
- Data Models: allow developers to abstract away the search language syntax, making Splunk queries (and thus, functionality) more manageable and portable/shareable.
- Modular Inputs: allow developers to extend Splunk to programmatically manage custom data input functionality via REST.