Classifying Multi-Variate Time Series at Scale:
Characterizing and understanding the runtime behavior of large scale Big Data production systems is extremely important. Typical systems consist of hundreds to thousands of machines in a cluster with hundreds of terabytes of storage costing millions of dollars, solving problems that are business critical. By instrumenting each running process, and measuring their resource utilization including CPU, Memory, I/O, network etc., as time series it is possible to understand and characterize the workload on these massive clusters. Each time series is a series consisting of tens to tens of thousands of data points that must be ingested and then classified. At Pepperdata, our instrumentation of the clusters collects over three hundred metrics from each task every five seconds resulting in millions of data points per hour. At this scale the data are equivalent to the biggest IOT data sets in the world. Our objective is to classify the collection of time series into a set of classes that represent different work load types. Or phrased differently, our problem is essentially the problem of classifying multivariate time series.
In this talk, we propose a unique, off-the-shelf approach to classifying time series that achieves near best-in-class accuracy for univariate series and generalizes to multivariate time series. Our technique maps each time series to a Grammian Angular Difference Field (GADF), interprets that as an image, uses Google’s pre-trained CNN (trained on Inception v3) to map the GADF images into a 2048-dimensional vector space and then uses a small MLP with two hidden layers, with fifty nodes in each layer, and a softmax output to achieve the final classification. Our work is not domain specific – a fact proven by our achieving competitive accuracies with published results on the univariate UCR data set as well as the multivariate UCI data set.
Bio: Before joining Pepperdata, Ash was executive chairman for Marianas Labs, a deep learning startup sold in December 2015. Prior to that he was CEO for Graphite Systems, a big data storage startup that was sold to EMC DSSD in August 2015. Munshi also served as CTO of Yahoo, as a CEO of both public and private companies, and is on the board of several technology startups.