How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-Cloud Data Fabric
Page 1 of 11
How the Journey to Modern Data
Management is Paved with an
Inclusive Edge-to-Cloud Data Fabric
Transcript of a discussion on the best ways widely inclusive data can be managed for today’s data-rich
but too often insights-poor organizations.
Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: Hewlett Packard
Dana Gardner: Hello, and welcome to the next BriefingsDirect Voice of Analytics Innovation
discussion. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and
moderator for this ongoing discussion on the latest insights into end-to-end data management
As businesses seek to gain insights for more elements of their physical edge -- from factory
sensors, myriad machinery, and across field operations -- data remains fragmented. But a Data
Fabric approach allows information and analytics to reside locally at the edge yet contribute to
the global improvement in optimizing large-scale operations.
Stay with us now as we explore how edge-to-core-to-cloud dispersed data can be harmonized
with a common fabric to make it accessible for use by more apps and across more analytics.
To learn more about the ways all data can be managed for
today’s data-rich but too often insights-poor organizations, we’re
joined by Chad Smykay, Field Chief Technology Officer for Data
Fabric at Hewlett Packard Enterprise (HPE). Welcome, Chad.
Chad Smykay: Thank you.
Gardner: Chad, why are companies still flooded with data? It
seems like they have the data, but they’re still thirsty for
actionable insights. If you have the data, why shouldn’t you also
have the insights readily available?
Smykay: There are a couple reasons for that. We still see today
challenges for our customers. One is just having a common data
governance methodology. That’s not just to govern the security and audits, and the techniques
around that -- but determining just what your data is.
I’ve gone into so many projects where they don’t even know where their data lives; just a simple
matrix of where the data is, where it lives, and how it’s important to the business. This is really
the first step that most companies just don’t do.
Gardner: What’s happening with managing data access when they do decide they want to find
it? What’s been happening with managing the explosive growth of unstructured data from all
corners of the enterprise?
Page 2 of 11
Tame your data and get to know it
Smykay: Five years ago, it was still the Wild West of data access. But we’re finally seeing
some great standards being deployed and application programming interfaces (APIs) for that
data access. Companies are now realizing there’s power in having one API to rule them all. In
this case, we see mostly Amazon S3.
There are some other great APIs for data access out there, but just having more standardized
API access into multiple datatypes has been great for our customers. It allows for APIs to gain
access across many different use cases. For example, business intelligence (BI) tools can come
in via an API. Or an application developer can access the same API. So that approach really
cuts down on my access methodologies, my security domains, and just how I manage that data
for API access.
Gardner: And when we look to get buy-in from the very top levels of businesses, why are
leaders now rethinking data management and exploitation of analytics? What are the business
drivers that are helping technologists get the resources they need to improve data access and
Smykay: The business drivers gain when data access
methods are as reusable as possible across the different use
cases. It used to be that you’d have different point solutions,
or different open source tools, needed to solve a business
use-case. That was great for the short-term, maybe with
some quarterly project or something for the year you did it in.
But then, down the road, say three years out, they would say, “My gosh, we have 10 different
tools across the many different use cases we’re using.” It makes it really hard to standardize for
the next set of use cases.
So that’s been a big business driver, gaining a common, secure access layer that can access
different types of data. That’s been the biggest driver for our HPE Data Fabric. That and having
common API access definitely reduces the management layer cost, as well as the security cost.
Gardner: It seems to me that such data access commonality, when you attain it, becomes a gift
that keeps giving. The many different types of data often need to go from the edge to dispersed
data centers and sometimes dispersed in the cloud. Doesn’t data access commonality also help
solve issues about managing access across disparate architectures and deployment models?
Smykay: You just hit the nail on the head. Having commonality for that API layer really gives
you the ability to deploy anywhere. When I have the same API set, it makes it very easy to go
from one cloud provider, or one solution, to another. But that can also create issues in terms of
where my data lives. You still have data gravity issues, for example. And if you don’t have
portability of the APIs and the data, you start to see some lock-in with the either the point
solution you went with or the cloud provider that’s providing that data access for you.
The business drivers
gain when data access
methods are as reusable
as possible across the
different use cases.
Page 3 of 11
Gardner: Following through on the gift that keeps giving idea, what is it about the Data Fabric
approach that also makes analytics easier? Does it help attain a common method for applying
Data Fabric increases deployment options
Smykay: There are a couple of things there. One, it allows you to keep the data where it may
need to stay. That could be for regulatory reasons or just depend on where you build and deploy
the analytics models. A Data Fabric helps you to start separating out your computing and
storage capabilities, but also keeps them coupled for wherever the deployment location is.
For example, a lot of our customers today have
the flexibility to deploy IT resources out in the
edge. That could be a small cluster or system
that pre-processes data. They may typically
slowly trickle all the data back to one location, a
core data center or a cloud location. Having
these systems at the edge gives them the benefit of both pushing information out, as well as
continuing to process at the edge. They can choose to deploy as they want, and to make the
data analytics solutions deployed at the core even better for reporting or modeling.
Gardner: It gets to the idea of act locally and learn globally. How is that important, and why are
organizations interested in doing that?
Smykay: It’s just-in-time, right? We want everything to be faster, and that’s what this Data
Fabric approach gets for you.
In the past, we’ve seen edge solutions deployed, but you weren’t processing a whole lot at the
edge. You were pushing along all the data back to a central, core location -- and then doing
something with that data. But we don’t have the time to do that anymore.
Unless you can change the laws of physics -- last time I checked, they haven’t done that yet --
we’re bound by the speed of light for these networks. And so we need to keep as much data
and systems as we can out locally at the edge. Yet we need to still take some of that information
back to one central location so we can understand what’s happening across all the different
locations. We still want to make the rearview reporting better globally for our business, as well
as allow for more global model management.
Gardner: Let’s look at some of the hurdles organizations have to overcome to make use of
such a Data Fabric. What is it about the way that data and information exist today that makes it
hard to get the most out of it? Why is it hard to put advanced data access and management in
place quickly and easily?
Track the data journey, standardize data destinations
Smykay: It’s tough for most organizations because they can’t take the wings off the airplane
while flying. We get that. You have to begin by creating some new standards within your
[Customers] can choose to deploy as
they want, and to make the data
analytics solutions deployed at the core
even better for reporting or modeling.
Page 4 of 11
organization, whether that’s standardizing on an API set for different datatypes, multiple
datatypes, a single datatype.
Then you need to standardize the deployment mechanisms within your organization for that
data. With the HPE Data Fabric, we give the ability to just say, “Hey, it doesn’t matter where you
deploy. We just need some x86 servers and we can help you standardize either on one API or
We now support more than 10 APIs, as well as the many different datatypes that these
organizations may have.
Typically, we see a lot of data silos still out there today with customers – and they’re getting
worse. By worse, I mean they’re now all over the place between multiple cloud providers. I may
use some of these cloud storage bucket systems from cloud vendor A, but I may use somebody
else’s SQL databases from cloud vendor B, and those may end up having their own access
methodologies and their own software development kits (SDKs).
Next you have to consider all the networking
in the middle. And let’s not even bring up
security and authorization to all of them. So
we find that the silos still exist, but they’ve
just gotten worse and they’ve just sprawled
out more. I call it the silo sprawl.
Gardner: Wow. So, if we have that silo sprawl now, and that complexity is becoming a hurdle,
the estimates are that we’re going to just keep getting more and more data from more and more
devices. So, if you don’t get a handle on this now, you’re never going to be able to scale, right?
Smykay: Yes, absolutely. If you’re going to have diversity of your data, the right way to manage
it is to make it use-case-driven. Don’t boil the ocean. That’s where we’ve seen all of our
successes. Focus on a couple of different use cases to start, especially if you’re getting into
newer predictive model management and using machine learning (ML) techniques.
But, you also have to look a little further out to say, “Okay, what’s next?” Right? “What’s
coming?” When you go down that data engineering and data science journey, you must
understand that, “Oh, I’m going to complete use case A, that’s going to lead to use case B,
which means I’m going to have to go grab from other data sources to either enrich the model or
create a whole other project or application for the business.”
You should create a data journey and understand where you’re going so you don’t just end up
with silo sprawl.
Gardner: Another challenge for organizations is their legacy installations. When we talk about
zettabytes of data coming, what is it about the legacy solutions -- and even the cloud storage
legacy -- that organizations need to rethink to be able to scale?
Zettabytes of data coming, need to be corralled
We see a lot of data silos still out there …
and they’re getting worse. … They’re now
all over the place between multiple cloud
providers. … I call it the silo sprawl.
Page 5 of 11
Smykay: It’s a very important point. Can we just have a moment of silence? Because now
we’re talking about zettabytes of data. Okay, I’m in.
Some 20 years ago, we were talking about petabytes of data. We thought that was a lot of data,
but if you look out to the future, we’re talking about some studies showing connected Internet of
Things (IoT) devices generating this zettabytes amount of data.
If you don’t get a handle on where your
data points are going to be generated, how
they’re going to be stored, and how they’re
going to be accessed now, this problem is
just going to get worse and worse for
Look, Data Fabric is a great solution. We have it, and it can solve a ton of these problems. But
as a consultant, if you don’t get ahead of these issues right now, you’re going to be under the
umbrella of probably 20 different cloud solutions for the next 10 years. So, really, we need to
look at the datatypes that you’re going to have to support, the access methodologies, and where
those need to be located and supported for your organization.
Gardner: Chad, it wasn’t that long ago that we were talking about how to manage big data, and
Hadoop was a big part of that. NoSQL and other open source databases in particular became
popular. What is it about the legacy of the big data approach that also needs to be rethought?
Smykay: One common issue we often see is the tendency to go either/or. By that I mean
saying, “Okay, we can do real-time analytics, but that’s a separate data deployment. Or we can
do batch, rearview reporting analytics, and that’s a separate data deployment.” But one thing
that our HPE Data Fabric has always been able to support is both -- at the same time -- and
that’s still true.
So if you’re going down a big data or data lake journey -- I think now the term now is a data
lakehouse, that’s a new one. For these, basically I need to be able to do my real-time analytics,
as well as my traditional BI reporting or rearview mirror reporting -- and that’s what we’ve been
doing for over 10 years. That’s probably one of the biggest limitations we have seen.
But it’s a heavy lift to get that data from one location to another, just because of the metadata
layer of Hadoop. And then you had dependencies with some of these NoSQL databases out
there on Hadoop, it caused some performance issues. You can only get so much performance
out of those databases, which is why we have NoSQL databases just out of the box of our Data
Fabric -- and we’ve never run into any of those issues.
Gardner: Of course, we can’t talk about end-to-end data without thinking about end-to-end
security. So, how do we think about the HPE Data Fabric approach helping when it comes to
security from the edge to the core?
Secure and manage data from edge to core
Smykay: This is near-and-dear to my heart because everyone always talks about these great
solutions out there to do edge computing. But I always ask, “Well, how do you secure it? How
If you don’t get a handle on where your data
points are going to be generated, how they’re
going to be stored, and how they’re going to
be accessed now, this problem is just going to
get worse and worse for organizations.
Page 6 of 11
do you authorize it? How does my application authorization happen all the way back from the
edge application to the data store in the core or in the cloud somewhere?”
That’s what I call off-sprawl, where those issues just add up. If we don’t have one way to secure
and manage all of our different datatypes, then what happens is, “Okay, well, I have this object-
based system out there, and it has its own authorization techniques.” It has its own
authentication techniques. By the way, it has its own way of enforcing security in terms of who
has access to what, unless … I haven’t talked about monitoring, right? How do we monitor this
So, now imagine doing that for each type of data that you have in your organization -- whether
it’s a SQL database, because that application is just a driving requirement for that, or a file-
based workload, or a block-based workload. You can see where this starts to steamroll and
build up to be a huge problem within an organization, and we see that all the time.
And, by the way, when it comes to your application developers, that becomes the biggest
annoyance for them. Why? Because when they want to go and create an application, they have
to go and say, “Okay, wait. How do I access this data? Oh, it’s different. Okay. I’ll use a different
key.” And then, “Oh, that’s a different authorization system. It’s a completely different way to
authenticate with my app.”
I honestly think that’s why we’re seeing a ton of issues
today in the security space. It’s why we’re seeing people
get hacked. It happens all the way down to the application
layer, as you often have this security sprawl that makes it
very hard to manage all of these different systems.
Gardner: We’ve come up in this word sprawl several times now. We’re sprawling with this,
we’re sprawling with that; there’s complexity and then there’s going to be even more scale
The bad news is there is quite a bit to consider when you want end-to-end data management
that takes the edge into consideration and has all these other anti-sprawl requirements. The
good news is a platform and standards approach with a Data Fabric forms the best, single way
to satisfy these many requirements.
So let’s talk about the solutions. How does HPE Ezmeral generally -- and the Ezmeral Data
Fabric specifically -- provide a common means to solve many of these thorny problems?
Smykay: We were just talking about security. We provide the same security domain across all
deployments. That means having one web-based user interface (UI), or one REST API call, to
manage all of those different datatypes.
We can be deployed across any x86 system. And having that multi-API access -- we have more
than 10 – allows for multi-data access. It includes everything from storing data into files and
storing data in blocks. We’re soon going to be able to support blocks in our solution. And then
we’ll be storing data into bit streams such as Kafka, and then into a NoSQL database as well.
Gardner: It’s important for people to understand that HPE Ezmeral is a larger family and that
the Data Fabric is a subset. But the whole seems to be greater than the sum of the parts. Why
You often have this security
sprawl that makes it very
hard to manage all of these
Page 7 of 11
is that the case? How has what HPE is doing in architecting Ezmeral been a lot more than data
Smykay: Whenever you have this “whole is greater than the sum of the parts,” you start
reducing so many things across the chain. When we talk about deploying a solution, that
includes, “How do I manage it? How do I update it? How do I monitor it?” And then back to
Honestly, there is a great report from IDC that says it best. We show a 567-percent, five-year
return on investment (ROI). That’s not from us, that’s IDC talking to our customers. I don’t know
of a better business value from a solution than that. The report speaks for itself, but it comes
down to these paper cuts of managing a solution. When you start to have multiple paper cuts,
across multiple arms, it starts to add up in an organization.
Gardner: Chad, what is it about the HPE Ezmeral portfolio and the way the Data Fabric fits in
that provides a catalyst to more improvement?
All the data, analyzed and put to future use cases
Smykay: One, the HPE Data Fabric can be deployed anywhere. It can be deployed
independently. We have hundreds and hundreds of customers. We have to continue supporting
them on their journey of compute and storage, but today we are already shipping a solution
where we can containerize the Data Fabric as a part of our HPE Ezmeral Container Platform
and also provide persistent storage for your containers.
The HPE Ezmeral Container Platform comes with the Data Fabric, it’s a part of the persistent
storage. That gives you full end-to-end management of the containers, not only the application
APIs. That means the management and the data portability.
So, now imagine being able to ship the data by containers
from your location, as it makes sense for your use case.
That’s the powerful message. We have already been on
the compute and storage journey; been down that road.
That road is not going away. We have many customers for
that, and it makes sense for many use cases. We’ve
already been on the journey of separating out compute and storage. And we’re in general
availability today. There are some other solutions out there that are still on a road map as far as
we know, but at HPE we’re there today. Customers have this deployed. They’re going down
their compute and storage separation journey with us.
Gardner: One of the things that gets me excited about the potential for Ezmeral is when you do
this right, it puts you in a position to be able to do advanced analytics in ways that hadn’t been
done before. Where do you see the HPE Ezmeral Data Fabric helping when it comes to broader
use of analytics across global operations?
Smykay: One of our CMOs used to say it best, and which Jack Morris has said: “If it’s going to
be about the data, it better be all about the data.”
Imagine being able to ship
the data by containers from
your location, as it makes
sense for your use case.
Page 8 of 11
When you improve automating data management across multiple deployments -- managing it,
monitoring it, keeping it secure -- you can then focus on those actual use cases. You can focus
on the data itself, right? That’s living in the HPE Data Fabric. That is the higher-level takeaway.
Our users are not spending all their time and money worrying about the data lifecycle. Instead,
they can now go use that data for their organizations and for future use cases.
HPE Ezmeral sets your organization up to use your data instead of worrying about your data.
We are set up to start using the Data Fabric for newer use cases and separating out compute
and storage, and having it run in containers. We’ve been doing that for years. The high-level
takeaway is you can go focus on using your data and not worrying about your data.
Gardner: How about some of the technical ways that you’re doing this? Things like global
namespaces, analytics-ready fabrics, and native multi-temperature management. Why are they
important specifically for getting to where we can capitalize on those new use cases?
Smykay: Global namespaces is probably the top feature we hear back from our customers on.
It allows them to gain one view of the data with the same common security model. Imagine
you’re a lawyer sitting at your computer and you double-click on a Data Fabric drive, you can
literally then see all of your deployments globally. That helps with discovery. That helps with
bringing onboard your data engineers and data scientists. Over the years that’s been one of the
biggest challenges, they spend a lot of time building up their data science and data engineering
groups and on just discovering the data.
Global namespace means I’m reducing my discovery time to figure out where the data is. A lot
of this analytics-ready value we’ve been supporting in the open source community for more than
10 years. There’s a ton of Apache open source projects out there, like Presto, Hive, and Drill. Of
course there’s also Spark-ready, and we have been supporting Spark for many years. That’s
pretty much the de facto standard we’re seeing when it comes to doing any kind of real-time
processing or analytics on data.
As for multi-temperature, that feature allows you to decrease your cost of your deployment, but
still allows managing all your data in one location. There are a lot of different ways we do that.
We use erasure coding. We can tear off to Amazon S3-compliant devices to reduce the overall
cost of deployment.
These features contribute to making it still easier. You
gain a common Data Fabric, common security layer,
and common API layer.
Gardner: Chad, we talked about much more data at the edge, how that’s created a number of
requirements, and the benefits of a comprehensive approach to data management. We talked
about the HPE Data Fabric solution, what it brings, and how it works. But we’ve been talking in
What about on the ground? Do you have any examples of organizations that have bitten off and
made Data Fabric core for them? As an adopter, what do they get? What are the business
Central view of data benefits customers, businesses
You gain a common Data Fabric,
common security layer, and
common API layer.
Page 9 of 11
Smykay: We’ve been talking a lot about edge-to-core-to-cloud, and the one example that’s just
top-of-mind is a big, tier-1 telecoms provider. This provider makes the equipment for your
AT&Ts and your Vodafones. That equipment sits out on the cell towers. And they have many
Data Fabric use cases, more than 30 with us.
But the one I love most is real-time antenna tuning. They’re able to improve customer
satisfaction in real time and reduce the need to physically return to hotspots on an antenna.
They do it via real-time data collection on the antennas and then aggregating that across all of
the different layers that they have in their deployments.
They gain a central view of all of the data using a
modern API for the DevOps needs. They still
centrally process data, but they also process it at
the edge today. We replicate all of that data for
them. We manage that for them and take a lot of
the traditional data management tasks off the table
for them, so they can focus on the use case of the
best way to tune antennas.
Gardner: They have the local benefit of tuning the antenna. But what’s the global payback? Do
we have a business quantitative or qualitative returns for them in doing that?
Smykay: Yes, but they’re pretty secretive. We’ve heard that they’ve gotten a payback in the
millions of dollars, but an immediate, direct payback for them is in reducing the application
development spend everywhere across the layer. That reduction is because they can use the
same type of API to publish that data as a stream, and then use the same API semantics to
secure and manage it all. They can then take that same application, which is deployed in a
container today, and easily deploy it to any remote location around the world.
Gardner: There’s that key aspect of the application portability that we’ve danced around a bit.
Any other examples that demonstrate the adoption of the HPE Data Fabric and the business
Smykay: Another one off the top of my head is a midstream oil and gas customer in the
Houston area. This one’s not so much about edge-to-core-to-cloud. This is more about
consolidation of use cases.
We discussed earlier that we can support both rearview reporting analytics as well as real-time
reporting use cases. And in this case, they actually have multiple use cases, up to about five or
six right now. Among them, they are able to do predictive failure reports for heat exchangers.
These heat exchangers are deployed regionally and they are really temperamental. You have to
monitor them all the time.
But now they have a proactive model where they can do a predictive failure monitor on those
heat exchangers just by checking the temperatures on the floor cameras. They bring in all real-
time camera data and they can predict, “Oh, we think we’re having an issue with this heat
exchanger on this time and this day.” So that decreases management cost for them.
They also gain a dynamic parts management capability for all of their inventory in their
warehouses. They can deliver faster, not only on parts, but reduce their capital expenditure
We manage that for them and take a
lot of the traditional data management
tasks off the table for them, so they
can focus on the use case of the best
way to tune antennas.
Page 10 of 11
(CapEx) costs, too. They have gained material measurement balances. When you push oil
across a pipeline, they can detect where that balance is off across the pipeline and detect where
they’re losing money, because if they are not pushing oil across the pipe at x amount of psi,
they’re losing money.
So they’re able to dynamically detect that and fix it along the pipe. They also have a pipeline
leak detection that they have been working on, which is modeled to detect corrosion and decay.
The point is there are multiple use cases. But because they’re able to start putting those data
types together and continue to build off of it, every use case gets stronger and stronger.
Gardner: It becomes a virtuous adoption cycle; the more you can use the data generally, then
the more value, then the more you invest in getting a standard fabric approach, and then the
more use cases pop up. It can become very powerful.
This last example also shows the intersection of operational technology (OT) and IT. Together
they can start to discover high-level, end-to-end business operational efficiencies. Is that what
Data science, data engineering teams work together
Smykay: Yes, absolutely. A Data Fabric is kind of the Kumbaya set among these different
groups. If they’re able to standardize on the IT and developer side, it makes it easier for them to
talk the same language. I’ve seen this with the oil and gas customer. Now those data science
and data engineering teams work hand in hand, which is where you want to get in your
organization. You want those IT teams working with the teams managing your solutions today.
That’s what I’m seeing. As you get a better, more common data model or fabric, you get faster
and you get better management savings by having your people working better together.
Gardner: And, of course, when you’re able to do data-driven operations, procurement, logistics,
and transportation you get to what we’re referring generally as digital business transformation.
Chad, how does a Data Fabric approach then contribute to the larger goal of business
Smykay: It allows organizations to work together
through a common data framework. That’s been one
of the biggest issues I’ve seen, when I come in and
say, “Okay, we’re going to start on this use case.
Where is the data?”
Depending on size of the organization, you’re talking to three to five different groups, and
sometimes 10 different people, just to put a use case together. But as you create a common
data access method, you see an organization where it’s easier and easier for not only your use
cases, but your businesses to work together on the goal of whatever you’re trying to do and use
your data for.
[Data Fabric] allows organizations
to work together through a
common data framework.
Page 11 of 11
Gardner: I’m afraid we’ll have to leave it there. We’ve been exploring how a Data Fabric
approach allows information and analytics to reside locally at the edge, yet contribute to a global
improvement in optimizing large-scale operations.
And we’ve learned how HPE Ezmeral Data Fabric makes modern data management more
attainable so businesses can dramatically improve their operational efficiency and innovate from
edge to core to clouds.
So please join me in thanking our guest, Chad Smykay, Field Chief Technology Officer for Data
Fabric at HPE. Thanks so much, Chad.
Smykay: Thank you, I appreciate it.
Gardner: And a big thank you as well to our audience for joining this sponsored BriefingsDirect
Voice of Analytics Innovation discussion. I’m Dana Gardner, Principal Analyst at Interarbor
Solutions, your host for this ongoing series of Hewlett Packard Enterprise-supported
Thanks again for listening. Please pass this along to your IT community, and do come back next
Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: Hewlett Packard
Transcript of a discussion on the best ways widely inclusive data can be managed for today’s data-rich
but too often insights-poor organizations. Copyright Interarbor Solutions, LLC, 2005-2020. All rights
You may also be interested in:
• The IT intelligence foundation for digital business transformation rests on HPE InfoSight AIOps
• Nimble Storage leverages big data and cloud to produce data performance optimization on the fly
• How Digital Transformation Navigates Disruption to Chart a Better Course to the New Normal
• How REI used automation to cloudify infrastructure and rapidly adjust its digital pandemic
• How the right data and AI deliver insights and reassurance on the path to a new normal
• How IT modern operational services enables self-managing, self-healing, and self-optimizing
• How HPE Pointnext Services ushers businesses to the new normal via an inclusive nine-step
• As containers go mainstream, IT culture should pivot to end-to-end DevSecOps
• AI-first approach to infrastructure design extends analytics to more high-value use cases
• How Intility uses HPE Primera intelligent storage to move to 100 percent data uptime