Speakers:
Shawn Akberali, Software Engineer Snr, Lockheed Martin
Caroline Nelson, Data Analyst, Lockheed Martin
Robert Tung, Data Warehouse Analyst, Lockheed Martin
Abstract: Knowledge graphs have enormous potential for delivering superior customer experiences, advanced analytics and efficient data management.
Learn valuable tips from a leading practitioner on how to position, organize and implement your first enterprise graph project.
2. LOCKHEED MARTIN PUBLIC RELEASE 2
Business Structure
Missiles and Fire
Control
Aeronautics SpaceRotary and Mission
Systems
3. LOCKHEED MARTIN PUBLIC RELEASE 3
Goal: Deliver F-35’s on time at
cost and keep them flying!
High Priority F-35 Program
• Largest defense program in history
• Customers include US Air Force, Navy, and Marine
Corps, as well as 10+ partner countries, and growing
• US to buy 2,500+ F-35s through 2037
• 420+ delivered to date
Goal: Deliver F-35’s on time at cost and keep them flying!
4. LOCKHEED MARTIN PUBLIC RELEASE 4
Why F-35 needs graph
From design to delivery, there are ample amounts of data that can be threaded together to
tell a story.
5. LOCKHEED MARTIN PUBLIC RELEASE 5
Business Value
Visualize disparate data & areas of
overlap
Detect Data Anomalies Easy Navigation of Many-to-Many
Relationships
Iteratively Solve Problems Root Cause AnalysisEmphasis on data relationships
TABLE A
TABLE B
6. LOCKHEED MARTIN PUBLIC RELEASE 6
GraphDB Platform: Ecosystem
Data Sources
Neo4j Linkurious
Community
Kettle
7. LOCKHEED MARTIN PUBLIC RELEASE 7
Self Service
“We Are All Developers Now.”
Tools Education Process
• Kettle
• Linkurious
• Neo4j
• Wiki
• Instructor Led Training
• Work Guides
• Online Tutorials
• IT/Business Partnership
• Data Access Requests
• Software Requests
• Development Lifecycle
• Naming Standards
8. LOCKHEED MARTIN PUBLIC RELEASE 8
Your Users Are Your Best
Assets!
Key Takeaways
• Scaling
• Training
• Security
• Grooming Data
• Model design
Editor's Notes
Name, AA function, deliver products/capabilities to the business – partner with IT; focus on meeting customer expectations internally and externally
Leverage Neo4j to navigate difficult data problems
Today we’ll be walking through what we’ve done, the value we’ve seen, and how IT is empowering the business
Business Areas/products – deliver capabilities to customer military forces– focus on airplanes
At Aeronautics we are currently working to deliver on the largest defense program in history – the F-35. With that, we have many customers from many different countries outside the US, and growing as we continue to meet customer expectations. The US alone will buy over 2500 F35’s, and so far, we have delivered over 420. Our main objective with this program is to deliver jets on time at cost, and sustain them with spare parts after they’ve been delivered.
So as you can tell already there is a lot that goes into the successful delivery and sustainment of an F35. There is the design phase where each individual part is identified for each jet. There’s the manufacturing phase where, if there’ve been any engineering improvements of parts since design, and then there’s the delivery/sustainment of the jet where there could be issues with parts/new and improved parts to swap out. The cost of the jet includes all labor/material costs associated with building it over roughly a 2 year period. We have many different systems of data that house different functions’ needs, such as purchase orders, labor data, org structure data, etc. And a lot of this data can be threaded together to tell the story of what went in to producing and sustaining a specific jet.
Graph technology came into the picture to help us tell these stories. We realized the performance benefits it could offer with our massive amounts of data too big for relational analyses. About two years ago, we started learning more and more until we were ready to adapt it ourselves, and Shawn will be discussing more of what we did and the value we’ve seen.
Working with business users, we have been able to identify 6 key areas where the business has found value in graph database
Our business has found value in the ability to…
Visualize disparate data and areas of overlap
Multiple data elements from various systems
Internal Sources
External Sources
Helps identify where functional areas plug into one another
Ultimately helps in synergizing our functional areas and developing better models
Detect data anomalies
Identifying holes in our data and working to fill gaps
Could be due to not standardizing across the board or data exclusion by a functional area
Possible flaws in business logic being built in
Easily navigate of many-to-many relationships
Although we are able to navigate relationships using relational databases, it would often get cumbersome
ERDs to translate data for sources to connect
GraphDB helps in making it easier to read and extract meaning from these relationships
Iteratively solve problems
As we saw data overlaps, we were able to modify our model quickly to make it more meaningful
This is helpful in answering new ad-hoc questions quicker
Emphasis on data relationships
Able to identify critical areas where data converges
Such as where engineering data and financial data both meet
Helps in answering business questions posed by respective functional areas
Perform root cause analysis
Users have been able to perform queries to answer questions quicker than before
Used to take hours to days to do manually
Can explore the graph to answer questions and perform analysis
Given a starting point, user can explore till answer is found
What does our graph ecosystem look like?
Goal is to develop an ecosystem that is more focused around the community of users versus developer centric.
Enabling the community to extract meaning from the data faster than traditional methods
Started our journey with just Neo4j
Creating indexes and constraints
Creating our metadata model
Exploring our models after loading
Create queries to perform analysis
Incorporated data via LOAD CSV however quickly realized this was not ideal
Realized we needed a tool for data integration into our graph database
Worked with Neo4j and were introduced to Kettle
Kettle mainly used for:
Essentially used for ETL purposes
We use Kettle to:
Pull in data from various sources
Standardize our data
Applying business logic that may be applicable
Write to our graph
Revise our model as needed
Kettle is easy to use and our user community has had an overall good experience while using it
After working with some users, we realized Neo4j looked quite technical resulting in resistance from our non-technical users
Worked with Neo4j and were recommended Linkurious
Linkurious mainly used for:
Intended for our wider technical and non-technical community
Technical users create and share meaningful queries with their team
Queries for common business questions
Queries used to better understand data at hand
Useful in performing ad-hoc analysis of business questions posed
Exploration from users often results in updates to existing models, new queries or new use cases all together
With this toolset we empowered our business users, allowing all of us to be developers.
I would like to welcome Robert Tung to speak about the incorporation of graph databases into our self service model
IT is building the framework and foundation upon which business users can operate with confidence to create solutions that add business value. IT should be viewed not as a barrier to entry but as an enabler for users to accomplish their goals.
One of our current missions within Lockheed Martin Aeronautics is to improve overall employee capability in the area of data analytics. In order to do so, we’re building these three pillars of self-service: Tools, Education, and Process.
Providing the TOOLS necessary to allow users to build their own solutions. We are making “developer” level applications more widely available to users. For example: instead of being the sole domain of IT developers, we are providing powerful ETL tools such as Kettle to users. We are creating pathways so that the business community is able to access the data contained in “enterprise” databases instead of canned reports. This will allow business users to escape the constraints of tools such as Excel, or MS Access and open up the flow of information to a larger community. Of course with great power, comes great responsibility. Just because you have the ability to run “DETACH DELETE ALL” does not necessarily mean you should. And that leads us to…
EDUCATION: Give the support and training needed for individuals to be successful in creating solutions. I.T. is partnering with business users to help accelerate their learning and solution building. We want to educate the initial group of users to be more than merely capable, this first wave of users should be the vanguard of a user community who are confident in their own capabilities and can act as local leaders to facilitate adoption of new technologies. We want these users to combine their business knowledge with I.T. know-how so they are able to accomplish their goals. In addition to intensive hands on guidance for this first wave, a training curriculum is being built to scale up our educational efforts. Our follow on waves of users will be a larger cohort as compared to the initial groups and our teaching methods will have to adapt to meet these needs.
PROCESS is almost a natural output of all these efforts. The process for acquiring the tools, the process of educating users, the process for getting access to data. As we are standing up these self service pillars, we are also incorporating our lessons learned to help guide future development. We are documenting these steps to promote growth by setting best practices and standards.
So what does this all mean in the context of graph? Graph, due to its more intuitive user interface is a good candidate as an introduction to the self service model. For users without a technical background, the transition from whiteboard to graph database is a blessing. Having the ability to visually traverse the data in a graph allow the user to reinforce their understanding of the data model as implemented in a graph.
https://www.gartner.com/it-glossary/self-service-analytics
Self-Service Analytics is a form of business intelligence (BI) in which line-of-business professionals are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support. Self-service analytics is often characterized by simple-to-use BI tools with basic analytic capabilities and an underlying data model that has been simplified or scaled down for ease of understanding and straightforward data access.
During our evaluation and graph deployment, we’ve observed the interactions between people, process and technology. How they interact within the confines of our organization is interesting and provides lessons for us to learn if we perform a careful, detailed examination.
Scalability is difficult. Problems that occur when you move from a single user to ten users is different than the problems you encounter when expanding the community to a hundred! It’s exponential (just think of the communication edges that are generated with the addition of each new person added to an organization graph). Not only do you have to ensure that your architecture can support the increased usage patterns, it becomes increasingly difficult to ensure that users follow best practices and coordinate so that their additions to the graph works well with contributions from other users.
Training becomes ever more important to ensure that the newer users can rapidly integrate. What we’ve seen is that for a significant portion of users, after an initial period of hesitation, their fear of Cypher as “a programming language” fades. Anecdotally, new users pick up Cypher more readily than SQL.
One thing we’ve observed is that we have to train users out of default to flat files as input/output. It’s something deeply engrained in many users and we’re slowly bringing them to the data sources (databases) that actually generate many of these flat files. We’re not just training users in Graph, we’re also elevating their access and skills so they will be able to fend for themselves in seeking out data in the future.
Due to the heterogenous nature of users, training and guidance have to be tailored to individual skill/experience level. An amusing tidbit is that speaking from personal experience graph requires a slightly different perspective in thinking especially if you’re coming from a relational database background. Although some core concepts will serve you well (normalization) the elevation in importance of relationships demands that we think about what questions we want to ask of the database early on instead of charging forward with normalization efforts and ERD diagrams (as per the normal relational database).
We are an aeronautics/defense company, as such we have a more stringent focus on data security. This is problematic in terms of a graph database, especially since the power of a graph comes from the relationships between data nodes. Of course, each organization is different but based upon our experience we would recommend that security/data restrictions be considered earlier rather than later. Retroactively bolting on security is no fun for anybody.
Data cleansing is a big portion of work in our field. Regardless of what tool/platform, ensuring that the data is good and well suited for your purpose is where a lot of effort is going to be spent (unless you’re in data utopia where you never have badly formatted data, the string “zero” instead of a 0 number, date of birth in address fields). Null handling in relational databases is important but in Graph I would claim that it is mandatory. Otherwise you end up with strange little paths where a bunch of nodes link to a blank/null node. Which of course means a funny little graph.
Graph model design is a crucial step that some newer users overlook. They jump headfirst into building out their graph and generating the solutions they need, today. I will say that this is perfectly acceptable for a single user graph, however once you expand scope to include other users, the design step can’t be skipped (unless you’re a fan of endless rework.
One relatively frequent design issue we’ve seen is how to address instances. The interaction between two nodes in a graph is usually a relationship but sometimes an intersection is formed resulting in an instance (one of the most common ones is Events).
-----Graph naming convention falls into the user training bucket but it is a best practice that will generate dividends as your graph expands.
Empowered business users will surprise you with their ability when provided sufficient support. They are out there solving problems and with a little (or a lot!) of IT help, they can accomplish great feats. As IT personnel, we should act as force multipliers and always be seeking to increase our users’ capabilities!
Thank you for listening to our Lockheed Martin Aeronautics presentation today at GraphTour Dallas 2019.