Ask any project manager and they will tell you the importance of reviewing lessons learned prior to starting a new project. The lesson learned databases are filled with nuggets of valuable information to help project teams increase the likelihood of project success. Why then do most lesson learned databases go unused by project teams? In my experience, they are difficult to search through and require hours of time to review the result set.
Recently I had a project engineer ask me if we could search our lessons learned using a list of 22 key terms the team was interested in. Our current keyword search engine would require him to enter each term individually, select the link, and save the document for review. Also, there was no way to search only the database, the query would search our entire corpus, close to 20 million URLs. This would not do. I asked our search team if they would run a special query against the lesson database only, using the terms provided. They returned a spreadsheet with a link to each document containing the terms. The engineer had his work cut out for him: over 1100 documents were on the list;.
I started thinking there had to be a better way. I had been experimenting with topic modeling, in particular to assist our users in connecting seemingly disparate documents through an easier visualization mechanism. Something better than a list of links on multiple pages. I gathered my toolbox: R/RStudio, for the topic modeling and exploring the data; Neo4j, for modeling and visualizing the topics; and Linkurious, a web front end for our users to search and visualize the graph database.
5. “The most important contribution management needs to
make in the 21st Century is to increase the productivity
of knowledge work and the knowledge worker.”
PETER F. DRUCKER, 1999
6. NASA Challenges
• Hundreds of millions of documents, reports, project data, lessons learned,
scientific research, medical analysis, geo spatial data, IT logs, etc., are stored
nation wide
• The data is growing in terms of variety, velocity, volume, value and veracity
• Accessibility to Engineering data sources
• Visibility is limited
7. To convert data to knowledge a convergence of Knowledge
Management, Informatics and Data Science is necessary.
7
Knowledge
Management
Data ScienceInformatics
8. Knowledge Architecture
• The people, processes, and technology of designing, implementing, and applying the
intellectual infrastructure of organizations.
• What is an intellectual infrastructure?
• The set of activities to create, capture, organize, analyze, visualize, present, and utilize
the information part of the information age..
• Information + Contexts = Knowledge
• Knowledge Management + Informatics + Data Science = Knowledge Architecture
• KM without Informatics is empty (Strategy Only)
• Informatics without KM is blind (IT based KM)
• Data Science transforms your data to knowledge
8
9. “We have an opportunity for everyone in the world to have access to all
the world’s information. This has never before been possible. Why is
ubiquitous information so profound? It is a tremendous equalizer.
Information is power.”
ERIC SCHMIDT (FORMER CEO OF GOOGLE)
15. TOPIC MODELING
15
Topic models are based upon the idea that documents are mixtures of
topics, where a topic is a probability distribution over words.
LDA Model from Blei (2011)
David Blei homepage - http://www.cs.columbia.edu/~blei/topicmodeling.htmlBlei, David M. 2011. “Introduction to Probabilistic Topic Models.” Communications of the ACM.
16. CORRELATION BY CATEGORY
16
To find the per-document probabilities we extract theta from the fitted model’s
topic posteriors
28. 28
WHAT COULD YOU ACCOMPLISH IF YOU COULD:
• Empower faster and more informed decision-making
• Leverage lessons of the past to minimize waste, rework,
re-invention and redundancy
• Reduce the learning curve for new employees
• Enhance and extend existing content and document
management systems
29. Contact Information
David Meza – david.meza-1@nasa.gov
Twitter - @davidmeza1
Linkedin - https://www.linkedin.com/pub/david-meza/16/543/50b
Github – davidmeza1
Blog
davidmeza1.github.io
29
IMAGE: Expedition 46 Soyuz Approaches Space Station for Docking
Russian cosmonaut Yuri Malenchenko manually docked the Soyuz TMA-19M spacecraft on Dec. 15, 2015 to the International Space Station's Rassvet module after an initial automated attempt was aborted. Flight Engineer Tim Kopra of NASA and Flight Engineer Tim Peake of ESA flanked Malenchenko as he brought the Soyuz to the Rassvet port.
In your time in engineering – what’s changed? Pace/complexity. Fundamentally, engineering is about problem solving. It hasn’t changed in 5-6 decades, what’s changed are the tools… but they don’t meet the needs of engineers,
Peter Drucker made this statement in 1999, it is even more important today for organizations to increase knowledge workers productivity. Why? Data creation has increased significantly since 1999. In 2012, Gartner projected data to increase by 800% with 80% being unstructured. Combine the speed by which we can access information, the need to make quicker decisions to stay competitive in this market place and the time it takes to find information in the Enterprise, and we have an environment where knowledge workers are spending more time sifting through information than extracting and using that knowledge.
46% Workers can’t find the information they need almost half the time. (IDC)
30% of total R&D spend is wasted duplicating research and work previously done.
Source: National Board of Patents and Registration (PRH), WIPO, IFA
54% of decisions are made with incomplete, inconsistent and inadequate information (InfoCentric Research)
The role of the Chief Data Officer – CIO is responsible for the pipes the data runs through, CDO is responsible for the data running through the pipes.
Combination of Knowledge Management, Data Science and Information Architecture
Information Architecture– Understanding your data, users and creation processes
Analytics – Using data science techniques to extract knowledge from implicit data
Knowledge Management– Utilizes the inputs from DM and Analytics to provide the tools and process to display the end results in a manner conducive to the user
Knowledge Architecture is the application of informatics to knowledge management. That is, using the skills for defining and designing information spaces to establish an environment conducive to managing knowledge.
Informatics is the study and practice of creating, storing, finding, manipulating and sharing information. Informatics was designed to be conceptual and practical, academic and professional, and focused on the human and humanistic dimensions of the design and use of information systems
Borrowing a metaphor from physics, you can think of the difference between information architecture and knowledge architecture in terms of energy. Information architecture tends to focus on designing spaces for existing or predefined information. What might be called kinetic information. For example, one branch of information architecture focuses on findability, with little or no concern about how the content itself comes into being.
Knowledge architecture, on the other hand, deals with potential information. So, rather than determining the best way to use existing content, the knowledge architect is designing "spaces" that encourage knowledge to be created, captured, and shared. In this respect, the actual content doesn't matter as much as the life cycle -- how and when it gets created and how best to get it to the right people quickly. For example, collaboration strategies may focus on the structure and set up of team spaces or discussion forums -- how they get created, how they operate, how people find them and vice versa. But the actual tasks and topics discussed in those spaces are up to the teams that use them and may not be determined to long after the strategy is completed and in place.
That is not to say that knowledge architects don't have plenty of traditional information architecture/content management responsibilities as well -- such as taxonomies, web site structures, search interfaces, etc. But what sets them apart from other information architects is their focus on the design of spaces and the processes that support knowledge being exchanged, rather than on the knowledge itself.
The database was not being used on a regular basis to extract lesson and when it was users complained on how difficult it was to find appropriate lessons. They had to read through too many lessons, wasting precious project time. Users could only filter by date or Center, even though the lesson contained other useful information, such as, category, safety related, program or project phase, directorate etc.
Fortunately the data set contained substantial metadata, providing opportunities to enhance the exploration of the corpus through the generated topics models. Late we will demonstrate how to find documents quickly and their association to the metadata.
Topic modeling provides an algorithmic solution to managing, organizing and annotating large archival text. The annotations aid you in tasks of information retrieval, classification and corpus exploration
Topic models provide a simple way to analyze large volumes of unlabeled text. A "topic" consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with similar meanings and distinguish between uses of words with multiple meanings. For a general introduction to topic modeling, see for example Probabilistic Topic Models by Steyvers and Griffiths (2007).
In machine learning and natural language processing topic models are generative models which provide a probabilistic framework for the term frequency occurrences in documents in a given corpus.
I drew a rough outline showing the various connections of the proposed nodes in the graph. It is fairly simple, however I found it helps keep me focused on the model.
http://davidmeza1.github.io/2015/07/16/Graphing-a-lesson-learned-database.html
I drew a rough outline showing the various connections of the proposed nodes in the graph. It is fairly simple, however I found it helps keep me focused on the model.
http://davidmeza1.github.io/2015/07/16/Graphing-a-lesson-learned-database.html
I drew a rough outline showing the various connections of the proposed nodes in the graph. It is fairly simple, however I found it helps keep me focused on the model.
http://davidmeza1.github.io/2015/07/16/Graphing-a-lesson-learned-database.html
Now I have all the data in my graph database, yet I still need to make it easier for my end users to search and connect lessons based on their criteria. In Learning Neo4j”, several visualization options were mentioned. I evaluated several of the applications and for this demonstration I settled on Linkurious, a web based interface for users to search and visualize graph data. Linkurious was designed to connect to a neo4j database and requires the neo4j database to be running before you start it. Upon opening the application you are presented with this dashboard.
I can begin exploring the topic and uncover relationships. Since the topic is related to four other nodes, I am given an option to select the nodes I want to display on the screen. Selecting the Lesson node, I am able to display all of the lessons contained in this topic. You can make adjustments to the visualization to add color and size. In this case, each node type is a different color and the node size is determined by the year the lesson was written. The newer the lesson the larger the node. Properties for each node are displayed to the left. In the image below, information on the highlighted node can be seen and if the user wants to visit the site where the lesson is stored, they can click on the url in the properties.
Now I ask you, Is information power or is Knowledge power.
NASA Astronaut Tim Kopra on Dec. 21 Spacewalk
Expedition 46 Flight Engineer Tim Kopra on a Dec. 21, 2015 spacewalk, in which Kopra and Expedition 46 Commander Scott Kelly successfully moved the International Space Station's mobile transporter rail car ahead of Wednesday's docking of a Russian cargo supply spacecraft.