AI for human communication is about recognition, parsing, understanding, and generating natural language. The concept of natural language is evolving. A key focus is the analysis, interpretation, and generation of verbal and written language. Other language focus areas include haptic, sonic, and visual language, data, and interaction.
6. This content included for educational purposes.
This research deck précis information from the Forrester Digital
Transformation Conference in May 2017. It compiles selected copy and
visuals from conference presentations and recent Forrester research
reports. Contents are organized into the following sections:
▪ Digital transfor
6
Overview of AI
for human communication
•Natural language processing (NLP) is
the confluence of artificial
intelligence (AI) and linguistics.
•A key focus is the analysis,
interpretation, and generation of
verbal and written language.
•Other language focus areas include
audible & visual language, data, and
interaction.
•Formal programming languages
enable computers to process natural
language and other types of data.
•Symbolic reasoning employs rules
and logic to frame arguments, make
inferences, and draw conclusions.
•Machine learning (ML) is a area of AI
and NLP that solves problems using
statistical techniques, large data sets
and probabilistic reasoning.
•Deep learning (DL) is a type of
machine learning that uses layered
artificial neural networks.
Deep Learning
Machine Learning
Human Communica6on
Ar6ficial Intelligence
Natural Language Processing:
NLP|NLU|NLG
InteracFon:
Dialog, gesture,
emoFon, hapFc
Audible Language:
Speech, sound
Visual Language:
2D/3D/4D
WriQen Language:
Verbal, text
Formal
Language
Processing
Symbolic Reasoning
Data
11. This content included for educational purposes.
Text analytics
11
Text mining is the discovery by computer of new, previously
unknown information, by automatically extracting it from
different written resources. A key element is the linking
together of the extracted information together to form new
facts or new hypotheses to be explored further by more
conventional means of experimentation.
Text analytics is the investigation of concepts, connections,
patterns, correlations, and trends discovered in written
sources. Text analytics examine linguistic structure and apply
statistical, semantic, and machine-learning techniques to
discern entities (names, dates, places, terms) and their
attributes as well as relationships, concepts, and even
sentiments. They extract these 'features' to databases or
semantic stores for further analysis, automate classification
and processing of source documents, and exploit visualization
for exploratory analysis.
IM messages, email, call center logs, customer service survey
results, claims forms, corporate documents, blogs, message
boards, and websites are providing companies with enormous
quantities of unstructured data — data that is information-rich
but typically difficult to get at in a usable way.
Text analytics goes beyond search to turn documents and
messages into data. It extends Business Intelligence (BI) and
data mining and brings analytical power to content
management. Together, these complementary technologies
have the potential to turn knowledge management into
knowledge analytics.
21. This content included for educational purposes.
Toward understanding diagrams using recurrent networks and deep learning
21
Source: AI2
Diagrams are rich and diverse. The top row depicts inter class variability of visual
illustrations. The bottom row shows intra-class variation for the water cycle category.
LSTM1 LSTM1 LSTM1 LSTM1
LSTM2 LSTM2 LSTM2 LSTM2
c0 c1 c2 cT
[xycand, scorecand, overlapcand, … scorerel , seenrel … ]
Candidate
Relationships
Diagram Parse
Graph
Stacked
LSTM
Network
Relationship
Feature Vector
FC1
FC2
FC1
FC2
FC1
FC2
FC1
FC2
FC3 FC3 FC3 FC3
Add No change Add Final
Fully
Connected
Fully
Connected
Architecture for inferring DPGs from diagrams. The LSTM based network exploits
global constraints such as overlap, coverage, and layout to select a subset of relations
amongst thousands of candidates to construct a DPG.
The diagram depicts
The life cycle of
a) frog 0.924
b) bird 0.02
c) insecticide 0.054
d) insect 0.002
How many stages of Growth
does the diagram Feature?
a) 4 0.924
b) 2 0.02
c) 3 0.054
d) 1 0.002
What comes before
Second feed?
a) digestion 0.0
b) First feed 0.15
c) indigestion 0.0
d) oviposition 0.85
Sample question answering results. Left column is the diagram.
The second column shows the answer chosen and the third column
shows the nodes and edges in the DPG that Dqa-Net decided to
attend to (indicated by red highlights).
Diagrams represent complex concepts, relationships and events, often
when it would be difficult to portray the same information with natural
images. Diagram Parse Graphs (DPG) model the structure of diagrams.
RNN+LSTM-based syntactic parsing of diagrams learns to infer DPGs.
Adding a DPG-based attention model enables semantic interpretation and
reasoning for diagram question answering.
28. This content included for educational purposes. 28
S u m m a r i z a t i o n
Output documentInput document Purpose
Source size
Single-document
Multi-document
Specificity
Domain-specific
General
Form
Audience
Generic
Query-oriented
Usage
Expansiveness
Indicative
Informative
Derivation
Conventionality
Background
Just-the-news
Extract
Abstract
Partiality
Neutral
Evaluative
Fixed
Floating
Scale
Genre
Summarization classification
Automatic summarization is the process of shortening a text
document with software, in order to create a summary with the
major points of the original document. Genres of summary
include:
• Single-document vs. multi-document source — based on one
text vs. fuses together many texts. E.g., for multi-document
summaries we may want one summary with common
information, or similarities and differences among documents,
or support and opposition to specific ideas and concepts.
• Generic vs. query-oriented — provides author’s view vs.
reflects user’s interest.
• Indicative vs. informative — what’s it about (quick
categorization) vs. substitute for reading it (content
processing).
• Background vs. just-the-news — assumes reader’s prior
knowledge is poor vs. up-to-date.
• Extract vs. abstract — lists fragments of text vs. re-phrases
content coherently.
33. This content included for educational purposes. 33
Input
document(s)
Summary
Pre-processing
Normalizer
Segmenter
Stemmer
Stop-word
eliminator
List
of sentences
List of
pre-processed
words for
each sentence
Processing
Clustering
Learning
Scoring
List
of clusters
Summary size
P(f|C)
Extraction
Extraction
Sentences
scores
ReOrdering
List of first
higher scored
sentences
Reordered
sentences
Extrac6ve summariza6on process
• Preprocessing reads and
cleans-up data (including
stop word removal, numbers,
punctuation, short words,
stemming, lemmatization),
and builds the document
term matrix.
• Processing vectorizes and
scores sentences, which may
entail heuristic, statistical,
linguistic, graph-based, and
machine learning methods.
• Extraction selects, orders and
stitches together highest
scoring sentences, and
presents the summary
41. This content included for educational purposes.
6x4 DOCUMENTS
T
E
R
M
S
=
6x4 TOPICS
T
E
R
M
S
X X
TOP 0 0 0
0 IC 0 0
0 0 IMPO 0
0 0 0 RTAN
CE
4x4 DOCUMENTS
T
O
P
I
C
S
41
Latent semantic analysis
• LSA is a technique of
distributional semantics for
analyzing relationships
between a set of documents
and the terms they contain by
producing a set of concepts
related to the documents and
terms.
• LSA finds smaller (lower-rank)
matrices that closely
approximate the document-
term matrix by picking the
highest assignments for each
word to topic, and each topic
to document, and dropping
the ones not of interest.
• The contexts in which a certain
word exists or does not exist
determine the similarity of the
documents.
43. This content included for educational purposes.
43
Source: Andrius Knispelis, ISSUU
the topic
distribution for
document i
a parameter that sets the
prior on the per-document
topic distributions
a parameter that sets the
prior on the per-topic
word distributions
the topic for
the j’th word in
a document i
observed
words in a
document i
N
M
Θα
β
Z W
N words
M documents
A topic model developed by David Blei, Andrew Ng and Michael Jordan in
2003.
It tells us what topics are present in any given document by observing all the
words in it and producing a topic distribution.
LATENT DIRICHLET ALLOCATION
word
word
word
word
word
word
word
word
word
word
word
word
word
word
word
word
tfidf.mm wordids.txt
words
documents
words
topics
model.lda
Document Term Matrix Topic Model
This content included for educational purposes.
45. This content included for educational purposes.
45
Source: Andrius Knispelis, ISSUU
preprocess
the data
Text corpus depends on the
application domain.
It should be contextualised since the
window of context will determine
what words are considered to be
related.
The only observable features for the
model are words. Experiment with
various stoplists to make sure only
the right ones are getting in.
Training corpus can be different from
the documents it will be scored on.
Good all utility corpus is Wikipedia.
train
the model
The key parameter is the number of
topics. Again, depends on the
domain.
Other parameters are alpha and beta.
You can leave them aside to begin
with and only tune later.
Good place to start is gensim - free
python library.
score
it on new document
The goal of the model is not to label
documents, but rather to give them a
unique fingerprint so that they can be
compared to each other in a
humanlike fashion.
evaluate
the performance
Evaluation depends on the
application.
Use Jensen-Shannon Distance as
similarity metric.
Evaluation should show whether the
model captures the right aspects
compared to a human.
Also it will show what distance
threshold is still being perceived as
similar enough.
Use perplexity to see if your model is
representative of the documents
you’re scoring it on.
LDA process
This content included for educational purposes.
46. This content included for educational purposes.
LDA topic modeling process
46
Topics and their Words
Tuning
Parameters
Dictionaries
Bag-of-Words
Bag of-
words Dictionaries
Tokenization
Lemmatization
Stopwords
Removal
LDA
Vector Space ModelPreprocessing
Step 1:
Select β
• The term distribution β is determined for each
topic by β ∼ Dirichlet (δ).
Step 2:
Select α
• The. proportions θ of the topic distribution for the
document w are determined by: θ ∼ Dirichlet (α)
Step 3:
Iterate
• For each of the N words wi
- (a) Choose a topic zi ∼ Multinomial(θ).
- (b) Choose a word wi from a multinomial
probability distribution conditioned on the topic
- zi : p(wi|zi, β).
* β is the term distribution of topics and
contains the probability of a word occurring in
a given topic.
* The process is purely based on frequency
and co-occurrence of words
• Pass through LDA
algorithm and
evaluate
• Create document-term
matrix, dictionaries,
corpus of Bag-of-Words
• Clean documents of as much noise
as possible, for example:
- Lowercase all the text
- Replace all special characters
and do n-gram tokenizing
- Lemmatize - reduce words to
their root form, e.g., “reviews”
and “reviewing” to “review”
- Remove numbers (e.g., “2017”)
and remove HTML tags and
symbols
47. This content included for educational purposes.
• Correlated topic model — CTM allows topics to be correlated, leading to better
prediction, which is more robust to overfitting.
• Dynamic topic model — DTM models how each individual topic changes over
time.
• Supervised LDA — sLDA associates an external variable with each document,
which defines a one-to-one correspondence between latent topics and user tags.
• Relational topic model — RTM predicts which documents a new document is
likely to be linked to. (E.g., tracking activities on Facebook in order to predict a
reaction to an advertisement.)
• Hierarchical topic model — HTM draws the relationship between one topic and
another (which LDA does not) and indicates the level of abstraction of a topic
(which CTM correlation does not).
• Structural topic model — STM provides fast, transparent, replicable analyses that
require few a priori assumptions about the texts under study. STM includes
covariates of interest. Unlike LDA, topics can be correlated and each document
has its own prior distribution over topics, defined by covariate X rather than
sharing a mean, allowing word use within a topic to vary by covariate U.
47
Advanced
topic modeling
techniques
49. This content included for educational purposes. 49
Query focused0mul$Qdocument0summariza$on0
• a
Document
Document
Document
Document
Document
Input Docs
Sentence
Segmentation
All sentences
from documents
Sentence
Simplification
Content Selection
Sentence
Extraction:
LLR, MMR
Extracted
sentences
Information
Ordering
Sentence
Realization
Summary
All sentences
plus simplified versions
Query
• Multi-document summarization
aims to capture the important
information of a set of
documents related to the same
topic and presenting it in a
brief, representative, and
pertinent summary.
• Query-driven summarization
encodes criteria as search
specs. The user needs only
certain types of information
(e.g., I know what I want! —
don’t confuse me with drivel!)
System processes specs top-
down to filter or analyze text
portions. Templates or frames
order information and shape
presentation of the summary.
56. This content included for educational purposes. 56Source: A Beginner’s Guide to Recurrent Networks and LSTMs
Long short term memory (LSTM)
• Long Short Term Memory (LSTM) empowers a RNN with longer-
term recall. This allows the model to make more context-aware
predictions.
• LSTM has gates that act as differentiable RAM memory. Access
to memory cells is guarded by “read”, “write” and “erase” gates.
• Starting from the bottom of the diagram, the triple arrows show
where information flows into the cell at multiple points. That
combination of present input and past cell state is fed into the
cell itself, and also to each of its three gates, which will decide
how the input will be handled.
• The black dots are the gates themselves, which determine
respectively whether to let new input in, erase the present cell
state, and/or let that state impact the network’s output at the
present time step. S_c is the current state of the memory cell,
and g_y_in is the current input to it. Remember that each gate
can be open or shut, and they will recombine their open and
shut states at each step. The cell can forget its state, or not; be
written to, or not; and be read from, or not, at each time step,
and those flows are represented here.
61. This content included for educational purposes.
Symbolic methods
• Declarative languages (Logic)
• Imperative languages
C, C++, Java, etc.
• Hybrid languages (Prolog)
• Rules — theorem provers,
expert systems
• Frames — case-based
reasoning, model-based
reasoning
• Semantic networks, ontologies
• Facts, propositions
Symbolic methods can find
information by inference, can
explain answer
Non-Symbolic methods
• Neural networks — knowledge
encoded in the weights of the
neural network, for
embeddings, thought vectors
• Genetic algorithms
• graphical models — baysean
reasoning
• Support vectors
Neural KR is mainly about
perception, issue is lack of
common sense (there is a lot of
inference involved in everyday
human reasoning
Knowledge Representation
and Reasoning
Knowledge representation
and reasoning is:
• What any agent—human,
animal, electronic,
mechanical—needs to
know to behave
intelligently
• What computational
mechanisms allow this
knowledge to be
manipulated?
61
85. This content included for educational purposes. 85
VERTICAL RULESET
CLIENT RULESET
CORE NLG ENGINE
CORE ENGINE RULESET
Source: Arria
NLG rulesets
• Core ruleset — general purpose rules used in
almost every application of the NLG engine. These
capture knowledge about data processing and
linguistic communication in general, independent
of the particular domain of application.
• Vertical ruleset — rules encoding knowledge
about the specific industry vertical or domain in
which the NLG engine is being used. Industry
vertical rulesets are constantly being refined via
ongoing development, embodying knowledge
about data processing and linguistic
communication, which is common to different
clients in the same vertical.
• Client ruleset — rules that are specific to the client
for whom the NLG engine is being configured.
These rules embody the particular expertise in
data processing and linguistic communication that
are unique to a client application.
89. This content included for educational purposes.
Summarization, and algorithms to make text quantifiable, allow us to
derive insights from Large amounts of unstructured text data.
Unstructured text has been slower to yield to the kinds of analysis that
many businesses are starting to take for granted.
We are beginning to gain the ability to do remarkable things with
unstructured text data.
First, the use of neural networks and deep learning for text offers the ability
to build models that go beyond just counting words to actually representing
the concepts and meaning in text quantitatively.
These examples start simple and eventually demonstrate the breakthrough
capabilities realized by the application of sentence embedding and
recurrent neural networks to capturing the semantic meaning of text.
Machine Learning
Machine Learning is a type of Artificial Intelligence that provides
computers with the ability to learn without being explicitly
programmed.
Machine Learning
Algorithm
Learned Model
Data
Prediction
Labeled Data
Training
Prediction
Provides various techniques that can learn from and make predictions on data
89
Source: NarraFve Science
Machine learning
Source: Lukas Masuch
90. This content included for educational purposes.
Summarization, and algorithms to make text quantifiable, allow us to
derive insights from Large amounts of unstructured text data.
Unstructured text has been slower to yield to the kinds of analysis that
many businesses are starting to take for granted.
We are beginning to gain the ability to do remarkable things with
unstructured text data.
First, the use of neural networks and deep learning for text offers the ability
to build models that go beyond just counting words to actually representing
the concepts and meaning in text quantitatively.
These examples start simple and eventually demonstrate the breakthrough
capabilities realized by the application of sentence embedding and
recurrent neural networks to capturing the semantic meaning of text.
Deep Learning
Architecture
A deep neural network consists of a hierarchy of layers, whereby each layer
transforms the input data into more abstract representations (e.g. edge ->
nose -> face). The output layer combines those features to make predictions.
90
Source: NarraFve Science
Deep learning
Source: Lukas Masuch
91. This content included for educational purposes.
Summarization, and algorithms to make text quantifiable, allow us to
derive insights from Large amounts of unstructured text data.
Unstructured text has been slower to yield to the kinds of analysis that
many businesses are starting to take for granted.
We are beginning to gain the ability to do remarkable things with
unstructured text data.
First, the use of neural networks and deep learning for text offers the ability
to build models that go beyond just counting words to actually representing
the concepts and meaning in text quantitatively.
These examples start simple and eventually demonstrate the breakthrough
capabilities realized by the application of sentence embedding and
recurrent neural networks to capturing the semantic meaning of text.
91
Source: NarraFve Science
Why deep learning
for NLP?
93. This content included for educational purposes.
Summarization, and algorithms to make text quantifiable, allow us to
derive insights from Large amounts of unstructured text data.
Unstructured text has been slower to yield to the kinds of analysis that
many businesses are starting to take for granted.
We are beginning to gain the ability to do remarkable things with
unstructured text data.
First, the use of neural networks and deep learning for text offers the ability
to build models that go beyond just counting words to actually representing
the concepts and meaning in text quantitatively.
These examples start simple and eventually demonstrate the breakthrough
capabilities realized by the application of sentence embedding and
recurrent neural networks to capturing the semantic meaning of text.
Deep Learning in NLP
Syntax Parsing
SyntaxNet (Parsey McParseface) tags each word with a part-of-speech tag, and it
determines the syntactic relationships between words in the sentence with an
94% accuracy compared to a human performance at 96%.
Source
93
Source: NarraFve Science
Deep learning can be
used to parse syntax
of natural language
sentences.
Source: Lukas Masuch
94. This content included for educational purposes.
Summarization, and algorithms to make text quantifiable, allow us to
derive insights from Large amounts of unstructured text data.
Unstructured text has been slower to yield to the kinds of analysis that
many businesses are starting to take for granted.
We are beginning to gain the ability to do remarkable things with
unstructured text data.
First, the use of neural networks and deep learning for text offers the ability
to build models that go beyond just counting words to actually representing
the concepts and meaning in text quantitatively.
These examples start simple and eventually demonstrate the breakthrough
capabilities realized by the application of sentence embedding and
recurrent neural networks to capturing the semantic meaning of text.
Deep Learning in NLP
Generating Text
To train the RNN, insert characters sequentially and
predict the probabilities of the next letter.
Backpropagate error and update RNN’s weights to
increase the confidence of the correct letter (green)
and decrease the confidence of all other letters (red).
Trained on structured Wikipedia markdown. Network learns to spell English words
completely from scratch and copy general syntactic structures.
Source
94
Source: NarraFve Science
Deep learning networks
can learn to spell
correctly and generate
texts with appropriate
syntactic structures.
Source: Lukas Masuch
96. This content included for educational purposes.
Deep Learning in NLP
Generating Text
To generate text, we feed a character into the trained RNN and get a distribution
over what characters are likely to come next (red = likely). We sample from this
distribution, and feed it right back in to get the next letter.
This highlighted neuron gets very excited (green = excited, blue = not excited) when
the RNN is inside the [[ ]] markdown environment and turns off outside of it.
The RNN is likely using this neuron to remember if it is inside a URL or not.
Source
96
Source: NarraFve Science
How deep learning
generates text from
character-level
embeddings.
Source: Lukas Masuch
97. This content included for educational purposes.
Deep Learning in NLP
Neural Machine Translation
Googles Neural Machine
Translation system based
on a deep LSTM seq2seq
architecture reduces
translation errors by more
than 55%-85% compared to
traditional approaches.
Source
97
Source: NarraFve Science
Deep learning neural
machine translations are
increasingly accurate.
Source: Lukas Masuch
98. This content included for educational purposes.
Deep Learning in NLP
Neural Conversational Model
Neural Chatbot: Predicts the next sentence given the previous
sentences in a conversation.
The LSTM remembers facts, understands contexts and performs
common sense reasoning in the trained domain.
Human: what is the purpose of life ?
Machine: to serve the greater good .
Human: what is the purpose of living ?
Machine: to live forever
Human: what is the purpose of existence?
Machine: to find out what happens when
we get to the planet earth
Human: where are you now ?
.
Machine: i ’m in the middle of nowhere .
Human: what is the purpose of dying ?
Machine: to have a life
Human: what is the purpose of being
intelligent ?
Machine: to find out what it is.
Human: what is the purpose of emotions ?
Machine: I don ’t know .
Philosophical Q&A – Network trained with OpenSubtitles (movie subtitles)
Source
98
Source: NarraFve Science
Deep learning LSTM
neural networks are
being used to generate
human-machine
conversations.
Source: Lukas Masuch
100. This content included for educational purposes.
Summarization, and algorithms to make text quantifiable, allow us to
derive insights from Large amounts of unstructured text data.
Unstructured text has been slower to yield to the kinds of analysis that
many businesses are starting to take for granted.
We are beginning to gain the ability to do remarkable things with
unstructured text data.
First, the use of neural networks and deep learning for text offers the ability
to build models that go beyond just counting words to actually representing
the concepts and meaning in text quantitatively.
These examples start simple and eventually demonstrate the breakthrough
capabilities realized by the application of sentence embedding and
recurrent neural networks to capturing the semantic meaning of text.
100
Source: NarraFve Science
Toward multi-modal deep
learning and language
generation