This document describes research on using deep learning models to predict stock market movements based on news events. It presents a method to extract event representations from news articles, generalize the events, embed the events, and feed the embedded events into deep learning models. Experimental results show that using embedded events as inputs to convolutional neural networks achieved more accurate stock market predictions than baseline methods, and modeling long, mid, and short-term event effects further improved performance. The research demonstrates that deep learning can effectively capture hidden relationships between news events and stock prices.
2. My
research
areas
Machine Learning
Natural Language Processing
Applications
Text synthesis
Machine translation
Information extractionMarket prediction
Sentiment analysis
Syntactic analysis
5. Introduction
• Is
it
possible?
– Random
walk
theory
– Efficient
market
hypothesis
– Human/algorithm
trading
• Examples
– Shares
of
Apple
Inc.
fell
as
trading
began
in
New
York
on
Tuesday
morning,
the
day
after
former
CEO
Steve
Jobs
passed
away
– Google’s
stock
falls
after
grim
earnings
come
out
early
6. Why
events?
• Previous
work
– Bag-‐of-‐words
– Named
Entities
– Noun
Phrases
• Examples
– Oracle
Corp
would
sue
Google
Inc.,
claiming
Google’s
Android
operating
system…
– Microsoft
agrees
to
buy
Nokia’s
mobile
phone
business
for
$
7.2
billion.
9. Method
• Event
Generalization
– First,
we
construct
a
morphological
analysis
tool
based
on
the
WordNet stemmer
to
extract
lemma
forms
of
inflected
words
– Second,
we
generalize
each
verb
to
its
class
name
in
VerbNet
• For
example
– Instant
view:
Private
sector
adds
114,000
jobs
in
July.
– (Private
sector,
adds,
114,000
jobs)
– (private
sector,
multiply_class,
114,000
job)
10. Method
• Model
– Input:
events
– Output:
two-‐way
movement
• Training:
historical
data
• Testing:
coming
data
11. Method
• Prediction
Model
– Linear
model
• Most
previous
work
uses
linear
models
to
predict
the
stock
market.
To
make
direct
comparisons,
this
paper
constructs
a
linear
prediction
model
by
using
SVM
with
linear
kernel
– Nonlinear
model
• Intuitively,
the
relationship
between
events
and
the
stock
market
may
be
more
complex
than
linear,
due
to
hidden
and
indirect
relationships.
We
exploit
a
deep
neural
network
model,
the
hidden
layers
of
which
is
useful
for
learning
such
hidden
relationships
12. …
News
documents
φ1
Class
+1
The
polarity
of
the
stock
price
movement
is
positive
Class
-‐1
The
polarity
of
the
stock
price
movement
is
negative
Input
Layer
Output
Layer
Hidden
Layers
…
…
φ2 φ3 φM
13. Method
• Feature
Representation
– Bag-‐of-‐words
• TF*IDF
– Events
• O1,
P,
O2,
O1
+
P,
P
+
O2,
O1
+
P
+
O2
• For
Example
– (Microsoft,
buy,
Nokia's
mobile
phone
business)
– (#arg1=Microsoft,
#action=buy,
#arg2= Nokia's
mobile
phone
business,
#arg1_action=Microsoft
buy,
#action_arg2=buy
Nokia's
mobile
phone
business,
#arg1_action_arg2=
Microsoft
buy
Nokia's
mobile
phone
business)
14. Experiments
• Data
Description
– We
use
publicly
available
financial
news
from
Reuters
and
Bloomberg
over
the
period
from
October
2006
to
November
2013.
This
time
span
witnesses
a
severe
economic
downturn
in
2007-‐2010,
followed
by
a
modest
recovery
in
2011-‐2013.
There
are
106,521
documents
in
total
from
Reuters
News
and
447,145
from
Bloomberg
News.
– We
mainly
focus
on
predicting
the
Standard
&Poor's
500
stocks
(S&P
500)
index,
obtaining
indices
and
stock
price
from
Yahoo
Finance.
21. Conclusion
• Events
are
useful.
– Events
are
more
useful
representations
compared
to
bags-‐of-‐words
for
the
task
of
stock
market
prediction.
• Hidden
relations
useful.
– A
deep
neural
network
model
can
be
more
accurate
on
predicting
the
stock
market
compared
to
the
linear
model.
• Robust
results
obtained.
– Our
approach
can
achieve
stable
experiment
results
on
S&P
500
index
prediction
and
individual
stock
prediction
over
a
large
amount
of
data
(eight
years
of
stock
prices
and
more
than
550,000
pieces
of
news).
• Quality
of
information
is
more
important
than
quantity.
– The
most
relevant
information
(i.e.
news
title
vs news
content,
individual
company
news
vs all
news)
is
better
than
more,
but
less
relevant
information.
29. Training
• Minimize the margin loss
• 500 iterations
• Standard back-‐propagation
Random replace with
an object
Regulation weight,set
to 0.0001
Parameters
30. Deep
Prediction
Model
• We model long-‐, mid-‐, short-‐term events
– Long-‐term events (Last month)
– Mid-‐term events (Last week)
– Short-‐term events (Last day)
32. Deep
Prediction
Model
• Convolution and Max-‐pooling
– Convolution layer to obtain local feature
– Max-‐pooling to determine the global
representativefeature
33. Experiments
• Baselines
Input Method
Luss and d’Aspremont [2012] Bag of words NN
Ding et al. [2014] (E-NN) Structured event NN
WB-NN Word embedding NN
WB-CNN Word embedding CNN
E-CNN Structured event CNN
EB-NN Event embedding NN
EB-CNN Event embedding CNN
34. Experiments
• Finds
– Events
are
better
features
than
words
– Reducing
sparsity if
helpful
in
the
task
– CNN-‐based
is
more
powerful
35. Experiments
• 15
companies
from
S&P
500
– Consists
of
High-‐,mid-‐ and
low-‐ranking
companies
– Evaluation
metric:
Accuracy
and
MCC
36. Conclusion
• Event
embeddings-‐based
document
representations
are
better
than
discrete
events-‐based
methods
• Deep
CNN
can
help
capture
longer-‐term
influence
of
news
event
37. Current
• More technical enhancements
• More
markets
– China’s
A
market
– Chinese
syntactic
and
semantic
analysis
– Chinese
Open
IE