Mining dynamic social networks from public news articles for company value prediction.

Mining dynamic social networks from
public news articles for company value
prediction.
- PRATIK, MICHEL, KAI & MINGHAO

Objectives and Key notes
What we discovered!
1. Study, analyze and understand impactful relations that exist between companies.
2. Transform the discovered relations into intercompany networks, revealing features
and metrics about the company.
3. Generate models that integrate network-feature metrics as well as company
financial valuations in order to substantially project or predict a company’s future
value OR profit over time e.g.
Metrics like Number of company's’ a company relates with (Network feature metric),
Company’s profit (financial metric).

Concepts and Techniques utilized.
Network Analysis
 Graph theory
 Ranking
Machine learning Algorithms
 Regression (𝑦 = 𝑎 + 𝑏𝑥)
Statistical Methods
 Correlation. (𝑅2
)
 Mean Squared Error.
Algebraic equations
 e.g the one that they used for the relation score

Choice of research domain
Document-level and sentence-level co-occurrence
The more companies co-appear or are described together in important news articles
and/or sentences, the stronger their mutual relationship.
NB: The study doesn’t extract specific relations separately but rather generalizes all
co-occurrence’s as impact relations, i.e., how many impacts a company receives from
others, by considering consider positive/negative structural impacts from networks.

Research Coverage
For a Target company
Generation of inter-company networks entailing Local and global relations, historical
relations and the delta change in impact of relations over time.
Borrowing the Page ranking algorithm ideology used in Information retrieval systems.
Companies are ranked by each network feature and company valuations.(e.g. Profit)
Usage of machine learning algorithm such as linear regression and SVM regression to
combine the features of the longitudinal network with a company’s financial
information to predict the company value.

Extracting Data
New York Times
Social Network Data
From the large scalable Public data about companies available in the news and
electronically through the web. (News Articles mainly. ). Data dated from 1981 – 2009
(year by year).
e.g. IBM appeared in about 300 news articles in the New York Times in 2009 (277 articles
as IBM and 84 articles as International Business Machines).
Interviews, Questionnaires and Observations.
Financial Data.
 Company valuations were also obtained from New York Times Fortune 500 List (1955 -
2009) .

Pre-processing the data
For a Target company
For target company x, let candidate company be y (one that is impacting x in a period of
time t. Sets of documents D and sentences S in which they’ve co-occurred during time t
are collected.
Generating Longitudinal directed/undirected and valued/unvalued Networks over a
period of years for a set of companies 𝑉.
𝐺 𝑡 = {𝐺 𝑡1, 𝐺 𝑡2, 𝐺 𝑡3 … … … . } Where 𝑡1 < 𝑡2 < 𝑡3
For eachcompany
𝑥 ∈ 𝑉
a structural feature vector F 𝑥
𝑇
is generated F 𝑥
𝑇
⊆ G 𝑇
where F 𝑥
𝑇
indicates network
effects for target company x.

Calculating Impact relation Strength
Algorithm
𝑆𝑐𝑜𝑟𝑒 𝑥(𝑦) = a* 𝑖∈𝐷 𝑥.𝑦
𝑡 𝑤 𝑑 𝑖 + b ∗ 𝑗𝜖𝐷 𝑥.𝑦
𝑡 𝑤𝑠 𝑗
𝑤 𝑑 𝑖 And 𝑤𝑠 𝑖 - Weights computed for the total number of documents and
sentences in which target company 𝑥 and candidate company 𝑌co-occur.
𝑤 𝑑(𝑖) = log(1 +
1
𝑌′ 𝑖
+
𝑡𝑓𝑥(𝑖)
𝑦∈{𝑥,𝑌} 𝑡𝑓𝑦(𝑖)
)
𝑤𝑠(𝑖) = log(1 +
1
𝑌′′ 𝑖
)
e.g. IBM in 2009. It is apparent that Microsoft had the greatest impact on IBM in 2009. They co-occurred in 55
articles and were described together in 264 sentences. From these sentences, we can infer that they are direct
competitors.
Sometimes impact isn’t obvious, SPSS and IBM are not competitors and co-occurred in only 1 article and in 3
sentences, but their relation is important because SPSS and IBM co- appeared in an article in a high-weight
document (which describes only SPSS and IBM’s acquisition relation in the entire article).

Mining Longitudinal Network
Network effects
Six types of network effects are considered.
1. The number of connections that target company has.
2. Distance between x and its related nodes.
3. The number of connections that the companies relating with target company have.
4. Number of connections among x’s related nodes.
5. Distance between target company’s related nodes.
6. Number of node pairs having x on the shortest path.

Mining Longitudinal Network
1. Network effects generation
A set of nodes that directly or indirectly impact focal company x is generated - 𝑁𝑥
3 different types of node pairs are defined,
𝑥, 𝑖 ∀ (𝑖 ∈ 𝑁𝑥) then
𝑖, 𝑗 ∀ (𝑖, 𝑗 ∈ 𝑁𝑥, 𝑖 ≠ 𝑗) and
𝑖 𝑖, 𝑘 ∀ (𝑖 ∈ 𝑁𝑥, 𝑘 ∈ 𝑉).
Measures of degree connectivity𝛽(𝑖, 𝑗), Eccentricity 𝜇(𝑖, 𝑗), betweeness 𝜁 𝑥(𝑖, 𝑗), are
computed and then standardized to the network size 𝑉 .

Further analysis on the Networks
Traversing the valued directed network for more patterns revealing possible impact
relations.
1. Two new sub-networks are incorporated.
Neighboring node sets 𝐿 𝑥 which are considered to exert an impact on to x through their
direct connection to 𝑁𝑥.
 NB: 𝐿 𝑥 ∶ 𝑁𝑥 - shows degree to which companies are directly related to x rather than
indirectly.
2. Retaining only arcs (directed edges) to reveal who is impacting who
3. Step 1(Network effects generation – (prev page)) is repeated to obtain historical
network effects.

Network Feature Selection
Filtering out companies with maximum Impact
Individual feature selection.
Companies are ranked by network features 𝑓𝑖 and by their valuations (profit).
𝑋𝑖 – Rank vector of companies ranked by network feature
Y – Companies ranked by their valuations like profit.
Spearman’s rank correlation is calculated between 𝑋𝑖 and Y.
The salient implication is that if there is an increase in the ratio of the number of
connections that a company has with the numbers of connections that its neighbors
have, then the value of its profits will increase.

Prediction Model
Network effects + Company valuations
Longitudinal network effects as well as valuations of each target company x are integrated into
Linear regression model (LRM) – Predicting a company’s current or future financial value.
Support vector regression model (SVR) – To learn Parameters.
Experimental results.
20 Fortune companies’ are selected as a sample. Their valuation records i.e. profits are captured and
networks are generated.
First, they calculate the mean profit value of the companies, then after train their model on the records for
records that span each five years networks, then after test it to predict the next five years profits then
they’re compared.
This is repeated for just a company.

Performance Evaluation
Prediction of the mean profits of 20 companies
Discovered
Network features do not seem to contribute
to revenue prediction but rather contribute
to predicting companies’ profit.
Company profit prediction by joint network
and financial analysis outperforms network-
only by 150% and financial-only by 34%.

Performance Evaluation
Prediction of the mean profits of IBM and INTEL

Aspects of Network science in paper.
 Graph-theory : such as degree of connectivity, diameter, shortest path used to calculate
network effects
 Developing models to understand the network
 Extracting data from NYT , Problem Statement part of Paper.
Building models to anticipate the evolution of the networks.
 Network effects, company valuations
Constructing models to optimise the outcomes of networks
Experimental results and improvements.

What else can be done.
Improvements
1. A company's value (or performance) may encompass several factors depending on the
context in which it’s defined. Such as
 Market performance, and Employee satisfaction and Responsibility. Analysis into these
aforementioned areas can potentially improve the model’s performance.
2. More social network data resources can be used. e.g.
 social media especially Twitter. e.g. Twitter analysis or Facebook analysis to get the longitudinal
social network data.
3. Categorizing relations as negative or positive using sentiment analysis. Separately handling
networks i.e. positive impact relations networks handled on their own as well as negative
impact relations networks.

Mining dynamic social networks from public news articles for company value prediction.

Recommended

Recommended

More Related Content

Similar to Mining dynamic social networks from public news articles for company value prediction.

Similar to Mining dynamic social networks from public news articles for company value prediction. (20)

Recently uploaded

Recently uploaded (20)

Mining dynamic social networks from public news articles for company value prediction.

Editor's Notes