A brief review of the paper:
Zeng, Z., Yin, Y., Song, Y., & Zhang, M. (2017). Socialized word embeddings. In International Joint Conference on Artificial Intelligence (pp. 3915–3921).
2. Overview
• Add the following two aspect to word embeddings:
‣ Personalisation (user information; not new)
‣ Socialisation (inter-user relationship; new)
• Three-fold evaluation:
‣ Perplexity comparison between word2vec
‣ Application to document-level sentiment classification
✦ As the features for SVM (inc. user segmentation)
✦ As the attention source for neural models
18
• Overview
• Proposed method
• Evaluation
• Comments
3. Proposed method:
Personalisation
• Starting from continuous bag-of-words model
(CBOW) of word2vec (Mikolov et al., 2013)
• Consider the context words for a word as user-
dependent
‣ Maximise:
‣ ‘for each user, s/he will think about a predicted
word given the global words meanings and
customize them to his/her own preference’
19
• Overview
• Proposed method
‣ Personalisation
‣ Socialisation
• Evaluation
• Comments
J1 =
NX
i
X
wj 2Wi
log P(wj|C(wj), ui)
<latexit sha1_base64="P0DTTiq/sMDCisw8h+RKFMCeyNY=">AAACRHicbVBLSwMxGMz6rPVV9eglWAQFKbsi6EUQvYgHqWBboVtDNk1rbDa7JFmlxPw4L/4Ab/4CLx4U8SpmaxFfA4HJzHzky0QpZ0r7/oM3Mjo2PjFZmCpOz8zOzZcWFusqySShNZLwRJ5FWFHOBK1ppjk9SyXFccRpI+od5H7jikrFEnGq+yltxbgrWIcRrJ2ESs0wxvqCYG6OLArgLgxVFp+bY4sMszlH5hpdwpAJ+JVsWMQsDHnSNdU15958OQc2v69vwAyxdYtKZb/iDwD/kmBIymCIKirdh+2EZDEVmnCsVDPwU90yWGpGOLXFMFM0xaSHu7TpqMAxVS0zKMHCVae0YSeR7ggNB+r3CYNjpfpx5JL5uuq3l4v/ec1Md3Zahok001SQz4c6GYc6gXmjsM0kJZr3HcFEMrcrJBdYYqJd70VXQvD7y39JfbMS+JXgZKu8tz+sowCWwQpYAwHYBnvgEFRBDRBwCx7BM3jx7rwn79V7+4yOeMOZJfAD3vsHzviyeQ==</latexit><latexit sha1_base64="P0DTTiq/sMDCisw8h+RKFMCeyNY=">AAACRHicbVBLSwMxGMz6rPVV9eglWAQFKbsi6EUQvYgHqWBboVtDNk1rbDa7JFmlxPw4L/4Ab/4CLx4U8SpmaxFfA4HJzHzky0QpZ0r7/oM3Mjo2PjFZmCpOz8zOzZcWFusqySShNZLwRJ5FWFHOBK1ppjk9SyXFccRpI+od5H7jikrFEnGq+yltxbgrWIcRrJ2ESs0wxvqCYG6OLArgLgxVFp+bY4sMszlH5hpdwpAJ+JVsWMQsDHnSNdU15958OQc2v69vwAyxdYtKZb/iDwD/kmBIymCIKirdh+2EZDEVmnCsVDPwU90yWGpGOLXFMFM0xaSHu7TpqMAxVS0zKMHCVae0YSeR7ggNB+r3CYNjpfpx5JL5uuq3l4v/ec1Md3Zahok001SQz4c6GYc6gXmjsM0kJZr3HcFEMrcrJBdYYqJd70VXQvD7y39JfbMS+JXgZKu8tz+sowCWwQpYAwHYBnvgEFRBDRBwCx7BM3jx7rwn79V7+4yOeMOZJfAD3vsHzviyeQ==</latexit><latexit sha1_base64="P0DTTiq/sMDCisw8h+RKFMCeyNY=">AAACRHicbVBLSwMxGMz6rPVV9eglWAQFKbsi6EUQvYgHqWBboVtDNk1rbDa7JFmlxPw4L/4Ab/4CLx4U8SpmaxFfA4HJzHzky0QpZ0r7/oM3Mjo2PjFZmCpOz8zOzZcWFusqySShNZLwRJ5FWFHOBK1ppjk9SyXFccRpI+od5H7jikrFEnGq+yltxbgrWIcRrJ2ESs0wxvqCYG6OLArgLgxVFp+bY4sMszlH5hpdwpAJ+JVsWMQsDHnSNdU15958OQc2v69vwAyxdYtKZb/iDwD/kmBIymCIKirdh+2EZDEVmnCsVDPwU90yWGpGOLXFMFM0xaSHu7TpqMAxVS0zKMHCVae0YSeR7ggNB+r3CYNjpfpx5JL5uuq3l4v/ec1Md3Zahok001SQz4c6GYc6gXmjsM0kJZr3HcFEMrcrJBdYYqJd70VXQvD7y39JfbMS+JXgZKu8tz+sowCWwQpYAwHYBnvgEFRBDRBwCx7BM3jx7rwn79V7+4yOeMOZJfAD3vsHzviyeQ==</latexit><latexit sha1_base64="P0DTTiq/sMDCisw8h+RKFMCeyNY=">AAACRHicbVBLSwMxGMz6rPVV9eglWAQFKbsi6EUQvYgHqWBboVtDNk1rbDa7JFmlxPw4L/4Ab/4CLx4U8SpmaxFfA4HJzHzky0QpZ0r7/oM3Mjo2PjFZmCpOz8zOzZcWFusqySShNZLwRJ5FWFHOBK1ppjk9SyXFccRpI+od5H7jikrFEnGq+yltxbgrWIcRrJ2ESs0wxvqCYG6OLArgLgxVFp+bY4sMszlH5hpdwpAJ+JVsWMQsDHnSNdU15958OQc2v69vwAyxdYtKZb/iDwD/kmBIymCIKirdh+2EZDEVmnCsVDPwU90yWGpGOLXFMFM0xaSHu7TpqMAxVS0zKMHCVae0YSeR7ggNB+r3CYNjpfpx5JL5uuq3l4v/ec1Md3Zahok001SQz4c6GYc6gXmjsM0kJZr3HcFEMrcrJBdYYqJd70VXQvD7y39JfbMS+JXgZKu8tz+sowCWwQpYAwHYBnvgEFRBDRBwCx7BM3jx7rwn79V7+4yOeMOZJfAD3vsHzviyeQ==</latexit>
4. • Word vectors and user
vectors have the same
dimensionality
• The user-dependent
word vector is
represented as the sum
of the word vector and
the user vector
20
Proposed method:
Personalisation
• Overview
• Proposed method
‣ Personalisation
‣ Socialisation
• Evaluation
• Comments
w
(i)
j = wj + ui<latexit sha1_base64="ice+/tJd2u0YbJ7CSX4HnNpOHWE=">AAACG3icbVDLSsNAFJ3UV62vqEs3g0WoCCUpgm6EohuXFewD2hom00k77WQSZiZKCfkPN/6KGxeKuBJc+DdO2uCj9cDAmXPu5d573JBRqSzr08gtLC4tr+RXC2vrG5tb5vZOQwaRwKSOAxaIloskYZSTuqKKkVYoCPJdRpru6CL1m7dESBrwazUOSddHfU49ipHSkmNWOj5SA9eL75KbuEQPEyceJvAM/sjOEB59f6PEoY5ZtMrWBHCe2Bkpggw1x3zv9AIc+YQrzJCUbdsKVTdGQlHMSFLoRJKECI9Qn7Q15cgnshtPbkvggVZ60AuEflzBifq7I0a+lGPf1ZXpjnLWS8X/vHakvNNuTHkYKcLxdJAXMagCmAYFe1QQrNhYE4QF1btCPEACYaXjLOgQ7NmT50mjUratsn11XKyeZ3HkwR7YByVggxNQBZegBuoAg3vwCJ7Bi/FgPBmvxtu0NGdkPbvgD4yPL2SEoZ4=</latexit><latexit sha1_base64="ice+/tJd2u0YbJ7CSX4HnNpOHWE=">AAACG3icbVDLSsNAFJ3UV62vqEs3g0WoCCUpgm6EohuXFewD2hom00k77WQSZiZKCfkPN/6KGxeKuBJc+DdO2uCj9cDAmXPu5d573JBRqSzr08gtLC4tr+RXC2vrG5tb5vZOQwaRwKSOAxaIloskYZSTuqKKkVYoCPJdRpru6CL1m7dESBrwazUOSddHfU49ipHSkmNWOj5SA9eL75KbuEQPEyceJvAM/sjOEB59f6PEoY5ZtMrWBHCe2Bkpggw1x3zv9AIc+YQrzJCUbdsKVTdGQlHMSFLoRJKECI9Qn7Q15cgnshtPbkvggVZ60AuEflzBifq7I0a+lGPf1ZXpjnLWS8X/vHakvNNuTHkYKcLxdJAXMagCmAYFe1QQrNhYE4QF1btCPEACYaXjLOgQ7NmT50mjUratsn11XKyeZ3HkwR7YByVggxNQBZegBuoAg3vwCJ7Bi/FgPBmvxtu0NGdkPbvgD4yPL2SEoZ4=</latexit><latexit sha1_base64="ice+/tJd2u0YbJ7CSX4HnNpOHWE=">AAACG3icbVDLSsNAFJ3UV62vqEs3g0WoCCUpgm6EohuXFewD2hom00k77WQSZiZKCfkPN/6KGxeKuBJc+DdO2uCj9cDAmXPu5d573JBRqSzr08gtLC4tr+RXC2vrG5tb5vZOQwaRwKSOAxaIloskYZSTuqKKkVYoCPJdRpru6CL1m7dESBrwazUOSddHfU49ipHSkmNWOj5SA9eL75KbuEQPEyceJvAM/sjOEB59f6PEoY5ZtMrWBHCe2Bkpggw1x3zv9AIc+YQrzJCUbdsKVTdGQlHMSFLoRJKECI9Qn7Q15cgnshtPbkvggVZ60AuEflzBifq7I0a+lGPf1ZXpjnLWS8X/vHakvNNuTHkYKcLxdJAXMagCmAYFe1QQrNhYE4QF1btCPEACYaXjLOgQ7NmT50mjUratsn11XKyeZ3HkwR7YByVggxNQBZegBuoAg3vwCJ7Bi/FgPBmvxtu0NGdkPbvgD4yPL2SEoZ4=</latexit><latexit sha1_base64="ice+/tJd2u0YbJ7CSX4HnNpOHWE=">AAACG3icbVDLSsNAFJ3UV62vqEs3g0WoCCUpgm6EohuXFewD2hom00k77WQSZiZKCfkPN/6KGxeKuBJc+DdO2uCj9cDAmXPu5d573JBRqSzr08gtLC4tr+RXC2vrG5tb5vZOQwaRwKSOAxaIloskYZSTuqKKkVYoCPJdRpru6CL1m7dESBrwazUOSddHfU49ipHSkmNWOj5SA9eL75KbuEQPEyceJvAM/sjOEB59f6PEoY5ZtMrWBHCe2Bkpggw1x3zv9AIc+YQrzJCUbdsKVTdGQlHMSFLoRJKECI9Qn7Q15cgnshtPbkvggVZ60AuEflzBifq7I0a+lGPf1ZXpjnLWS8X/vHakvNNuTHkYKcLxdJAXMagCmAYFe1QQrNhYE4QF1btCPEACYaXjLOgQ7NmT50mjUratsn11XKyeZ3HkwR7YByVggxNQBZegBuoAg3vwCJ7Bi/FgPBmvxtu0NGdkPbvgD4yPL2SEoZ4=</latexit>
5. Proposed method:
Personalisation
• To deal with learning on the large vocabulary,
hierarchical softmax with the Huffman tree built
based on word frequencies is used
• Parameters in optimisation function (J1), word
vectors, and user vectors are updated by Stochastic
Gradient Descent (SGD)
21
• Overview
• Proposed method
‣ Personalisation
‣ Socialisation
• Evaluation
• Comments
6. Proposed method:
Socialisation
• Homophily in social networks (Lazarsfeld and
Merton, 1954; McPherson et al., 2001)
‣ someone’s friends tend to share similar opinions or
topics
• User vectors in friendship (neighbours) should be
similar
‣ Minimise:
‣ SGD is also applied here to update user vectors
22
• Overview
• Proposed method
‣ Personalisation
‣ Socialisation
• Evaluation
• Comments
7. • Incorporating
socialisation makes
user vectors being
updated more
frequently than word
vectors
• Introduce a constraint
for user vectors’ L2-
norm
23
Proposed method:
Socialisation
• Overview
• Proposed method
‣ Personalisation
‣ Socialisation
• Evaluation
• Comments
9. 25
Perplexity
• 6-gram perplexity
• Changing the importance
of social regularisation (λ)
and the strength of
personalisation (r)
‣ From the shape of
curves, they can be
optimised for the given
dataset
10. Sentiment Classification
• Use socialised word embeddings to a downstream
task
‣ They chose document-level sentiment classification
(to predict ratings of Yelp reviews)
• Check two aspects:
‣ User segmentation (active users or not)
‣ Applicability for attention vectors in neural models
26
• Overview
• Proposed method
• Evaluation
‣ Dataset
‣ Perplexity
‣ Sentiment Classification
• Comments
11. 27
Sentiment Classification:
SVM and user segmentation
• Overview
• Proposed method
• Evaluation
‣ Dataset
‣ Perplexity
‣ Sentiment Classification
• Comments
• Split users by published reviews
‣ Total num. of reviews are the
same in both segments
• Use the average of word vectors
in a document as features for
SVM
12. Sentiment Classification:
NN models and user attention
In Yelp review prediction task, some papers proposed
neural models that apply (extra-)attention mechanism
to users
28
• Overview
• Proposed method
• Evaluation
‣ Dataset
‣ Perplexity
‣ Sentiment Classification
• Comments
(e.g.) Chen et al., 2016
13. Sentiment Classification:
NN models and user attention
• How about using socialised word embeddings as
“fixed” attention vectors in those models?
• Better than without attention but slightly worse than
original models (with attention vector trained)
29
• Overview
• Proposed method
• Evaluation
‣ Dataset
‣ Perplexity
‣ Sentiment Classification
• Comments
14. Comments
• Socialisation looks a very interesting/promising idea
‣ Not significant performance gains though
‣ How to regularise sociality has room for
improvement
✦ Regarding neighbour users should-be similar
seems too naive and strong
30
• Overview
• Proposed method
• Evaluation
• Comments
15. Comments
• Their source code is available:
‣ https://github.com/HKUST-KnowComp/
SocializedWordEmbeddings
• They have just published improved version of
socialised word embeddings this year
‣ https://github.com/HKUST-KnowComp/SRBRW
31
• Overview
• Proposed method
• Evaluation
• Comments @inproceedings{zeng2018biased,
title={Biased Random Walk based Social Regularization for Word
Embeddings},
author={Zeng, Ziqian and Liu, Xin and Song, Yangqiu},
booktitle={IJCAI},
pages={XX-YY},
year={2018},
}