13. Concept Search
● Simple Content Search
○ High recall, but low precision
○ Precision is important in Hatena Bookmark
● Concept Search
○ Query Expansion
■ Use search results retrieved by tag search
■ Expand queries with TF-IDF and IDF, RIDF
● Term Vector API
○ Retrieve using expanded queries
■ eg. 「京都」 -> 「祇園、寺、神社、桜、京、...」
ref. はてなブックマークの全文検索の精度改善
https://speakerdeck.com/takuyaa/hatenabutukumakuquan-wen-jian-suo-falsejing-du-gai-shan
25. Topic by Elasticsearch
● Acquire topic keywords
○ Two-layered Significant Terms Aggregation
● Acquire entries related with the topic
○ Function Score Query
○ Retrieve using topic keywords and their scores
官邸、首相、ドローン、落下、カメラ
● 首相官邸にドローン落下 けが人はなし :日本経済新聞
● 首相官邸の屋上にドローン落下、微量の放射線を検出| Reuters
ref. はてなブックマークのトピックページの作り方
http://codezine.jp/article/detail/8767
26. Bookmark Counter
● Count the number of bookmarks in a web site
○ Count by Sum Aggregation
○ eg. http://d.hatena.ne.jp/
{
“query”: {
{ “prefix”: { “url”: “http://d.hatena.ne.jp/” } }
},
“aggs”: { “total_count”: {
“sum” : { “field”: “count” },
} }
}
27. Conclusion
● Elasticsearch in Hatena Bookmark
● Features powered by Elasticsearch
○ Tag / Title / Content / URL Search
○ Related entry
○ Issue
○ Topic
○ Bookmark Counter