Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mining Social Media Issues

636 views

Published on

  • Hello! I can recommend a site that has helped me. It's called ⇒ www.HelpWriting.net ⇐ They helped me for writing my quality research paper.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Get Paid To Waste Time On YouTube! ♣♣♣ http://t.cn/AieXiXbg
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Mining Social Media Issues

  1. 1. 1 Dissertation Report on COMMUNITY DETECTION IN SOCIAL MEDIA AS PARTIAL FULFILMENT OF MASTER OF COMPUTER APPLICATION SEMESTER-5 BY BHAGYASHRI MANANI (Enroll# 105090693093) TANVI SHARMA (Enroll# 105090693070) UNDER GUIDANCE OF Dr. SONAL JAIN SUBMITTED To GLS INSTITUE OF COMPUTER TECHNOLOGY, GUJARAT TECHNOLOGICAL UNIVERSITY
  2. 2. 2 Acknowledgment First, we would like to thank Dr. Sonal Jain our internal guide. She introduces us to the field of Community Detection in social media and provides guidance for any query through entire dissertation. We would like to thank her for her valuable suggestions and Show us the way of doing research. Without her guidance and support the research could not be complete. We would also like to thank Dr. Harshal Arolkar, Dr. Devarshi Mehta and Dr. Jyotika Doshi for giving us valuable suggestions and for their reviews and help. Abstract Social Media provide many features like online chatting, online discussion, online communities, advertisement and marketing etc. But it also come up with issues like community detection, influence maximization, message propagation, monitoring social media etc. In this research we focus on community finding, which is one of the major issue of social media. We have studied and discussed different existing algorithm used for community detection. We also have done comparative analysis of existing algorithms. These algorithms basically use graph theory concepts to detect community from web. We also have discussed the limitation of existing algorithms and proposed solution towards it.
  3. 3. 3 Contents 1. Introduction………………………………………………………………………………..5 1.1 What is Data Mining? ......................................................5 1.2 Need for Data Mining in Social Media ……………….………….7 1.3 Research Goal ………………………………………………………….…11 2. Elements of Social Media…………………………………………………………..13 2.1 Profile……………………………………………………………………………...13 2.2 Member………………………………………………………………………..…13 2.3 Group……………………………………………………………………….........14 2.4 Discussion………………………………………………………………..………15 2.5 Blogs………………………………………………………………………………..15 2.6 Widgets …………………………………………………………………………..18 3. Issues of mining Social Media…………………………………………….………20 3.1 Community Detection……………………………………………………...20 3.2 Influence Maximization…………………………………………….……...23 3.3 Message Propagation……………………………………………..………..24 3.4 Monitoring……………………………………………………………..………..25 3.5 Social CRM(Customer Relationship Management)…..……….27 4. Algorithms for Community Detection………………………………………..29 4.1 Vertex Base Community Detection…………………..……………....30 4.1.1 Bron-Kerbosch Algorithm…………..…………………30 4.1.2 Clique-Percolation Method……………………..……34 4.2 Edge Base Community Detection…………………………………......36 4.2.1 Girvan–Newman Algorithm……………..…….……..36 5. Conclusion and Proposed Solution……………………………………..……...38 5.1 Comparative Analysis of Algorithm…………………………..……....38 5.2 Limitation of existing Algorithm………………………………..…...….39 5.3 Proposed Solution………………………………………………………………41 5.4 Future Work……………………………………..…………………………..…..43 References…………………………………………………………………………………………43
  4. 4. 4 List of Tables: 5.1 Comparative Analysis of Algorithm……………………………………………………………38 5.2 Edge content Base Community Detection Example……………………………………41 List of Figures: Figure 1.1: Social Media Websites……………………………………………………………….…7 Figure 2.1: Profile of Social Media Website…………………………………………………...13 Figure 2.2: Member of Social Media Website………………………………………………...14 Figure 2.3: Group of Social Media Website…………………………………………………...15 Figure 2.4: Personal Blogs………………………………………………………………………………16 Figure 2.5: Media Blogs………………………………………………………………………………....17 Figure 2.6: Widgets used in Social Media…………………………………………………......18 Figure 3.1: Example of Community on Website……………………………………………….20 Figure 3.2 Twitter facilities for Monitoring……………………………………………………...26 Figure 3.3 Traditional v/s social CRM System……………………………………………….….28 Figure 4.1: Clique Graph…………………………………………………………………………….…..30 Figure 4.2: Undirected Graph G………………………………………………………………..….….32 Figure 4.3: CPM Graph (a) ………………………………………………………………………..………34 Figure 4.4: CPM Graph (b) ………………………………………………………………………..………34 Figure 5.1: Content of edge example………………………………………………………….………40 Figure5.2: Edge content Base Community Detection………………………………….………42
  5. 5. 5 Chapter 1 Introduction Because of increasing facility of Internet more and more people are depending on web services. They may search or publish information, download music and movies, play Game, use social networking websites to interact with friend and family member, do online shopping, even payments of bills are done using internet. With the progress of World Wide Web technologies, more and more data are available online for web users. Web data covers a wide range of fields like government, sports, entertainment, commercial, health & lifestyle. The availability of vast amount of web data does not mean that users can get whatever they want very easily. As more and more data are available on the web, it takes more time and more effort to find the desired information. It has been observed that 99% of the data accessible on the web is not useful for 99% of the users [1]. The massive amount of web data need that, there should be technique to find useful knowledge hidden behind web data. 1.1 What is Data Mining? Data is the facts of World. Description about man (gender, height, weight, color, name, age, education etc.), Animal (category, size, noun, weight, age etc), Mobile (height, width, color, company, prize), Country (name, population, area, number of states) etc can be stored and is known as Data. For example Student Database contain following data.  Name as “Janki”  Gender as “girl”  Result as 60% ,  Attendance as 90%.  Year as “1st ”
  6. 6. 6 Information is filtered, meaningful and relevant Data. For example,  Student named Janki got 60% in B.C.A is information.  70% student of B.C.A 3rd year got distinction is information.  “Sachin” got highest Percentage in 3rd year B.C.A is information. Knowledge is information processed in the mind of individual. In other words Knowledge is the state or fact of knowing; it gains understanding through experience or study. For example,  From, Monthly Attendance Report and Exam Result Report teacher decides does student’s performance is Average, Good, or Excellent? by applying their knowledge and experience. Data mining It is commonly defined as the process of discovering useful patterns or knowledge from data sources like databases, texts, images and the Web also. Web data mining is when data mining techniques are applied on web data. It bridges the gap between data and knowledge, which designs to extract useful and hidden knowledge from massive “garbage” data available on web [2, 3]. Data mining has many applications in market basket analysis, fraud detection, profiling, risk management, e-commerce, web analysis and many other fields.
  7. 7. 7 1.2 Need for Data Mining in Social Media What is Social Media? When it comes to online social networking, websites are commonly used. These websites are known as social sites. Social media websites is like an online community of internet users. Online community’s member can share common interests in hobbies, religion, politics, lifestyle etc. Using social media web sites people can share text, photos, audio, video, and information. Platforms like Twitter, Facebook, LinkedIn have created online communities where people can share as much or as little personal information as they want with other members. Once you are granted access to a social networking website you are member of that site and you can use these site to interact with other members. Person may share their views, thoughts, videos, images ,also can update status, can communicate with other members, can comment on other’s views and status, can join groups and also can invite other member for events, can read the profile pages of other members . Figure 1.1: Social Media Websites
  8. 8. 8 Need for Data Mining in Social Media Social media can be used to know current trends, opinions, influencers. Information gathered from social networking site can be used for following purpose.  To Improve content marketing by better understanding customer’s opinion  Learning what is most relevant regarding your products, brand or even entire business area.  To know who are the key influencers  To know who are intended customers for your product This allows you to identify people who are interested in your product or content and find ways of reaching out to them, to create content that attracts people who are interested in your product, to get back to those people. For example, Facebook will be able to sell their data to companies wanting to understand market data. Facebook has the demographic and geographic data in place, and just needs to sell access to the data. Why use Social Media for Marketing? Social media users are increasing day by day and become part of more and more social media communities. Social media come with lots of features and advantage some of them we have listed below.
  9. 9. 9  Communicate with customers Social media allows products server to reach prospective customers and customers can reach to particular advertisement or web sites or business employees. Social media is a two-way process and allows marketing person or technical person to chat with customers or answer any questions of customers might have. When it comes time to buy the product customer can feel like they have a friend in the business.  Word of Mouth Social media takes word of mouth marketing to new level. When your fans follow or interact with your page, all of their friends see those interactions happen. With every interaction, comment and discussion you open up your brand to hundreds or thousands of prospective community members. Happy customers can also directly tell their friends on social media about their good experience with you. They can amplify positive chatter about your business and create a positive atmosphere for your brand.  Customer Loyalty By engaging your customers through social media you have the opportunity to reward your loyal fans and generate repeat business. By building these relationships and maintaining them you can build customer loyalty and satisfaction that rewards your business further.  Feedback The value of knowing where you are succeeding and failing can mean everything in business. Social media lets you directly estimate what works with your fans and what doesn’t, and allows you to address negative feedback quickly.
  10. 10. 10 Example of Social Media used for Marketing  Twitter Twitter allows companies to promote products on an individual level. The use of a product can be explained in short messages that followers are more likely to read. These messages appear on followers’ home pages. Messages can link to the product’s website, Facebook profile, photos, videos, etc. This link provides followers the opportunity to spend more time interacting with the product online. This interaction can create a loyal connection between product and individual and can also lead to larger advertising opportunities. Twitter promotes a product in real-time and brings customers in.  Facebook Facebook profiles are more detailed than Twitter. They allow a product to provide videos, photos, and longer descriptions. Videos can show when a product can be used as well as how to use it. These also can include testimonials as other followers can comment on the product pages for others to see. Facebook can link back to the product’s Twitter page as well as send out event reminders.  Blogs Every day there are more reasons for companies to use blogging platforms to their social media repertoire. Platform like LinkedIn creates an environment for companies and clients to connect online. Companies that recognize the need for information, originality, and accessibility employ blogs to make their products popular and unique, and ultimately reach out to consumers who are privy to social media. Blogs allow a product or company to provide longer descriptions of products or services. The longer description can include reasoning and uses. It can include testimonials and can link to and from Facebook, Twitter and many social network and blog pages. Blogs can be updated frequently and are promotional techniques for keeping customers.
  11. 11. 11 Online communities benefit businesses because they enable them to reach the clients of other businesses using the platform. These online environments can be accessed by virtually anyone; therefore consumers are invited to be a part of the creative process. Issues in mining Social Media Mining the content of social media or performing analysis of social networking data is becomes major part for online business. Community Detection is one of the issue which deals with how to detect community in social network, Influence maximization is the problem of Finding out the person who is working as influencer I large social network , Message propagation is about analyzing the pattern or keywords of the messages which are propagated in very short time , Social Customer Relationship Management – its goal is to strengthen relationships with customers, improving and strengthening them through more meaningful interactions and social media monitoring are the issues of mining Social Media. 1.3 Research Goal We have seen the need for mining social media. More and more businesses are running through websites. It can be online selling of products, books, music cd’s, movie tickets ,railway or airline tickets, hotel booking. More and more peoples are now member of different social networking sites. By mining social media business can know current trends, customer’s interest, opinions of customer toward products and services. This information can be used in business to reach interested customers more efficiently for advertisement & marketing purpose.
  12. 12. 12 We have focused on Community Detection which is one of the issues of mining social media. Our goal of the research is to analyze existing algorithm for community detection; these algorithms basically use graph theory concept to detect to community and entire social network is represented as graph and nodes of graph shows actor or member of community while edge between pair of nodes shows connection between these members. We try to find out limitations of existing algorithm and proposed solution.
  13. 13. 13 Chapter 2 Elements of Social Media Social networking is based on a certain structure that allows people to communicate and share their information with each other. This structure includes having profiles, friends, blog posts, widgets, and usually something unique to that particular social networking website such as the ability to 'poke' people on Facebook or high-five someone on Hi5.Following section we have discussed elements of social media. 2.1 Profile This is where you tell the world about yourself. Profiles contain basic information, like where you live and how old you are, religious views, contact details, educational background, job or business details, Relationship status, profile picture and personality questions, like who's your favorite actor or politician and what's your favorite book. Figure 2.1: Profile of Social Media Website
  14. 14. 14 2.2 Members Members are trusted people of the site who are allowed to view your profile content (images, video, status), who can post Comments on your profile content or who can send you private messages. You can also see updates on how members added in your account are using social networking sites, such as when they post a new picture or update their profile. Members are the heart and soul of social networking. In Facebook they are known as 'friends'; LinkedIn refers to them as 'connections ‘; while twitter refers to them as ‘followers’ where you can tweet and followers can do reply on your tweet, but all social networks consider member as trusted people’. Figure 2.2: Member of Social Media Website
  15. 15. 15 2.3 Groups Most social networks use groups to help you find people with similar interests. They are both a way to connect with like-minded people and way to identify your interests. For example student of HLICA College’s batch 2010 can create group and can discuss on any topic like exam syllabus, technical events, exam schedule also can discuss on queries and about the solution. 2.4 Discussions A primary focus of groups is to create interaction between users in the form of discussions. Most social networking websites support discussion boards for the groups, and many also allow members of the group to post pictures, music, video clips, and other tidbits related to the group. Figure 2.3: Group of Social Media Website
  16. 16. 16 2.5 Blogs Another feature of some social networks is the ability to create your own blog entries .A blog is a discussion published on the World Wide Web and consisting of entries ("posts") typically displayed in reverse order (the most recent post appears first). Good quality blogs are interactive; allowing visitors to leave comments and even message each other via GUI widgets on the blogs, and this interactivity distinguishes them from other static websites. In that sense, blogging can be seen as a form of social networking. Blog is like article, news or views towards some points and other members can comment their views and opinions on that blogs.  Personal Blogs: The personal blog, an ongoing diary or commentary by an individual is the most common blog. Some sites, such as Twitter, allow bloggers to share thoughts and Figure 2.4: Personal Blogs
  17. 17. 17 Feelings instantaneously with friends and family, and are much faster than emailing or writing. In Facebook its known as Status update.  Corporate and Organizational Blogs A blog can be private, as in most cases, or it can be for business purposes. Blogs used internally to enhance the communication in a corporation or externally for marketing, branding or public relations purposes are called corporate blogs. Similar blogs for clubs and societies are called club blogs, group blogs, or by similar names; typical use is to inform members and other interested parties of club and member activities. For example, member of Facebook group can post views, news, updates, articles on that group and other member can do reply on that post.  Media Blogs Blogs with shorter posts and mixed media types are called media blogs. Example yahoo updates the articles on latest news of celebs, lifestyle, business and technology. People can comment to that blogs. Figure 2.5: Media Blogs
  18. 18. 18 2.6 Widgets A popular way of letting your personality shine through is by gracing your social networking profile with web widgets. Many social networks allow a variety of widgets, and you can usually find interesting widgets located on widget galleries. Figure 2.6: Widgets used in Social Media
  19. 19. 19 Basic Widgets for Social Website or Blog  Photo Badge This photo badge allows you to share your Facebook photos on websites and blogs. Choose from a vertical, horizontal, or two-column layout and also choose the number of photos to be displayed.  Profile Badge Create a Facebook, twitter or LinkedIn profile to share selected profile information on your website. A profile badge will allow your users to easily connect with you and add you as a friend.  Like Box This allows your users to publish their content and activity.  Share Button This powerful widget allows your visitors to share your content image, video, article etc.  Comments Box: This allow member to comment or post on website content.
  20. 20. 20 Chapter 3 Issues of Mining Social Media Social media provide very good services like online chatting, sharing of video, images, online game, online communities and also serve as effective tool for advertising and marketing. Despite of these many features mining social media is really very essential and is not easy work. It comes with issues like community detection, influence maximization, message propagation, monitoring social media and mining customer relationship. 3.1 Community Detection What is community? As we have seen, online social networks such as Twitter, Facebook and Twitter are rapidly gaining popularity. Therefore, social network analysis is becoming a very important in research field .One major topic in social network analysis is the study of communities in social networks for advertisement and marketing to identify target groups. Figure 3.1: Example of Community on Website
  21. 21. 21 A virtual community is a social network of individuals who communicate with each other through particular social media, crossing geographical and political boundaries in order to look for mutual interests or goals. It is huge collections of individuals who interact unusually frequently with each other. Interesting properties shared by member, such as common hobbies, occupations. Community word has been included in various social networking sites. A social network community informs for instance about the following questions:  Who knows whom?  Who knows what?  Who can do what?  Who looks for what?  Who offers what? It provides a wealth of information to its members about other people and allows managing friends and business partners in effective environment. What is Community Detection? Having social media accounts for your business and creating posts for them is not enough. You need to check whether your posting has the right message and addresses your target audience. It needs to find right community for effective advertisement result. Community detection is a different field whose goal is to detect communities within networks. It tries to answer, when should people be considered close enough to be in the same community? In the problem of community detection, goal is detecting communities in real-world graphs such as large social networks, web graphs, and biological network. Partition the network into dense regions of the graph. Such dense
  22. 22. 22 regions typically correspond to entities which are closely related, and can hence be said to belong to a community [8, 9]. The determination of such communities is useful in the context of a variety of applications in social-network analysis, including customer segmentation, recommendations, and influence analysis. As a result, a number of researches have been devoted towards algorithms for solving this problem. Community Detection for Advertisement The social media software enables anyone without knowledge of coding, to post, comment on, share or mash up content, and to form communities around shared interests. Social Media communities are growing at an exponential rate and represent a huge potential market for Advertising & Marketing. The most well- known Social Media communities are Linked In, Facebook, Twitter, and YouTube with blog sites. Social Media Optimization It refers to the use of a number of social media outlets and communities to generate publicity to increase the awareness of a product, brand or event. An important problem in the area of social networking is that of community detection so that the addressed content or posts are available to right audience.
  23. 23. 23 3.2 Influence Maximization Influence maximization is the problem of finding out the person who is working as influencer. [2] For example, a small company develops a cool online application for an online social network and wants to market it through the same network. It has a limited budget such that it can only select a small number of initial users in the network to use it (by giving them gifts or payments). The company wishes that these initial users would love the application and start influencing their friends on the social network to use it, and their friends would influence their friends’ friends and so on, and thus through the word-of-mouth effect a large population in the social network would adopt the application. The problem is whom to select as the initial users so that they eventually influence the largest number of people in the network, This problem, referred to as influence maximization, would be of interest to many companies as well as individuals that want to promote their products, services, and innovative ideas through the powerful word-of-mouth effect (or called viral marketing). Another example we have discuss is, Topsy analyzed the Twitter reaction to the bin Laden raid last year [7]. The analysis began with one person tweeting from Pakistan, and looked at the exposure he received over time. Within the first eight hours of the raid, the Pakistani Twitter user reached around 1, 00,000 exposures. Then someone in U.S. media — the influencer in this case found the initial tweets and retweeted them and, less than one day later, the Pakistani Twitter user had reached 90 million exposures. After the influencer retweet the message, large numbers of followers would also retweet the message, increasing the amplification of that particular tweet.
  24. 24. 24 Maximize influence include this kind of mining or technical issues:  To find out who served as an influencer and was able to amplify that message  How many followers do they have?  Do they get response?  How many external links point to their blog?  How many comments do their blog posts attract?  See how the exposure increased with each amplification  Track how fast the message is trending  Learn the positive and negative sentiment Once you are able to use this analysis to uncover the influencers, you want to be able to reach out to those key experts, as well as to monitor them to find out what they are saying – including whether they are saying well, or bad, things about your brand. You even want to find out to whom they are talking. 3.3 Message Propagation Social websites including Facebook, Twitter, and linkedIn allow users to construct a personal profile, share interesting information with other people, and build relationships within a community. The mode of interaction on social websites is affecting people’s social behaviors and consumer habits. Although many marketing techniques may be used to spread information over a social network, the target consumers should be defined, and the relative suitable messages should be broadcast to them in a certain time period. Consequently, enterprises need a tool to analyze message propagation behavior at different combinations of community and time dimensions. Message propagation is a problem of try to find out message with some pattern or keywords that are spread quickly.
  25. 25. 25 We have provided one example of to understand message propagation more clearly, of research done by Shaozhi and Felix on twitter.com [3]. They collected and analyzed a large data set from the Twitter social network for following event: In June 2009, the news of Michael Jackson's death spread all over the world. Many online social networks were flooded messages related to this breaking event. They started collecting related messages from Twitter.com on June 27th, 2009, two days after the tragedy. Among all the messages which are crawled, the tweets containing “Michael Jackson" or MJ" related messages are selected. After removing the noise, it has been found 5, 49,667 MJ related message posted by 3, 05, 035 users. 5, 48, 102 messages were posted after Jun 25, 2009. Need to analyze following things to know how message is propagate  User id: Unique identifier for the user who posted this message.  Id: The message ID, which is unique for messages posted by the same user. Two messages posted by different users may share the same message ID.  Text: The content of the message.  Created at: The creation time for this message.  Source: Twitter, facebook, yahoo blogs any client software was used to post the message.  In reply to status id: The message ID which this message replies to.  In reply to user id: The user ID which this message replies to. 3.4 Social Media Monitoring Social Media monitoring is about listening to the discussions that take place around your brand in order to find out different views of people. It is a very important tool for social media crisis plan and marketing plan as well.
  26. 26. 26 Here is good example of social media monitoring, Last week I was watching television and saw an interesting advertisement for something called the Total Bib. It kind of made me chuckle, which caused me to tweet something like “Total Bib reminds me of something out of a Saturday Night Live sketch”. The tweet received a few laughs and comments by followers. A few hours later, I received a reply tweet from TotalBib thanking me for the mention in conversation. I was pretty amazed since I was not following them previously; they were simply monitoring the stream. They simply took the time and made the effort to do some simple monitoring of the Twitter stream to identify opportunity. Social media monitoring involves text mining specific keywords on social networking websites, blogs, discussion forums and other social media. Essentially, monitoring software transposes specific words or phrases in unstructured data into numerical values. The numerical values are linked to structured data in a database, allowing the data to be analyzed with traditional data mining techniques. Figure 3.2 Twitter facilities for Monitoring
  27. 27. 27 What are Needs for Social Media Monitoring?  To know negative criticism about your brand, which you can then respond to, turning that unhappy customer into a lifelong brand advocate.  To know positive comments people are saying about your brand, giving you the opportunity to connect further with those individuals.  To detect a social media crisis in the rise, before it builds up and begins to spiral out of control. 3.5 Social CRM Systems: Social CRM is a strategy based around customer engagement and interactions being a by-product. Social CRM is an extension of CRM. It means a back-end process and system for managing different things to different organizations. Social CRM is about try to understand problems of customer regarding to product or service and then solving it. Traditional CRM was very much based around data and information that brands could collect on their customers, all of which would go into a CRM system that then allowed the company to better target various customers.
  28. 28. 28 In social CRM, customer is actually the focal point of how an organization operates. Instead of marketing or pushing messages to customers, brands now talk to and collaborate with customers to solve business problems, empower customers to shape their own experiences and build customer relationships, which will hopefully turn into customer advocates. PR now has a very active role in social CRM (in fact, PR typically owns budgetary control and authority of social initiatives ahead of every other department). In most organizations, PR departments manage the social presence of brands and handle the customer engagement. Figure 3.3 Traditional v/s social CRM System
  29. 29. 29 Chapter4 Algorithms for Community Detection An important problem in the area of social networking is community detection. In the problem of community detection, the goal is to partition the network into dense regions of the graph. Such dense regions typically correspond to entities which are closely related, and can hence be said to belong to a community. The problem of community detection in social networking sites has been broadly studied because of its importance in social networking application. Before discussing the algorithm in detail, we will introduce some notations. G = social network graph where G = ( 𝒱, E) V = vertex set, each vertex in 𝒱 corresponds to an actor in the network, E = edge set, an edge corresponds to a relationship between this pair of actors, We have devoted two kinds of methods for community detection:  Node (vertex) base community detection o Bron–Kerbosch algorithm o Clique percolation Method(CPM) algorithm  Link (edge) base community detection o Girvan–Newman algorithm
  30. 30. 30 4.1 Vertex Base Community Detection In node base community detection Each Node in a group satisfies some properties can make community. Clique is used to describe a group of 2 to 12 (averaging 5 or 6) persons who interact with each other more regularly and intensely than others in the same setting. Maximal Clique is a strongly connected sub- graph in which all nodes are adjacent to each other. In this image Nodes {5, 6, 7, 8} form a maximal clique. In overlapping community structure node can be a member of more than one community. 4.1.1 Bron–Kerbosch Algorithm The BK algorithm used for non-overlapping community structure and on undirected graph. The BK algorithm uses the recursive backtracking paradigm to enumerate all maximal cliques in the graph [6]. Figure 4.1: Clique Graph
  31. 31. 31 Algorithm:  We can find maximal clique using bron-kerbosch algorithm.  At any given point in time it maintains three lists, R, P and X.  The set R contains a set of vertices that represent a maximal clique or can be extended to a maximal clique.  The set P contains vertices that are connected to all vertices in R and can be added to R to make a larger clique  The set X contains vertices that are connected to all vertices in R but are excluded from being added to R because all cliques containing vertices in X have already been enumerated in a different recursion cycle.  N(v) is neighbor of vertex v. Pseudo Code: BronKerbosch (R, P, X): If ( P and X both are empty) { report R as a maximal clique } Choose a pivot vertex u in P ⋃ X for each vertex v in P N(u){ BronKerbosch2(R ⋃ {v}, P ⋃ N(v), X ⋃ N(v)) P: = P {v} X: = X ⋃ {v} }
  32. 32. 32 Figure 4.2: Undirected Graph G Example:  Initially there are 3 sets R = Ø, P = {1, 2, 3, 4, 5, 6}, and X = Ø. o Select pivot node Ʋ which has maximum number of degree or edges Ʋ=2 (these node have degree=3); o Neighbor of Ʋ is n(Ʋ)= {1,3,5} o p n(Ʋ) = {2, 4, 6} (the vertices that are elements of set P but that should not be elements of set N).  The iteration of the inner loop for Ʋ = 2 makes a recursive call to the algorithm with R = {2}, P = {1, 3, 5}, and X = Ø. Within this recursive call Ʋ=1 or Ʋ=3 or Ʋ=5 o if Ʋ =1 then R={1,2} p={5} then  for Ʋ = 5 R={1,2,5} P= Ø ,X= Ø. o If Ʋ=5 then R={5,2} p={3}  then for Ʋ =3 R={1,2,5} P=Ø ,X= Ø o If Ʋ=3 then R={2,3} P=Ø ,X= Ø  Now Ʋ=4(degree=2) makes a recursive call to the algorithm with R = {4}, P = {3, 5,6}, and X = Ø (although vertex 2 belongs to the set X in the outer call to the algorithm, it is not a neighbor of pivot node 4 and is excluded from the subset of X passed to the recursive call). o If Ʋ=3 ,then R={3,4} P=Ø ,X= Ø; o If Ʋ=5 ,then R={5,4} P=Ø ,X= Ø; o If Ʋ=6 then R={6,4} P=Ø ,X= Ø;  In final iteration for Ʋ = 6, there is a recursive call to the algorithm with R = {6}, P = 4 but it already has computed so it is in set X ={4} and set } P=Ø .
  33. 33. 33 BronKerbosch(Ø, {1,2,3,4,5,6}, Ø) BronKerbosch({2}, {1,3,5}, Ø) BronKerbosch({2,3}, Ø, Ø): output {2, 3} BronKerbosch({2,5}, {1}, Ø) BronKerbosch({1,2,5}, Ø, Ø): output {1,2,5} BronKerbosch({4}, {3,5,6}, Ø) BronKerbosch({3,4}, Ø, Ø): output {3,4} BronKerbosch({4,5}, Ø, Ø): output {4,5} BronKerbosch({4,6}, Ø, Ø): output {4,6} BronKerbosch({6}, Ø, {4}): no output  The overlap of these can be used to define communities in several ways. The simplest is to consider only maximal cliques bigger than a minimum size (number of nodes minimum size=2). o Community {1,2,5} Drawbacks:  The Bron-Kerbosch algorithm for finding cliques in a network is very costly, running in worst case time in large scale network (number of nodes are large).  Overlapping community structure that is node which is part of more than one community is not supported. Application:  The union of these cliques then defines a sub graph whose components (disconnected parts) then define communities. Such approaches are often implemented in social network analysis software. UCINET It is a software
  34. 34. 34 Figure 4.3: CPM graph(a) Figure 4.4: CPM graph(b) package for community detection in social network which uses this algorithm to detect community. It was developed by Lin Freeman and Martin.  URL of UCINET: https://sites.google.com/site/ucinetsoftware/home. 4.1.2 Clique Percolation Method Clique percolation is a community detection method developed by Gergely Palla in 2005 [7]. The Clique Percolation Method is a popular approach for analyzing the overlapping community structure of networks. Algorithm:  Find out all cliques of size k (here k=3) in a given network.  Construct a clique graph  Two cliques are adjacent if they share k-1(here k-1 = 2) nodes.  Each connected component in the clique graph form a community Example:  Find Cliques of size 3 Here, {1,2,3},{1,3,4,},{4,5,6},{5,6,7} (5,7,8},{5,6,8},{6,7,8}  Construct a clique graph for only those cliques which are adjacent, that is which are sharing k-1 =2 nodes.
  35. 35. 35  Each connected components in the clique graph form a community.  Communities detected: o {1,2,3,4} o {4,5,6,7,8} Advantage:  It is not too restrictive (unlike cliques that require each node to be connected to all other nodes),  It allows overlaps (a) a node can be a member of several different communities at the same time, and (i) communities can overlap with each other by sharing nodes. Drawback:  Not all the nodes of graph can participate in k- clique community. for example leaf node may be always out of community.  To determine the size of k to find cliques of size K. Applications:  CFinder is free software for finding community in networks, based on the Clique Percolation Method (CPM) developed by Palla.  URL of CFINDER: http://www.cfinder.org
  36. 36. 36 4.2 Link-Base (edge) Community Detection Girvan–Newman Algorithm The Girvan–Newman algorithm (named after Michelle Girvan and Mark Newman) is one of the methods used to detect communities in complex systems. The algorithm is based on the edge betweenness of edges [5]. Betweenness is a centrality measure (used as weight in weighted graph) of a vertex within a graph. The communities are detected by progressively removing edges from the original graph, rather than by adding the strongest edges to an initially empty network. Algorithm: The betweenness of a vertex in a graph G: = (V,E) is computed as follows: 1. For each pair of vertices (s, t) compute the shortest paths between them. 2. For each pair of vertices (s, t) determine the fraction of, shortest paths and total path of vertex pair (s, t). 3. Sum this fraction over all pairs of vertices (s, t). Where is total number of shortest paths from node to node , is the number of total paths.
  37. 37. 37 Procedure: 1. Calculate and assign betweenness 1. Calculate betweenness (weight W) of all the set of vertices V in graph G. 2. Each vertex pair V{ (s1, e1), … ,( sn , en)} will be assigned associated weight W1... Wn etc. 2. The edge with the highest weight Wh is removed. 3. The betweenness of all edges affected by the removal is recalculated. 4. Steps 2 and 3 are repeated until no edges remain. 5. The order in which edges are removed is noted and communities are then detecting using a hierarchical clustering based on reading edges in reverse order. Application:  Snap software’s community detection module uses this algorithm for community detection which is implied under <cmty.h> file.  URL of SNAP: http://snap.stanford.edu/snap/description.html Advantage:  This algorithm is quite sensitive and gives accurate result.  This algorithm is one of the few able to detect community structure at all levels. Drawback: Its major drawback is the computational cost.
  38. 38. 38 Chapter 5 Conclusion and Proposed Solution 5.1 Comparative Analysis of Algorithm Bron–Kerbosch Algorithm CPM Algorithm Girvan– Newman algorithm Node Overlapping Does not allow Allow Allow Computational Time O(3n/3 ) (n= vertices) Its computational time is high as it try to find all k- size cliques in network O(m 2 n) (m=edges n=vertices) Application(software) UCINET CFINDER SNAP Edge content and node content Does Not consider Does not consider Does not consider Based on Vertex structure Vertex structure Edge structure Can work efficiently in given Scale(Number of nodes in graph) Small Large Large Table 5.1 Comparative Analysis of Algorithms
  39. 39. 39 Bron kerbosch algorithm has limitation that it does not support overlapping community structure .Though it is simple and its computational time is less than other two algorithms. It works efficiently in small size social network. CPM algorithm developed by Palla, find all k-size cliques in network rolls by rotating any of its (k-1) edge. Though its computational time is high, it allows one to find community in graph of having node size is (10)5 [4]. Girvan and Newman algorithm is the first modern algorithm which is based on edge structure. Links are iteratively removed based on the value of their betweenness, which expresses the number of shortest paths between pairs of nodes that pass through the link. Its computation time complexity is O(m 2 n) [4]. 5.2 Limitations of Existing Algorithm The existing algorithms for community detection use only the information about the linkage (edge) structure and node structure for community detection. However, in many recent applications, edge content should be consider in order to provide better supervision to the community detection process. That is edge or node content should also be considered while detecting community. While traditional community detection is designed with links and node structure only, the addition of edge content will give more accurate and relevant results to the detection process, because it provides understanding of how the cliques relate to the content on the edges. It is possible that vertices which are poorly linked may sometimes belong to the same community because of a very high amount of similarity between the content itself. Thus, in some cases in which link connectivity and content-based similarity do not agree, it is important to set up criteria to decide whether the node is part of community or not.
  40. 40. 40 For example,  Two nodes might sharing audio, video, text, image etc. Edge content or vertex content can be helpful to detect community more effectively.  In email networks, a communication between two participants can be considered as edge content. Clearly, participants with similar content of communication are much more likely to belong to the same community than those which do not.  In social media networks such as Facebook, users may tag an image with keywords. In such cases, it may be possible to construct a network of both people and images in which the edge content corresponds to the keywords which are used for tagging. Clearly such keywords provide important and useful knowledge about the nature of the underlying community. Figure 5.1: Edge content Example
  41. 41. 41 Table 5.2 Community Detection Example 5.3 Proposed Solution Community detection with edge content and vertex content give more efficient result. Vertex content algorithm works on 2 individual node’s content. While, Edge content works on pairwise content or communication between 2 nodes .From the given example we can clearly see how we can detect community using the edge content passing between two actors or nodes in graph of social media. The graph forms two community named Fasttrack watch and Jet Airways. From figure 5.2 we have created the following table to detect member of community: Fast Track Watch Community member Student_ABC; Student_XYZ, ;Student_PQR Jet Airways Community member Student_XYZ ; Traveler_MNO ; Traveler_RST Name of Node(v) Activity(Edge Content) Keyword Student_ABC Share “Fasttrack” website link with Student_XYZ Fasttrack Student_XYZ (1)Like the link send by Student_ABC of “Fasttrack” watch (2)Comment on the status of Travelor_ MNO about “Jet Airways” Fasttrack, Jet Airways Student_PQR Tag Student_ABC in “Fasttrack” watch Image Fasttrack Traveler _MNO Update status by latest news of “Jet Airway’s” flight J530 launching. Jet Airways Traveler_RST Like the page of “Jet airways” Jet Airways
  42. 42. 42 Figure 5.2: Edge content Community Detection
  43. 43. 43 Future Work: We can develop edge content base algorithm using the concept of matrix and graph theory which consider one additional field of edge content passing from one node to another node to detect community in social media graph. References: [1] J. Han and M. Kamber: “Data Mining Concepts and Techniques”, 2000. [2] Wei Chen, Yajun Wang: “Efficient Influence Maximization in Social Networks.” [3]Shaozhi Ye and Felix Wu: “Measuring Message Propagation and Social Influence Maximization.” [4]Andrea Lancichinetti and Santo Fortunato: ” Community Detection Algorithms Analysis”, 2010. [5]M. E. J. Newman: ”Detecting Community Structure in Networks” , 2003. [6] C Bron, J Kerbosch: “Finding All Cliques of an Undirected Graph”, 1973. [7] G. Palla: “Clique Percolation Method”, 2005.

×