SlideShare a Scribd company logo
1 of 25
Using Digital Traces for User Profiling: the Uncertainty 
of Identity Toolset 
Muhammad Adnan1, Antonio Lima2, Luca Rossi2, Suresh Veluru3, Paul 
Longley1, Mirco Musolesi2, Muttukrishnan Rajarajan3 
1 Department of Geography, University College London 
2 School of Computer Science, University of Birmingham 
3 School of Engineering and Mathematical Sciences, City University London 
Web: www.uncertaintyofidentity.com
Introduction 
• Past years have witnessed a rapid growth of the use of 
online services 
• Online shopping, bank transactions, social networking services 
• Issues related to cyber-crimes, identity frauds, and hacking 
• This project aims to combining real and virtual world 
datasets to better understand the identity of individuals 
• Identities 
• Real world (Name: Forename & Surname) 
• Virtual world (Email addresses, Social media accounts etc)
Introduction 
• This paper presents a framework for the identification and 
profiling of individuals from their 
• Social media accounts 
• E-mail addresses 
• Twitter Geographic Profiler 
• Maps ethno-cultural communities of a person’s friends 
• E-mail Address Profiler 
• Used a database of family names to extract probably identities from 
E-mail addresses 
• Could have potential applications in targeted marketing and 
online fraud detection
Outline 
• Onomap 
• A Name (Forename and Surname) classification system 
• Twitter Geographic Profiler 
• Extracting identities of Twitter users 
• Mapping them to probable ethnic origins 
• E-mail Address Profiler 
• Extracting identities from E-mail addresses 
• Geographic distribution
Onomap classification 
• A name is a person’s ethnic, linguistic, and cultural identity 
• A network of Forename-Surname pairs was created by using 
Pablo 
Forenames Surnames 
Mateos 
Garcia 
Pérez 
... 
Juan 
Rosa 
Marta 
... 
Sánchez 
Rodríguez 
the data from 26 different countries 
• www.onomap.org 
Name: Pablo Mateos
Onomap Classification
Onomap Classification 
• ONOMAP (www.onomap.org) for forename – surname pairs 
Kevin Hodge (English) 
Pablo Mateos (Spanish) 
… 
… 
… 
…
Twitter Geographic Profiler
Twitter Geographic Profiler 
• Given an individual’s Twitter Username or ID 
• Extracts the information of individual’s friends 
• Extracts the forename-surname pairs of the friends 
• Maps forename-surname pairs to Onomap 
• Builds an ethno-cultural profile person’s friends 
• Maps the geographic distribution
Data available through the Twitter API 
• User ID 
• User Creation Date 
• Followers 
• Friends 
• Language 
• Location 
• Name 
• Screen Name or User Name 
• Time Zone 
• Geo Enabled 
• Latitude 
• Longitude 
• Tweet date and time 
• Tweet text
Twitter: getting the ids and usernames 
• Given a Twitter username of a person, we use the Twitter 
API to get the list of friends’ ids 
– A max of 15 requests every 15 minutes is allowed 
– Each query can get up to 5000 ids 
– Generally enough to download all the ids 
• Using the ids, we fetch the name associated to each id 
– Limited to 180 requests every 15 min 
– Returns a single string from which we need to extract the name 
and surname tokens 
– Not necessarily a valid forename + surname! 
• E.g., “University of Birmingham”, “John1965”, “ What is Love”, 
“Mystic_mind”
Twitter: getting forename-surname pairs 
• Name field was divided into different tokens 
• Forenames and Surnames were detected by matching the 
string tokens against the database of forename surnames 
pairs of 26 countries 
• Users discarded 
– where tokens were not matched against valid forename and 
surname
Onomap: from names to ethnicity 
• ONOMAP (www.onomap.org) was applied on forename – 
surname pairs 
Kevin Hodge (English) 
Pablo Mateos (Spanish) 
… 
… 
… 
…
Friends’ Ethnicity Histogram 
GEOGRAPHIC PROFILER 
cultural communities of a 
determine the distribution 
groups of the friends of a 
integrate information from two 
Note, that the same ideas 
other Online Social 
Foursquare1. However, 
around different and 
Foursquare’s venues. In this 
because of the general 
not restricted to a specific 
Facebook, information is 
username of the person being 
surname, forename) pairs of 
of names to a list of 
classification of Onomap. 
probable countries of 
estimate respectively the 
set of possible ethno-cultural 
countries. In the following 
details of the tool and 
terms of users' privacy. 
Twitter is directed, in the 
necessarily reciprocated. 
associated with each user, 
following and one for the 
Figure 1: Screenshot of the Twitter Geographic Profiler. The 
bottom part of the screen shows the histogram of the Twitter 
user's friends ethno-cultural groups. 
Once the entire list of friends name + surname pairs has been parsed, we can 
easily estimate the distribution over the set of possible ethno-cultural groups of 
the Twitter user's friends 
her followers. In this 
representing the list of a user's 
actually follow a limited number of profiles, which are then 
accessible even with the rate limitation in place. 
With the list of (surname, forename) pairs to hand, we query 
Onomap to get the ethno-cultural classification associated with 
each (surname, forename) pair, and the 
SearchSurnameTopCountries method to get the list of the 
countries where an instance of a given surname was observed.
pair among the extracted tokens. In this work we mark as invalid 
any string that is composed of a single token. If this is the case, 
we skip the profile of the corresponding friend. 
Friends’ Geographic Origins 
Map showing the geographic origin of the Twitter user's friends’ surnames as 
assigned by our tool. Below the map the user is shown a list of the top 10 
countries with the respective frequency. 
If the string contains two or more tokens, we take the first one to 
be the forename and the last one to be the surname. Moreover, 
when a (surname, forename) pair is sent to Onomap, an error 
distance matrix one can Euclidean space for the purpose similar ethno-cultural groups. 
However, note that we expect ethno-cultural groups to vary is, on average a resident of spanning a wider spectrum of Swansea4, due to the substantial in London. As a consequence, performed within a limited been shown that roughly 50% assigned in their profile, and are at town level [10], thus feasible. 
Given the friendships distribution it is also possible to use identify individuals or group of the ethno-cultural groups also infer the ethnicity of an but for which a list of friend To understand the extent of we should stress that the default profile of a user as public. Although private, thus making it impossible profile, when testing our tool profile. Consequently, we download the list of names Figure 2: Map showing the geographical origin of the Twitter 
ethno-cultural profiling. 
user's friends’ surnames as assigned by our tool. Below the 
map the user is shown a list of the top 10 countries with the 
As for the limitations of the respective frequency. 
we observed that the Twitter noise, which can considerably computation. The source of of extracting the surname string introduces unwanted
Twitter Geographic Profiler 
• Potential applications include 
– Measure the level of segregation/integration of a given individual 
(community) as the Shannon entropy of the (average) friends’ 
ethnicity histogram 
– Outliers detection: identify uncommon behaviors, e.g., individuals 
that stand out in terms of the ethno-cultural groups they bond with 
• Limitations 
– Twitter data is very noisy 
– We need a better heuristic to extract forename + surname
E-mail Address Profiler
E-mail address profiler 
• In many instances, an e-mail address encapsulates some 
kind of identity information 
– Forename or surname 
• This tool 
– Extracts identities of individuals from their e-mail addresses 
– Maps the geographical distribution of a Surname in the UK 
• The tool identifies surname or forename as substring in an 
email address 
• Tool builds a suffix tree of an e-mail address and searches 
for probable identities
An example suffix tree 
Suffix Tree for a name aamalam$. The surname for this name is alam$ 
and it has been shown at a leaf node
Surname matching algorithm 
• Surname matching algorithm constructs a suffix tree for an 
email address. 
• Uses a database of surnames and forenames and matches 
them 
– with each substring of the suffix tree 
• A probable identity is the substring where a surname or 
forename matches with the substring 
• We use a database of the most common 10,000 surnames 
in the UK
E-mail Address Profiler: geographic distribution 
• 2007 Electoral Register 
– Name and Address of every individual who is eligible to vote in 
the UK 
• Every postcode in the Electoral Register was converted 
to latitude/longitude values 
• The tool maps all the latitude/longitudes for a particular 
surname geographically 
• Onomap is used to identify the probable ethnic origin of 
a surname
E-mail Address Profiler 
Email: a.singleton@ucl.ac.uk
Geographic distribution 
Surname: Singleton Surname: Keay
Conclusion 
• A toolkit for identity detection and profiling 
• Identification and profiling of ethno-cultural characteristics of 
individuals 
• From Social media accounts and e-mail address 
• Future work will include 
• The extension of Twitter Geographic Profiler for other social media 
services 
• The extension of E-mail address profiler to process a large corpus of 
e-mail address 
• Study of privacy implications on social media services
Thanks for Listening 
Any Questions ?

More Related Content

Similar to Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset

ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013
Leonid Zhukov
 
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
Julie Roest
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
1crore projects
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptx
reenarocky
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
David Graus
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
sodhi3
 

Similar to Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset (20)

ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013ancestry-bigdatasummit-april2013
ancestry-bigdatasummit-april2013
 
Linguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social MediaLinguistic Cues to Deception: Identifying Political Trolls on Social Media
Linguistic Cues to Deception: Identifying Political Trolls on Social Media
 
21 New Age Ways To Essa
21 New Age Ways To Essa21 New Age Ways To Essa
21 New Age Ways To Essa
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
 
Example Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssExample Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free Ess
 
01 Network Data Collection
01 Network Data Collection01 Network Data Collection
01 Network Data Collection
 
Duke talk
Duke talkDuke talk
Duke talk
 
Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)Relationships In Wbs Ns (Tin180 Com)
Relationships In Wbs Ns (Tin180 Com)
 
CSE5656 Complex Networks - Dunbar's Number
CSE5656   Complex Networks - Dunbar's NumberCSE5656   Complex Networks - Dunbar's Number
CSE5656 Complex Networks - Dunbar's Number
 
Measuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual ReferenceMeasuring Anonymity in Academic Virtual Reference
Measuring Anonymity in Academic Virtual Reference
 
Our digital traces and how they can be missuseed
Our digital traces and how they can be missuseedOur digital traces and how they can be missuseed
Our digital traces and how they can be missuseed
 
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
After Apple Picking Essay. After Apple Picking.docx - After Apple Picking Mic...
 
George Washington (Elementary) Writing Pape
George Washington (Elementary) Writing PapeGeorge Washington (Elementary) Writing Pape
George Washington (Elementary) Writing Pape
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
 
Data for the Humanities
Data for the HumanitiesData for the Humanities
Data for the Humanities
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptx
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
 
Topic models, vector semantics and applications
Topic models, vector semantics and applicationsTopic models, vector semantics and applications
Topic models, vector semantics and applications
 
First Person Narrative. Online assignment writing service.
First Person Narrative. Online assignment writing service.First Person Narrative. Online assignment writing service.
First Person Narrative. Online assignment writing service.
 

More from Dr Muhammad Adnan

More from Dr Muhammad Adnan (8)

Spatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter usersSpatio-temporal demographic classification of the Twitter users
Spatio-temporal demographic classification of the Twitter users
 
Analysing the digital traces of Social Media users
Analysing the digital traces of Social Media usersAnalysing the digital traces of Social Media users
Analysing the digital traces of Social Media users
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and Visualisation
 
Geodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtodsGeodemographics: Open tools and mehtods
Geodemographics: Open tools and mehtods
 
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Great...
 
Spatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identitySpatio-temporal linkage of real and virtual identity
Spatio-temporal linkage of real and virtual identity
 
Visualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographicsVisualising large spatial databases and Building bespoke geodemographics
Visualising large spatial databases and Building bespoke geodemographics
 
Real Time Geodemographics
Real Time GeodemographicsReal Time Geodemographics
Real Time Geodemographics
 

Recently uploaded

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 

Recently uploaded (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset

  • 1. Using Digital Traces for User Profiling: the Uncertainty of Identity Toolset Muhammad Adnan1, Antonio Lima2, Luca Rossi2, Suresh Veluru3, Paul Longley1, Mirco Musolesi2, Muttukrishnan Rajarajan3 1 Department of Geography, University College London 2 School of Computer Science, University of Birmingham 3 School of Engineering and Mathematical Sciences, City University London Web: www.uncertaintyofidentity.com
  • 2. Introduction • Past years have witnessed a rapid growth of the use of online services • Online shopping, bank transactions, social networking services • Issues related to cyber-crimes, identity frauds, and hacking • This project aims to combining real and virtual world datasets to better understand the identity of individuals • Identities • Real world (Name: Forename & Surname) • Virtual world (Email addresses, Social media accounts etc)
  • 3. Introduction • This paper presents a framework for the identification and profiling of individuals from their • Social media accounts • E-mail addresses • Twitter Geographic Profiler • Maps ethno-cultural communities of a person’s friends • E-mail Address Profiler • Used a database of family names to extract probably identities from E-mail addresses • Could have potential applications in targeted marketing and online fraud detection
  • 4. Outline • Onomap • A Name (Forename and Surname) classification system • Twitter Geographic Profiler • Extracting identities of Twitter users • Mapping them to probable ethnic origins • E-mail Address Profiler • Extracting identities from E-mail addresses • Geographic distribution
  • 5. Onomap classification • A name is a person’s ethnic, linguistic, and cultural identity • A network of Forename-Surname pairs was created by using Pablo Forenames Surnames Mateos Garcia Pérez ... Juan Rosa Marta ... Sánchez Rodríguez the data from 26 different countries • www.onomap.org Name: Pablo Mateos
  • 7. Onomap Classification • ONOMAP (www.onomap.org) for forename – surname pairs Kevin Hodge (English) Pablo Mateos (Spanish) … … … …
  • 9. Twitter Geographic Profiler • Given an individual’s Twitter Username or ID • Extracts the information of individual’s friends • Extracts the forename-surname pairs of the friends • Maps forename-surname pairs to Onomap • Builds an ethno-cultural profile person’s friends • Maps the geographic distribution
  • 10. Data available through the Twitter API • User ID • User Creation Date • Followers • Friends • Language • Location • Name • Screen Name or User Name • Time Zone • Geo Enabled • Latitude • Longitude • Tweet date and time • Tweet text
  • 11. Twitter: getting the ids and usernames • Given a Twitter username of a person, we use the Twitter API to get the list of friends’ ids – A max of 15 requests every 15 minutes is allowed – Each query can get up to 5000 ids – Generally enough to download all the ids • Using the ids, we fetch the name associated to each id – Limited to 180 requests every 15 min – Returns a single string from which we need to extract the name and surname tokens – Not necessarily a valid forename + surname! • E.g., “University of Birmingham”, “John1965”, “ What is Love”, “Mystic_mind”
  • 12. Twitter: getting forename-surname pairs • Name field was divided into different tokens • Forenames and Surnames were detected by matching the string tokens against the database of forename surnames pairs of 26 countries • Users discarded – where tokens were not matched against valid forename and surname
  • 13. Onomap: from names to ethnicity • ONOMAP (www.onomap.org) was applied on forename – surname pairs Kevin Hodge (English) Pablo Mateos (Spanish) … … … …
  • 14. Friends’ Ethnicity Histogram GEOGRAPHIC PROFILER cultural communities of a determine the distribution groups of the friends of a integrate information from two Note, that the same ideas other Online Social Foursquare1. However, around different and Foursquare’s venues. In this because of the general not restricted to a specific Facebook, information is username of the person being surname, forename) pairs of of names to a list of classification of Onomap. probable countries of estimate respectively the set of possible ethno-cultural countries. In the following details of the tool and terms of users' privacy. Twitter is directed, in the necessarily reciprocated. associated with each user, following and one for the Figure 1: Screenshot of the Twitter Geographic Profiler. The bottom part of the screen shows the histogram of the Twitter user's friends ethno-cultural groups. Once the entire list of friends name + surname pairs has been parsed, we can easily estimate the distribution over the set of possible ethno-cultural groups of the Twitter user's friends her followers. In this representing the list of a user's actually follow a limited number of profiles, which are then accessible even with the rate limitation in place. With the list of (surname, forename) pairs to hand, we query Onomap to get the ethno-cultural classification associated with each (surname, forename) pair, and the SearchSurnameTopCountries method to get the list of the countries where an instance of a given surname was observed.
  • 15. pair among the extracted tokens. In this work we mark as invalid any string that is composed of a single token. If this is the case, we skip the profile of the corresponding friend. Friends’ Geographic Origins Map showing the geographic origin of the Twitter user's friends’ surnames as assigned by our tool. Below the map the user is shown a list of the top 10 countries with the respective frequency. If the string contains two or more tokens, we take the first one to be the forename and the last one to be the surname. Moreover, when a (surname, forename) pair is sent to Onomap, an error distance matrix one can Euclidean space for the purpose similar ethno-cultural groups. However, note that we expect ethno-cultural groups to vary is, on average a resident of spanning a wider spectrum of Swansea4, due to the substantial in London. As a consequence, performed within a limited been shown that roughly 50% assigned in their profile, and are at town level [10], thus feasible. Given the friendships distribution it is also possible to use identify individuals or group of the ethno-cultural groups also infer the ethnicity of an but for which a list of friend To understand the extent of we should stress that the default profile of a user as public. Although private, thus making it impossible profile, when testing our tool profile. Consequently, we download the list of names Figure 2: Map showing the geographical origin of the Twitter ethno-cultural profiling. user's friends’ surnames as assigned by our tool. Below the map the user is shown a list of the top 10 countries with the As for the limitations of the respective frequency. we observed that the Twitter noise, which can considerably computation. The source of of extracting the surname string introduces unwanted
  • 16. Twitter Geographic Profiler • Potential applications include – Measure the level of segregation/integration of a given individual (community) as the Shannon entropy of the (average) friends’ ethnicity histogram – Outliers detection: identify uncommon behaviors, e.g., individuals that stand out in terms of the ethno-cultural groups they bond with • Limitations – Twitter data is very noisy – We need a better heuristic to extract forename + surname
  • 18. E-mail address profiler • In many instances, an e-mail address encapsulates some kind of identity information – Forename or surname • This tool – Extracts identities of individuals from their e-mail addresses – Maps the geographical distribution of a Surname in the UK • The tool identifies surname or forename as substring in an email address • Tool builds a suffix tree of an e-mail address and searches for probable identities
  • 19. An example suffix tree Suffix Tree for a name aamalam$. The surname for this name is alam$ and it has been shown at a leaf node
  • 20. Surname matching algorithm • Surname matching algorithm constructs a suffix tree for an email address. • Uses a database of surnames and forenames and matches them – with each substring of the suffix tree • A probable identity is the substring where a surname or forename matches with the substring • We use a database of the most common 10,000 surnames in the UK
  • 21. E-mail Address Profiler: geographic distribution • 2007 Electoral Register – Name and Address of every individual who is eligible to vote in the UK • Every postcode in the Electoral Register was converted to latitude/longitude values • The tool maps all the latitude/longitudes for a particular surname geographically • Onomap is used to identify the probable ethnic origin of a surname
  • 22. E-mail Address Profiler Email: a.singleton@ucl.ac.uk
  • 23. Geographic distribution Surname: Singleton Surname: Keay
  • 24. Conclusion • A toolkit for identity detection and profiling • Identification and profiling of ethno-cultural characteristics of individuals • From Social media accounts and e-mail address • Future work will include • The extension of Twitter Geographic Profiler for other social media services • The extension of E-mail address profiler to process a large corpus of e-mail address • Study of privacy implications on social media services
  • 25. Thanks for Listening Any Questions ?