Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2
1. Created by The Curiosity Bits Blog (curiositybits.com)
Download the Python code used in the tutorial
Codes provided by Dr. Gregory D. Saxton
Mining Twitter User Profile on
Python
1
2. Prerequisite
Setting up API keys: pg.4-6
Installing necessary Python libraries: pg.7-8
Creating a list ofTwitter screen-names: pg.9
Setting up a SQLite Database to storeTwitter data: pg.10-14
But, if you are a Python newbie, so let’s start with the
very basics.
2
3. We assume you are a Python newbie, so let’s start with the
very basics.
• Choosing the right Python platform: Python is a programing
language, but you can use different software packages to write, edit
and run Python codes. We choose Anaconda which is free to
download, and the Python version is 2.7.
• Once you install Anaconda, you can play around Python codes in
Spyder
3
4. Setting up API keys
• We need keys to getTwitter data throughTwitter API
(https://dev.twitter.com/).You need: API Key, API Secret, Access token,
Access token secret.
• First, go to https://dev.twitter.com/, and sign in yourTwitter account. Go
to my applications page to create an application.
4
5. Enter any name that makes sense to
you
Enter any text that makes sense to
you
you can enter any legitimate URL, here, I put in
the URL of my institution.
Same as above, you can enter any legitimate URL,
here, I put in the URL of my institution.
Setting up API keys
5
6. • After creating the app, go to API Keys page, scroll down to the
bottom and click Create my access token. Wait for a few minutes
and refresh the page, then you get all your keys!
Setting up API keys
you need API Key, API Secret, Access token, Access token secret.
6
7. Installing necessary Python libraries
Think of Python libraries as the apps running on your operating
system.To use our code, you need the following libraries:
• Simplejson (https://pypi.python.org/pypi/simplejson)
• Sqlite3 (http://sqlite.org/)
• Sqlalchemy (http://www.sqlalchemy.org/)
• Twython
(https://twython.readthedocs.org/en/latest/index.html)
7
8. Installing necessary Python libraries
To install the libraries, go to Start menu and type in CMD and run the CMD file as
administrator. Once you are on CMD, type in the command line pip install, followed by the
name of Python library. For example, to install Twython, you need to type pip install
twython, and press enter. Use this procedure to Install all necessary libraries.
8
9. • Our Python code enables gathering profile information for multiple
Twitter users. So, first let’s create a list of users.The list should be in
.csv format and contains three columns (in accordance to the
configuration in our Python code). Specially, it looks like this:
Creating a list ofTwitter screen-names
The first column lists sequential
numbers
the second column listsTwitter
screen-names you are interested
in
For the third column, I entered 1
all throughout, but you can leave
it blank.
9
10. Setting up a SQLite Database to storeTwitter data
You need a storage for incoming data fromTwitterAPI.That
is what databases are for.We use SQLite, a Python library
based on SQL. SQL is a common relational database
management system (RDBMS). In previous steps, you have
installed this sqlite library (sqlite3). On top of that, you can
download a database browser to view and edit the database
just like an Excel file.
Go to http://sqlitebrowser.sourceforge.net/ and download
SQLite Database Browser. It allows you to view and edit
SQLite databases. 10
11. Setting up a SQLite Database to storeTwitter data
Once you have the files downloaded, run the following file.
11
12. Setting up a SQLite Database to storeTwitter data
Now, we need to import theTwitter users list into a SQLite database.To do that,
create a new database. Remember the database file name because we need to
write that into Python code.
The default file extension for sqlite is .sqlite, to prevent future complications,
add the extension .sqlite when you save a file in SQLite database browser,.
12
13. File-Import-Table From CSV File, import the
.csv file you saved. Name the imported table as
accounts.This table name corresponds to the
one we will use in Python code. After you click
create, the csv list will be loaded into the
database, and you can browse it in Browse
Data. Lastly, remember to save the database.
Setting up a SQLite Database to storeTwitter data
Stay on the database file you just created.
13
14. Setting up a SQLite Database to storeTwitter data
Now, we need to modify the imported table.
Go to Edit-ModifyTables, then use Edit field
to change column names.To correspond to our
Python code, name the first column as rowed,
and FiledType as Integer; the second column
as screen_name, and Field type String, and the
third as user_type, and String. In the end, the
database table is defined as the screen-shoted.
14
15. Now, moving on to the actual Python code…
Download the Python code, and open it inAnaconda
15
16. There are only a few places you need to change, but let’s
walk through the code first…
The first block of code is to import necessary Python libraries
Make sure you have
installed all these
necessary libraries
16
17. The second block is where you need to enter the keys we have obtained in the
beginning. Just copy and paste the keys inside quotation mark.
API Key
API secret
Access token
Access token secret
17
18. The third block is where we define columns in SQLite database. For now, we do not
need to edit anything here.
18
19. The fourth block is where we ask the Python code to getTwitter user profile
information based on a list of users already saved in SQLite database. Here, you will
see that table names and the column names correspond to the ones we previously
saved in SQLite.
19
20. The fifth block is where we make specific request throughTwitter API to
get data:
Here, we ask Python to
get one recent status
from the listed user.This
procedure returns the
user’s profile
information.We will
discuss what profile
information is available
later on.
20
21. The raw output fromTwitter API is in JSON format. JSON is a standardized way of
storing information. Now we need to map the information in JSON format to the
tables in database. Notice that each column in the database represents aTwitter
output variable.
e.g. A Twitter user’s profile description is
stored as description under user in
JSON. This line of code maps the
profile description in JSON to the
database column named
from_user_description.
21
22. You need to change the file path and file name here
(RECOMMENDED).
If the Python file and your SQLite database are in the
same folder, just paste your database name here.
22
23. Now, you are ready to run the code. Go to Run, and choose Execute in a new dedicated
Python interpreter. The first option Execute in current Python or IPython interpreter
does not work on my end, but may be working on your computer.
23
24. Now, look at the right-side bar in Anaconda.
Oops, looks like I am getting error messages!
ERRORS!!
Don’t panic! Its likely you will hit roadblocks
when you run Python codes. So, it is important
to learn to debug.
For this error, it is likely because I saved the
Python file in a folder that is not a default
Python folder.
But what is default Python folder ?
24
25. the simple way to find out your default
Python folder is
• On a WINDOWS machine, In Start menu, right-click the Computer
and choose Properties
25
27. In my case, C:AnacondaLibsite-packages is my default Python folder. So I moved the
Python code there, edited the file path in the code, and ran it. Here you go, the code is
running and is getting what we want! If you go check the database file, you will see a
new table named typhoon is created (you can change the table name in the Python
code), and it includes the listed users’ recent tweets and profile information.
27
28. Oops! Error again!
Twitter API has rate limit.
Based on the version ofTwitter API in our
Python code, you can get 300ish users per
15 minutes. Once you hit the limit, you
will see the error message shown in the
screenshot.
There are two ways to deal with the
restriction:
1. wait for 15 minutes for another run;
2. create multipleTwitter apps and get
multiple keys. Once you use up the quota
in one run, paste in a new key to start a
new run!
28
29. If putting 0 here, the code starts with the user listed in the first row.
Because we will hit rate limit, you will need to run the code multiple times
to complete crawling all users on the list. Make sure to change the starting
row number!
For example, in the first run, you get user (0) to user (150), and hit rate
limit.You should put 151 in the second run to start with the user listed on
the 150th row. 29
30. A list ofTwitter output variables
Go to SQLite Database Browser and select the table typhoon (again, this is the name we
gave in Python code).You will see output variables across columns.
30
31. A list ofTwitter output variables
Some key variables related to user profile:
• from_user_screen_name: user’sTwitter screen-name
• from_user_followers_count: how many people are following the user
• from_user_friends_count: how many people this user is following
• from_user_listed_count: how many times the user is listed in other users’ public
lists
• from_user_favourites_count: how many times the user is favored (liked) by
other users
• from_user_statuses_count: how many tweets has the user sent
• from_user_description: the user’s profile bio
• from_user_location: location
• from_user_created_at: when is the account created
31
32. A list ofTwitter output variables
File – Export –Table as CSV to export the data into csv. format. Make sure to
add the .csv file extension name.
32
33. Please send your questions and comments to
weiaixu [at] buffalo dot edu
33