ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges
1. Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges Image by campoalto Axel Bruns / Jean Burgess ARC Centre of Excellence for Creative Industries and Innovation, Brisbane a.bruns@qut.edu.au – @snurb_dot_info je.burgess@qut.edu.au – @jeanburgesshttp://mappingonlinepublics.net – http://cci.edu.au/ Thomas Nicolai / Lars Kirchhoff Sociomantic Labs, Berlin thomas.nicolai@sociomantic.com / lars.kirchhoff@sociomantic.com http://sociomantic.com/
19. Blog Network (between known blogs only)(~8500 blogs / 17 July to 25 Aug. 2010 / All page links / Node size: Indegree) parenting politics food arts & crafts design and style
23. Data Processing – Twitter Tools: Gawk – Scripting tool für CSV processing (open source) Excel – Data aggregation, pivot tables and charts Leximancer / WordStat – Keyword extraction, co-occurence matrices Gephi – Network analysis and visualisation (open source) # Extract @replies for network visualisation # # this script takes a CSV archive of tweets, and reworks it into network data for visualisation # # expected data format: # text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type, # geo_coordinates_0,geo_coordinates_1,created_at,time # # output format: # from,to,tweet,time,timestamp # # the script extracts @replies from tweets, and creates duplicates where multiple @replies are # present in the same tweet - e.g. the tweet "@one @two hello" from user @user results in # @user,@one,"@one @two hello" and @user,@two,"@one @two hello" # # Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au BEGIN { print "from,to,tweet,time,timestamp" } /@([A-Za-z0-9_]+)/ { a=0 do { match(substr($1, a),/@([A-Za-z0-9_]+)?/,atArray) a=a+atArray[1, "start"]+atArray[1, "length"] if (atArray[1] != 0) print $3 "," atArray[1] "," $1 "," $12 "," $13 } while(atArray[1, "start"] != 0) } # filter.awk - Filter list of tweets # # this script takes a CSV or other list of tweets, and removes any lines that don't include RT @username # the script preserves the first line, expecting that it contains header information # # script expects command-line argument search={searchcriteria} _before_ the input CSV filename # enclose the search term in quotation marks if it contains any special characters # # e.g.: gawk -F , -f filter.awk search="(julia|gillard)" tweets.csv >filteredtweets.csv # # expected data format: # CSV or simple list of tweets, line-by-line # # output format: # same as above, listing only retweets # # Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au BEGIN { getline print $0 } tolower($0) ~ search { print $0 }
29. Challenges Twapperkeeper relies on #hashtags Problem if #hashtags are inconsistent/unclear Follow-on @replies and retweets may not continue to use #hashtags May miss early developments – e.g. #hashtagstandardisation Need to look at overall user activity / Twitterfirehose for more comprehensive picture Need to track baseline activity to understand how exceptional acute events are Ethical considerations: Using only publicly available data (no protected tweets, no firewalled blogs) But technical publicness not enough – ‘publicly available’ ≠ ‘meant to be public’ No easy answers – #hashtags probably indicate intention to be public, but may not Need to consider data storage and publication carefully, too See more at mappingonlinepublics.net – up next: time-based animations... Or find us at @snurb_dot_info and @jeanburgess