SlideShare a Scribd company logo
1 of 98
Download to read offline
Austin Data 2014 
Grounding Text 
Jason Baldridge 
@jasonbaldridge 
Associate Professor Co-founder & Chief Scientist 
Friday, September 5, 14
What does “barbecue” mean? 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 14
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 14
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 14
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 14
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 14
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 14
What does “barbecue” mean? Barbecue’ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
2 
Friday, September 5, 14
What I thought semantics was before 2005 
From: John Enrico and Jason Baldridge. 2011. Possessor Raising, Demonstrative Raising, Quantifier 
Float and Number Float in Haida. International Journal of American Linguistics. 77(2):185-218 
© 2012 Jason M Baldridge Text Analytics Summit, June 2013 
3 
Friday, September 5, 14
Updated perspective a la Ray Mooney (UT Austin CS) 
http://www.cs.utexas.edu/users/ml/slides/chen-icml08.ppt 
© 2012 Jason M Baldridge Text Analytics Summit, June 2013 
4 
Friday, September 5, 14
http://www.lib.Travel at the Turn of the 20th Century utexas.edu/books/travel/index.html 
© 2012 Jason M Baldridge Text Analytics Summit, June 2013 
5 
Friday, September 5, 14
Motivation: Google Lit Trips [http://www.googlelittrips.com/] 
http://www.googlelittrips.com/GoogleLit/9-12/Entries/2006/11/1_The_Grapes_of_Wrath_by_John_Steinbeck.html 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
6 
Grapes of Wrath in Google Earth 
Text 
Friday, September 5, 14
Motivation: Google Lit Trips [http://www.googlelittrips.com/] 
http://www.googlelittrips.com/GoogleLit/9-12/Entries/2006/11/1_The_Grapes_of_Wrath_by_John_Steinbeck.html 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
6 
Grapes of Wrath in Google Earth 
Text 
Friday, September 5, 14
Crisis response: Haiti earthquake 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
7 
Friday, September 5, 14
Crisis response: Haiti earthquake 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
7 
Friday, September 5, 14
Look, Mom, no hands! (Err, um... no metadata.) 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
8 
Friday, September 5, 14
Look, Mom, no hands! (Err, um... no metadata.) 
Topics with a clear, circumscribed 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
8 
geographic focus emerge! 
Friday, September 5, 14
But, of course, metadata is now plentiful. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
9 
Friday, September 5, 14
Geotagged Wikipedia 
30° 17′ N 97° 44′ W 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
10 
Friday, September 5, 14
01:55:55 RT @USER_dc5e5498: Drop and give me 50.... 
05:09:29 I said u got a swisher from redmond!? He said nah kirkland! 
Lol..ooooooooOkay! 
05:57:35 Lmao!:) havin a good ol time after work! Unexpected! #goodtimes 
06:00:09 RT @USER_d5d93fec: #letsbereal .. No seriously, #letsbereal>>lol. 
Don't start. 
06:00:37 On my way to get @USER_60939380 yeee! She want some of this 
strawberry! Sexy! 
... 
47°31’41’’ N 122°11’52’’ W 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
11 
Geotagged Twitter 
Friday, September 5, 14
01:55:55 RT @USER_dc5e5498: Drop and give me 50.... 
05:09:29 I said u got a swisher from redmond!? He said nah kirkland! 
Lol..ooooooooOkay! 
05:57:35 Lmao!:) havin a good ol time after work! Unexpected! #goodtimes 
06:00:09 RT @USER_d5d93fec: #letsbereal .. No seriously, #letsbereal>>lol. 
Don't start. 
06:00:37 On my way to get @USER_60939380 yeee! She want some of this 
strawberry! Sexy! 
... 
47°31’41’’ N 122°11’52’’ W 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
11 
Geotagged Twitter 
Friday, September 5, 14
Document geolocation: where is this person? 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
12 
Friday, September 5, 14
Language modeling approach 
Amsterdam, Zaandam, Amstelveen, Diemen, Landsmeer ... 
Frankfurt, Frechen, Hürth, Brühl, Wesseling, ... 
Wing & Baldridge 2011: Simple supervised document geolocation with geodesic grids. 
© 2013 Jason M Baldridge Text 13 Analytics Summit, June 2013 
Friday, September 5, 14
Language modeling approach 
Amsterdam, Zaandam, Amstelveen, Diemen, Landsmeer ... 
Frankfurt, Frechen, Hürth, Brühl, Wesseling, ... 
Wing & Baldridge 2011: Simple supervised document geolocation with geodesic grids. 
© 2013 Jason M Baldridge Text 13 Analytics Summit, June 2013 
Friday, September 5, 14
Where’s a word on Earth? 
beach mountain 
wine barbecue 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
Friday, September 5, 14
Where’s a word on Earth? 
beach mountain 
wine barbecue 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
Friday, September 5, 14
Locations of Twitter users are not uniformly distributed! 
(Small) GeoUT (Twitter) plotted 
on Google Earth, one pin per user. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
15 
Density of (all) 
documents in GeoUT 
over the USA 
(390 million tweets) 
Friday, September 5, 14
k-d tree for geotagged Wikipedia, looking at N. America 
Roller, Speriosu, Rallapalli, Wing & Baldridge 2014: 
Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
16 
Friday, September 5, 14
k-d tree for geotagged Wikipedia, looking at N. America 
Roller, Speriosu, Rallapalli, Wing & Baldridge 2014: 
Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
16 
Friday, September 5, 14
Pre-grid clustering [Erik Skiles, MA thesis, UT Austin, Ling] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
17 
Friday, September 5, 14
Four clusters on GeoUT (390 million tweets) 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
18 
Friday, September 5, 14
Four clusters on GeoUT (390 million tweets) 
All tweets 
West coast East coast Midwest & South Spanish language 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
18 
Friday, September 5, 14
Automatic document geolocation 
[Serdyukov, Murdock, & van Zwol 2009; Cheng, Caverlee, & Lee 2010; Wing & Baldridge 2011] 
Friday, September 5, 14
Automatic document geolocation 
[Serdyukov, Murdock, & van Zwol 2009; Cheng, Caverlee, & Lee 2010; Wing & Baldridge 2011] 
Friday, September 5, 14
Image geo-location: http://graphics.cs.cmu.edu/projects/im2gps/ 
© 2010 Jason M Baldridge Text Analytics Summit, June 2013 
Friday, September 5, 14
Performance (kd-tree with clustering) 
Wikipedia (entire world) 
Half of documents geotagged within 12 km of truth 
Percent of documents within 166km (100 miles): 91% 
Twitter (USA) 
Half of users geotagged within 330 km of truth 
Percent of documents within 166km (100 miles): 40% 
For better or worse, it soon might not matter 
whether you have location turned on or not... what 
you say is where you are / are from. (Also, other 
factors, e.g. who you are linked to, of course.) 
© 2010 Jason M Baldridge Text Analytics Summit, June 2013 
21 
Friday, September 5, 14
Hierarchical geo-location with logistic regression 
Wing & Baldridge 2014: Hierarchical Discriminative Classification for Text-Based Geolocation. 
© 2010 Jason M Baldridge Text Analytics Summit, June 2013 
22 
Friday, September 5, 14
Performance (kd-tree with clustering) 
Twitter (USA) 
Half of users geotagged within 170 km of truth 
Percent of documents within 166km (100 miles): 49% 
Twitter (World) 
Half of users geotagged within 490 km of truth 
Percent of documents within 166km (100 miles): 31% 
Flickr (entire world) 
Half of documents geotagged within 18 km of truth 
Percent of documents within 166km (100 miles): 66% 
© 2010 Jason M Baldridge Text Analytics Summit, June 2013 
23 
Friday, September 5, 14
Hierarchical logistic regression beats flat naive Bayes 
Accuracy @ 161 km, kd-tree grid 
Naive Bayes Hierarchical LR 
© 2010 Jason M Baldridge Text Analytics Summit, June 2013 
24 
Twitter USA 
Twitter World 
Flickr 
English Wikipedia 
German Wikipedia 
Portuguese Wikipedia 
36.2 49.2 
28.7 31.3 
58.5 66.0 
84.5 88.9 
89.3 90.2 
77.1 89.5 
Friday, September 5, 14
Logistic regression weights good features heavily 
© 2010 Jason M Baldridge Text Analytics Summit, June 2013 
25 
Friday, September 5, 14
Toponym (place name) resolution 
They visit Portland every year. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
26 
Friday, September 5, 14
Toponym (place name) resolution 
They visit Portland every year. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
26 
? 
? 
? 
? 
? 
? 
? 
? 
? 
? 
? 
? 
? 
? 
? 
? 
? 
Which Portland? (Also: Canada, Australia, Ireland...) 
Friday, September 5, 14
Toponym resolution in context 
Although Elisha Newman made the first land entry in the township of Portland (June, 
1833), he did not become a settler until three years later, by which time a few settlers had 
located in the town. From Mr. Newman's story, it appears that early in 1833, he was 
visiting friends in Ann Arbor, and during an evening conversation discussed with others 
the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph 
Wood) remarked that he had been out with the party sent to survey Ionia and other 
counties, and that the surveyors were struck by the valuable water-power at the mouth of 
the Looking Glass River, saying there would surely be a village there some day. 
Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking 
Glass. Following up his impulse, he made ready to start at once, and, accompanied by 
James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. 
Being satisfied with the location, he returned Eastward with his companions, and at White 
Pigeon made his land entry. 
Newman did not return for a permanent settlement until the spring of 1836, and 
meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the 
bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up 
a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he 
brought the house into decent shape went over to Hunt's at Lyons for his family, whom he 
had left there against such time as he should have affairs prepared for their comfort. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
27 
Friday, September 5, 14
Spatial minimality 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
28 
Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by 
which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, 
and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph 
Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable 
water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. 
Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to 
start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with 
the location, he returned Eastward with his companions, and at White Pigeon made his land entry. 
Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of 
land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the 
Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, 
whom he had left there against such time as he should have affairs prepared for their comfort. 
Friday, September 5, 14
Spatial minimality 
GeoNames 
4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-15 
4084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-15 
4127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-14 
4169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-15 
4217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-05 
4277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-15 
4305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-15 
4305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-15 
4305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15 
404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-29 
4521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-29 
4650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-14 
4720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-14 
4841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-14 
4871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-14 
4906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-15 
5006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-14 
5746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
28 
Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by 
which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, 
and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph 
Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable 
water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. 
Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to 
start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with 
the location, he returned Eastward with his companions, and at White Pigeon made his land entry. 
Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of 
land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the 
Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, 
whom he had left there against such time as he should have affairs prepared for their comfort. 
Friday, September 5, 14
Spatial minimality 
Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by 
which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, 
and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph 
Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable 
water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. 
Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to 
start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with 
the location, he returned Eastward with his companions, and at White Pigeon made his land entry. 
Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of 
land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the 
Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, 
whom he had left there against such time as he should have affairs prepared for their comfort. 
GeoNames 
4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-15 
4084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-15 
4127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-14 
4169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-15 
4217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-05 
4277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-15 
4305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-15 
4305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-15 
4305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15 
404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-29 
4521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-29 
4650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-14 
4720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-14 
4841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-14 
4871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-14 
4906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-15 
5006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-14 
5746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14 
Toponym # Locations 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
28 
Ann Arbor 
Detroit 
Ionia 
Lyons 
Portland 
White Pigeon 
1 
>7 
>4 
>15 
>17 
1 
Friday, September 5, 14
Spatial minimality 
Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by 
which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, 
and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph 
Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable 
water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. 
Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to 
start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with 
the location, he returned Eastward with his companions, and at White Pigeon made his land entry. 
Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of 
land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the 
Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, 
whom he had left there against such time as he should have affairs prepared for their comfort. 
GeoNames 
4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-15 
4084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-15 
4127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-14 
4169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-15 
4217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-05 
4277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-15 
4305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-15 
4305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-15 
4305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15 
404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-29 
4521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-29 
4650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-14 
4720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-14 
4841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-14 
4871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-14 
4906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-15 
5006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-14 
5746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14 
Ionia Lyons 
Toponym # Locations 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
28 
Portland 
White Pigeon 
Ann Arbor 
Detroit 
Ionia 
Lyons 
Portland 
White Pigeon 
1 
>7 
>4 
>15 
>17 
1 
Friday, September 5, 14
Spatial minimality often fails 
I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. 
By far, Ashville is more hip, especially West Asheville. Asheville has a lot in common with 
Portland. Austin, I've never been to so I cannot comment. But what makes a place cool and 
hip, in my opinion are that give a area "punch". There are a lot of ingredients. One is geography. 
Add a college or university (and all that they bring- and draw), good restaurants, a good music 
scene, a progressive attitude and tolerance. Hmmm. I'm sure there are many more to ponder. But 
that's my start. Oh, lots of bars! 
From: http://www.city-data.com/forum/austin/1694181-what-makes-city-like-austin-portland-3.html 
City-data.com incorrectly marks “West” and “Portland” 
as the cities in Texas -- presumably because of their 
textual and spatial proximity to “Austin”. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
29 
Friday, September 5, 14
Spatial minimality often fails 
I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. 
By far, Ashville is more hip, especially West Asheville. Asheville has a lot in common with 
Portland. Austin, I've never been to so I cannot comment. But what makes a place cool and 
hip, in my opinion are that give a area "punch". There are a lot of ingredients. One is geography. 
Add a college or university (and all that they bring- and draw), good restaurants, a good music 
scene, a progressive attitude and tolerance. Hmmm. I'm sure there are many more to ponder. But 
that's my start. Oh, lots of bars! 
From: http://www.city-data.com/forum/austin/1694181-what-makes-city-like-austin-portland-3.html 
City-data.com incorrectly marks “West” and “Portland” 
as the cities in Texas -- presumably because of their 
textual and spatial proximity to “Austin”. 
But: it is clear from the text that Portland, Oregon and 
Austin, Texas are the referents, though their states are 
never mentioned and are far from the other locations! 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
29 
Friday, September 5, 14
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikipedia. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
30 
Friday, September 5, 14
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikipedia. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
30 
Friday, September 5, 14
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikipedia. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
30 
Friday, September 5, 14
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikipedia. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
30 
Friday, September 5, 14
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikipedia. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
30 
Friday, September 5, 14
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikipedia. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
30 
Friday, September 5, 14
Toponym classifiers 
Strategy: build a textual classifier per toponym by 
obtaining indirectly labeled examples from Wikipedia. 
P(Portland-OR|music) > P(Portland-ME|music) 
P(Portland-OR|wharf ) < P(Portland-ME|wharf ) 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
30 
Friday, September 5, 14
Results: disambiguating toponyms 
TR-CoNLL 
Reuters News Texts 
August 1996 
Perseus Civil War Corpus 
Books 
Late 19th Century 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
31 
Average error 
distance 
Accuracy Average error 
distance 
Accuracy 
Population 
SPIDER 
(spatial minimality) 
WISTR 
(Wiki supervised) 
SPIDER 
+WISTR 
216 81.0 1749 59.7 
2180 30.9 266 57.5 
279 82.3 855 69.1 
430 81.8 201 85.9 
Take-home message: text classifiers are very effective & 
can be boosted by spatial minimality algorithms. 
Friday, September 5, 14
Identifying, disambiguating, and displaying toponyms 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
32 
Friday, September 5, 14
Back to grounding 
Grounding often involves connecting text to 
knowledge sources and other modalities (image, video) 
& bootstrapping. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
33 
Friday, September 5, 14
Back to grounding 
Grounding often involves connecting text to 
knowledge sources and other modalities (image, video) 
& bootstrapping. 
Also, they can help us create models for deeper 
aspects of language, such as syntactic structure and 
logical form. 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
33 
Friday, September 5, 14
Lexical brain decoding [Yarkoni, Poldrack, Nichols, Van Essen & Wager (2011)] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
34 
Friday, September 5, 14
Lexical brain decoding [Yarkoni, Poldrack, Nichols, Van Essen & Wager (2011)] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
34 
Friday, September 5, 14
He says, she says http://www.tweetolife.com/gender/ 
Friday, September 5, 14
Temporality of words, by hour http://www.tweetolife.com/hour/ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
36 
Friday, September 5, 14
Temporality of words, by hour http://www.tweetolife.com/hour/ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
36 
Friday, September 5, 14
Temporality of expressions, by day: http://www.google.com/trends 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
37 
Friday, September 5, 14
Temporality of expressions, by day: http://www.google.com/trends 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
37 
Friday, September 5, 14
Temporality of expressions, by year: http://ngrams.googlelabs.com/ 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
38 
slave 
trenches aircraft 
war 
Friday, September 5, 14
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
2000 BC 
0 AD 
2000 AD 
4000 BC 
Friday, September 5, 14
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
2000 BC 
0 AD 
2000 AD 
4000 BC 
Friday, September 5, 14
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
2000 BC 
0 AD 
2000 AD 
4000 BC 
Friday, September 5, 14
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
2000 BC 
0 AD 
2000 AD 
4000 BC 
Friday, September 5, 14
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
2000 BC 
0 AD 
2000 AD 
4000 BC 
Friday, September 5, 14
Temporal resolution [Kumar, Lease, and Baldridge 2011] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
39 
2000 BC 
0 AD 
2000 AD 
4000 BC 
Friday, September 5, 14
More modalities: videos [Motwani & Mooney, 2012] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
40 
Friday, September 5, 14
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
41 
beach 
Friday, September 5, 14
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
41 
beach 
Friday, September 5, 14
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
41 
beach 
Friday, September 5, 14
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
41 
beach 
Friday, September 5, 14
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
41 
beach 
Friday, September 5, 14
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
41 
beach 
Friday, September 5, 14
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
41 
beach 
Friday, September 5, 14
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
41 
beach 
Friday, September 5, 14
Beyond word co-occurences for vector-space models 
bear boat car cow hadoop snow water wrench 
3 234 42 4 1 2 325 0 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
41 
beach 
Friday, September 5, 14
Combining distributional models with logics 
Erk (2013): “Towards a semantics for distributional representations.” 
Garrette et al (2012): “A formal approach to linking logical form and vector-space lexical semantics” 
Beltagy et al (2013): “Montague Meets Markov: Deep Semantics with Probabilistic Logical Form” 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
42 
Friday, September 5, 14
Multi-component structured vector-space models 
the children visit the beach 
visit 
children beach 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
43 
Agent 
Patient 
Friday, September 5, 14
Language learning in context [Kim & Mooney, 2013] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
44 
Friday, September 5, 14
Language learning in context [Kim & Mooney, 2013] 
© 2013 Jason M Baldridge Text Analytics Summit, June 2013 
44 
Friday, September 5, 14
All your meaning are belong to us 
Friday, September 5, 14
All your meaning are belong to us 
Friday, September 5, 14
All your meaning are belong to us 
Friday, September 5, 14
Grounding matters 
http://davidrothman.net/2009/09/02/all-your-healthbase-are-belong-to-us-want-em-back/ 
Friday, September 5, 14
Open Source Software (Scala/Java) 
Junto - label propagation 
https://github.com/scalanlp/junto 
Textgrounder - document geolocation 
https://github.com/utcompling/textgrounder 
Fieldspring - toponym resolution 
https://github.com/utcompling/fieldspring 
Low-resource POS tagging 
https://github.com/dhgarrette/low-resource-pos- 
tagging-2013 
Updown - polarity classification 
https://github.com/scalanlp/junto 
OpenNLP - machine learning / NLP 
http://opennlp.apache.org/ 
Friday, September 5, 14
Open Source Software (Scala/Java) 
Junto - label propagation 
https://github.com/scalanlp/junto 
Textgrounder - document geolocation 
https://github.com/utcompling/textgrounder 
Fieldspring - toponym resolution 
https://github.com/utcompling/fieldspring 
Low-resource POS tagging 
https://github.com/dhgarrette/low-resource-pos- 
tagging-2013 
Updown - polarity classification 
https://github.com/scalanlp/junto 
OpenNLP - machine learning / NLP 
http://opennlp.apache.org/ 
Chalk - NLP 
https://github.com/scalanlp/chalk 
Nak - machine learning 
https://github.com/scalanlp/nak 
Breeze - linear algebra 
https://github.com/scalanlp/nak 
ScalaNLP 
Friday, September 5, 14
Open Source Software (Scala/Java) 
Junto - label propagation 
https://github.com/scalanlp/junto 
Textgrounder - document geolocation 
https://github.com/utcompling/textgrounder 
Fieldspring - toponym resolution 
https://github.com/utcompling/fieldspring 
Low-resource POS tagging 
https://github.com/dhgarrette/low-resource-pos- 
tagging-2013 
Updown - polarity classification 
https://github.com/scalanlp/junto 
OpenNLP - machine learning / NLP 
http://opennlp.apache.org/ 
Chalk - NLP 
https://github.com/scalanlp/chalk 
Nak - machine learning 
https://github.com/scalanlp/nak 
Breeze - linear algebra 
https://github.com/scalanlp/nak 
ScalaNLP 
Friday, September 5, 14
This research was sponsored by: 
Grant from the Grant: W911NF-10-1-0533 
Morris Memorial Trust Fund 
Final note: Whitman had it right many years ago! 
- Walt Whitman, A Song of the Rolling Earth (in Leaves of Grass) 
Friday, September 5, 14
Document geolocation 
Supervision 
- documents labeled with latitude & longitude 
Methods 
- Language Modeling for Information Retrieval 
Code 
- Textgrounder: https://github.com/utcompling/textgrounder 
Publications 
- Stephen Roller, Mike Speriosu, Sarat Rallapalli, Benjamin Wing and Jason Baldridge. 2012. Supervised Text-based 
Geolocation Using Language Models on an Adaptive Grid. EMNLP 2012. Jeju, Korea. 
- Benjamin Wing and Jason Baldridge. 2011. Simple supervised document geolocation with geodesic grids. In 
Proceedings of ACL HLT 2011. 
Friday, September 5, 14
Toponym resolution 
Supervision 
- indirectly acquired toponym annotations using a gazeteer 
and geo-annotated Wikipedia 
Methods 
- logistic regression 
- named entity recognition 
Code 
- Fieldspring: https://github.com/utcompling/fieldspring 
Publications 
- Mike Speriosu and Jason Baldridge. Text-Driven Toponym Resolution using Indirect Supervision. To appear in 
proceedings of ACL 2013. 
Friday, September 5, 14

More Related Content

Recently uploaded

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 

Recently uploaded (20)

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 

Featured

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

Grounding Text

  • 1. Austin Data 2014 Grounding Text Jason Baldridge @jasonbaldridge Associate Professor Co-founder & Chief Scientist Friday, September 5, 14
  • 2. What does “barbecue” mean? © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  • 3. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  • 4. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  • 5. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  • 6. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  • 7. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  • 8. What does “barbecue” mean? Barbecue’ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 2 Friday, September 5, 14
  • 9. What I thought semantics was before 2005 From: John Enrico and Jason Baldridge. 2011. Possessor Raising, Demonstrative Raising, Quantifier Float and Number Float in Haida. International Journal of American Linguistics. 77(2):185-218 © 2012 Jason M Baldridge Text Analytics Summit, June 2013 3 Friday, September 5, 14
  • 10. Updated perspective a la Ray Mooney (UT Austin CS) http://www.cs.utexas.edu/users/ml/slides/chen-icml08.ppt © 2012 Jason M Baldridge Text Analytics Summit, June 2013 4 Friday, September 5, 14
  • 11. http://www.lib.Travel at the Turn of the 20th Century utexas.edu/books/travel/index.html © 2012 Jason M Baldridge Text Analytics Summit, June 2013 5 Friday, September 5, 14
  • 12. Motivation: Google Lit Trips [http://www.googlelittrips.com/] http://www.googlelittrips.com/GoogleLit/9-12/Entries/2006/11/1_The_Grapes_of_Wrath_by_John_Steinbeck.html © 2013 Jason M Baldridge Text Analytics Summit, June 2013 6 Grapes of Wrath in Google Earth Text Friday, September 5, 14
  • 13. Motivation: Google Lit Trips [http://www.googlelittrips.com/] http://www.googlelittrips.com/GoogleLit/9-12/Entries/2006/11/1_The_Grapes_of_Wrath_by_John_Steinbeck.html © 2013 Jason M Baldridge Text Analytics Summit, June 2013 6 Grapes of Wrath in Google Earth Text Friday, September 5, 14
  • 14. Crisis response: Haiti earthquake © 2013 Jason M Baldridge Text Analytics Summit, June 2013 7 Friday, September 5, 14
  • 15. Crisis response: Haiti earthquake © 2013 Jason M Baldridge Text Analytics Summit, June 2013 7 Friday, September 5, 14
  • 16. Look, Mom, no hands! (Err, um... no metadata.) © 2013 Jason M Baldridge Text Analytics Summit, June 2013 8 Friday, September 5, 14
  • 17. Look, Mom, no hands! (Err, um... no metadata.) Topics with a clear, circumscribed © 2013 Jason M Baldridge Text Analytics Summit, June 2013 8 geographic focus emerge! Friday, September 5, 14
  • 18. But, of course, metadata is now plentiful. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 9 Friday, September 5, 14
  • 19. Geotagged Wikipedia 30° 17′ N 97° 44′ W © 2013 Jason M Baldridge Text Analytics Summit, June 2013 10 Friday, September 5, 14
  • 20. 01:55:55 RT @USER_dc5e5498: Drop and give me 50.... 05:09:29 I said u got a swisher from redmond!? He said nah kirkland! Lol..ooooooooOkay! 05:57:35 Lmao!:) havin a good ol time after work! Unexpected! #goodtimes 06:00:09 RT @USER_d5d93fec: #letsbereal .. No seriously, #letsbereal>>lol. Don't start. 06:00:37 On my way to get @USER_60939380 yeee! She want some of this strawberry! Sexy! ... 47°31’41’’ N 122°11’52’’ W © 2013 Jason M Baldridge Text Analytics Summit, June 2013 11 Geotagged Twitter Friday, September 5, 14
  • 21. 01:55:55 RT @USER_dc5e5498: Drop and give me 50.... 05:09:29 I said u got a swisher from redmond!? He said nah kirkland! Lol..ooooooooOkay! 05:57:35 Lmao!:) havin a good ol time after work! Unexpected! #goodtimes 06:00:09 RT @USER_d5d93fec: #letsbereal .. No seriously, #letsbereal>>lol. Don't start. 06:00:37 On my way to get @USER_60939380 yeee! She want some of this strawberry! Sexy! ... 47°31’41’’ N 122°11’52’’ W © 2013 Jason M Baldridge Text Analytics Summit, June 2013 11 Geotagged Twitter Friday, September 5, 14
  • 22. Document geolocation: where is this person? © 2013 Jason M Baldridge Text Analytics Summit, June 2013 12 Friday, September 5, 14
  • 23. Language modeling approach Amsterdam, Zaandam, Amstelveen, Diemen, Landsmeer ... Frankfurt, Frechen, Hürth, Brühl, Wesseling, ... Wing & Baldridge 2011: Simple supervised document geolocation with geodesic grids. © 2013 Jason M Baldridge Text 13 Analytics Summit, June 2013 Friday, September 5, 14
  • 24. Language modeling approach Amsterdam, Zaandam, Amstelveen, Diemen, Landsmeer ... Frankfurt, Frechen, Hürth, Brühl, Wesseling, ... Wing & Baldridge 2011: Simple supervised document geolocation with geodesic grids. © 2013 Jason M Baldridge Text 13 Analytics Summit, June 2013 Friday, September 5, 14
  • 25. Where’s a word on Earth? beach mountain wine barbecue © 2013 Jason M Baldridge Text Analytics Summit, June 2013 Friday, September 5, 14
  • 26. Where’s a word on Earth? beach mountain wine barbecue © 2013 Jason M Baldridge Text Analytics Summit, June 2013 Friday, September 5, 14
  • 27. Locations of Twitter users are not uniformly distributed! (Small) GeoUT (Twitter) plotted on Google Earth, one pin per user. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 15 Density of (all) documents in GeoUT over the USA (390 million tweets) Friday, September 5, 14
  • 28. k-d tree for geotagged Wikipedia, looking at N. America Roller, Speriosu, Rallapalli, Wing & Baldridge 2014: Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 16 Friday, September 5, 14
  • 29. k-d tree for geotagged Wikipedia, looking at N. America Roller, Speriosu, Rallapalli, Wing & Baldridge 2014: Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 16 Friday, September 5, 14
  • 30. Pre-grid clustering [Erik Skiles, MA thesis, UT Austin, Ling] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 17 Friday, September 5, 14
  • 31. Four clusters on GeoUT (390 million tweets) © 2013 Jason M Baldridge Text Analytics Summit, June 2013 18 Friday, September 5, 14
  • 32. Four clusters on GeoUT (390 million tweets) All tweets West coast East coast Midwest & South Spanish language © 2013 Jason M Baldridge Text Analytics Summit, June 2013 18 Friday, September 5, 14
  • 33. Automatic document geolocation [Serdyukov, Murdock, & van Zwol 2009; Cheng, Caverlee, & Lee 2010; Wing & Baldridge 2011] Friday, September 5, 14
  • 34. Automatic document geolocation [Serdyukov, Murdock, & van Zwol 2009; Cheng, Caverlee, & Lee 2010; Wing & Baldridge 2011] Friday, September 5, 14
  • 35. Image geo-location: http://graphics.cs.cmu.edu/projects/im2gps/ © 2010 Jason M Baldridge Text Analytics Summit, June 2013 Friday, September 5, 14
  • 36. Performance (kd-tree with clustering) Wikipedia (entire world) Half of documents geotagged within 12 km of truth Percent of documents within 166km (100 miles): 91% Twitter (USA) Half of users geotagged within 330 km of truth Percent of documents within 166km (100 miles): 40% For better or worse, it soon might not matter whether you have location turned on or not... what you say is where you are / are from. (Also, other factors, e.g. who you are linked to, of course.) © 2010 Jason M Baldridge Text Analytics Summit, June 2013 21 Friday, September 5, 14
  • 37. Hierarchical geo-location with logistic regression Wing & Baldridge 2014: Hierarchical Discriminative Classification for Text-Based Geolocation. © 2010 Jason M Baldridge Text Analytics Summit, June 2013 22 Friday, September 5, 14
  • 38. Performance (kd-tree with clustering) Twitter (USA) Half of users geotagged within 170 km of truth Percent of documents within 166km (100 miles): 49% Twitter (World) Half of users geotagged within 490 km of truth Percent of documents within 166km (100 miles): 31% Flickr (entire world) Half of documents geotagged within 18 km of truth Percent of documents within 166km (100 miles): 66% © 2010 Jason M Baldridge Text Analytics Summit, June 2013 23 Friday, September 5, 14
  • 39. Hierarchical logistic regression beats flat naive Bayes Accuracy @ 161 km, kd-tree grid Naive Bayes Hierarchical LR © 2010 Jason M Baldridge Text Analytics Summit, June 2013 24 Twitter USA Twitter World Flickr English Wikipedia German Wikipedia Portuguese Wikipedia 36.2 49.2 28.7 31.3 58.5 66.0 84.5 88.9 89.3 90.2 77.1 89.5 Friday, September 5, 14
  • 40. Logistic regression weights good features heavily © 2010 Jason M Baldridge Text Analytics Summit, June 2013 25 Friday, September 5, 14
  • 41. Toponym (place name) resolution They visit Portland every year. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 26 Friday, September 5, 14
  • 42. Toponym (place name) resolution They visit Portland every year. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 26 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Which Portland? (Also: Canada, Australia, Ireland...) Friday, September 5, 14
  • 43. Toponym resolution in context Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry. Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 27 Friday, September 5, 14
  • 44. Spatial minimality © 2013 Jason M Baldridge Text Analytics Summit, June 2013 28 Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry. Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort. Friday, September 5, 14
  • 45. Spatial minimality GeoNames 4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-15 4084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-15 4127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-14 4169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-15 4217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-05 4277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-15 4305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-15 4305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-15 4305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15 404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-29 4521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-29 4650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-14 4720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-14 4841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-14 4871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-14 4906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-15 5006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-14 5746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 28 Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry. Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort. Friday, September 5, 14
  • 46. Spatial minimality Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry. Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort. GeoNames 4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-15 4084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-15 4127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-14 4169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-15 4217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-05 4277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-15 4305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-15 4305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-15 4305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15 404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-29 4521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-29 4650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-14 4720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-14 4841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-14 4871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-14 4906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-15 5006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-14 5746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14 Toponym # Locations © 2013 Jason M Baldridge Text Analytics Summit, June 2013 28 Ann Arbor Detroit Ionia Lyons Portland White Pigeon 1 >7 >4 >15 >17 1 Friday, September 5, 14
  • 47. Spatial minimality Although Elisha Newman made the first land entry in the township of Portland (June, 1833), he did not become a settler until three years later, by which time a few settlers had located in the town. From Mr. Newman's story, it appears that early in 1833, he was visiting friends in Ann Arbor, and during an evening conversation discussed with others the subject of unlocated lands lying west of Ann Arbor. One of the company (Joseph Wood) remarked that he had been out with the party sent to survey Ionia and other counties, and that the surveyors were struck by the valuable water-power at the mouth of the Looking Glass River, saying there would surely be a village there some day. Mr. Newman was at once taken with the idea of locating lands at the mouth of the Looking Glass. Following up his impulse, he made ready to start at once, and, accompanied by James Newman and Joseph Wood, went out to the Looking Glass on a tour of inspection. Being satisfied with the location, he returned Eastward with his companions, and at White Pigeon made his land entry. Newman did not return for a permanent settlement until the spring of 1836, and meanwhile, in November, 1833, Philo Bogue bought a piece of land on section 28, in the bend of the Grand River, where he proposed to set up a trading post. Unaided he rolled up a log cabin near where the Detroit, Lansing, and Northern depot was located, and when he brought the house into decent shape went over to Hunt's at Lyons for his family, whom he had left there against such time as he should have affairs prepared for their comfort. GeoNames 4048392 Portland Mills Portland Mills 39.7781 -87.00918 P PPL US IN 133 0 223 218 America/Indiana/Indianapolis 2010-02-15 4084605 Portland Portland 32.15459 -87.1686 P PPL US AL 047 0 30 41 America/Chicago 2006-01-15 4127143 Portland Portland Portlend,Портленд 33.2379 -91.51151 P PPL US AR 003 430 38 39 America/Chicago 2011-05-14 4169227 Portland Portland 30.51242 -86.19578 P PPL US FL 131 0 8 14 America/Chicago 2006-01-15 4217115 Portland Portland 34.05732 -85.03634 P PPL US GA 233 0 229 228 America/New_York 2010-09-05 4277586 Portland Portland 37.0778 -97.31227 P PPL US KS 191 0 362 364 America/Chicago 2006-01-15 4305000 Portland Portland 37.12062 -85.44608 P PPL US KY 001 0 220 223 America/Chicago 2006-01-15 4305001 Portland Portland 38.26924 -85.8108 P PPL US KY 111 0 135 138 America/Kentucky/Louisville 2006-01-15 4305002 Portland Portland 38.74812 -84.44772 P PPL US KY 191 0 265 266 America/New_York 2006-01-15 404289 Portland Portland Portlend,Портленд 38.71088 -91.71767 P PPL US MO 027 0 170 172 America/Chicago 2010-01-29 4521811 Portland Portland Portlend,Портленд 39.00341 -81.77124 P PPL US OH 105 0 187 188 America/New_York 2010-01-29 4650946 Portland Portland Portlend,Портленд 36.58171 -86.51638 P PPL US TN 165 11480 244 245 America/Chicago 2011-05-14 4720131 Portland Portland Portlend,Портленд 27.87725 -97.32388 P PPL US TX 409 15099 13 11 America/Chicago 2011-05-14 4841001 Portland Portland Portlend,Портленд 41.57288 -72.64065 P PPL US CT 007 5862 24 27 America/New_York 2011-05-14 4871855 Portland Portland 43.12858 -93.12354 P PPL US IA 033 35 327 330 America/Chicago 2011-05-14 4906524 Portland Portland 41.66253 -89.98012 P PPL US IL 195 0 190 190 America/Chicago 2006-01-15 5006314 Portland Portland Portlend,Портленд 42.8692 -84.90305 P PPL US MI 067 3883 221 223 America/Detroit 2011-05-14 5746545 Portland Portland 45.52345 -122.67621 P PPLA2 US OR 051 583776 12 15 America/Los_Angeles 2011-05-14 Ionia Lyons Toponym # Locations © 2013 Jason M Baldridge Text Analytics Summit, June 2013 28 Portland White Pigeon Ann Arbor Detroit Ionia Lyons Portland White Pigeon 1 >7 >4 >15 >17 1 Friday, September 5, 14
  • 48. Spatial minimality often fails I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. By far, Ashville is more hip, especially West Asheville. Asheville has a lot in common with Portland. Austin, I've never been to so I cannot comment. But what makes a place cool and hip, in my opinion are that give a area "punch". There are a lot of ingredients. One is geography. Add a college or university (and all that they bring- and draw), good restaurants, a good music scene, a progressive attitude and tolerance. Hmmm. I'm sure there are many more to ponder. But that's my start. Oh, lots of bars! From: http://www.city-data.com/forum/austin/1694181-what-makes-city-like-austin-portland-3.html City-data.com incorrectly marks “West” and “Portland” as the cities in Texas -- presumably because of their textual and spatial proximity to “Austin”. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 29 Friday, September 5, 14
  • 49. Spatial minimality often fails I moved from Encinitas, CA, a nice beach town in North San Diego County to Asheville, NC. By far, Ashville is more hip, especially West Asheville. Asheville has a lot in common with Portland. Austin, I've never been to so I cannot comment. But what makes a place cool and hip, in my opinion are that give a area "punch". There are a lot of ingredients. One is geography. Add a college or university (and all that they bring- and draw), good restaurants, a good music scene, a progressive attitude and tolerance. Hmmm. I'm sure there are many more to ponder. But that's my start. Oh, lots of bars! From: http://www.city-data.com/forum/austin/1694181-what-makes-city-like-austin-portland-3.html City-data.com incorrectly marks “West” and “Portland” as the cities in Texas -- presumably because of their textual and spatial proximity to “Austin”. But: it is clear from the text that Portland, Oregon and Austin, Texas are the referents, though their states are never mentioned and are far from the other locations! © 2013 Jason M Baldridge Text Analytics Summit, June 2013 29 Friday, September 5, 14
  • 50. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  • 51. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  • 52. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  • 53. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  • 54. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  • 55. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  • 56. Toponym classifiers Strategy: build a textual classifier per toponym by obtaining indirectly labeled examples from Wikipedia. P(Portland-OR|music) > P(Portland-ME|music) P(Portland-OR|wharf ) < P(Portland-ME|wharf ) © 2013 Jason M Baldridge Text Analytics Summit, June 2013 30 Friday, September 5, 14
  • 57. Results: disambiguating toponyms TR-CoNLL Reuters News Texts August 1996 Perseus Civil War Corpus Books Late 19th Century © 2013 Jason M Baldridge Text Analytics Summit, June 2013 31 Average error distance Accuracy Average error distance Accuracy Population SPIDER (spatial minimality) WISTR (Wiki supervised) SPIDER +WISTR 216 81.0 1749 59.7 2180 30.9 266 57.5 279 82.3 855 69.1 430 81.8 201 85.9 Take-home message: text classifiers are very effective & can be boosted by spatial minimality algorithms. Friday, September 5, 14
  • 58. Identifying, disambiguating, and displaying toponyms © 2013 Jason M Baldridge Text Analytics Summit, June 2013 32 Friday, September 5, 14
  • 59. Back to grounding Grounding often involves connecting text to knowledge sources and other modalities (image, video) & bootstrapping. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 33 Friday, September 5, 14
  • 60. Back to grounding Grounding often involves connecting text to knowledge sources and other modalities (image, video) & bootstrapping. Also, they can help us create models for deeper aspects of language, such as syntactic structure and logical form. © 2013 Jason M Baldridge Text Analytics Summit, June 2013 33 Friday, September 5, 14
  • 61. Lexical brain decoding [Yarkoni, Poldrack, Nichols, Van Essen & Wager (2011)] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 34 Friday, September 5, 14
  • 62. Lexical brain decoding [Yarkoni, Poldrack, Nichols, Van Essen & Wager (2011)] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 34 Friday, September 5, 14
  • 63. He says, she says http://www.tweetolife.com/gender/ Friday, September 5, 14
  • 64. Temporality of words, by hour http://www.tweetolife.com/hour/ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 36 Friday, September 5, 14
  • 65. Temporality of words, by hour http://www.tweetolife.com/hour/ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 36 Friday, September 5, 14
  • 66. Temporality of expressions, by day: http://www.google.com/trends © 2013 Jason M Baldridge Text Analytics Summit, June 2013 37 Friday, September 5, 14
  • 67. Temporality of expressions, by day: http://www.google.com/trends © 2013 Jason M Baldridge Text Analytics Summit, June 2013 37 Friday, September 5, 14
  • 68. Temporality of expressions, by year: http://ngrams.googlelabs.com/ © 2013 Jason M Baldridge Text Analytics Summit, June 2013 38 slave trenches aircraft war Friday, September 5, 14
  • 69. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  • 70. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  • 71. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  • 72. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  • 73. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  • 74. Temporal resolution [Kumar, Lease, and Baldridge 2011] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 39 2000 BC 0 AD 2000 AD 4000 BC Friday, September 5, 14
  • 75. More modalities: videos [Motwani & Mooney, 2012] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 40 Friday, September 5, 14
  • 76. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  • 77. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  • 78. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  • 79. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  • 80. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  • 81. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  • 82. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  • 83. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  • 84. Beyond word co-occurences for vector-space models bear boat car cow hadoop snow water wrench 3 234 42 4 1 2 325 0 © 2013 Jason M Baldridge Text Analytics Summit, June 2013 41 beach Friday, September 5, 14
  • 85. Combining distributional models with logics Erk (2013): “Towards a semantics for distributional representations.” Garrette et al (2012): “A formal approach to linking logical form and vector-space lexical semantics” Beltagy et al (2013): “Montague Meets Markov: Deep Semantics with Probabilistic Logical Form” © 2013 Jason M Baldridge Text Analytics Summit, June 2013 42 Friday, September 5, 14
  • 86. Multi-component structured vector-space models the children visit the beach visit children beach © 2013 Jason M Baldridge Text Analytics Summit, June 2013 43 Agent Patient Friday, September 5, 14
  • 87. Language learning in context [Kim & Mooney, 2013] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 44 Friday, September 5, 14
  • 88. Language learning in context [Kim & Mooney, 2013] © 2013 Jason M Baldridge Text Analytics Summit, June 2013 44 Friday, September 5, 14
  • 89. All your meaning are belong to us Friday, September 5, 14
  • 90. All your meaning are belong to us Friday, September 5, 14
  • 91. All your meaning are belong to us Friday, September 5, 14
  • 93. Open Source Software (Scala/Java) Junto - label propagation https://github.com/scalanlp/junto Textgrounder - document geolocation https://github.com/utcompling/textgrounder Fieldspring - toponym resolution https://github.com/utcompling/fieldspring Low-resource POS tagging https://github.com/dhgarrette/low-resource-pos- tagging-2013 Updown - polarity classification https://github.com/scalanlp/junto OpenNLP - machine learning / NLP http://opennlp.apache.org/ Friday, September 5, 14
  • 94. Open Source Software (Scala/Java) Junto - label propagation https://github.com/scalanlp/junto Textgrounder - document geolocation https://github.com/utcompling/textgrounder Fieldspring - toponym resolution https://github.com/utcompling/fieldspring Low-resource POS tagging https://github.com/dhgarrette/low-resource-pos- tagging-2013 Updown - polarity classification https://github.com/scalanlp/junto OpenNLP - machine learning / NLP http://opennlp.apache.org/ Chalk - NLP https://github.com/scalanlp/chalk Nak - machine learning https://github.com/scalanlp/nak Breeze - linear algebra https://github.com/scalanlp/nak ScalaNLP Friday, September 5, 14
  • 95. Open Source Software (Scala/Java) Junto - label propagation https://github.com/scalanlp/junto Textgrounder - document geolocation https://github.com/utcompling/textgrounder Fieldspring - toponym resolution https://github.com/utcompling/fieldspring Low-resource POS tagging https://github.com/dhgarrette/low-resource-pos- tagging-2013 Updown - polarity classification https://github.com/scalanlp/junto OpenNLP - machine learning / NLP http://opennlp.apache.org/ Chalk - NLP https://github.com/scalanlp/chalk Nak - machine learning https://github.com/scalanlp/nak Breeze - linear algebra https://github.com/scalanlp/nak ScalaNLP Friday, September 5, 14
  • 96. This research was sponsored by: Grant from the Grant: W911NF-10-1-0533 Morris Memorial Trust Fund Final note: Whitman had it right many years ago! - Walt Whitman, A Song of the Rolling Earth (in Leaves of Grass) Friday, September 5, 14
  • 97. Document geolocation Supervision - documents labeled with latitude & longitude Methods - Language Modeling for Information Retrieval Code - Textgrounder: https://github.com/utcompling/textgrounder Publications - Stephen Roller, Mike Speriosu, Sarat Rallapalli, Benjamin Wing and Jason Baldridge. 2012. Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. EMNLP 2012. Jeju, Korea. - Benjamin Wing and Jason Baldridge. 2011. Simple supervised document geolocation with geodesic grids. In Proceedings of ACL HLT 2011. Friday, September 5, 14
  • 98. Toponym resolution Supervision - indirectly acquired toponym annotations using a gazeteer and geo-annotated Wikipedia Methods - logistic regression - named entity recognition Code - Fieldspring: https://github.com/utcompling/fieldspring Publications - Mike Speriosu and Jason Baldridge. Text-Driven Toponym Resolution using Indirect Supervision. To appear in proceedings of ACL 2013. Friday, September 5, 14