37. 1 Alabama Male 2009 Percent 0.51
2 Alabama Female 2009 Percent 0.49
3 Alabama Total less than 18 2009 Percent 1.00
4 Alaska Male 2009 Percent 0.52
5 Alaska Female 2009 Percent 0.48
6 Alaska Total less than 18 2009 Percent 1.00
7 Arizona Male 2009 Percent 0.51
8 Arizona Female 2009 Percent 0.49
9 Arizona Total less than 18 2009 Percent 1.00
10 Arkansas Male 2009 Percent 0.51
11 Arkansas Female 2009 Percent 0.49
12 Arkansas Total less than 18 2009 Percent 1.00
13 California Male 2009 Percent 0.51
14 California Female 2009 Percent 0.49
15 California Total less than 18 2009 Percent 1.00
16 Colorado Male 2009 Percent 0.51
17 Colorado Female 2009 Percent 0.49
18 Colorado Total less than 18 2009 Percent 1.00
19 Connecticut Male 2009 Percent 0.51
20 Connecticut Female 2009 Percent 0.49
21 Connecticut Total less than 18 2009 Percent 1.00
22 Delaware Male 2009 Percent 0.51
23 Delaware Female 2009 Percent 0.49
24 Delaware Total less than 18 2009 Percent 1.00
25 District of Columbia Male 2009 Percent 0.51
26 District of Columbia Female 2009 Percent 0.49
38.
39. 1 Alabama Male 2009 Percent 0.51
2 Alabama Female 2009 Percent 0.49
3 Alabama Total less than 18 2009 Percent 1.00
4 Alaska Male 2009 Percent 0.52
5 Alaska Female 2009 Percent 0.48
6 Alaska Total less than 18 2009 Percent 1.00
7 Arizona Male 2009 Percent 0.51
8 Arizona Female 2009 Percent 0.49
9 Arizona Total less than 18 2009 Percent 1.00
10 Arkansas Male 2009 Percent 0.51
11 Arkansas Female 2009 Percent 0.49
12 Arkansas Total less than 18 2009 Percent 1.00
13 California Male 2009 Percent 0.51
14 California Female 2009 Percent 0.49
15 California Total less than 18 2009 Percent 1.00
16 Colorado Male 2009 Percent 0.51
17 Colorado Female 2009 Percent 0.49
18 Colorado Total less than 18 2009 Percent 1.00
19 Connecticut Male 2009 Percent 0.51
20 Connecticut Female 2009 Percent 0.49
21 Connecticut Total less than 18 2009 Percent 1.00
22 Delaware Male 2009 Percent 0.51
23 Delaware Female 2009 Percent 0.49
24 Delaware Total less than 18 2009 Percent 1.00
25 District of Columbia Male 2009 Percent 0.51
26 District of Columbia Female 2009 Percent 0.49
Editor's Notes
Big data for the greater good. To start, I’d like you all to help me with a little exercise - I want you to close your eyes and just picture “data”. That’s right, go ahead, let it wash over you - what do you picture? Well, if you’re like most people, you probably picture
boring, lifeless, spreadsheets
or, even worse, the anonymous tunnel of binary, like we all read data like we’re int the Matrix or something . And this is sad, I hate to admit that when I hear “data” or “big data” that’s what I picture too, and it’s a shame because data is so much more personal than that. And it doesn’t take much to realize how much big data has touched our lives than to think back to a time before big data, to a time in the dark ages of humanity, back when renting a movie
looked like this. Oh god, it was horrible! You had no idea what movies were good or bad, you didn’t know what you’d like or not, and it was a grim, depressing time for all humanity.
but thankfully now we live in a world where, with the click of a button, I can see a whole array of movies tailored for me, I know what’s good and bad, I can make better decisions about how to spend my Friday night. And this is all because of the data that Netflix collects about what I like, what you like, and uses it to make better decision. And it’s not just movies, big data is driving
how we get from Point A to point B
how we decide what products to buy
The CDC is constantly on the lookout for the next killer flu that’s going to wipe humanity off the face of the earth. Not a good thing. To do this, they place thousands of people in hospitals around the world monitoring cases coming in and out to make sure they are ready if they see the first sign of an outbreak.
The trouble is, this is hugely expensive and requires a lot of people. So the CDC asked, “man, is there a better easier way to get this realtime data about when people have the flu?”
Enter our friend Kim. whatever you think of her talents, she is quite good at telling when she has the flu. And it turns out that other people on Twitter are too, and that means that if you monitor all of the billions of tweets coming in every minute and all of the streams of people talking about everything from Breaking News to Breaking Bad...
you can make this very-boring-but-very important graph, which shows that tweets about the flu lead actual CDC-measured levels of flu outbreaks. That’s incredible. And if you go further,
And if you get fancy, you can even visualize this conversation to see where and when the flu is happening in realtime. Yes, this is how novel sources of data, that are right out there for the picking, can be used to help tackle real world problems. But the real heroes of that story to me aren’t the CDC or Kim, no,
Why it’s the data scientists! The data engineers! Call them statisticians, analysts, whatever you call them, they’re the people who turn large streams of “big data” into decisions. And the coolest part, the truly coolest part to me, is that they don’t work on data 9-5, they get together and do this in their free time at “hackathons” and coding competitions.
I remember being at my first hackathon and thinking this is how we’re going to change the world - I’m sitting next to a guy with a Ph.D. in machine learning, another guy with incredible coding skills, I thought “this is how we’re going to change things - we don’t need our jobs to do this - we’re going to make things that are so important, so world-changing, they’re going to have so much impact! And the things we made were so!
Unfulfilling! Here’s an app to park your car, another to find local deals. These apps are great, but they’re more of the same - apps that make very comfortable lives *ever so slightly* more comfortable. And one of the most exciting things to me about the “big” in big data is that it means “expansive” and that it’s touching everyone, including people like this
clean water NGO. These guys are trying to make the world better every day and, for the first time ever, they are awash in data. Data about surveys they do, about well locations, data about their finances - heck, even if they didn’t collect a single bit of data, groups like the World Bank and the White House are opening data that they could use. So they have this great opportunity to use this new resource of data, just like in the CDC example...
...but they have no one to help them do it. They can’t afford a data scientist, so all that potential gets lost.
So I founded a non-profit called DataKind that connects pro bono data scientists with social organizations. This gives data scientists a chance to have social impact, social organizations a chance to maximize their impact and in the process, we all get to live in a better world. Let me show you a few examples of what this looks like before I close:
First up is a group called DC Action for Children in Washington, DC. They are an amazing group that is tasked with assessing the wellbeing of children in the city. For the first time ever, the government has opened up access to huge amounts of data, data about education, data about social services, data about health. At first it sounds like they’re in a great situation - they’ve got a great question, they’ve got great data, until you realize the data looks like this:
And this is just one little snippet of tables and tables of information they have to make sense of. Yuk! How does a social organization without data skills turn this into something meaningful, something the world finds meaningful?
Well they teamed up with these guys - Sisi Wei and Emily Chow from the Washington Post, Max Richman, and a slew of other pro bono data scientists in DC . Together, they turned this
into this. This is a screenshot of an interactive app where you can rollover each neighborhood to see demographic info, educational information and a slew of other info that DC Action is using to better get a handle on neighborhood health. What’s immediate apparent is that you can *see* things - things you could never see in that opaque block of data. Moreover this tool is so useful, the mayor of DC unveiled it to say “this is how we have to tackle child wellbeing - using big data.”
Now I know what some of you may be thinking “pssht, you put took some data and put it on a map? I could do that.” And you’re absolutely right, you could! What’s so simple for us to do can be transformative to these organizations.
Another example comes from the NYC Parks department. They have info about every single tree in the “urban forest” - such a beautiful name for the city’s trees - but they didn’t know the answers to even the most basic questions, namely, when we prune a tree to prevent future disasters, does it help? We suspect it does, but can we use our data to show it?
So they teamed up with this guy, Brian D’Alessandro, who spends all day at an advertising company figuring out, if he shows you an ad, do you do something different in the future? Well it turns out that that same model and algorithm can be used to answer “if I prune a tree, are there fewer disasters in the future?”
What you’re seeing here is a visualization that volunteers made of the density of trees in NY, but what’s more important than this is that Brian helped them calculate a number - 32%. If you prune trees in NY, there are 32% fewer emergencies on those blocks than on the blocks that you didn’t. That was the first time that number had been calculated, it’s helping the city become more effective, and now other urban forestry programs from other cities are asking New York, how can we do the same thing? that’s fantastic.
And I want you to note one last important thing - I didn’t say the White House commissioned Google to work with UNICEF to do this. Nope. Volunteers, in their spare time on nights and weekends, worked with visionary non-profits and social organizations to do this.