This is my experience of going to my first data hackathon, Govhack 2015 and what it taught me.
A Hackathon is an event where you gather a heap of resources and people, form small teams and try to deliver as fully realised solution to a set theme or problem in a short intense amount of time.
Normally a hackathon is focused on delivering working software, but in the case of a data hackathon you work from a heap of datasets and try to deliver something of value, that can be working software, but often is something else. For this reason non coders can participate in a data hack easily.
Another difference is a hackathon normally revolves around creating some sort of business (be that profit or non-profit) idea and validating it.
Data hackathons are about understanding and realising value from data, and that value can often just be delivering better access to the information the data represents.
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Going To A Data Hack - Govhack 2015
1. Going to a Data HackathonOr How I Learned To Stop Worrying About Perfect And Learned To Love Good Enough
Andrew Saul – Gaming Technology
2. What is a data hackathon?
Hackathons:
• Entrepreneurship
• Focus on working code & the commercial viability of product
• Need at least one coder in a team
• Great code gets you a long way
Data hackathon:
• Focus on the datasets & the value from them
• Value = Something people will use (not buy)
• Don’t need any coders
• Need at least one person with understanding of how to use
datasets
• Great code won’t get you very far
3. What is GovHack
• Lots of govt data freely available (open data)
• Lack of awareness and usage
• GovHack aims to:
• Identify issues with datasets & access
• Increase public awareness
• Inspire ventures using govt data (profit and non-profit)
• Run by data.gov.au and Open Knowledge Australia
4. GovHack datasets
• Datasets from federal, state and local govt
• Wide range of data. Some examples
• Traffic offences
• Drinking fountain locations
• Mining exploration expenditure
• ABC Broadcast Data Archive 1978-2011
• Census 2011
• National Drugs Strategy Household Survey
• Landsat 5 and 7 surface reflectance archive
• WWI diary and letter transcripts
Plus many, many, more
5. GovHack prizes
• Lots of prizes from lots of different govt bodies
• Can win more than one prize
• A few general prizes
e.g. Best Digital Transformation Hack
• Most prizes relate to a specific dataset
e.g. The intellectual property data bounty: Must use IP
Government Open Data
6. What do you make?
From GovHack website:
A “hack” can be anything that uses government data in a clever or
creative way. It might be an application, an analysis, a data viz, a
3D printed or laser cut project, a digitisation project, artworks or
anything else that fits the spirit of GovHack.
Examples of 2015 winners
• Story telling/art – Remembrance
• Augmented maps - AusTrails.org
• API driven apps - Health Buddy
• Games - Question Time: A game of policy
• Specialised search tools - Neuron
• Data visualisations. With code (Gender Equality) and
existing tools (Synergising Synergies for Sitizens)
7. Timeline: Before the event
My team before the event:
• Looked at previous winners - Hindsight: too much
• Looked at datasets - Hindsight: not enough
• Posted ideas in a Trello board
8. Timeline: Friday night
1. Opening ceremony and announcement of prizes
2. Venue = Gorgeous, internet = atrocious
3. Listed project ideas we had previously and came up with new
ones as a team.
4. Discussion & voting to cut list down. Our criteria: Working code
Hindsight: Too much traditional hackathon thinking
9. Timeline: Saturday
1. Lost best web dev on Friday to illness
2. Decided on estimation game web app and assigned team roles
3. Plan: Front end in D3.js, Backend (database) in node.js
4. Node.js backend crashed that night.
5. Late on backend saved via Google Fusion tables – dancing ensued
10. Timeline: Sunday
1. Manually populated question dataset
2. Finalised value offering: data visualisation on the results screen &
input method. Discarded a lot of game elements.
Hindsight: Should’ve had this focus all along
3. Couldn’t get player responses writing to Fusion table. Used fake
user responses via random number generator
Hindsight: Should’ve done this as our first iteration
4. Started video
Hindsight: Should’ve started planning this Friday and used it
as a key piece of our value statement
5. Submitted. Team was tired but really proud of what we’d made.
11. Consensus: What it does
• Estimation game based on idea of Wits & Wagers board game
• Players move a slider to guess a statistic and get points for how
close they are to the correct answer
• Players see how their answer compared to other players
12. Consensus: How it’s made
• The frontend (user interface) is made with D3.js.
• Backend (database) is in a Google Drive app called Fusion
Tables.
We couldn’t get the write back from the web app working in
time but Fusion Tables are able to read and write data much as
a normal database. Super easy to use and setup.
13. Consensus: The value it gives
• Originally wanted to make a great game; dataset engagement was
to be a result of playing the game.
• Ended up focusing on how to engage people with data and then
fitted this into a game experience.
• Consensus value proposition:
• The player can see how well they know an area of society
• Players see how their knowledge compares to others
• Govt departments can infer public perceptions & awareness of
areas of society
19. What we won
1st prize State & Local awards:
• Best Use of Science Data on the Queensland Open Data Portal
• BCC Mashup Prize
Highly Commended (4th place) National:
• Best Open Government Data Hack
20. What I learned
I’m better at defining the problem that we need to solve, rather
than fitting a solution to a problem
I’m better at identifying and eliminating scope
Gained a new appreciation of how much you can learn just by
“faking it”
21. What I’d do differently aka “How to win a data hack”
• Understand the datasets: they are your problem
22. What I’d do differently aka “How to win a data hack”
• Focus on a fully thought out solution NOT working code
• Don’t try to fit a solution to datasets
23. What I’d do differently aka “How to win a data hack”
• Decide on your tools in advance. What tools doesn’t matter
nearly as much as that you can all contribute
• Use the video as your main communication tool: Takes the
load off working code
24. More info (read: go to GovHack 2016)
GovHack 2016 on 29-31 of July. Registrations open soon.
Find out more at: www.govhack.org/
Editor's Notes
https://www.govhack.org/
If you can’t read the fine print then you’re missing out on a movie reference that will most likely be lost on those under a certain age.
A Hackathon is an event where you gather a heap of resources and people, form small teams and try to deliver as fully realised solution to a set theme or problem in a short intense amount of time.
Normally a hackathon is focused on delivering working software, but in the case of a data hackathon you work from a heap of datasets and try to deliver something of value, that can be working software, but often is something else. For this reason non coders can participate in a data hack easily.
Another difference is a hackathon normally revolves around creating some sort of business (be that profit or non-profit) idea and validating it.
Data hackathons are about understanding and realising value from data, and that value can often just be delivering better access to the information the data represents.
A push is being made to make govt data open to the public which has resulted in lots of datasets now being available
But to date there has low awareness of these datasets and not much value that has been generated from these resources
GovHack was formed to:
Help the departments clean up their datasets and identify issues that were stopping people using them more frequently
Increase public awareness of these resources
Try to kick-start some for profit and non-profit ventures around these resources so the govt could start to see some return from this initiative
https://www.data.gov.au/ is the home of the overall open data initiative and there are state based sites such as QLD’s https://data.qld.gov.au/
Open Knowledge Australia is the Australian arm of the global Open Knowledge network which is a non-profit that supports the open data movement all over the world. They are who runs GovHack. http://au.okfn.org/
Only a subsection of the datasets available are selected for GovHack each year. That said there was well over a hundred for GovHack 2015 (https://www.govhack.org/2015-data/).
There’s huge range of data in both source and composition.
Data from all levels of govt bodies
Wide range of types of datasets. Some examples are: sensor readings, satellite imagery, questionnaires, letters, images, video, financial records, GPS co-ordinates, etc.
Data available many different ways: APIs, web portals, simple flat files, queries
Lots of prizes are on offer. Can win more than one prize for each entry.
Best Digital Transformation Hack: Open category for how government can be brought into the 21st Century via digital services.
The intellectual property data bounty: Develop an easy way for non-experts to access and use the IP Government Open Data on data.gov.au to find out where, who and what IP exists in Australia.
Find the winners from 2015 here: https://www.govhack.org/2015-winners/
Remembrance: A website that retells the ANZAC story, both at the front and at home. https://hackerspace.govhack.org/content/remembrance
AusTrails.org: Aggregator of trail data from community that’s run on OpenStreetMap. https://hackerspace.govhack.org/content/austrailsorg
Health Buddy: Helps you find health services matched to your needs and plan trips. Govt departments can use it to see usage stats. https://hackerspace.govhack.org/content/health-buddy
Question Time: A game of policy: 2 player game where you try to guess where an MP voting record lies in relation to a portfolio area. https://hackerspace.govhack.org/content/question-time-game-policy
Neuron: Mind-map of inter-connected IP data to aid discovery. Links to a purpose built IP social network. https://hackerspace.govhack.org/content/neuron-connecting-minds
Gender Equality: Data visualisation of gender equality data from the Workplace Gender Equality Act dataset using D3. https://hackerspace.govhack.org/node/542
Synergising Synergies for Sitizens: Tool in Tableau Public to work out which suburbs are best-placed for rooftop solar investment. https://hackerspace.govhack.org/content/synergising-synergies-sitizens
Do: Get familiar with the datasets and thinking about what tools you’ll use
Don’t: Start building something to take in with you. You are painting yourself into a corner to start with.
Before the event you should also make triple sure that whatever you are taking in as goes tech is working well and has all the updates you need. A couple of our team spent far too long getting their laptops working due to these very reasons.
Whilst you shouldn’t start building something before you go in, at least have a conversation about common tools. You should get setup up with at least a common messaging service and project management tool. We used Skype and Trello, but again wasted some time getting these setup at the event.
Just because you might not have decided yet what you are going to build doesn’t mean you don’t have at least a shortlist of development tools you might use. It’s a good idea to install all of these and get them working before you head in. Much easier to just not use something than it is to install and configure something for the first time at the event.
The UQ engineering block is a cathedral of wood, steel and glass. The UQ wifi is by contrast a poorly built mud hut. We struggled a lot with it over the course of the event. I ended up only being able to get it working on my phone, which I then had to tether to my laptop.
Consider backup internet options you could use before you go in and try to limit how much you will rely upon the internet at the venue.
Our main criteria was getting working code, and this was a mistake. This is more traditional hackathon thinking and didn’t fit well with the idea behind Govhack. I’ll talk more about this later on in the presentation.
Our best web developer was struck down by sickness on Friday night. We had originally started selecting ideas based on her considerable skills. In hindsight we didn’t readjust enough to take into account losing her from the team.
We ended up with two ideas:
an estimation game (based on the board game Wits & Wagers). This is an idea we’d had before the event
A map based tool to help you assess the safety of a neighbourhood. This is an idea we come up with on Friday night.
To decide we quickly sketched out a dev plan for each to see which was most achievable. We looked at:
time we had
Skill in the team
what the datasets we would use were like and what prizes we’d be eligible for
In the end the estimation game won out and we went to work.
Node.js backend crashed and burned because we’d selected it based on the skills of the original team. Developer working on it was toast so we sent him home. Rest of the team regrouped to think about what we could salvage from the wreckage. After some research we came across Google Fusion Tables. After an hour of hacking about our remaining dev raised his hands an announced it was working. We danced a bit then called it a day as it was really late and we were really tired.
One of the very best moments at Govhack was when our dejected dev, who’d been working on the node.js backend, came through the door and we surprised him with a working website. Even though it wasn’t his fault he had felt like he’d let the team down, so he was extra happy to see we were back in the game.
The whole team had been given a shot of energy from this brush with disaster and it carried us through the rest of the day.
We discarded many of the game elements we’d originally planned and focused instead on delivering value from the data. In hindsight this should’ve always been the focus as it was true to the data we had.
We couldn’t get the player responses to write back our Fusion Table database, even though that functionality does exist, so we used fake responses data. In hindsight it was way too optimistic to think we were going to have time to get 100’s of players through the game before submission, and not having response data earlier held up designing the results screen. Even though it was fake data we still needed to tweak it to make believable and have it produce interesting results. Even this took a little more time than we had anticipated.
We started the video late as we’d been told by the organisers not to focus on it. In reality you should start planning it Friday. It’s one of your best tools to explain what you’ve built and the better you can make it the less you have to have working code.
How to play Wits & Wagers: https://www.youtube.com/watch?v=6FbGRahrAhI
Our entry
Hackerspace entry: http://2015.hackerspace.govhack.org/content/consensus
Website (which is now broken): http://www.consensusquiz.com/
Video: https://www.youtube.com/watch?v=Q994vWVLNRM
Google Fusion tables: https://sites.google.com/site/fusiontablestalks/stories
Data Driven Documents (D3.js): https://d3js.org/
D3.js is a really flexible and easy to use data visualisation tool for the web. It’s great at one thing: bespoke data vizs. It can’t be relied upon for dashboard like functionality though.
Google Fusion Tables are even easier to setup and use, and many people use them to host the data from their web vizs for this reason. Now that Google have expanded their data offerings you have a wide range of free, robust, and easy to use options:
Google big data tools: https://cloud.google.com/products/big-data/
Google cloud data tools: https://cloud.google.com/products/storage/
There’s also a heap of other tools that you can use to get a prototype working fast. A good place to start is the resources at the The GovHack Developer Toolkit: http://govhack-toolkit.readthedocs.io/
We had originally wanted to make a great game that used datasets in an interesting way. The engagement with those datasets, we thought, would come as a result of people enjoying the game. We had this the wrong way around.
We got a much better result when we focused on how to engage people with the datasets in a way where they could understand them better, and we added game elements as a way to facilitate this process. We created another layer of data (in this case player estimations) that could help explain more fully an existing statistic.
Consensus helps players see how well their own perceptions of areas of society line up with the reality. They also get to see how their perceptions relate to other players.
For govt this is at tool they can use to see how the general public perceive areas of society.
Slider: We used a slider for these reasons:
You don’t have to come up with answers, so you can add questions much faster
You get a much more accurate measure of what people’s understanding of a statistic is. It’s not either they get the right answer or not, you get to peek inside their perceptions of a statistic to see what they think about a subject
A slider feels much closer to the feeling of estimating than selecting a value or typing one in. We still have a number come up so it’s easy to see what you have selected.
We debated about the slider start and end and the starting position of the selector on the range. All these things can bias a players answer; especially if they are particularly unsure of the answer. In the end we had two types of ranges: whole numbers and 0-100%. The whole number range goes super high, and if we’d had time we wanted instead to hide everything but the selector and have values come up down the bottom. This way we could have different ranges, but they would’ve been hidden from the player so as to not bias them.
Your score: Originally this was to be the most important part of the game. In the revised version it plays a much smaller role. Your score relates to how many people you were closer to the answer than.
“How close your answer was” bar: In the results screen we double up on the information about how close you were (the same info is in the histogram at the bottom), but we do so because how close you were is really important to visualise for the player so they start to get an idea of how their knowledge relates to the actual statistic.
It’s a really simple visualisation with a bar which is the same width as the histogram at the bottom, but it strips away all the other data and just gives the distance to the player so they can clearly see it. To reinforce this we also added a label to qualify how close they got to the answer.
Question scrolling text: As we were running out of screen real estate, and some of the questions were quite long, we put the question into a scrolling text box. It’s a hacky solution, but it gives a “breaking news” feel to the results screen.
The histogram: We have already shown the player how close they were to the correct answer and we show that again in the histogram but now we add in all the other player answers to the mix.
Here we want to the player to focus on where their answer falls in relation to all the other answers. We help them by highlighting what range their answer falls within either side of the correct answer and as a result we also highlight all those that fell outside that range.
We tell the player how many players they did better than. We added a random descriptive stat to give another insight into the histogram they are looking at.
We were impressed that many of the people we got to try the game after the event, even though we told them it was faked response data beforehand, ended up thinking it was real responses when they got to the results screen. That really validated for us the design decisions we had taken.
If we’d have more time we would’ve liked to have had the option for players to see how their answers faired across their region, state, and nationally. We’d also have liked to have added separate categories for players to estimate questions in.
Here’s Dale and I receiving the local and state awards.
Here all the winners: https://www.govhack.org/2015-winners/
I got a much better understanding of what defining a problem is and how to identify and eliminate scope.
I gained a much better appreciation of how far you can get “faking it” before you have to actually get something working.
How Paypal and Reddit faked their way to traction
https://medium.com/platform-thinking/how-paypal-and-reddit-faked-their-way-to-traction-9411fb583205#.y8yyoli0d
You’re there to realise value from the datasets the govt has. They are valuable data.
If you understand the value intrinsic to the datasets you then can much more easily match this to a data product which is valuable to someone else.
Also what a dataset is like heavily influences what you need to know to be able work with it. Ruling out datasets is more important than selecting ones you’d like to work with. There is no chance you’ll not have at least a couple of dataset you can work with.
Don’t be fooled that you won’t be able to think of anything. You’ll have more than enough ideas that will be achievable. Better you spend the time finding the solution that delivers meaning value, then working out a way to quickly prototype it than searching through 100’s of datasets trying to find something that fits into what you have already built or decided to build.
The real fun is finding a way to hack together a great solution and that’s where you use your creativity.
You’re not deploying production code so just decide on tools everyone can contribute with and get setup before the event. Too much time is wasted trying to pick the “perfect” tool to use that can be used coming up with a kick-ass solution.
If you don’t have to have it working perfectly that opens up heaps of options for you to hack together something to show the functionality via the video. The video helps you be more creative and innovative and attempt more ambitious solutions.
https://www.youtube.com/watch?v=Q994vWVLNRM