Making earth observation data available in the cloud is accelerating scientific discovery and enabling the creation of new products. Attend and learn how the cloud lets earth scientists, researchers, startups, and GIS professionals gather and analyze earth observation data without worrying about limitations of bandwidth, storage, memory, or processing power. Join us and learn how earth science data projects are becoming more scalable, agile, and efficient with AWS on-demand IT infrastructure.
2. Why does AWS care about open data?
Many of our commercial sector customers rely on quality open data as much as they
rely on our cloud infrastructure services.
Many of our public sector customers use AWS to make their data available to a global
community of researchers, entrepreneurs, students, and fellow government agencies.
Sharing data on AWS makes it accessible to a large and growing community
of researchers, entrepreneurs, and enterprises who use the AWS Cloud.
3. Traditional data acquisition
“…data must be organized, well-documented, consistently formatted, and error free.
Cleaning the data is often the most taxing part of data science, and is frequently 80% of
the work.”
— Data Driven by DJ Patil and Hilary Mason
Tape Data center Disk Server Client
The big data challenge
Traditionally, it has been time-consuming and expensive to acquire, store, and analyze
large data sets.
4. Data acquisition in the cloud
Our solution – shared open data on AWS
When data is staged for analysis in the cloud, anyone can analyze it without needing to
download it or store it themselves.
“Ordinarily, hitting ‘copy’ on a 4 gigabyte file is an opportunity to stand up and get a fresh
cup of coffee, browse the sports section for a little while, but moving data between servers
in an Amazon data center barely affords time to touch your toes a couple times.”
— Paul Ramsey Source: http://s3.cleverelephant.ca.s3.amazonaws.com/2015-ccog.pdf
5. Landsat on AWS: usage
In the first year:
Over 400,000 scenes available
Over 1 billion hits globally
Used for new product development by:
Colin Reilly
Senior Director GIS
NYC Department of IT & Telecom
12. Opening up AQ empowers the public
Low-cost
sensors/
satellites
Public health policy +
research
Apps + local
activism
Media
Climate + AQ
research
Open,
transparent
data layer
Real-time
air quality
data on
public sites
13. Scaling up in the first year
• Data: PM10, PM2.5, BC, O3, CO, NO2, SO2
• ~13 million data points at 2000+ sites in 20 countries
Utilizing Amazon S3, Amazon EC2, Amazon ECS, Amazon
ElastiCache, Amazon RDS, Amazon CloudFront, Amazon
Route 53, AWS Lambda, and SSL via AGU and AWS
14. OpenAQ’s global grassroots community
• 10 core contributors on 4 continents
• ~500,000 API requests/month
• Platform accessed by ~100 research orgs
Nov. 2015: Ulaanbaatar
workshop
Upcoming in Fall 2016:
Jakarta
workshop
15. Thank you to our partners and sponsors:
• American Geophysical Union’s Thriving Earth Exchange providing
AWS credit awards
• Development Seed
• Echoing Green
• Open Science Prize: HHMI, NIH and Welcome Trust
• Internews + Earth Journalism Network
• Keen.io
• And the ENTIRE OpenAQ community!
17. Weather is both a feature we provide, and an input to our agronomic models
http://www.climate.c
om/
The Climate Corporation (TCC) provides
decision-making tools for farmers
18. NOAA Big Data Project
Transform the Department of Commerce data capacity to
enhance the value, accessibility and usability of Commerce
data for government, business, and the public.
19. Large-scale analysis was prone to errors
Request & Wait
& Download
TCC Amazon S3 Process
Research: Analyses & Evaluations
20. Data and processing in AWS, reduces errors
Process
Research: Analyses & Evaluations
AWS NEXRAD S3
21. Everyone wins
TCC projects are several weeks shorter.
TCC evaluations of new methods happen on larger datasets.
We don’t pay Amazon for the S3 bucket to store NEXRAD data.
Instead, we pay Amazon for the EC2 instances to process the
larger dataset.
NOAA data is used more widely, but without overwhelming NCEI.
TCC/AWS found a long-standing problem in NOAA archive,
improving data quality.
2
* Not an issue for one city or country, global issue
*Be sure to point out each dot is a country…or possibly change to illustrate
* Outsized impact of a little open data
~5 million data points a day out there and lost…
* Amazing things that could be done with these data!!
To date!
Dark blue we have complete coverage.
Light blue only partial coverage.
Our community is awesome!
Level II data in process of being moved (+135 TB to date) from National Centers for Environmental Information (NCEI) archive in Asheville, NC for use in collaborators’ pilots.
Only AWS has an offering, because of the partnership with TCC.
Hierarchical Data Storage System (HDSS) Access System