I've (probably) been using Google App Engine for a week longer than you have
1. I’ve (probably) been using
Google App Engine
for a week longer than you have
Simon Willison - http://simonwillison.net/
BarCamp London 4
31st May 2008
2. Except you have to re-write
your whole application
If you totally rethink the
way you use a database
3. What it can do
• Serve static files
• Serve dynamic requests
• Store data
• Call web services (sort of)
• Authenticate against Google’s user database
• Send e-mail, process images, use memcache
4. The dev environment
is really, really nice
• Download the (open source) SDK
• a full simulation of the App Engine environment
• dev_appserver.py myapp for a local webserver
• appcfg.py update myapp to deploy to the cloud
5. Options
• You have to use Python
• You can choose how you use it:
• CGI-style scripts
• WSGI applications
• Google’s webapp framework
• Django (0.96 provided, or install your own)
7. With webapp and WSGI
import wsgiref.handlers
from google.appengine.ext import webapp
class MainPage(webapp.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/html'
self.response.out.write('Hello, webapp World!')
def main():
application = webapp.WSGIApplication(
[('/', MainPage)], debug=True)
wsgiref.handlers.CGIHandler().run(application)
if __name__ == quot;__main__quot;:
main()
8. With Django
from django.conf.urls.defaults import *
from django.http import HttpResponse
def hello(request):
return HttpResponse(quot;Hello, World!quot;)
urlpatterns = patterns('',
('^$', hello),
)
(And django_dispatch.py for boilerplate)
9. • Don't use CGI: it requires reloading for every hit
• Why use Django over webapp?
• Django has easy cookies and custom 500 errors
• Django is less verbose
• Django middleware is really handy
• You can use other WSGI frameworks if you like
12. “Bigtable is a distributed storage system for
managing structured data that is designed
to scale to a very large size: petabytes of
data across thousands of commodity
servers. Many projects at Google store data
in Bigtable, including web indexing, Google
Earth, and Google Finance.”
13. The App Engine datastore
• Apparently based on BigTable
• Absolutely not a relational database
• No joins (they do have “reference fields”)
• No aggregate queries - not even count()!
• Hierarchy affects sharding and transactions
• All queries must run against an existing index
14. Models and entities
• Data is stored as entities
• Entities have properties - key/value pairs
• An entity has a unique key
• Entities live in a hierarchy, and siblings exist in
the same entity group - these are actually really
important for transactions and performance
• A model is kind of like a class; it lets you define
a type of entity
15. AppEngine Models
from google.appengine.ext import db
class Account(db.Model):
slug = db.StringProperty(required=True)
owner = db.UserProperty()
onlyme = db.BooleanProperty()
referrers = db.StringListProperty()
(There is a ReferenceProperty, but I haven’t used it yet)
18. BUT...
• All queries must run against an existing index
• Filtering or sorting on a property requires that
the property exists
• Inequality filters are allowed on one property only
• Properties in inequality filters must be sorted
before other sort orders
• ... and various other rules
• Thankfully the dev server creates most indexes
for you automatically based on usage
19. How indexes are used
1. The datastore identifies the index that
corresponds with the query’s kind, filter
properties, filter operators, and sort orders.
2. The datastore starts scanning the index at the
first entity that meets all of the filter conditions
using the query’s filter values.
3. The datastore continues to scan the index,
returning each entity, until it finds the next entity
that does not meet the filter conditions, or until
it reaches the end of the index.
20. Further limitations
• If you create a new index and push it live,
you have to wait for it to rebuilt
• This can take hours, and apparently can go
wrong
• You can’t safely grab more than about 500
records at once - App Engine times out
• You can’t delete in bulk
21. Other random notes
• You have to use the URL Fetch API to do
HTTP requests (e.g. for web services) - and it
times out agressively at about 5 seconds
• The Google accounts Users API is ridiculously
easy to use, but...
• no permanent unique identifier; if the user
changes their e-mail address you’re screwed
• The new image and memcache APIs are neat
22. Final thoughts
• It’s really nice not to have to worry about hosting
• But... the lack of aggregate queries and ad-hoc
queries really hurts
• Perfect for small projects you don’t want to
worry about and big things which you’re sure will
have to scale
• Pricing is comparable to S3 - i.e. stupidly cheap
23. Pricing
• $0.10 - $0.12 per CPU core-hour
• $0.15 - $0.18 per GB-month of storage
• $0.11 - $0.13 per GB outgoing bandwidth
• $0.09 - $0.11 per GB incoming bandwidth