SlideShare a Scribd company logo
1 of 51
Advanced Django ORM
     techniques
 Daniel Roseman   http://blog.roseman.org.uk
About Me
• Python user for five years
• Discovered Django four years ago
• Worked full-time with Python/Django since
  2008.
• Top Django answerer on StackOverflow!
• Occasionally blog on Django, concentrating
  on efficient use of the ORM.
Contents

• Behind the scenes: models and fields
• How model relationships work
• More efficient relationships
• Other optimising techniques
Django ORM
efficiency: a story
414 queries!
How can you stop this
 happening to you?


             http://www.flickr.com/photos/m0n0/4479450696
Behind the scenes:
models and fields


               http://www.flickr.com/photos/spacesuitcatalyst/847530840
Defining a model

• Model structure initialised via metaclass
• Called when model is first defined
• Resulting model class stored in cache to
  use when instantiated
Fields

• Fields have contribute_to_class
• Adds methods, eg get_FOO_display()
• Enables use of descriptors for field access
Model metadata

•   Model._meta

•   .fields

•   .get_field(fieldname)

•   .get_all_related_objects()
Model instantiation

• Instance is populated from database initially
• Has no subsequent relationship with db
  until save
• No identity between models
Querysets
• Model=manager returns a queryset:
  foos Foo.objects.all()

• Queryset is an ordered list of instances
  of a single model
• No database access yet
• Slice: foos[0]
• Iterate: {% for foo in foos %}
Where do all those
  queries come from?
• Repeated queries
• Lack of caching
• Relational lookup
• Templates as well as views
Repeated queries
    def get_absolute_url(self):
      return "%s/%s" % (
         self.category.slug,
         self.slug
      )


    Same category, but query is
    repeated for each article
Repeated queries
• Same link on every
  page

• Dynamic, so can't
  go in urlconf

• Could be cached
  or memoized
Relationships




        http://www.flickr.com/photos/katietegtmeyer/124315322
Relational lookups

• Forwards:
  foo.bar.field



• Backwards:
  bar.foo_set.all()
Example models
class Foo(models.Model):
 name = models.CharField(max_length=10)


class Bar(models.Model):
 name = models.CharField(max_length=10)
 foo = models.ForeignKey(Foo)
Forwards relationship

>>> bar = Bar.objects.all()[0]
>>> bar.__dict__
{'id': 1, 'foo_id': 1, 'name': u'item1'}
Forwards relationship
>>> bar.foo.name
u'item1'
>>> bar.__dict__
{'_foo_cache': <Foo: Foo object>, 'id': 1,
'foo_id': 1, 'name': u'item1'}
Fowards relationships
• Relational access implemented via a
    descriptor:
    django.db.models.fields.related.
    SingleRelatedObjectDescriptor

•   __get__ tries to access _foo_cache

• If doesn't exist, does lookup and creates
    cache
select_related
• Automatically follows foreign keys in SQL
  query
• Prepopulates _foo_cache
• Doesn't follow null=True relationships by
  default
• Makes query more expensive, so be sure
  you need it
Backwards relationships
{% for foo in my_foos %}
 {% for bar in foo.bar_set.all %}
  {{ bar.name }}
 {% endfor %}
{% endfor %}
Backwards relationships
• One query per foo
• If you iterate over foo_set again, you
  generate a new set of db hits
• No _foo_cache
• select_related does not work here
Optimising backwards
    relationships

• Get all related objects at once
• Sort by ID of parent object
• Then cache in hidden attribute as with
  select_related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
qs = Foo.objects.filter(criteria=whatever)
obj_dict = dict([(obj.id, obj)
           for obj in qs])
objects = Bar.objects.filter(foo__in=qs)
relation_dict = {}
for obj in objects:
 relation_dict.setdefault(
          obj.foo_id, []).append(obj)
for id, related in relation_dict.items():
 obj_dict[id]._related = related
Optimising backwards
[{'time': '0.000', 'sql': u'SELECT
"foobar_foo"."id", "foobar_foo"."name" FROM
"foobar_foo"'},
{'time': '0.000', 'sql': u'SELECT
"foobar_bar"."id", "foobar_bar"."name",
"foobar_bar"."foo_id" FROM "foobar_bar"
WHERE "foobar_bar"."foo_id" IN (SELECT
U0."id" FROM "foobar_foo" U0)'}]
Optimising backwards

• Still quite expensive, as can mean large
  dependent subquery – MySQL in particular
  very bad at these
• But now just two queries instead of n
• Not automatic – need to remember to use
  _related_items attribute
Generic relations
• Foreign key to ContentType, object_id
• Descriptor to enable direct access
• iterating through creates n+m
  queries(n=number of source objects,
  m=number of different content types)
• ContentType objects automatically cached
• Forwards relationship creates _foo_cache
• but select_related doesn't work
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
generics = {}
for item in queryset:
  generics.setdefault(item.content_type_id,
                 
 set()).add(item.object_id)
content_types = ContentType.objects.in_bulk(
                 generics.keys())
relations = {}
for ct, fk_list in generics.items():
 ct_model = content_types[ct].model_class()
 relations[ct] = ct_model.objects.
           in_bulk(list(fk_list))
for item in queryset:
 setattr(item, '_content_object_cache',
    relations[content_type_id][item.object_id]
 )
Other optimising
  techniques
Memoizing
• Cache property on first access
• Can cache within instance, if multiple
  accesses within same request
def get_expensive_items(self):
 if not hasattr(self, '_cache'):
  self._cache = self.expensive_op()
 return self._cache
DB Indexes

• Pay attention to slow query log and
  debug toolbar output
• Add extra indexes where necessary -
  especially for multiple-column lookup
• Use EXPLAIN
Outsourcing

• Does all the logic need to go in the web
  app?
• Services - via eg Piston
• Message queues
• Distributed tasks, eg Celery
Summary

• Understand where queries are coming
  from
• Optimise where necessary, within Django
  or in the database
• and...
PROFILE
Daniel Roseman

http://blog.roseman.org.uk

More Related Content

What's hot

A Basic Django Introduction
A Basic Django IntroductionA Basic Django Introduction
A Basic Django Introduction
Ganga Ram
 
Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type Classes
John De Goes
 

What's hot (20)

SwiftUI and Combine All the Things
SwiftUI and Combine All the ThingsSwiftUI and Combine All the Things
SwiftUI and Combine All the Things
 
jQuery for beginners
jQuery for beginnersjQuery for beginners
jQuery for beginners
 
Introduction to angular with a simple but complete project
Introduction to angular with a simple but complete projectIntroduction to angular with a simple but complete project
Introduction to angular with a simple but complete project
 
Multiplatform architecture ribs in swift
Multiplatform architecture ribs in swiftMultiplatform architecture ribs in swift
Multiplatform architecture ribs in swift
 
laravel.pptx
laravel.pptxlaravel.pptx
laravel.pptx
 
A Basic Django Introduction
A Basic Django IntroductionA Basic Django Introduction
A Basic Django Introduction
 
Asp.net mvc basic introduction
Asp.net mvc basic introductionAsp.net mvc basic introduction
Asp.net mvc basic introduction
 
JavaScript Fetch API
JavaScript Fetch APIJavaScript Fetch API
JavaScript Fetch API
 
Angular 9
Angular 9 Angular 9
Angular 9
 
Web development
Web developmentWeb development
Web development
 
Python/Flask Presentation
Python/Flask PresentationPython/Flask Presentation
Python/Flask Presentation
 
Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type Classes
 
Entity Framework Core
Entity Framework CoreEntity Framework Core
Entity Framework Core
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Advanced Javascript
Advanced JavascriptAdvanced Javascript
Advanced Javascript
 
Angular introduction students
Angular introduction studentsAngular introduction students
Angular introduction students
 
What’s New in Angular 14?
What’s New in Angular 14?What’s New in Angular 14?
What’s New in Angular 14?
 
Json
JsonJson
Json
 
Web application framework
Web application frameworkWeb application framework
Web application framework
 
Building blocks of Angular
Building blocks of AngularBuilding blocks of Angular
Building blocks of Angular
 

Viewers also liked

Basic Django ORM
Basic Django ORMBasic Django ORM
Basic Django ORM
Ayun Park
 
Tabela de números romanos
Tabela de números romanosTabela de números romanos
Tabela de números romanos
Dann Senda
 
Top 10 senior technical architect interview questions and answers
Top 10 senior technical architect interview questions and answersTop 10 senior technical architect interview questions and answers
Top 10 senior technical architect interview questions and answers
tonychoper5406
 

Viewers also liked (19)

Advanced Django
Advanced DjangoAdvanced Django
Advanced Django
 
What's new in Django 1.7
What's new in Django 1.7What's new in Django 1.7
What's new in Django 1.7
 
Django orm-tips
Django orm-tipsDjango orm-tips
Django orm-tips
 
Basic Django ORM
Basic Django ORMBasic Django ORM
Basic Django ORM
 
Django ORM
Django ORMDjango ORM
Django ORM
 
Introduction to Django REST Framework, an easy way to build REST framework in...
Introduction to Django REST Framework, an easy way to build REST framework in...Introduction to Django REST Framework, an easy way to build REST framework in...
Introduction to Django REST Framework, an easy way to build REST framework in...
 
Django: Advanced Models
Django: Advanced ModelsDjango: Advanced Models
Django: Advanced Models
 
Django In Depth
Django In DepthDjango In Depth
Django In Depth
 
Django REST Framework
Django REST FrameworkDjango REST Framework
Django REST Framework
 
REST Easy with Django-Rest-Framework
REST Easy with Django-Rest-FrameworkREST Easy with Django-Rest-Framework
REST Easy with Django-Rest-Framework
 
Advanced Django Forms Usage
Advanced Django Forms UsageAdvanced Django Forms Usage
Advanced Django Forms Usage
 
Full Stack & Full Circle: What the Heck Happens In an HTTP Request-Response C...
Full Stack & Full Circle: What the Heck Happens In an HTTP Request-Response C...Full Stack & Full Circle: What the Heck Happens In an HTTP Request-Response C...
Full Stack & Full Circle: What the Heck Happens In an HTTP Request-Response C...
 
12 tips on Django Best Practices
12 tips on Django Best Practices12 tips on Django Best Practices
12 tips on Django Best Practices
 
Django in the Real World
Django in the Real WorldDjango in the Real World
Django in the Real World
 
Tabela de números romanos
Tabela de números romanosTabela de números romanos
Tabela de números romanos
 
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
(PFC403) Maximizing Amazon S3 Performance | AWS re:Invent 2014
 
Top 10 senior technical architect interview questions and answers
Top 10 senior technical architect interview questions and answersTop 10 senior technical architect interview questions and answers
Top 10 senior technical architect interview questions and answers
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
Web Development with Python and Django
Web Development with Python and DjangoWeb Development with Python and Django
Web Development with Python and Django
 

Similar to Advanced Django ORM techniques

Backbone.js Simple Tutorial
Backbone.js Simple TutorialBackbone.js Simple Tutorial
Backbone.js Simple Tutorial
추근 문
 
Django class based views (Dutch Django meeting presentation)
Django class based views (Dutch Django meeting presentation)Django class based views (Dutch Django meeting presentation)
Django class based views (Dutch Django meeting presentation)
Reinout van Rees
 
A Related Matter: Optimizing your webapp by using django-debug-toolbar, selec...
A Related Matter: Optimizing your webapp by using django-debug-toolbar, selec...A Related Matter: Optimizing your webapp by using django-debug-toolbar, selec...
A Related Matter: Optimizing your webapp by using django-debug-toolbar, selec...
Christopher Adams
 
Django Class-based views (Slovenian)
Django Class-based views (Slovenian)Django Class-based views (Slovenian)
Django Class-based views (Slovenian)
Luka Zakrajšek
 
Mongo and Harmony
Mongo and HarmonyMongo and Harmony
Mongo and Harmony
Steve Smith
 

Similar to Advanced Django ORM techniques (20)

Hibernate Tutorial for beginners
Hibernate Tutorial for beginnersHibernate Tutorial for beginners
Hibernate Tutorial for beginners
 
Powerful Generic Patterns With Django
Powerful Generic Patterns With DjangoPowerful Generic Patterns With Django
Powerful Generic Patterns With Django
 
Django workshop : let's make a blog
Django workshop : let's make a blogDjango workshop : let's make a blog
Django workshop : let's make a blog
 
Django Search
Django SearchDjango Search
Django Search
 
The Django Book, Chapter 16: django.contrib
The Django Book, Chapter 16: django.contribThe Django Book, Chapter 16: django.contrib
The Django Book, Chapter 16: django.contrib
 
Backbone.js Simple Tutorial
Backbone.js Simple TutorialBackbone.js Simple Tutorial
Backbone.js Simple Tutorial
 
Django class based views (Dutch Django meeting presentation)
Django class based views (Dutch Django meeting presentation)Django class based views (Dutch Django meeting presentation)
Django class based views (Dutch Django meeting presentation)
 
Django design-patterns
Django design-patternsDjango design-patterns
Django design-patterns
 
Chap 3 Python Object Oriented Programming - Copy.ppt
Chap 3 Python Object Oriented Programming - Copy.pptChap 3 Python Object Oriented Programming - Copy.ppt
Chap 3 Python Object Oriented Programming - Copy.ppt
 
اسلاید جلسه ۹ کلاس پایتون برای هکر های قانونی
اسلاید جلسه ۹ کلاس پایتون برای هکر های قانونیاسلاید جلسه ۹ کلاس پایتون برای هکر های قانونی
اسلاید جلسه ۹ کلاس پایتون برای هکر های قانونی
 
Firebase for Apple Developers
Firebase for Apple DevelopersFirebase for Apple Developers
Firebase for Apple Developers
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
A Related Matter: Optimizing your webapp by using django-debug-toolbar, selec...
A Related Matter: Optimizing your webapp by using django-debug-toolbar, selec...A Related Matter: Optimizing your webapp by using django-debug-toolbar, selec...
A Related Matter: Optimizing your webapp by using django-debug-toolbar, selec...
 
Django Class-based views (Slovenian)
Django Class-based views (Slovenian)Django Class-based views (Slovenian)
Django Class-based views (Slovenian)
 
Django Forms: Best Practices, Tips, Tricks
Django Forms: Best Practices, Tips, TricksDjango Forms: Best Practices, Tips, Tricks
Django Forms: Best Practices, Tips, Tricks
 
Mongo and Harmony
Mongo and HarmonyMongo and Harmony
Mongo and Harmony
 
Declarative Data Modeling in Python
Declarative Data Modeling in PythonDeclarative Data Modeling in Python
Declarative Data Modeling in Python
 
Alfresco Content Modelling and Policy Behaviours
Alfresco Content Modelling and Policy BehavioursAlfresco Content Modelling and Policy Behaviours
Alfresco Content Modelling and Policy Behaviours
 
Django Heresies
Django HeresiesDjango Heresies
Django Heresies
 
Core data in Swfit
Core data in SwfitCore data in Swfit
Core data in Swfit
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Advanced Django ORM techniques

  • 1. Advanced Django ORM techniques Daniel Roseman http://blog.roseman.org.uk
  • 2. About Me • Python user for five years • Discovered Django four years ago • Worked full-time with Python/Django since 2008. • Top Django answerer on StackOverflow! • Occasionally blog on Django, concentrating on efficient use of the ORM.
  • 3. Contents • Behind the scenes: models and fields • How model relationships work • More efficient relationships • Other optimising techniques
  • 5.
  • 7. How can you stop this happening to you? http://www.flickr.com/photos/m0n0/4479450696
  • 8. Behind the scenes: models and fields http://www.flickr.com/photos/spacesuitcatalyst/847530840
  • 9. Defining a model • Model structure initialised via metaclass • Called when model is first defined • Resulting model class stored in cache to use when instantiated
  • 10. Fields • Fields have contribute_to_class • Adds methods, eg get_FOO_display() • Enables use of descriptors for field access
  • 11. Model metadata • Model._meta • .fields • .get_field(fieldname) • .get_all_related_objects()
  • 12. Model instantiation • Instance is populated from database initially • Has no subsequent relationship with db until save • No identity between models
  • 13. Querysets • Model=manager returns a queryset: foos Foo.objects.all() • Queryset is an ordered list of instances of a single model • No database access yet • Slice: foos[0] • Iterate: {% for foo in foos %}
  • 14. Where do all those queries come from? • Repeated queries • Lack of caching • Relational lookup • Templates as well as views
  • 15. Repeated queries def get_absolute_url(self): return "%s/%s" % ( self.category.slug, self.slug ) Same category, but query is repeated for each article
  • 16. Repeated queries • Same link on every page • Dynamic, so can't go in urlconf • Could be cached or memoized
  • 17. Relationships http://www.flickr.com/photos/katietegtmeyer/124315322
  • 18. Relational lookups • Forwards: foo.bar.field • Backwards: bar.foo_set.all()
  • 19. Example models class Foo(models.Model): name = models.CharField(max_length=10) class Bar(models.Model): name = models.CharField(max_length=10) foo = models.ForeignKey(Foo)
  • 20. Forwards relationship >>> bar = Bar.objects.all()[0] >>> bar.__dict__ {'id': 1, 'foo_id': 1, 'name': u'item1'}
  • 21. Forwards relationship >>> bar.foo.name u'item1' >>> bar.__dict__ {'_foo_cache': <Foo: Foo object>, 'id': 1, 'foo_id': 1, 'name': u'item1'}
  • 22. Fowards relationships • Relational access implemented via a descriptor: django.db.models.fields.related. SingleRelatedObjectDescriptor • __get__ tries to access _foo_cache • If doesn't exist, does lookup and creates cache
  • 23. select_related • Automatically follows foreign keys in SQL query • Prepopulates _foo_cache • Doesn't follow null=True relationships by default • Makes query more expensive, so be sure you need it
  • 24. Backwards relationships {% for foo in my_foos %} {% for bar in foo.bar_set.all %} {{ bar.name }} {% endfor %} {% endfor %}
  • 25. Backwards relationships • One query per foo • If you iterate over foo_set again, you generate a new set of db hits • No _foo_cache • select_related does not work here
  • 26. Optimising backwards relationships • Get all related objects at once • Sort by ID of parent object • Then cache in hidden attribute as with select_related
  • 27. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 28. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 29. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 30. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 31. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 32. qs = Foo.objects.filter(criteria=whatever) obj_dict = dict([(obj.id, obj) for obj in qs]) objects = Bar.objects.filter(foo__in=qs) relation_dict = {} for obj in objects: relation_dict.setdefault( obj.foo_id, []).append(obj) for id, related in relation_dict.items(): obj_dict[id]._related = related
  • 33. Optimising backwards [{'time': '0.000', 'sql': u'SELECT "foobar_foo"."id", "foobar_foo"."name" FROM "foobar_foo"'}, {'time': '0.000', 'sql': u'SELECT "foobar_bar"."id", "foobar_bar"."name", "foobar_bar"."foo_id" FROM "foobar_bar" WHERE "foobar_bar"."foo_id" IN (SELECT U0."id" FROM "foobar_foo" U0)'}]
  • 34. Optimising backwards • Still quite expensive, as can mean large dependent subquery – MySQL in particular very bad at these • But now just two queries instead of n • Not automatic – need to remember to use _related_items attribute
  • 35. Generic relations • Foreign key to ContentType, object_id • Descriptor to enable direct access • iterating through creates n+m queries(n=number of source objects, m=number of different content types) • ContentType objects automatically cached • Forwards relationship creates _foo_cache • but select_related doesn't work
  • 36. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 37. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 38. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 39. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 40. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 41. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 42. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 43. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 44. generics = {} for item in queryset: generics.setdefault(item.content_type_id, set()).add(item.object_id) content_types = ContentType.objects.in_bulk( generics.keys()) relations = {} for ct, fk_list in generics.items(): ct_model = content_types[ct].model_class() relations[ct] = ct_model.objects. in_bulk(list(fk_list)) for item in queryset: setattr(item, '_content_object_cache', relations[content_type_id][item.object_id] )
  • 45. Other optimising techniques
  • 46. Memoizing • Cache property on first access • Can cache within instance, if multiple accesses within same request def get_expensive_items(self): if not hasattr(self, '_cache'): self._cache = self.expensive_op() return self._cache
  • 47. DB Indexes • Pay attention to slow query log and debug toolbar output • Add extra indexes where necessary - especially for multiple-column lookup • Use EXPLAIN
  • 48. Outsourcing • Does all the logic need to go in the web app? • Services - via eg Piston • Message queues • Distributed tasks, eg Celery
  • 49. Summary • Understand where queries are coming from • Optimise where necessary, within Django or in the database • and...

Editor's Notes

  1. (background: montage of Limmud, rosemanblog, Capital, Classic, Heart, GlassesDirect)
  2. Some of same ideas in Guido&apos;s Appstats talk this morning
  3. It&apos;s a model, in a field, geddit?
  4. For more, see Marty Alchin, Pro Django (Apress)
  5. descriptors used especially in related objects - see later
  6. Very useful for introspection and working out what&apos;s going on
  7. explain identity: multiple instances relating to same model row aren&apos;t the same object, changes made to one don&apos;t reflect the other; even saving one with new values won&apos;t be reflected in others.
  8. Update, Aggregates, Q, F
  9. Find repeated queries with my branch of the django-debug-toolbar, or SimonW&apos;s original query debug middleware
  10. Actually in 1.2 there&apos;s an extra _state object in __dict__, which is used for the multiple DB support (which I&apos;m not covering here).
  11. Lack of model identity means that accessing the related item on one instance does not cause cache to be created on other instances that might reference the same db row
  12. Note: backwards cache does work on OneToOne as of 1.2
  13. +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+ | 1 | PRIMARY | sandy_bar | ALL | NULL | NULL | NULL | NULL | 100 | Using where | | 2 | DEPENDENT SUBQUERY | U0 | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where | +----+-----------+----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+ | 1 | PRIMARY | sandy_bar | ALL | NULL | NULL | NULL | NULL | 100 | Using where | | 2 | DEPENDENT SUBQUERY | U0 | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where | +----+--------------------+-----------+-----------------+---------------+---------+---------+------+------+-------------+ --------+-----------+-----------------+---------------+---------+---------+------+------+-------------+