Brief introduction on message queue and how its relevant in web applications
How to tell if your web application could benefit from message queue
Common example of tasks that could benefit from message queues
Choosing a broker/protocol
What broker/protocol PBS Education chose and why
Message queue solution architecture
Brief introduction on celery/carrot
Writing a message queue task using celery
How to invoke a message queue taks
What happens when you invoke a task (walk through architecture)
How to write tasks efficiently
What are the things that are good to know when writing tasks (things we experienced at PBS Education)
How AI, OpenAI, and ChatGPT impact business and software.
Life in a Queue - Using Message Queue with django
1. Life in a Queue
Tareque Hossain
Education Technology
2. What is Message Queue?
• Message Queues are:
o Communication Buffers
o Between independent sender & receiver processes
o Asynchronous
• Time of sending not necessarily same as receiving
• In context of Web Applications:
o Sender: Web Application Servers
o Receiver: Background worker processes
o Queue items: Tasks that the web server doesn’t have
time/resources to do
3.
4. Inside a Message Queue
Web
App
Server
Dequeue
Manager
Worker
Server
Web
App
T1 T3
Server
T2 T4
T6 Worker
Server
T5
Web
App
T7
Server
Q1 Q2
Enqueue
Worker
Server
Manager
Web
App
Server
Message
Queue
Broker
5. How does it work?
• Say a web application server has a task it
doesn’t have time to do
• It puts the task in the message queue
• Other web servers can access the same queue(s)
and put tasks there
• Queues are FIFO (First In First Out)
• Workers are greedy and they all watch the
queues for tasks
• Workers asynchronously pick up the first
available task on the queue when they are ready
6. Do I need Message Queues?
• Message Queues are useful in certain
situations
• General guidelines:
o Does your web applications take more than a
few seconds to generate a response?
o Are you using a lot of cron jobs to process data
in the background?
o Do you wish you could distribute the processing
of the data generated by your application among
many servers?
7. Wait I’ve heard Asynchronous before!
• Yes. AJAX is an asynchronous communication
method between client & server
• Some of the response time issues can be solved:
o With AJAX responses that continually enhance the
initial response
o Only if the AJAX responses also complete within a
reasonable amount of time
• You need Message Queues when:
o Long processing times can’t be avoided in generating
responses
o You want application data to be continuously processed
in the background and readily available when requested
8. MQ Tasks: Processing User Uploads
• Resize uploaded image to generate different
resolutions of images, avatars, gallery snapshots
• Reformat videos to match your player
requirements
• YouTube, Facebook, Slideshare are good examples
9. MQ Tasks: Generate Reports
• Generating reports from large amount of data
o Reports that contains graphical charts
o Multiple reports that cross reference each other
10. MQ Tasks: 3rd Party Integrations
• Bulk processing of 3rd party service requests
o Refund hundreds of transactions using Paypal
o Any kind of data synchronization
o Aggregation of RSS/other feeds
Social
Network
Feed
Aggregator
11. MQ Tasks: Cron Jobs
• Any cron job that is not time sensitive
o Asynchronous behavior of message queue doesn’t
guarantee execution of tasks on the dot
o Jobs in cron that should be done as soon as resources
become available are good candidates
14. OMG That’s too much!
• Yeah. I agree.
• Read great research details at Second Life dev site
o http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes
• Let’s simplify. How do we choose?
o How is the exception handling and recovery?
o Is maintenance relatively low?
o How easy is deployment?
o Are the queues persistent?
o How is the community support?
o What language is it written in? How compatible is that
with our current systems?
o How detailed are the documentations?
15. Choice of PBS Education
• We chose AMQP & RabbitMQ
• Why?
o We don’t expect message volumes as high as 1M or
more at a time
o RabbitMQ is free to use
o The documentation is decent
o There is decent clustering support, even though we never
needed clustering
o We didn’t want to lose queues or messages upon broker
crash/ restart
o We develop applications using Python/django and
setting up an AMQP backend using celery/kombu was
easy
16. Message Queue Solution Stack
RabbitMQ
PyAMQPlib/Kombu
PyAMQPlib/Kombu
Celery
Celery
Web
Application
Server
Queue
Worker
17. Celery? Kombu? Yummy.
• django made web development using Python a
piece of cake
• Celery & Kombu make using message queue in
your django/Python applications a piece of cake
• Kombu
o AMQP based Messaging Framework for Python,
powered by PyAMQPlib
o Provides fundamentals for creating queues, configuring
broker, sending receiving messages
• Celery
o Distributed task queue management application
18. Celery Backends
• Celery is very, very powerful
• You can use celery to emulate message queue
brokers using a DB backend for broker
o Involves polling & less efficient than AMQP
o Use for local development
• Bundled broker backends
o amqplib, pika, redis, beanstalk, sqlalchemy, django,
mongodb, couchdb
• Broker backend is different that task & task result
store backend
o Used by celery to store results of a task, errors if failed
19. A Problem with a View
• What is wrong with this view?
def
create_report(request):
...
Code
for
extracting
parameters
from
request
...
...
Code
for
generating
report
from
lots
of
data
...
return
render_to_response(‘profiles/
index.html’,
{
‘report’:
report,
},
context_instance=RequestContext(request))
21. Lets Write a Celery Task
• Writing celery tasks was never any more difficult
than this:
import
celery
@celery.task()
def
generate_report(*args,
**kwargs):
...
Code
for
generating
report
...
report.save()
22. Lets Write a Celery Task II
• If you want to customize your tasks, inherit from
the base Task object
from
celery.task.base
import
Task
class
GenerateReport(Task):
def
__init__(self,
*args,
**kwargs):
...
Custom
init
code
...
return
super(GenerateReport,
self).__init__(*args,
**kwargs)
def
run(self,
*args,
**kwargs):
...
Code
for
generating
report
...
report.save()
23. Issuing a task
• After writing a task, we issue the task from within
a request in the following way:
def
create_report(request):
...
Code
for
extracting
parameters
from
request
...
generate_report.delay(**params)
//
or
GenerateReport.delay(**params)
messages.success(request,
'You
will
receive
an
email
when
report
generation
is
complete.')
return
HTTPResponseRedirect(reverse
(‘reports_index’))
24. What happens when you issue tasks?
Broker
Queue
Celery
Celery
Celery
Celery
Application
Request
Server
Handler
Worker
Worker
Worker
25. Understanding Queue Routing
• Brokers contains multiple virtual hosts
• Each virtual host contains multiple exchanges
• Messages are sent to exchanges
o Exchanges are hubs that connect to a set of queues
• An exchange routes messages to one or more
queues
Queue
Exchange
VHost
26. Understanding Queue Routing
• In Celery configurations:
o binding_key binds a task namespace to a queue
o exchange defines the name of an exchange
o routing_key defines which queue a message should be
directed to under a certain exchange
o exchange_type = ‘direct’ routes for exact routing keys
o exchange_type = ‘topic’ routes for namespaced &
wildcard routing keys
• * (matches a single word)
• # (matches zero or more words)
28. Quick Tips
#
Route
a
task
mytask.apply_async(
args=[filename],
routing_key=“video.compress”
)
#
Or
define
task
mapping
in
CELERY_ROUTES
setting
#
Set
expiration
for
a
task
–
in
seconds
mytask.apply_async(args=[10,
10],
expires=60)
#
Revoke
a
task
using
the
task
instance
result
=
mytask.apply_async(args=[2,
2],
countdown=120)
result.revoke()
#
Or
save
the
task
ID
(result.task_id)
somewhere
from
celery.task.control
import
revoke
revoke(task_id)
29. Quick Tips
• Execute task as a blocking call using:
generate_report.apply(kwargs=params,
**options)
• Avoid issuing tasks inside an asynchronous task
that waits on children data (blocking)
o Write re-usable pieces of code that can be called as
functions instead of called as tasks
o If necessary, use the callback + subtask feature of celery
• Ignore results if you don’t need them
o If your asynchronous task doesn’t return anything
@celery.task(ignore_results=True)
30. Good to know
• Do check whether your task parameters are
serializable
o WSGI request objects are not serializable
o Don’t pass request as a parameter for your task
• Don’t pass unnecessary data in task
parameters
o They have to be stored until task is complete
31. Good to know
• Avoid starvation of tasks using multiple
queues
o If really long video re-formatting tasks are processed
in the same queue as relatively quicker thumbnail
generation tasks, the latter may starve
o Only available when using AMQP broker backend
• Use celerybeat for time sensitive repeated
tasks
o Can replace time sensitive cron jobs related to your web
application
32. Q&A
• Slides available at:
o http://www.slideshare.net/tarequeh
• Extensive guides & documentation available at:
o http://ask.github.com/celery/