SlideShare a Scribd company logo
1 of 26
Download to read offline
Advanced Task Management in Celery


           Mahendra M
           @mahendra
    https://github.com/mahendra
@mahendra
●   Python developer for 6 years
●   FOSS enthusiast/volunteer for 14 years
    ●   Bangalore LUG and Infosys LUG
    ●   FOSS.in and LinuxBangalore/200x
●   Celery user for 3 years
●   Contributions
    ●   patches, testing new releases
    ●   Zookeeper msg transport for kombu
    ●   Kafka support (in-progress)
Quick Intro to Celery
●   Asynchronous task/job queue
●   Uses distributed message passing
●   Tasks are run asynchronously on worker nodes
●   Results are passed back to the caller (if any)
Overview

                    Worker 1



                    Worker 2
Sender    Msg Q
                        .
                        .
                        .

                    Worker N
Sample Code
from celery.task import task


@task
def add(x, y):
   return x + y


result = add.delay(5,6)
result.get()
Uses of Celery
●   Asynchronous task processing
●   Handling long running / heavy jobs
    ●   Image resizing, video transcode, PDF generation
●   Offloading heavy web backend operations
●   Scheduling tasks to be run at a particular time
    ●   Cron for python
Advanced Uses
●   Task Routing
●   Task retries, timeout and revoking
●   Task Canvas – combining tasks
    ●   Task co-ordination
    ●   Dependencies
    ●   Task trees or graphs
    ●   Batch tasks
    ●   Progress monitoring
●   Tricks
    ●   DB conflict management
Sending tasks to a particular worker

                                  Worker 1
                                 (Windows)

                       windows
                                  Worker 2
                       windows   (Windows)
     Sender    Msg Q
                                     .
                        linux
                                     .
                                     .
                                 Worker N
                                  (Linux)
Routing tasks – Use cases
●   Priority execution
●   Based on hardware capabilities
    ●   Special cards available for video capture
    ●   Making use of GPUs (CUDA)
●   Based on OS (for eg. Playready encryption)
●   Based on location
    ●   Moving compute closer to data (Hadoop-ish)
    ●   Sending tasks to different data centers
●   Sequencing operations (CouchDB conflicts)
Sample Code
from celery.task import task


@task(queue = 'windows')
def drm_encrypt(audio_file, key_phrase):
   ...


r = drm_encrypt.apply_async( args = [afile, key],
                               queue = 'windows' )


#Start celery worker with queues options
$ celery worker -Q windows
Retrying tasks
@task( default_retry_delay = 60,
      max_retries = 3 )
def drm_encrypt(audio_file, key_phrase):
   try:
       playready.encrypt(...)
   except Exception, exc:
       raise drm_encrypt.retry(exc=exc, countdown=5)
Retrying tasks
●   You can specify the number of times a task can
    be retried.
●   The cases for retrying a task must be handled
    within code. Celery will not do it automatically
●   The tasks should be designed to be idempotent
Handling worker failures
@task( acks_late = True )
def drm_encrypt(audio_file, key_phrase):
     try:
          playready.encrypt(...)
     except Exception, exc:
          raise drm_encrypt.retry(exc=exc, countdown=5)



●   This is used where the task must be resend in case of
    worker or node failure
●   The ack message to the message queue is sent after the
    task finishes executing
Worker processes

                                 Worker 1
                                (Windows)

                      windows
                                 Worker 2
                      windows   (Windows)
Sender        Msg Q
                                    .
                       linux
                                    .
                                    .
                                Worker N
                                 (Linux)
                                    Process 1
                                    Process 2

                                    Process N
Worker processes

                                 Worker 1
                                (Windows)

                      windows
                                 Worker 2
                      windows   (Windows)
Sender        Msg Q
                                    .
                       linux
                                    .
                                    .
                                Worker N
                                 (Linux)
                                    Process 1
                                    Process 2

                                    Process N
Worker process
●   In every worker node, celery starts a pool of
    worker processes
●   The number is determined by the concurrency
    setting (or autodetected – for full CPU usage)
●   Each processes can be configured to restart
    after running x number of tasks
    ●   Disabled by default
●   Alternately eventlet can be used instead of
    processes (discuss later)
Revoking tasks
celery.control.revoke( task_id,
                        terminate = False,
                        signal = 'SIGKILL' )
●
    revoke() works by sending a broadcast
    message to all workers
●   If a task has not yet run, workers will keep this
    task_id in memory and ensure that it does not
    run
●   If a task is running, revoke() will not work
    unless terminate = True
Task expiration
task.apply_async( expires = x )
        x can be
        * in seconds
        * a specific datetime()


●   Global time limits can be configured in settings
    ●   Soft time limit – the task receives an exception
        which can be used to cleanup
    ●   Hard time limit – the worker running the task is
        killed and is replaced with another one.
Handling soft time limit
@task()
def drm_encrypt(audio_file, key_phrase):
   Try:
          setup_tmp_files()
           SoftTimeLimitExceeded:

          playready.encrypt(...)
   except SoftTimeLimitExceeded:
          cleanup_tmp_files()
   except Exception, exc:
          raise drm_encrypt.retry(exc=exc, countdown=5)
Task Canvas
●   Chains – Linking one task to another
●   Groups – Execute several tasks in parallel
●   Chord – execute a task after a set of tasks has
    finished
●   Map and starmap – Similar to map() function
●   Chunks – divide an iterable of work into chunks
●   Chunks + Chord/chain can be used for map-
    reduce
                Best shown in a demo
Task trees

[ task 1 ] --- spawns --- [ task 2 ] ---- spawns -->   [ task 2_1 ]
                  |                                    [ task 2_3 ]
                  |
                  +------ [ task 3 ] ---- spawns -->   [ task 3_1 ]
                  |                                    [ task 3_2 ]
                  |
                  +------ [ task 4 ] ---- links ---> [ task 5 ]
                                                         |(spawns)
                                                         |
                                                         |
                          [ task 8 ] <--- links <--- [ task 6 ]
                                                         |(spawns)
                                                     [ task 7 ]
Task Trees
●   Home grown solution (our current approach)
    ●   Use db models and keep track of trees
●   Better approach
    ●   Use celery-tasktree
    ●   http://pypi.python.org/pypi/celery-tasktree
Celery Batches
●   Collect jobs and execute it in a batch.
●   Can be used for stats collection
●   Batch execution is done once
    ●   a configured timeout is reached OR
    ●   a configured number of tasks have been received
●   Useful for reducing n/w and db loads
Celery Batches
from celery.contrib.batches import Batches
@task( base=Batches, flush_every=50, flush_interval=10 )
def collect_stats( requests ):
   items = {}
   for request in requests:
       item_id = request.kwargs['item_id']
       items[ item_id ] = get_obj( item_id )
       items[ item_id ].count += 1
   # Sync to db


collect_stats.delay( item_id = 45 )
collect_stats.delay( item_id = 57 )
Celery monitoring
●   Celery Flower
    https://github.com/mher/flower
●   Django admin monitor
●   Celery jobstatic
    http://pypi.python.org/pypi/jobtastic
Celery deployment
●   Cyme – celery instance manager
    https://github.com/celery/cyme
●   Celery autoscaling
●   Use celery eventlet where required

More Related Content

What's hot

Europython 2011 - Playing tasks with Django & Celery
Europython 2011 - Playing tasks with Django & CeleryEuropython 2011 - Playing tasks with Django & Celery
Europython 2011 - Playing tasks with Django & CeleryMauro Rocco
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMariaDB plc
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task QueueRichard Leland
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidDatabricks
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...InfluxData
 
weather-data-processing-using-python
weather-data-processing-using-pythonweather-data-processing-using-python
weather-data-processing-using-pythonmarc_kth
 
[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우NAVER D2
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Understanding of Apache kafka metrics for monitoring
Understanding of Apache kafka metrics for monitoring Understanding of Apache kafka metrics for monitoring
Understanding of Apache kafka metrics for monitoring SANG WON PARK
 
Non blocking io with netty
Non blocking io with nettyNon blocking io with netty
Non blocking io with nettyZauber
 
Presto User & Admin Guide
Presto User & Admin GuidePresto User & Admin Guide
Presto User & Admin GuideJEONGPHIL HAN
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]MongoDB
 

What's hot (20)

Europython 2011 - Playing tasks with Django & Celery
Europython 2011 - Playing tasks with Django & CeleryEuropython 2011 - Playing tasks with Django & Celery
Europython 2011 - Playing tasks with Django & Celery
 
Celery with python
Celery with pythonCelery with python
Celery with python
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Ansible - Crash course
Ansible - Crash courseAnsible - Crash course
Ansible - Crash course
 
ClickHouse Intro
ClickHouse IntroClickHouse Intro
ClickHouse Intro
 
Maximizing performance via tuning and optimization
Maximizing performance via tuning and optimizationMaximizing performance via tuning and optimization
Maximizing performance via tuning and optimization
 
Django Celery
Django Celery Django Celery
Django Celery
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task Queue
 
Funnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and DruidFunnel Analysis with Apache Spark and Druid
Funnel Analysis with Apache Spark and Druid
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
 
weather-data-processing-using-python
weather-data-processing-using-pythonweather-data-processing-using-python
weather-data-processing-using-python
 
[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Understanding of Apache kafka metrics for monitoring
Understanding of Apache kafka metrics for monitoring Understanding of Apache kafka metrics for monitoring
Understanding of Apache kafka metrics for monitoring
 
Non blocking io with netty
Non blocking io with nettyNon blocking io with netty
Non blocking io with netty
 
Presto User & Admin Guide
Presto User & Admin GuidePresto User & Admin Guide
Presto User & Admin Guide
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
 

Similar to Advanced task management with Celery

Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark InternalsKnoldus Inc.
 
Trevor McDonald - Nagios XI Under The Hood
Trevor McDonald  - Nagios XI Under The HoodTrevor McDonald  - Nagios XI Under The Hood
Trevor McDonald - Nagios XI Under The HoodNagios
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overviewjessesanford
 
Robust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time ChecksRobust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time ChecksStoyan Nikolov
 
Batch Processing with Amazon EC2 Container Service
Batch Processing with Amazon EC2 Container ServiceBatch Processing with Amazon EC2 Container Service
Batch Processing with Amazon EC2 Container ServiceAmazon Web Services
 
Async and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NLAsync and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NLArie Leeuwesteijn
 
Troubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveTroubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveMarcelo Altmann
 
NovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programsNovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programsGreg Banks
 
Scaling Django with gevent
Scaling Django with geventScaling Django with gevent
Scaling Django with geventMahendra M
 
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Anne Nicolas
 
Programming with Threads in Java
Programming with Threads in JavaProgramming with Threads in Java
Programming with Threads in Javakoji lin
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"LogeekNightUkraine
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Francesco
 

Similar to Advanced task management with Celery (20)

SWT Tech Sharing: Node.js + Redis
SWT Tech Sharing: Node.js + RedisSWT Tech Sharing: Node.js + Redis
SWT Tech Sharing: Node.js + Redis
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
 
internals
internalsinternals
internals
 
Internals
InternalsInternals
Internals
 
Celery
CeleryCelery
Celery
 
Concurrency in Swift
Concurrency in SwiftConcurrency in Swift
Concurrency in Swift
 
Trevor McDonald - Nagios XI Under The Hood
Trevor McDonald  - Nagios XI Under The HoodTrevor McDonald  - Nagios XI Under The Hood
Trevor McDonald - Nagios XI Under The Hood
 
Dragoncraft Architectural Overview
Dragoncraft Architectural OverviewDragoncraft Architectural Overview
Dragoncraft Architectural Overview
 
Robust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time ChecksRobust C++ Task Systems Through Compile-time Checks
Robust C++ Task Systems Through Compile-time Checks
 
Celery introduction
Celery introductionCelery introduction
Celery introduction
 
Batch Processing with Amazon EC2 Container Service
Batch Processing with Amazon EC2 Container ServiceBatch Processing with Amazon EC2 Container Service
Batch Processing with Amazon EC2 Container Service
 
Async and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NLAsync and parallel patterns and application design - TechDays2013 NL
Async and parallel patterns and application design - TechDays2013 NL
 
Troubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveTroubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer Perspective
 
NovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programsNovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programs
 
Scaling Django with gevent
Scaling Django with geventScaling Django with gevent
Scaling Django with gevent
 
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
 
Programming with Threads in Java
Programming with Threads in JavaProgramming with Threads in Java
Programming with Threads in Java
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"
 
Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013Numpy Meetup 07/02/2013
Numpy Meetup 07/02/2013
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Advanced task management with Celery

  • 1. Advanced Task Management in Celery Mahendra M @mahendra https://github.com/mahendra
  • 2. @mahendra ● Python developer for 6 years ● FOSS enthusiast/volunteer for 14 years ● Bangalore LUG and Infosys LUG ● FOSS.in and LinuxBangalore/200x ● Celery user for 3 years ● Contributions ● patches, testing new releases ● Zookeeper msg transport for kombu ● Kafka support (in-progress)
  • 3. Quick Intro to Celery ● Asynchronous task/job queue ● Uses distributed message passing ● Tasks are run asynchronously on worker nodes ● Results are passed back to the caller (if any)
  • 4. Overview Worker 1 Worker 2 Sender Msg Q . . . Worker N
  • 5. Sample Code from celery.task import task @task def add(x, y): return x + y result = add.delay(5,6) result.get()
  • 6. Uses of Celery ● Asynchronous task processing ● Handling long running / heavy jobs ● Image resizing, video transcode, PDF generation ● Offloading heavy web backend operations ● Scheduling tasks to be run at a particular time ● Cron for python
  • 7. Advanced Uses ● Task Routing ● Task retries, timeout and revoking ● Task Canvas – combining tasks ● Task co-ordination ● Dependencies ● Task trees or graphs ● Batch tasks ● Progress monitoring ● Tricks ● DB conflict management
  • 8. Sending tasks to a particular worker Worker 1 (Windows) windows Worker 2 windows (Windows) Sender Msg Q . linux . . Worker N (Linux)
  • 9. Routing tasks – Use cases ● Priority execution ● Based on hardware capabilities ● Special cards available for video capture ● Making use of GPUs (CUDA) ● Based on OS (for eg. Playready encryption) ● Based on location ● Moving compute closer to data (Hadoop-ish) ● Sending tasks to different data centers ● Sequencing operations (CouchDB conflicts)
  • 10. Sample Code from celery.task import task @task(queue = 'windows') def drm_encrypt(audio_file, key_phrase): ... r = drm_encrypt.apply_async( args = [afile, key], queue = 'windows' ) #Start celery worker with queues options $ celery worker -Q windows
  • 11. Retrying tasks @task( default_retry_delay = 60, max_retries = 3 ) def drm_encrypt(audio_file, key_phrase): try: playready.encrypt(...) except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)
  • 12. Retrying tasks ● You can specify the number of times a task can be retried. ● The cases for retrying a task must be handled within code. Celery will not do it automatically ● The tasks should be designed to be idempotent
  • 13. Handling worker failures @task( acks_late = True ) def drm_encrypt(audio_file, key_phrase): try: playready.encrypt(...) except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5) ● This is used where the task must be resend in case of worker or node failure ● The ack message to the message queue is sent after the task finishes executing
  • 14. Worker processes Worker 1 (Windows) windows Worker 2 windows (Windows) Sender Msg Q . linux . . Worker N (Linux) Process 1 Process 2 Process N
  • 15. Worker processes Worker 1 (Windows) windows Worker 2 windows (Windows) Sender Msg Q . linux . . Worker N (Linux) Process 1 Process 2 Process N
  • 16. Worker process ● In every worker node, celery starts a pool of worker processes ● The number is determined by the concurrency setting (or autodetected – for full CPU usage) ● Each processes can be configured to restart after running x number of tasks ● Disabled by default ● Alternately eventlet can be used instead of processes (discuss later)
  • 17. Revoking tasks celery.control.revoke( task_id, terminate = False, signal = 'SIGKILL' ) ● revoke() works by sending a broadcast message to all workers ● If a task has not yet run, workers will keep this task_id in memory and ensure that it does not run ● If a task is running, revoke() will not work unless terminate = True
  • 18. Task expiration task.apply_async( expires = x ) x can be * in seconds * a specific datetime() ● Global time limits can be configured in settings ● Soft time limit – the task receives an exception which can be used to cleanup ● Hard time limit – the worker running the task is killed and is replaced with another one.
  • 19. Handling soft time limit @task() def drm_encrypt(audio_file, key_phrase): Try: setup_tmp_files() SoftTimeLimitExceeded: playready.encrypt(...) except SoftTimeLimitExceeded: cleanup_tmp_files() except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)
  • 20. Task Canvas ● Chains – Linking one task to another ● Groups – Execute several tasks in parallel ● Chord – execute a task after a set of tasks has finished ● Map and starmap – Similar to map() function ● Chunks – divide an iterable of work into chunks ● Chunks + Chord/chain can be used for map- reduce Best shown in a demo
  • 21. Task trees [ task 1 ] --- spawns --- [ task 2 ] ---- spawns --> [ task 2_1 ] | [ task 2_3 ] | +------ [ task 3 ] ---- spawns --> [ task 3_1 ] | [ task 3_2 ] | +------ [ task 4 ] ---- links ---> [ task 5 ] |(spawns) | | [ task 8 ] <--- links <--- [ task 6 ] |(spawns) [ task 7 ]
  • 22. Task Trees ● Home grown solution (our current approach) ● Use db models and keep track of trees ● Better approach ● Use celery-tasktree ● http://pypi.python.org/pypi/celery-tasktree
  • 23. Celery Batches ● Collect jobs and execute it in a batch. ● Can be used for stats collection ● Batch execution is done once ● a configured timeout is reached OR ● a configured number of tasks have been received ● Useful for reducing n/w and db loads
  • 24. Celery Batches from celery.contrib.batches import Batches @task( base=Batches, flush_every=50, flush_interval=10 ) def collect_stats( requests ): items = {} for request in requests: item_id = request.kwargs['item_id'] items[ item_id ] = get_obj( item_id ) items[ item_id ].count += 1 # Sync to db collect_stats.delay( item_id = 45 ) collect_stats.delay( item_id = 57 )
  • 25. Celery monitoring ● Celery Flower https://github.com/mher/flower ● Django admin monitor ● Celery jobstatic http://pypi.python.org/pypi/jobtastic
  • 26. Celery deployment ● Cyme – celery instance manager https://github.com/celery/cyme ● Celery autoscaling ● Use celery eventlet where required