前言

今天在使用django做后台的时候,遇到了这样的问题:django在处理用户请求的时候,需要进行一个耗时较长的异步抓取操作,同时希望能立即给用户返回数据,使用常规的daemon进程的方式没有解决该问题,用力Google了一下发现如下解答

In fact Django have a syncronous model. If you want to do real async processing, you need a message queue. The most used with django is celery, it may look a bit "overkill" but it's a good answer.

Why do we need this? because in a wsgi app, apache give the request to the executable, and, the executable returns text. It's only once when the executable finish his execution that apache aknowledge the end of the request.

celery安装

官网:http://www.celeryproject.org/,celery利用python编写,实现了分布式的消息队列。安装如下:

sudo pip install celery

此外,celery需要方案来实现发送和接收消息,通常通过一种message broker的独立服务来完成,我们安装官方推荐的RabbitMQ

sudo apt-get install rabbitmq-server

celery介绍

首先,我们需要创建一个celery的实例,编写文件tasks.py

from celery import Celery

app = Celery('tasks', broker='amqp://guest@localhost//')

@app.task
    def add(x, y):
    return x + y

第一个参数tasks是当前模块的名称,第二个参数指定了所使用的message broker的URL,上使用的是RabbitMQ默认URL

启动celery work进程

celery -A tasks worker --loglevel=info
#帮助
celery worker --help
celery help

调用task

>>> from tasks import add
>>> add.delay(4, 4)

celery在django的使用

整体框架如下:

      *--------*                     +----------+
      |        |                     |          |
      | Django >--- Enqueue tasks ---> RabbitMQ >-----.
      |        |                     | (Broker) |     |
      *---v----*                     +----------+     |
          |                                           |
          | Query                    *----------*     |
          |                          |  Celery  <-----+
        +-v----------+  .-- Events --<  Worker  |     |
        |            | /             *----------*     | Consume
        | PostgreSQL <=                               | & Run Tasks
        |            | \             *----------*     |
        +------------+  `-- Events --<  Celery  |     |
                                     |  Worker  <-----'
                                     *----------*

官方教程已经足够详细了~在此就不在累述了~ ;-),请戳:http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html

附:supervise介绍

嗯,插入supervise的介绍~方便之后在后台运行celery实例。因为我厂大规模使用supervise管理服务程序,所以,有必要好好了解一下了~

supervise是开源工具集daemontools其中的一个工具

工作原理

supervise启动的时候fork一个子进程,子进程执行execvp系统调用,将自己替换成执行的模块, 模块变成supervise的子进程在运行,而supervise则死循环运行,并通过waitpid或者wait3系统调用选择非阻塞的方式去侦听子进程的运行情况,

当然同时也会读取pipe文件svcontrol的命令,然后根据命令去执行不同的动作, 如果子进程因某种原因导致退出,则supervise通过waitpid或者wait3获知,并继续启动模块,如果模块异常导致无法启动,则会使supervise陷入死循环,不断的启动模块。

安装

sudo apt-get install daemontools

用法

#supervise 目录名
supervise s

supervise switches to the directory named s and starts ./run. It restarts ./run if ./run exits. It pauses for a second after starting ./run, so that it does not loop too quickly if ./run exits immediately.

If the file s/down exists, supervise does not start ./run immediately. You can use svc to start ./run and to give other commands to supervise.

supervise maintains status information in a binary format inside the directory s/supervise, which must be writable to supervise. The status information can be read by svstat.

supervise may exit immediately after startup if it cannot find the files it needs in s or if another copy of supervise is already running in s. Once supervise is successfully running, it will not exit unless it is killed or specifically asked to exit. You can use svok to check whether supervise is successfully running. You can use svscan to reliably start a collection of supervise processes.


Follow me on GitHub