job queues overview

22
Job Queues

Upload: joeyrobert

Post on 09-Aug-2015

1.706 views

Category:

Technology


0 download

TRANSCRIPT

Job Queues

● Allows for asynchronous computation of jobs (or tasks)● Uses consumers (or workers) to complete the job in the

background● Results are available when the job is complete

Queue

● First In First Out data structure (FIFO)

Queue

● enqueue adds an item to end of queue➜● dequeue pulls the oldest item off the queue➜● isEmpty boolean ➜● length integer (number of items in queue)➜

Queue Operations

For an unbounded queue, we choose a singly linked list with head and tail pointers as the data structure.

Queue Data Structure

All O(1) operations!

● enqueue - sets current tail next pointer and tail pointer to new item

● dequeue - returns current head and sets head pointer to head next pointer

● isEmpty - head/tail is null

Producers push jobs onto the job queue

Examples:● Web servers - A typical HTTP response must return

within a short timeframe (200ms - 2000ms)● Humans phoning into tech support

Producers

Consumers pop jobs off of the queue and complete them

Example use cases (any long running process):● Map / reduce calls on large datasets● Media conversion, manipulation and rendering● Image resize● Downloading remote resources● CPU intensive tasks (calculations)

Consumers

Producers and Consumers can be part of the same process!

Example: a web crawler (breadth first search)1. Push a base URL to the queue (e.g. http://yahoo.com/)2. Pop a URL from the queue and parse it 3. For each link the page, push it onto the queue4. Goto 2

Producers and Consumers

Each job exists in one of the following states:● Queued● Processing (in progress)● Completed● FailedJobs may also output:● Logs● Progress (% complete)

Job States

Consumers are functional. The only input they receive comes from the job, which comes from the producer.

Job data should include:● Type● Any information needed to complete the job

Job Data

...states that the speedup a concurrent algorithm can achieve is limited by the serial path.

Locks and serial parts limit the maximum performance of a concurrent system.

Amdahl’s law...

● Priority ordered Queue data structure● Highest priority jobs are dequeued first● On the same priority level, oldest jobs are dequeued

first

Priority Queue

● enqueue adds a job to end of queue with a priorty➜● dequeue pulls the highest priority, oldest job off the ➜

queue● isEmpty boolean ➜● length integer (number of items in queue)➜

Priority Queue Operations

● Data structure (max heap)● Binary tree with the max heap property (each parent

node is larger than its children)● For a priority queue, each item in the tree would be a

pointer to a regular queue for that priority

Priority Queue Data Structure

Enqueue and dequeue O(log n) operations!

● Average wait time per job type● Number of queued jobs● Jobs processed / time● Jobs pushed / time

Jobs processed / time ≥ Jobs push / time

Otherwise a backlog forms!

Priority Queue Metrics

In sophisticated job systems, a job scheduler exists to:● Maximize use of computing power● Minimize wait time● Provide an interface to job tasks

They can use a combination of priority, estimated (historical) job time and available computing power to determine how jobs are run. Sophisticated job scheduling algorithms exists.

Job Scheduler

Case Study: Grocery Lines

Case Study: Grocery Lines

4 consumers, 4 queues, 12 jobs of varying durations

Average wait time = (10 + 13 + 4 + 6 + 1 + 9 + 6 + 13) / 12 = 5.1666...

Case Study: Grocery Lines

4 consumers, 1 queue, 12 jobs of varying durations

Order: 6, 1, 4, 10, 7 (1), 8 (4), 2 (6), 3 (6), 11 (8), 5 (8), 12 (9), 9 (10)

Average wait time = (1 + 4 + 6 + 6 + 8 + 8 + 9 + 10) / 12 = 4.3333...

Case Study: Grocery Lines

4 consumers, 1 queue, 12 jobs of varying durations intelligently ordered to minimize wait time:

Order: 1, 2, 3, 4, 5 (1), 6 (2), 7 (3), 8 (4), 9 (5), 10 (6), 11 (8), 12 (9)

Average wait time = (1 + 2 + 3 + 4 + 5 + 6 + 8 + 9) / 12 = 3.1667...

● Beanstalkd (C) http://kr.github.io/beanstalkd/ ● Celery (Python + many backends) http://www.celeryproject.org/ ● Delayed::Job (Ruby + DB) https://github.com/collectiveidea/delayed_job ● Gearman (C++) http://gearman.org/ ● Kue (Node + Redis) https://github.com/learnboost/kue ● Resque (Ruby + Redis) http://resquework.org/ ● RQ (Python + Redis) http://python-rq.org/ ● Sidekiq (Ruby) http://sidekiq.org/ ● SQS by Amazon (managed) http://aws.amazon.com/sqs/

More links and information at http://queues.io/

Job Queue Software