Rails 6 adds hook to Active Job for retry & discard

Vishal Telangre

Vishal Telangre

July 9, 2019

This blog is part of our  Rails 6 series.

Before Rails 6

Before Rails 6, we have to provide a custom block to perform custom logging and monitoring around retries and discards of the jobs defined using Active Job framework.

1class Container::DeleteJob < ActiveJob::Base
2retry_on Timeout::Error, wait: 2.seconds, attempts: 3 do |job, error|
3message = "Stopped retrying #{job.class} (JID #{job.job_id})
4with #{job.arguments.join(', ')} due to
5'#{error.class} - #{error.message}'.
6This job was retried for #{job.executions} times.".squish
7
8    BackgroundJob::ErrorLogger.log(message)
9
10end
11
12discard_on Container::NotFoundError do |job, error|
13message = "Discarded #{job.class} (JID #{job.job_id})
14with #{job.arguments.join(', ')} due to
15'#{error.class} - #{error.message}' error.".squish
16
17    BackgroundJob::ErrorLogger.log(message)
18
19end
20
21def perform(container_id)
22Container::DeleteService(container_id).process
23
24    # Will raise Container::NotFoundError
25    # if no container is found with 'container_id'.
26
27    # Might raise Timeout::Error when the remote API is not responding.
28
29end
30end

Notice the custom blocks provided to retry_on and discard_on methods to an individual job in the above example.

Extracting such custom logic to a base class or to a 3rd-party gem is possible but it will be non-standard and will be a bit difficult task.

An alternative approach is to subscribe to the hooks instrumented using Active Support Instrumentation API which is a standard and recommended way. Prior versions of Rails 6 already instruments some hooks such as enqueue_at.active_job, enqueue.active_job, perform_start.active_job, and perform.active_job. Unfortunately no hook is instrumented around retries and discards of an Active Job prior to Rails 6.

Rails 6

Rails 6 has introduced hooks to Active Job around retries and discards to which one can easily subscribe using Active Support Instrumentation API to perform custom logging and monitoring or to collect any custom information.

The newly introduced hooks are enqueue_retry.active_job, retry_stopped.active_job and discard.active_job.

Let's discuss each of these hooks in detail.

Note that whenever we say a job, it means a job of type ActiveJob.

enqueue_retry.active_job

The enqueue_retry.active_job hook is instrumented when a job is enqueued to retry again due to occurrence of an exception which is configured using the retry_on method in the job's definition. This hook is triggered only when above condition is satisfied and the number of executions of the job is less than the number of attempts defined using the retry_on method. The number of attempts is by default set to 5 if not defined explicitly.

This is how we would subscribe to this hook and perform custom logging in our Rails application.

1ActiveSupport::Notifications.subscribe "enqueue_retry.active_job" do |*args|
2event = ActiveSupport::Notifications::Event.new *args
3payload = event.payload
4job = payload[:job]
5error = payload[:error]
6message = "#{job.class} (JID #{job.job_id})
7with arguments #{job.arguments.join(', ')}
8will be retried again in #{payload[:wait]}s
9due to error '#{error.class} - #{error.message}'.
10It is executed #{job.executions} times so far.".squish
11
12BackgroundJob::Logger.log(message)
13end

Note that the BackgroundJob::Logger above is our custom logger. If we want, we can add any other logic instead.

We will change the definition of Container::DeleteJob job as below.

1class Container::DeleteJob < ActiveJob::Base
2retry_on Timeout::Error, wait: 2.seconds, attempts: 3
3
4def perform(container_id)
5Container::DeleteService(container_id).process
6
7    # Will raise Timeout::Error when the remote API is not responding.
8
9end
10end

Let's enqueue this job.

1Container::DeleteJob.perform_now("container-1234")

Assume that this job keeps throwing Timeout::Error exception due to a network issue. The job will be retried twice since it is configured to retry when a Timeout::Error exception occurs up to maximum 3 attempts. While retrying this job, Active Job will instrument enqueue_retry.active_job hook along with the necessary job payload.

Since we have already subscribed to this hook, our subscriber would log something like this with the help of BackgroundJob::Logger.log.

1Container::DeleteJob (JID 770f4f2a-daa7-4c1e-be51-24adc04b4cb2) with arguments container-1234 will be retried again in 2s due to error 'Timeout::Error - Couldn't establish connection to cluster within 10s'. It is executed 1 times so far.
2
3Container::DeleteJob (JID 770f4f2a-daa7-4c1e-be51-24adc04b4cb2) with arguments container-1234 will be retried again in 2s due to error 'Timeout::Error - Couldn't establish connection to cluster within 10s'. It is executed 2 times so far.
retry_stopped.active_job

The retry_stopped.active_job hook is triggered when a job is retried up to the available number of attempts.

Let's see how this hook is triggered.

Along with the subscription for the enqueue_retry.active_job hook, let's subscribe to the retry_stopped.active_job hook, too.

1ActiveSupport::Notifications.subscribe "retry_stopped.active_job" do |*args|
2event = ActiveSupport::Notifications::Event.new *args
3payload = event.payload
4job = payload[:job]
5error = payload[:error]
6message = "Stopped processing #{job.class} (JID #{job.job_id})
7further with arguments #{job.arguments.join(', ')}
8since it failed due to '#{error.class} - #{error.message}' error
9which reoccurred #{job.executions} times.".squish
10
11BackgroundJob::Logger.log(message)
12end

Let's keep the Container::DeleteJob job's definition unchanged and enqueue it again.

1Container::DeleteJob.perform_now("container-1234")

We will assume that the job will keep throwing Timeout::Error exception due to a network issue.

In the logs recorded using BackgroundJob::Logger.log, we should see something like this.

1Container::DeleteJob (JID 770f4f2a-daa7-4c1e-be51-24adc04b4cb2) with arguments container-1234 will be retried again in 2s due to error 'Timeout::Error - Couldn't establish connection to cluster within 10s'. It is executed 1 times so far.
2
3Container::DeleteJob (JID 770f4f2a-daa7-4c1e-be51-24adc04b4cb2) with arguments container-1234 will be retried again in 2s due to error 'Timeout::Error - Couldn't establish connection to cluster within 10s'. It is executed 2 times so far.
4
5Stopped processing Container::DeleteJob (JID 770f4f2a-daa7-4c1e-be51-24adc04b4cb2) further with arguments container-1234 since it failed due to 'Timeout::Error - Couldn't establish connection to cluster within 10s' error which reoccurred 3 times.

Notice the last entry in the logs above and its order.

discard.active_job

The discard.active_job hook is triggered when a job's further execution is discarded due to occurrence of an exception which is configured using discard_on method.

To see how this hook is triggered, we will subscribe to the discard.active_job hook.

1ActiveSupport::Notifications.subscribe "discard.active_job" do |*args|
2event = ActiveSupport::Notifications::Event.new *args
3payload = event.payload
4job = payload[:job]
5error = payload[:error]
6message = "Discarded #{job.class} (JID #{job.job_id})
7with arguments #{job.arguments.join(', ')}
8due to '#{error.class} - #{error.message}' error.".squish
9
10BackgroundJob::Logger.log(message)
11end

We will configure our Container::DeleteJob job to discard when Container::NotFoundError exception occurs while executing the job.

1class Container::DeleteJob < ActiveJob::Base
2discard_on Container::NotFoundError
3retry_on Timeout::Error, wait: 2.seconds, attempts: 3
4
5def perform(container_id)
6Container::DeleteService(container_id).process
7
8    # Will raise Container::NotFoundError
9    # if no container is found with 'container_id'.
10
11    # Will raise Timeout::Error when the remote API is not responding.
12
13end
14end

Let's enqueue this job and assume that it would throw Container::NotFoundError exception.

1Container::DeleteJob.perform_now("unknown-container-9876")

We should see following in the logs recorded by BackgroundJob::Logger.log.

1Discarded Container::DeleteJob (JID e9b1cb5c-6d2d-49ae-b1d7-fef44f09ab8d) with arguments unknown-container-9876 due to 'Container::NotFoundError - Container 'unknown-container-9876' was not found' error.

Notes

  1. These new hooks are also instrumented for the jobs which are enqueued using perform_later method since both perform_now and perform_later calls perform method under the hood.

  2. Active Job already subscribes to the these hooks and writes them using the Rails' default logger.

  3. If a block is provided to retry_on or discard_on methods then an applicable hook is instrumented first and then the given block is yielded.

If this blog was helpful, check out our full blog archive.

Stay up to date with our blogs.

Subscribe to receive email notifications for new blog posts.