July 23, 2014
Recently in one of our projects,
we experienced some strange errors from
Delayed::Job.
Delayed::Job
workers started successfully,
but when they were starting to lock the jobs, workers failed
with PG::Error: no connection to server
or
PG::Error: FATAL: invalid frontend message type 60
errors.
After some search, we found there had been such issues already experienced by others (Link is not available) .
We started isolating the problem and digging through the recent changes we had made to the project. Since the last release the only significant modification had been made to internationalization. We had started using I18n-active_record .
# config/initializers/locale.rb
require 'i18n/backend/active_record'
Translation = I18n::Backend::ActiveRecord::Translation
if (ActiveRecord::Base.connected? && Translation.table_exists?) ||
in_delayed_job_process?
I18n.backend = I18n::Backend::ActiveRecord.new
I18n::Backend::ActiveRecord.send(:include, I18n::Backend::Memoize)
I18n::Backend::ActiveRecord.send(:include, I18n::Backend::Flatten)
I18n::Backend::Simple.send(:include, I18n::Backend::Memoize)
I18n::Backend::Simple.send(:include, I18n::Backend::Pluralization)
I18n.backend = I18n::Backend::Chain.new(I18n::Backend::Simple.new, I18n.backend)
end
for Delayed Job we had extra check as
def in_delayed_job_process?
executable_name = File.basename $0
arguments = $\*
rake_args_regex = /\Ajobs:/
(executable_name == 'delayed_job') || (executable_name == 'rake' && arguments.find{ |v| v =~ rake_args_regex })
end
After some serious searching and digging through both Delayed::Job
source code and how we were using to setup its config, we started noticing some issues.
The first thing we found was that the problem did not turn up when delayed job workers were started using rake jobs:work
task.
After looking at DelayedJob internals we found that the main difference between a rake task and a binstub was in the fork
method that was invoked in the binstub version.
The binstub version was being executed seamlessly using Daemons#run_process
method and had a slightly different lifecycle of execution.
Let's take a look into DelayedJob internals before proceeding. DelayedJob has systems of the hooks that can be used by plugin-writers and in our applications.
All this events functionality is hidden in Delayed::Lifecycle
class. Each worker has its own instance of that class.
So, which events exactly do we have here?
Job-related events:
:enqueue
:perform
:error
:failure
:invoke_job
Worker-related events:
:execute
:loop
:perform
:error
:failure
You can setup callbacks to be run on before
, after
or around
events simply using Delayed::Worker.lifecycle.before
,
Delayed::Worker.lifecycle.after
and Delayed::Worker.lifecycle.around
methods.
Let's move on to our problem. It turned out that
delayed job active record gem was closing all
database connections in before_fork
hook and reestablishing them in after_fork
hook.
It was clear that I18n-active-record did not play well with this, causing the issue at hand.
We looked into DelayedJob lifecycle and chose before :execute
hook, which was executed after all DelayedJob ActiveRecord backend connections manipulations.
Finally the locales initializer for delayed_job workers was changed to match as below:
require 'i18n/backend/active_record'
Translation = I18n::Backend::ActiveRecord::Translation
Delayed::Worker.lifecycle.before :execute do
if (ActiveRecord::Base.connected? && Translation.table_exists?) || in_delayed_job_process?
I18n.backend = I18n::Backend::ActiveRecord.new
I18n::Backend::ActiveRecord.send(:include, I18n::Backend::Memoize)
I18n::Backend::ActiveRecord.send(:include, I18n::Backend::Flatten)
I18n::Backend::Simple.send(:include, I18n::Backend::Memoize)
I18n::Backend::Simple.send(:include, I18n::Backend::Pluralization)
I18n.backend = I18n::Backend::Chain.new(I18n::Backend::Simple.new, I18n.backend)
end
end
This helped us to mitigate the connection errors, and connections stopped dying abruptly.
If this blog was helpful, check out our full blog archive.