Class: Raif::ResumeStalledModelCompletionBatchPollsJob

Inherits:

ApplicationJob

Object
ApplicationJob
ApplicationJob
Raif::ResumeStalledModelCompletionBatchPollsJob

show all

Defined in:: app/jobs/raif/resume_stalled_model_completion_batch_polls_job.rb

Overview

Recovery sweep for Raif::ModelCompletionBatch records whose self-rescheduling poll chain (Raif::PollModelCompletionBatchJob) was dropped -- e.g. a scheduled job evicted on a queue backend restart, an ActiveJob retry ceiling reached, or a deploy that drained the queue before the next poll fired. Without recovery, such batches sit non-terminal until the hourly Raif::ExpireStuckModelCompletionBatchesJob force-fails them at max_age, discarding any results the provider may have produced in the meantime.

For each non-terminal batch whose next_poll_at is in the past by at least POLL_GRACE, this sweep enqueues a fresh Raif::PollModelCompletionBatchJob. That job is idempotent at the top (terminal? check + handler-dispatched gating), and batch.fetch_status! is a read against the provider, so a concurrent normally-firing poll plus this sweep at most causes a duplicate provider status request.

Pairs with Raif::ExpireStuckModelCompletionBatchesJob to form a recover-then-expire pattern: host apps should schedule this sweep frequently (every ~5 minutes) and the expire sweep hourly. The resume sweep tries to reclaim results before the expire sweep declares the batch lost.

Constant Summary collapse

POLL_GRACE = Skip batches whose next_poll_at landed within this window. A poll job that fires at exactly next_poll_at takes a moment to call reschedule!, so a too-small grace would race the normally-firing chain. 5 minutes is comfortably outside the tightest entry in the default poll schedule (60s) without leaving stranded batches unattended for long.

5.minutes

Instance Method Summary collapse

#perform ⇒ Object

Instance Method Details

#perform ⇒ `Object`