Class: Raif::ExpireStuckModelCompletionBatchesJob
- Inherits:
-
ApplicationJob
- Object
- ApplicationJob
- ApplicationJob
- Raif::ExpireStuckModelCompletionBatchesJob
- Defined in:
- app/jobs/raif/expire_stuck_model_completion_batches_job.rb
Overview
Hourly safety sweep for Raif::ModelCompletionBatch records whose polling chain went stale -- e.g. a scheduled job got dropped during a queue backend restart, or a long-running batch outlived our tolerance.
For each non-terminal batch with submitted_at older than Raif.config.model_completion_batch_max_age, calls #expire! (which issues a best-effort provider-side cancel before marking the batch failed locally), then dispatches the configured completion handler so any waiting workflow step can advance.
Should be scheduled hourly by the host app via whichever cron / recurring job mechanism the host's queue adapter uses.
Instance Method Summary collapse
Instance Method Details
#perform ⇒ Object
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'app/jobs/raif/expire_stuck_model_completion_batches_job.rb', line 18 def perform max_age = Raif.config.model_completion_batch_max_age cutoff = max_age.ago batches_table = Raif::ModelCompletionBatch.arel_table Raif::ModelCompletionBatch .non_terminal .where(batches_table[:submitted_at].not_eq(nil)) .where(batches_table[:submitted_at].lt(cutoff)) .find_each do |batch| batch.expire!(reason: "Batch exceeded Raif.config.model_completion_batch_max_age (#{max_age})") batch.dispatch_completion_handler! rescue StandardError => e # Per-batch rescue so a single bad handler (or a transient DB hiccup # mid-force_fail!) doesn't block expiry of every later batch in the # sweep. The next hourly tick re-enters and retries any batch that's # still non-terminal; a batch that expired locally but whose handler # raised will be picked up by the polling job's handler-retry path # via handler_dispatched_at gating. Raif.logger.error( "Raif::ExpireStuckModelCompletionBatchesJob: failed to expire batch ##{batch.id}: #{e.class}: #{e.}" ) Raif.logger.error(e.backtrace.first(20).join("\n")) if e.backtrace.present? if defined?(Airbrake) notice = Airbrake.build_notice(e) notice[:context][:component] = "raif_expire_stuck_model_completion_batches_job" notice[:params] = { batch_id: batch.id } Airbrake.notify(notice) end end end |