Class: Raif::Llm

Inherits:

Object

Object
Raif::Llm

show all

Includes:: ActiveModel::Model, Concerns::Llms::MessageFormatting

Defined in:: app/models/raif/llm.rb

Direct Known Subclasses

Raif::Llms::Anthropic, Raif::Llms::Bedrock, Raif::Llms::Google, Raif::Llms::OpenAiBase, Raif::Llms::OpenRouter, Raif::Llms::XAi

Constant Summary collapse

VALID_RESPONSE_FORMATS =

[:text, :json, :html].freeze

Instance Attribute Summary collapse

#api_name ⇒ Object
Returns the value of attribute api_name.
#default_max_completion_tokens ⇒ Object
Returns the value of attribute default_max_completion_tokens.
#default_temperature ⇒ Object
Returns the value of attribute default_temperature.
#display_name ⇒ Object
Returns the value of attribute display_name.
#input_token_cost ⇒ Object
Returns the value of attribute input_token_cost.
#key ⇒ Object
Returns the value of attribute key.
#output_token_cost ⇒ Object
Returns the value of attribute output_token_cost.
#provider_settings ⇒ Object
Returns the value of attribute provider_settings.
#supported_provider_managed_tools ⇒ Object
Returns the value of attribute supported_provider_managed_tools.
#supports_native_tool_use ⇒ Object (also: #supports_native_tool_use?)
Returns the value of attribute supports_native_tool_use.

Class Method Summary collapse

.batch_inference_cost_multiplier ⇒ Object
Multiplier applied to per-token costs when a model completion was resolved through this provider's Batch API.
.cache_creation_input_token_cost_multiplier ⇒ Object
Multiplier applied to the base input_token_cost to derive the per-token cost for cache creation writes.
.cache_read_input_token_cost_multiplier ⇒ Object
Multiplier applied to the base input_token_cost to derive the per-token cost for cache reads.
.prompt_tokens_include_cached_tokens? ⇒ Boolean
Override in subclasses to indicate whether prompt_tokens reported by the provider already include cached tokens as a subset (OpenAI, Google, OpenRouter) or whether cached tokens are reported separately and are additive to prompt_tokens (Anthropic, Bedrock).
.streaming_supported_for_key?(model_key) ⇒ Boolean
Whether streaming is supported for the given Raif model key.
.supports_batch_inference? ⇒ Boolean
Whether this provider supports submitting model completions via a Batch API.
.valid_response_formats ⇒ Object

Instance Method Summary collapse

#build_forced_tool_choice(tool_name) ⇒ Hash
Build the tool_choice parameter to force a specific tool to be called.
#build_pending_model_completion(messages:, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil, stream_response: false, allow_parallel_tool_calls: false, anthropic_prompt_caching_enabled: false, bedrock_prompt_caching_enabled: false, raif_model_completion_batch: nil, batch_custom_id: nil) ⇒ Raif::ModelCompletion
Builds and persists a Raif::ModelCompletion without performing the request.
#build_required_tool_choice ⇒ Hash, String
Build the tool_choice parameter to require the model to call any tool (but not a specific one).
#chat(message: nil, messages: nil, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil, allow_parallel_tool_calls: false, anthropic_prompt_caching_enabled: false, bedrock_prompt_caching_enabled: false, &block) ⇒ Object
#initialize(key:, api_name:, display_name: nil, model_provider_settings: {}, supported_provider_managed_tools: [], supports_native_tool_use: true, temperature: nil, max_completion_tokens: nil, input_token_cost: nil, output_token_cost: nil) ⇒ Llm constructor
A new instance of Llm.
#name ⇒ Object
#perform_model_completion!(model_completion, &block) ⇒ Object
#streaming_supported? ⇒ Boolean
#supports_batch_inference? ⇒ Boolean
Instance-level shortcut for the class-level predicate so callers can use the idiomatic Raif.llm(:some_key).supports_batch_inference? form instead of reaching through to the class.
#supports_faithful_required_tool_choice?(available_model_tools) ⇒ Boolean
Whether the provider can faithfully enforce tool_choice: :required for the given tool set.
#supports_parallel_tool_calls? ⇒ Boolean
Whether this model can handle being asked to make multiple tool calls in a single response.
#supports_provider_managed_tool?(tool_klass) ⇒ Boolean
#validate_provider_managed_tool_support!(tool) ⇒ Object

Constructor Details

#initialize(key:, api_name:, display_name: nil, model_provider_settings: {}, supported_provider_managed_tools: [], supports_native_tool_use: true, temperature: nil, max_completion_tokens: nil, input_token_cost: nil, output_token_cost: nil) ⇒ `Llm`

Returns a new instance of Llm.

# File 'app/models/raif/llm.rb', line 26

def initialize(
  key:,
  api_name:,
  display_name: nil,
  model_provider_settings: {},
  supported_provider_managed_tools: [],
  supports_native_tool_use: true,
  temperature: nil,
  max_completion_tokens: nil,
  input_token_cost: nil,
  output_token_cost: nil
)
  @key = key
  @api_name = api_name
  @display_name = display_name
  @provider_settings = model_provider_settings
  @supports_native_tool_use = supports_native_tool_use
  @default_temperature = temperature || 0.7
  @default_max_completion_tokens = max_completion_tokens
  @input_token_cost = input_token_cost
  @output_token_cost = output_token_cost
  @supported_provider_managed_tools = supported_provider_managed_tools.map(&:to_s)
end

Instance Attribute Details

#api_name ⇒ `Object`

Returns the value of attribute api_name.



8
9
10

# File 'app/models/raif/llm.rb', line 8

def api_name
  @api_name
end

#default_max_completion_tokens ⇒ `Object`

Returns the value of attribute default_max_completion_tokens.



8
9
10

# File 'app/models/raif/llm.rb', line 8

def default_max_completion_tokens
  @default_max_completion_tokens
end

#default_temperature ⇒ `Object`

Returns the value of attribute default_temperature.



8
9
10

# File 'app/models/raif/llm.rb', line 8

def default_temperature
  @default_temperature
end

#display_name ⇒ `Object`

Returns the value of attribute display_name.



8
9
10

# File 'app/models/raif/llm.rb', line 8

def display_name
  @display_name
end

#input_token_cost ⇒ `Object`

Returns the value of attribute input_token_cost.



8
9
10

# File 'app/models/raif/llm.rb', line 8

def input_token_cost
  @input_token_cost
end

#key ⇒ `Object`

Returns the value of attribute key.



8
9
10

# File 'app/models/raif/llm.rb', line 8

def key
  @key
end

#output_token_cost ⇒ `Object`

Returns the value of attribute output_token_cost.



8
9
10

# File 'app/models/raif/llm.rb', line 8

def output_token_cost
  @output_token_cost
end

#provider_settings ⇒ `Object`

Returns the value of attribute provider_settings.



8
9
10

# File 'app/models/raif/llm.rb', line 8

def provider_settings
  @provider_settings
end

#supported_provider_managed_tools ⇒ `Object`

Returns the value of attribute supported_provider_managed_tools.



8
9
10

# File 'app/models/raif/llm.rb', line 8

def supported_provider_managed_tools
  @supported_provider_managed_tools
end

#supports_native_tool_use ⇒ `Object` Also known as: supports_native_tool_use?

Returns the value of attribute supports_native_tool_use.



8
9
10

# File 'app/models/raif/llm.rb', line 8

def supports_native_tool_use
  @supports_native_tool_use
end

Class Method Details

.batch_inference_cost_multiplier ⇒ `Object`

Multiplier applied to per-token costs when a model completion was resolved through this provider's Batch API. Defaults to 0.5 (50% discount), which is what both Anthropic and OpenAI charge for batch requests today.



258
259
260

# File 'app/models/raif/llm.rb', line 258

def self.batch_inference_cost_multiplier
  0.5
end

.cache_creation_input_token_cost_multiplier ⇒ `Object`

Multiplier applied to the base input_token_cost to derive the per-token cost for cache creation writes. Return nil when there is no write surcharge.



237
238
239

# File 'app/models/raif/llm.rb', line 237

def self.cache_creation_input_token_cost_multiplier
  nil
end

.cache_read_input_token_cost_multiplier ⇒ `Object`

Multiplier applied to the base input_token_cost to derive the per-token cost for cache reads. Return nil when the provider has no cache pricing.



231
232
233

# File 'app/models/raif/llm.rb', line 231

def self.cache_read_input_token_cost_multiplier
  nil
end

.prompt_tokens_include_cached_tokens? ⇒ `Boolean`

Override in subclasses to indicate whether prompt_tokens reported by the provider already include cached tokens as a subset (OpenAI, Google, OpenRouter) or whether cached tokens are reported separately and are additive to prompt_tokens (Anthropic, Bedrock).

Returns:

(Boolean)



225
226
227

# File 'app/models/raif/llm.rb', line 225

def self.prompt_tokens_include_cached_tokens?
  true
end

.streaming_supported_for_key?(model_key) ⇒ `Boolean`

Whether streaming is supported for the given Raif model key. A model key is considered unsupported if it matches any entry in Raif.config.streaming_unsupported_model_keys (each entry may be a String, Symbol, or Regexp). Used by #chat to transparently fall back to the non-streaming path for models with known-broken streaming endpoints.

Returns:

(Boolean)

# File 'app/models/raif/llm.rb', line 59

def self.streaming_supported_for_key?(model_key)
  entries = Array(Raif.config.streaming_unsupported_model_keys)
  key_str = model_key.to_s
  entries.none? do |entry|
    case entry
    when Regexp then entry.match?(key_str)
    else entry.to_s == key_str
    end
  end
end

.supports_batch_inference? ⇒ `Boolean`

Whether this provider supports submitting model completions via a Batch API. Override in subclasses by including Raif::Concerns::Llms::SupportsBatchInference, which sets this to true.

Returns:

(Boolean)



244
245
246

# File 'app/models/raif/llm.rb', line 244

def self.supports_batch_inference?
  false
end

.valid_response_formats ⇒ `Object`



217
218
219

# File 'app/models/raif/llm.rb', line 217

def self.valid_response_formats
  VALID_RESPONSE_FORMATS
end

Instance Method Details

#build_forced_tool_choice(tool_name) ⇒ `Hash`

Build the tool_choice parameter to force a specific tool to be called. Each provider implements this to return the correct format.

Parameters:

tool_name (String) —
The name of the tool to force

Returns:

(Hash) —
The tool_choice parameter for the provider's API

Raises:

(NotImplementedError)



270
271
272

# File 'app/models/raif/llm.rb', line 270

def build_forced_tool_choice(tool_name)
  raise NotImplementedError, "#{self.class.name} must implement #build_forced_tool_choice"
end

#build_pending_model_completion(messages:, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil, stream_response: false, allow_parallel_tool_calls: false, anthropic_prompt_caching_enabled: false, bedrock_prompt_caching_enabled: false, raif_model_completion_batch: nil, batch_custom_id: nil) ⇒ `Raif::ModelCompletion`

Builds and persists a Raif::ModelCompletion without performing the request. Used by #chat (which then calls perform_model_completion!) and by callers that want to defer execution -- e.g. submitting through a provider Batch API via Raif::Task.build_for_batch / Raif::Task#prepare_for_batch!.

Returns:

(Raif::ModelCompletion) —
persisted, with started_at: nil

# File 'app/models/raif/llm.rb', line 188

def build_pending_model_completion(messages:, response_format: :text, available_model_tools: [], source: nil,
  system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil,
  stream_response: false, allow_parallel_tool_calls: false, anthropic_prompt_caching_enabled: false,
  bedrock_prompt_caching_enabled: false, raif_model_completion_batch: nil, batch_custom_id: nil)
  temperature ||= default_temperature
  max_completion_tokens ||= default_max_completion_tokens

  model_completion = Raif::ModelCompletion.create!(
    messages: format_messages(messages),
    system_prompt: system_prompt,
    response_format: response_format,
    source: source,
    llm_model_key: key.to_s,
    model_api_name: api_name,
    temperature: temperature,
    max_completion_tokens: max_completion_tokens,
    available_model_tools: available_model_tools,
    tool_choice: tool_choice&.to_s,
    stream_response: stream_response,
    raif_model_completion_batch: raif_model_completion_batch,
    batch_custom_id: batch_custom_id
  )

  model_completion.allow_parallel_tool_calls = allow_parallel_tool_calls
  model_completion.anthropic_prompt_caching_enabled = anthropic_prompt_caching_enabled
  model_completion.bedrock_prompt_caching_enabled = bedrock_prompt_caching_enabled
  model_completion
end

#build_required_tool_choice ⇒ `Hash`, `String`

Build the tool_choice parameter to require the model to call any tool (but not a specific one). Each provider implements this to return the correct format.

Returns:

(Hash, String) —
The tool_choice parameter for the provider's API

Raises:

(NotImplementedError)



277
278
279

# File 'app/models/raif/llm.rb', line 277

def build_required_tool_choice
  raise NotImplementedError, "#{self.class.name} must implement #build_required_tool_choice"
end

#chat(message: nil, messages: nil, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil, allow_parallel_tool_calls: false, anthropic_prompt_caching_enabled: false, bedrock_prompt_caching_enabled: false, &block) ⇒ `Object`

# File 'app/models/raif/llm.rb', line 74

def chat(message: nil, messages: nil, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil,
  max_completion_tokens: nil, tool_choice: nil, allow_parallel_tool_calls: false, anthropic_prompt_caching_enabled: false,
  bedrock_prompt_caching_enabled: false, &block)
  unless response_format.is_a?(Symbol)
    raise ArgumentError,
      "Raif::Llm#chat - Invalid response format: #{response_format}. Must be a symbol (you passed #{response_format.class}) and be one of: #{VALID_RESPONSE_FORMATS.join(", ")}" # rubocop:disable Layout/LineLength
  end

  unless VALID_RESPONSE_FORMATS.include?(response_format)
    raise ArgumentError, "Raif::Llm#chat - Invalid response format: #{response_format}. Must be one of: #{VALID_RESPONSE_FORMATS.join(", ")}"
  end

  unless message.present? || messages.present?
    raise ArgumentError, "Raif::Llm#chat - You must provide either a message: or messages: argument"
  end

  if message.present? && messages.present?
    raise ArgumentError, "Raif::Llm#chat - You must provide either a message: or messages: argument, not both"
  end

  # Normalize :required / "required" to the symbol form for validation
  tool_choice = :required if tool_choice.to_s == "required"

  if tool_choice == :required
    if available_model_tools.blank?
      raise ArgumentError,
        "Raif::Llm#chat - tool_choice: :required requires at least one available model tool"
    end
  elsif tool_choice.present? && !available_model_tools.map(&:to_s).include?(tool_choice.to_s)
    raise ArgumentError,
      "Raif::Llm#chat - Invalid tool choice: #{tool_choice} is not included in the available model tools: #{available_model_tools.join(", ")}"
  end

  # Runs before the ModelCompletion is created or any provider call is made,
  # and before the llm_api_requests_enabled guard so authorization applies
  # even when API requests are disabled. Vetoes by raising. Any raised
  # exception is tagged with Raif::Errors::ModelCompletionAuthorizationError
  # so wrapped flows (Raif::Task.run, Raif::Conversation) re-raise it to the
  # caller instead of swallowing it as an ordinary model failure.
  if Raif.config.model_completion_authorizer
    begin
      Raif.config.model_completion_authorizer.call(llm: self, source: source)
    rescue StandardError => e
      e.extend(Raif::Errors::ModelCompletionAuthorizationError) unless e.is_a?(Raif::Errors::ModelCompletionAuthorizationError)
      raise
    end
  end

  unless Raif.config.llm_api_requests_enabled
    Raif.logger.warn("LLM API requests are disabled. Skipping request to #{api_name}.")
    return
  end

  messages = [{ "role" => "user", "content" => message }] if message.present?

  temperature ||= default_temperature
  max_completion_tokens ||= default_max_completion_tokens

  stream_response = block_given? && streaming_supported?
  if block_given? && !stream_response
    Raif.logger.info(
      "Raif::Llm#chat: streaming requested but disabled for model key #{key.inspect} " \
        "via Raif.config.streaming_unsupported_model_keys; falling back to non-streaming."
    )
  end

  model_completion = build_pending_model_completion(
    messages: messages,
    response_format: response_format,
    available_model_tools: available_model_tools,
    source: source,
    system_prompt: system_prompt,
    temperature: temperature,
    max_completion_tokens: max_completion_tokens,
    tool_choice: tool_choice,
    stream_response: stream_response,
    allow_parallel_tool_calls: allow_parallel_tool_calls,
    anthropic_prompt_caching_enabled: anthropic_prompt_caching_enabled,
    bedrock_prompt_caching_enabled: bedrock_prompt_caching_enabled
  )

  model_completion.started!

  retry_with_backoff(model_completion) do
    perform_model_completion!(model_completion, &block)
    ensure_model_completion_present!(model_completion)
  end

  model_completion.completed!
  model_completion
rescue Raif::Errors::StreamingError => e
  Rails.logger.error("Raif streaming error -- code: #{e.code} -- type: #{e.type} -- message: #{e.message} -- event: #{e.event}")
  model_completion&.record_failure!(e) unless model_completion&.failed?
  raise e
rescue Faraday::Error => e
  Raif.logger.error("LLM API request failed (status: #{e.response_status}): #{e.message}")
  Raif.logger.error(e.response_body)
  model_completion&.record_failure!(e) unless model_completion&.failed?
  raise e
rescue StandardError => e
  model_completion&.record_failure!(e) unless model_completion&.failed?
  raise e
end

#name ⇒ `Object`



50
51
52

# File 'app/models/raif/llm.rb', line 50

def name
  I18n.t("raif.model_names.#{key}", default: display_name || key.to_s.humanize)
end

#perform_model_completion!(model_completion, &block) ⇒ `Object`

Raises:

(NotImplementedError)



178
179
180

# File 'app/models/raif/llm.rb', line 178

def perform_model_completion!(model_completion, &block)
  raise NotImplementedError, "#{self.class.name} must implement #perform_model_completion!"
end

#streaming_supported? ⇒ `Boolean`

Returns:

(Boolean)



70
71
72

# File 'app/models/raif/llm.rb', line 70

def streaming_supported?
  self.class.streaming_supported_for_key?(key)
end

#supports_batch_inference? ⇒ `Boolean`

Instance-level shortcut for the class-level predicate so callers can use the idiomatic Raif.llm(:some_key).supports_batch_inference? form instead of reaching through to the class.

Returns:

(Boolean)



251
252
253

# File 'app/models/raif/llm.rb', line 251

def supports_batch_inference?
  self.class.supports_batch_inference?
end

#supports_faithful_required_tool_choice?(available_model_tools) ⇒ `Boolean`

Whether the provider can faithfully enforce tool_choice: :required for the given tool set. Override in subclasses when a provider can only enforce required tool use for some tool types.

Returns:

(Boolean)



284
285
286

# File 'app/models/raif/llm.rb', line 284

def supports_faithful_required_tool_choice?(available_model_tools)
  available_model_tools.present?
end

#supports_parallel_tool_calls? ⇒ `Boolean`

Whether this model can handle being asked to make multiple tool calls in a single response. Override (per provider or per model key) to return false for models that reject the parallel-tool-call request parameter or that produce worse results when allowed to batch. Agents consult this before enabling parallel tool calls; when false they fall back to one tool call per step.

Returns:

(Boolean)



293
294
295

# File 'app/models/raif/llm.rb', line 293

def supports_parallel_tool_calls?
  true
end

#supports_provider_managed_tool?(tool_klass) ⇒ `Boolean`

Returns:

(Boolean)



262
263
264

# File 'app/models/raif/llm.rb', line 262

def supports_provider_managed_tool?(tool_klass)
  supported_provider_managed_tools&.include?(tool_klass.to_s)
end

#validate_provider_managed_tool_support!(tool) ⇒ `Object`

# File 'app/models/raif/llm.rb', line 297

def validate_provider_managed_tool_support!(tool)
  unless supports_provider_managed_tool?(tool)
    raise Raif::Errors::UnsupportedFeatureError,
      "Invalid provider-managed tool: #{tool.name} for #{key}"
  end
end

Class: Raif::Llm

Direct Known Subclasses

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(key:, api_name:, display_name: nil, model_provider_settings: {}, supported_provider_managed_tools: [], supports_native_tool_use: true, temperature: nil, max_completion_tokens: nil, input_token_cost: nil, output_token_cost: nil) ⇒ Llm

Instance Attribute Details

#api_name ⇒ Object

#default_max_completion_tokens ⇒ Object

#default_temperature ⇒ Object

#display_name ⇒ Object

#input_token_cost ⇒ Object

#key ⇒ Object

#output_token_cost ⇒ Object

#provider_settings ⇒ Object

#supported_provider_managed_tools ⇒ Object

#supports_native_tool_use ⇒ Object Also known as: supports_native_tool_use?

Class Method Details

.batch_inference_cost_multiplier ⇒ Object

.cache_creation_input_token_cost_multiplier ⇒ Object

.cache_read_input_token_cost_multiplier ⇒ Object

.prompt_tokens_include_cached_tokens? ⇒ Boolean

.streaming_supported_for_key?(model_key) ⇒ Boolean

.supports_batch_inference? ⇒ Boolean

.valid_response_formats ⇒ Object

Instance Method Details

#build_forced_tool_choice(tool_name) ⇒ Hash

#build_required_tool_choice ⇒ Hash, String

#name ⇒ Object

#perform_model_completion!(model_completion, &block) ⇒ Object

#streaming_supported? ⇒ Boolean

#supports_batch_inference? ⇒ Boolean

#supports_faithful_required_tool_choice?(available_model_tools) ⇒ Boolean

#supports_parallel_tool_calls? ⇒ Boolean

#supports_provider_managed_tool?(tool_klass) ⇒ Boolean

#validate_provider_managed_tool_support!(tool) ⇒ Object

#initialize(key:, api_name:, display_name: nil, model_provider_settings: {}, supported_provider_managed_tools: [], supports_native_tool_use: true, temperature: nil, max_completion_tokens: nil, input_token_cost: nil, output_token_cost: nil) ⇒ `Llm`

#api_name ⇒ `Object`

#default_max_completion_tokens ⇒ `Object`

#default_temperature ⇒ `Object`

#display_name ⇒ `Object`

#input_token_cost ⇒ `Object`

#key ⇒ `Object`

#output_token_cost ⇒ `Object`

#provider_settings ⇒ `Object`

#supported_provider_managed_tools ⇒ `Object`

#supports_native_tool_use ⇒ `Object` Also known as: supports_native_tool_use?

.batch_inference_cost_multiplier ⇒ `Object`

.cache_creation_input_token_cost_multiplier ⇒ `Object`

.cache_read_input_token_cost_multiplier ⇒ `Object`

.prompt_tokens_include_cached_tokens? ⇒ `Boolean`

.streaming_supported_for_key?(model_key) ⇒ `Boolean`

.supports_batch_inference? ⇒ `Boolean`

.valid_response_formats ⇒ `Object`

#build_forced_tool_choice(tool_name) ⇒ `Hash`

#build_required_tool_choice ⇒ `Hash`, `String`

#name ⇒ `Object`

#perform_model_completion!(model_completion, &block) ⇒ `Object`

#streaming_supported? ⇒ `Boolean`

#supports_batch_inference? ⇒ `Boolean`

#supports_faithful_required_tool_choice?(available_model_tools) ⇒ `Boolean`

#supports_parallel_tool_calls? ⇒ `Boolean`

#supports_provider_managed_tool?(tool_klass) ⇒ `Boolean`

#validate_provider_managed_tool_support!(tool) ⇒ `Object`