Class: Raif::Llm
- Inherits:
-
Object
- Object
- Raif::Llm
- Includes:
- ActiveModel::Model, Concerns::Llms::MessageFormatting
- Defined in:
- app/models/raif/llm.rb
Direct Known Subclasses
Raif::Llms::Anthropic, Raif::Llms::Bedrock, Raif::Llms::Google, Raif::Llms::OpenAiBase, Raif::Llms::OpenRouter, Raif::Llms::XAi
Constant Summary collapse
- VALID_RESPONSE_FORMATS =
[:text, :json, :html].freeze
Instance Attribute Summary collapse
-
#api_name ⇒ Object
Returns the value of attribute api_name.
-
#default_max_completion_tokens ⇒ Object
Returns the value of attribute default_max_completion_tokens.
-
#default_temperature ⇒ Object
Returns the value of attribute default_temperature.
-
#display_name ⇒ Object
Returns the value of attribute display_name.
-
#input_token_cost ⇒ Object
Returns the value of attribute input_token_cost.
-
#key ⇒ Object
Returns the value of attribute key.
-
#output_token_cost ⇒ Object
Returns the value of attribute output_token_cost.
-
#provider_settings ⇒ Object
Returns the value of attribute provider_settings.
-
#supported_provider_managed_tools ⇒ Object
Returns the value of attribute supported_provider_managed_tools.
-
#supports_native_tool_use ⇒ Object
(also: #supports_native_tool_use?)
Returns the value of attribute supports_native_tool_use.
Class Method Summary collapse
-
.batch_inference_cost_multiplier ⇒ Object
Multiplier applied to per-token costs when a model completion was resolved through this provider's Batch API.
-
.cache_creation_input_token_cost_multiplier ⇒ Object
Multiplier applied to the base input_token_cost to derive the per-token cost for cache creation writes.
-
.cache_read_input_token_cost_multiplier ⇒ Object
Multiplier applied to the base input_token_cost to derive the per-token cost for cache reads.
-
.prompt_tokens_include_cached_tokens? ⇒ Boolean
Override in subclasses to indicate whether prompt_tokens reported by the provider already include cached tokens as a subset (OpenAI, Google, OpenRouter) or whether cached tokens are reported separately and are additive to prompt_tokens (Anthropic, Bedrock).
-
.streaming_supported_for_key?(model_key) ⇒ Boolean
Whether streaming is supported for the given Raif model key.
-
.supports_batch_inference? ⇒ Boolean
Whether this provider supports submitting model completions via a Batch API.
- .valid_response_formats ⇒ Object
Instance Method Summary collapse
-
#build_forced_tool_choice(tool_name) ⇒ Hash
Build the tool_choice parameter to force a specific tool to be called.
-
#build_pending_model_completion(messages:, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil, stream_response: false, anthropic_prompt_caching_enabled: false, bedrock_prompt_caching_enabled: false, raif_model_completion_batch: nil, batch_custom_id: nil) ⇒ Raif::ModelCompletion
Builds and persists a Raif::ModelCompletion without performing the request.
-
#build_required_tool_choice ⇒ Hash, String
Build the tool_choice parameter to require the model to call any tool (but not a specific one).
- #chat(message: nil, messages: nil, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil, anthropic_prompt_caching_enabled: false, bedrock_prompt_caching_enabled: false, &block) ⇒ Object
-
#initialize(key:, api_name:, display_name: nil, model_provider_settings: {}, supported_provider_managed_tools: [], supports_native_tool_use: true, temperature: nil, max_completion_tokens: nil, input_token_cost: nil, output_token_cost: nil) ⇒ Llm
constructor
A new instance of Llm.
- #name ⇒ Object
- #perform_model_completion!(model_completion, &block) ⇒ Object
- #streaming_supported? ⇒ Boolean
-
#supports_batch_inference? ⇒ Boolean
Instance-level shortcut for the class-level predicate so callers can use the idiomatic Raif.llm(:some_key).supports_batch_inference? form instead of reaching through to the class.
-
#supports_faithful_required_tool_choice?(available_model_tools) ⇒ Boolean
Whether the provider can faithfully enforce tool_choice: :required for the given tool set.
- #supports_provider_managed_tool?(tool_klass) ⇒ Boolean
- #validate_provider_managed_tool_support!(tool) ⇒ Object
Constructor Details
#initialize(key:, api_name:, display_name: nil, model_provider_settings: {}, supported_provider_managed_tools: [], supports_native_tool_use: true, temperature: nil, max_completion_tokens: nil, input_token_cost: nil, output_token_cost: nil) ⇒ Llm
Returns a new instance of Llm.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'app/models/raif/llm.rb', line 26 def initialize( key:, api_name:, display_name: nil, model_provider_settings: {}, supported_provider_managed_tools: [], supports_native_tool_use: true, temperature: nil, max_completion_tokens: nil, input_token_cost: nil, output_token_cost: nil ) @key = key @api_name = api_name @display_name = display_name @provider_settings = model_provider_settings @supports_native_tool_use = supports_native_tool_use @default_temperature = temperature || 0.7 @default_max_completion_tokens = max_completion_tokens @input_token_cost = input_token_cost @output_token_cost = output_token_cost @supported_provider_managed_tools = supported_provider_managed_tools.map(&:to_s) end |
Instance Attribute Details
#api_name ⇒ Object
Returns the value of attribute api_name.
8 9 10 |
# File 'app/models/raif/llm.rb', line 8 def api_name @api_name end |
#default_max_completion_tokens ⇒ Object
Returns the value of attribute default_max_completion_tokens.
8 9 10 |
# File 'app/models/raif/llm.rb', line 8 def default_max_completion_tokens @default_max_completion_tokens end |
#default_temperature ⇒ Object
Returns the value of attribute default_temperature.
8 9 10 |
# File 'app/models/raif/llm.rb', line 8 def default_temperature @default_temperature end |
#display_name ⇒ Object
Returns the value of attribute display_name.
8 9 10 |
# File 'app/models/raif/llm.rb', line 8 def display_name @display_name end |
#input_token_cost ⇒ Object
Returns the value of attribute input_token_cost.
8 9 10 |
# File 'app/models/raif/llm.rb', line 8 def input_token_cost @input_token_cost end |
#key ⇒ Object
Returns the value of attribute key.
8 9 10 |
# File 'app/models/raif/llm.rb', line 8 def key @key end |
#output_token_cost ⇒ Object
Returns the value of attribute output_token_cost.
8 9 10 |
# File 'app/models/raif/llm.rb', line 8 def output_token_cost @output_token_cost end |
#provider_settings ⇒ Object
Returns the value of attribute provider_settings.
8 9 10 |
# File 'app/models/raif/llm.rb', line 8 def provider_settings @provider_settings end |
#supported_provider_managed_tools ⇒ Object
Returns the value of attribute supported_provider_managed_tools.
8 9 10 |
# File 'app/models/raif/llm.rb', line 8 def supported_provider_managed_tools @supported_provider_managed_tools end |
#supports_native_tool_use ⇒ Object Also known as: supports_native_tool_use?
Returns the value of attribute supports_native_tool_use.
8 9 10 |
# File 'app/models/raif/llm.rb', line 8 def supports_native_tool_use @supports_native_tool_use end |
Class Method Details
.batch_inference_cost_multiplier ⇒ Object
Multiplier applied to per-token costs when a model completion was resolved through this provider's Batch API. Defaults to 0.5 (50% discount), which is what both Anthropic and OpenAI charge for batch requests today.
240 241 242 |
# File 'app/models/raif/llm.rb', line 240 def self.batch_inference_cost_multiplier 0.5 end |
.cache_creation_input_token_cost_multiplier ⇒ Object
Multiplier applied to the base input_token_cost to derive the per-token cost for cache creation writes. Return nil when there is no write surcharge.
219 220 221 |
# File 'app/models/raif/llm.rb', line 219 def self.cache_creation_input_token_cost_multiplier nil end |
.cache_read_input_token_cost_multiplier ⇒ Object
Multiplier applied to the base input_token_cost to derive the per-token cost for cache reads. Return nil when the provider has no cache pricing.
213 214 215 |
# File 'app/models/raif/llm.rb', line 213 def self.cache_read_input_token_cost_multiplier nil end |
.prompt_tokens_include_cached_tokens? ⇒ Boolean
Override in subclasses to indicate whether prompt_tokens reported by the provider already include cached tokens as a subset (OpenAI, Google, OpenRouter) or whether cached tokens are reported separately and are additive to prompt_tokens (Anthropic, Bedrock).
207 208 209 |
# File 'app/models/raif/llm.rb', line 207 def self.prompt_tokens_include_cached_tokens? true end |
.streaming_supported_for_key?(model_key) ⇒ Boolean
Whether streaming is supported for the given Raif model key. A model key is considered unsupported if it matches any entry in Raif.config.streaming_unsupported_model_keys (each entry may be a String, Symbol, or Regexp). Used by #chat to transparently fall back to the non-streaming path for models with known-broken streaming endpoints.
59 60 61 62 63 64 65 66 67 68 |
# File 'app/models/raif/llm.rb', line 59 def self.streaming_supported_for_key?(model_key) entries = Array(Raif.config.streaming_unsupported_model_keys) key_str = model_key.to_s entries.none? do |entry| case entry when Regexp then entry.match?(key_str) else entry.to_s == key_str end end end |
.supports_batch_inference? ⇒ Boolean
Whether this provider supports submitting model completions via a Batch API. Override in subclasses by including Raif::Concerns::Llms::SupportsBatchInference, which sets this to true.
226 227 228 |
# File 'app/models/raif/llm.rb', line 226 def self.supports_batch_inference? false end |
.valid_response_formats ⇒ Object
199 200 201 |
# File 'app/models/raif/llm.rb', line 199 def self.valid_response_formats VALID_RESPONSE_FORMATS end |
Instance Method Details
#build_forced_tool_choice(tool_name) ⇒ Hash
Build the tool_choice parameter to force a specific tool to be called. Each provider implements this to return the correct format.
252 253 254 |
# File 'app/models/raif/llm.rb', line 252 def build_forced_tool_choice(tool_name) raise NotImplementedError, "#{self.class.name} must implement #build_forced_tool_choice" end |
#build_pending_model_completion(messages:, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil, stream_response: false, anthropic_prompt_caching_enabled: false, bedrock_prompt_caching_enabled: false, raif_model_completion_batch: nil, batch_custom_id: nil) ⇒ Raif::ModelCompletion
Builds and persists a Raif::ModelCompletion without performing the request. Used by #chat (which then calls perform_model_completion!) and by callers that want to defer execution -- e.g. submitting through a provider Batch API via Raif::Task.build_for_batch / Raif::Task#prepare_for_batch!.
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'app/models/raif/llm.rb', line 171 def build_pending_model_completion(messages:, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil, stream_response: false, anthropic_prompt_caching_enabled: false, bedrock_prompt_caching_enabled: false, raif_model_completion_batch: nil, batch_custom_id: nil) temperature ||= default_temperature max_completion_tokens ||= default_max_completion_tokens model_completion = Raif::ModelCompletion.create!( messages: (), system_prompt: system_prompt, response_format: response_format, source: source, llm_model_key: key.to_s, model_api_name: api_name, temperature: temperature, max_completion_tokens: max_completion_tokens, available_model_tools: available_model_tools, tool_choice: tool_choice&.to_s, stream_response: stream_response, raif_model_completion_batch: raif_model_completion_batch, batch_custom_id: batch_custom_id ) model_completion.anthropic_prompt_caching_enabled = anthropic_prompt_caching_enabled model_completion.bedrock_prompt_caching_enabled = bedrock_prompt_caching_enabled model_completion end |
#build_required_tool_choice ⇒ Hash, String
Build the tool_choice parameter to require the model to call any tool (but not a specific one). Each provider implements this to return the correct format.
259 260 261 |
# File 'app/models/raif/llm.rb', line 259 def build_required_tool_choice raise NotImplementedError, "#{self.class.name} must implement #build_required_tool_choice" end |
#chat(message: nil, messages: nil, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil, anthropic_prompt_caching_enabled: false, bedrock_prompt_caching_enabled: false, &block) ⇒ Object
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
# File 'app/models/raif/llm.rb', line 74 def chat(message: nil, messages: nil, response_format: :text, available_model_tools: [], source: nil, system_prompt: nil, temperature: nil, max_completion_tokens: nil, tool_choice: nil, anthropic_prompt_caching_enabled: false, bedrock_prompt_caching_enabled: false, &block) unless response_format.is_a?(Symbol) raise ArgumentError, "Raif::Llm#chat - Invalid response format: #{response_format}. Must be a symbol (you passed #{response_format.class}) and be one of: #{VALID_RESPONSE_FORMATS.join(", ")}" # rubocop:disable Layout/LineLength end unless VALID_RESPONSE_FORMATS.include?(response_format) raise ArgumentError, "Raif::Llm#chat - Invalid response format: #{response_format}. Must be one of: #{VALID_RESPONSE_FORMATS.join(", ")}" end unless .present? || .present? raise ArgumentError, "Raif::Llm#chat - You must provide either a message: or messages: argument" end if .present? && .present? raise ArgumentError, "Raif::Llm#chat - You must provide either a message: or messages: argument, not both" end # Normalize :required / "required" to the symbol form for validation tool_choice = :required if tool_choice.to_s == "required" if tool_choice == :required if available_model_tools.blank? raise ArgumentError, "Raif::Llm#chat - tool_choice: :required requires at least one available model tool" end elsif tool_choice.present? && !available_model_tools.map(&:to_s).include?(tool_choice.to_s) raise ArgumentError, "Raif::Llm#chat - Invalid tool choice: #{tool_choice} is not included in the available model tools: #{available_model_tools.join(", ")}" end unless Raif.config.llm_api_requests_enabled Raif.logger.warn("LLM API requests are disabled. Skipping request to #{api_name}.") return end = [{ "role" => "user", "content" => }] if .present? temperature ||= default_temperature max_completion_tokens ||= default_max_completion_tokens stream_response = block_given? && streaming_supported? if block_given? && !stream_response Raif.logger.info( "Raif::Llm#chat: streaming requested but disabled for model key #{key.inspect} " \ "via Raif.config.streaming_unsupported_model_keys; falling back to non-streaming." ) end model_completion = build_pending_model_completion( messages: , response_format: response_format, available_model_tools: available_model_tools, source: source, system_prompt: system_prompt, temperature: temperature, max_completion_tokens: max_completion_tokens, tool_choice: tool_choice, stream_response: stream_response, anthropic_prompt_caching_enabled: anthropic_prompt_caching_enabled, bedrock_prompt_caching_enabled: bedrock_prompt_caching_enabled ) model_completion.started! retry_with_backoff(model_completion) do perform_model_completion!(model_completion, &block) ensure_model_completion_present!(model_completion) end model_completion.completed! model_completion rescue Raif::Errors::StreamingError => e Rails.logger.error("Raif streaming error -- code: #{e.code} -- type: #{e.type} -- message: #{e.} -- event: #{e.event}") model_completion&.record_failure!(e) unless model_completion&.failed? raise e rescue Faraday::Error => e Raif.logger.error("LLM API request failed (status: #{e.response_status}): #{e.}") Raif.logger.error(e.response_body) model_completion&.record_failure!(e) unless model_completion&.failed? raise e rescue StandardError => e model_completion&.record_failure!(e) unless model_completion&.failed? raise e end |
#name ⇒ Object
50 51 52 |
# File 'app/models/raif/llm.rb', line 50 def name I18n.t("raif.model_names.#{key}", default: display_name || key.to_s.humanize) end |
#perform_model_completion!(model_completion, &block) ⇒ Object
161 162 163 |
# File 'app/models/raif/llm.rb', line 161 def perform_model_completion!(model_completion, &block) raise NotImplementedError, "#{self.class.name} must implement #perform_model_completion!" end |
#streaming_supported? ⇒ Boolean
70 71 72 |
# File 'app/models/raif/llm.rb', line 70 def streaming_supported? self.class.streaming_supported_for_key?(key) end |
#supports_batch_inference? ⇒ Boolean
Instance-level shortcut for the class-level predicate so callers can use the idiomatic Raif.llm(:some_key).supports_batch_inference? form instead of reaching through to the class.
233 234 235 |
# File 'app/models/raif/llm.rb', line 233 def supports_batch_inference? self.class.supports_batch_inference? end |
#supports_faithful_required_tool_choice?(available_model_tools) ⇒ Boolean
Whether the provider can faithfully enforce tool_choice: :required for the given tool set. Override in subclasses when a provider can only enforce required tool use for some tool types.
266 267 268 |
# File 'app/models/raif/llm.rb', line 266 def supports_faithful_required_tool_choice?(available_model_tools) available_model_tools.present? end |
#supports_provider_managed_tool?(tool_klass) ⇒ Boolean
244 245 246 |
# File 'app/models/raif/llm.rb', line 244 def supports_provider_managed_tool?(tool_klass) supported_provider_managed_tools&.include?(tool_klass.to_s) end |
#validate_provider_managed_tool_support!(tool) ⇒ Object
270 271 272 273 274 275 |
# File 'app/models/raif/llm.rb', line 270 def validate_provider_managed_tool_support!(tool) unless supports_provider_managed_tool?(tool) raise Raif::Errors::UnsupportedFeatureError, "Invalid provider-managed tool: #{tool.name} for #{key}" end end |