Class: Raif::Evals::LlmJudges::Summarization

Inherits:
Raif::Evals::LlmJudge show all
Defined in:
app/models/raif/evals/llm_judges/summarization.rb

Constant Summary

Constants included from Concerns::LlmResponseParsing

Concerns::LlmResponseParsing::ASCII_CONTROL_CHARS

Instance Attribute Summary

Attributes inherited from Task

#files, #images

Instance Method Summary collapse

Methods inherited from Raif::Evals::LlmJudge

#default_llm_model_key, #judgment_confidence, #judgment_reasoning, #low_confidence?

Methods inherited from Task

json_response_schema, #json_response_schema, #messages, prompt, #prompt_studio_task_attributes, #re_run, run, #run, #status, system_prompt

Methods included from Concerns::JsonSchemaDefinition

#schema_for_instance

Methods included from Concerns::LlmResponseParsing

#parse_html_response, #parse_json_response, #parsed_response

Methods included from Concerns::HasRuntimeDuration

#runtime_duration, #runtime_duration_seconds, #runtime_ended_at

Methods included from Concerns::HasAvailableModelTools

#available_model_tools_map

Methods included from Concerns::HasRequestedLanguage

#requested_language_name, #system_prompt_language_preference

Methods included from Concerns::HasLlm

#default_llm_model_key, #llm

Methods inherited from ApplicationRecord

table_name_prefix, where_json_not_blank

Instance Method Details

#accuracy_justificationObject



171
172
173
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 171

def accuracy_justification
  parsed_response["accuracy"]["justification"] if completed?
end

#accuracy_scoreObject



167
168
169
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 167

def accuracy_score
  parsed_response["accuracy"]["score"] if completed?
end

#build_promptObject



109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 109

def build_prompt
  <<~PROMPT.strip
    # Instructions
    Below is an original piece of content and its summary. Evaluate the summary against the original content based on our 4 criteria. For each, you should provide:
    - A brief justification (1-3 sentences) noting any relevant observations (e.g. what was missing, incorrect, unclear, or well-done).
    - A score from 1 to 5 (5 = excellent, 1 = very poor).

    Finally, provide an **overall evaluation** of the summary, consisting of a brief justification (1-3 sentences) and a score from 1 to 5 (5 = excellent, 1 = very poor).

    # Output Format
    Format your output as a JSON object with the following keys:
    {
      "coverage": {
        "justification": "...",
        "score": 1-5
      },
      "accuracy": {
        "justification": "...",
        "score": 1-5
      },
      "clarity": {
        "justification": "...",
        "score": 1-5
      },
      "conciseness": {
        "justification": "...",
        "score": 1-5
      },
      "overall": {
        "justification": "...",
        "score": 1-5
      }
    }
    #{additional_context_prompt}
    # Original Article/Document
    #{original_content}

    # Summary to Evaluate
    #{summary}
  PROMPT
end

#build_system_promptObject



75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 75

def build_system_prompt
  <<~PROMPT.strip
    You are an impartial expert judge of summary quality. You'll be provided a original piece of content and its summary. Your job is to evaluate the summary against the original content based on the following criteria, and assign a score from 1 to 5 for each (5 = excellent, 1 = very poor):

    **Coverage (Relevance & Completeness):** Does the summary capture all the important points of the original content?
    - 5 = Excellent Coverage - Nearly all key points and essential details from the content are present in the summary, with no major omissions.
    - 4 = Good Coverage - Most important points are included, but a minor detail or two might be missing.
    - 3 = Fair Coverage - Some main points appear, but the summary misses or glosses over other important information.
    - 2 = Poor Coverage - Many critical points from the content are missing; the summary is incomplete.
    - 1 = Very Poor - The summary fails to include most of the content's main points (highly incomplete).

    **Accuracy (Faithfulness to the Source):** Is the summary factually correct and free of hallucinations or misrepresentations of the content?
    - 5 = Fully Accurate - All statements in the summary are correct and directly supported by the content. No errors or invented information.
    - 4 = Mostly Accurate - The summary is generally accurate with perhaps one minor error or slight ambiguity, but no significant falsehoods.
    - 3 = Some Inaccuracies - Contains a few errors or unsupported claims from the content, but overall captures the gist correctly.
    - 2 = Mostly Inaccurate - Multiple statements in the summary are incorrect or not supported by the content.
    - 1 = Completely Inaccurate - The summary seriously distorts or contradicts the content; many claims are false or not in the source.

    **Clarity and Coherence:** Is the summary well-written and easy to understand? (Consider organization, flow, and whether it would make sense to a reader.)
    - 5 = Very Clear & Coherent - The summary is logically organized, flows well, and would be easily understood by the target reader. No confusion or ambiguity.
    - 4 = Mostly Clear - Readable and mostly well-structured, though a sentence or transition could be smoother.
    - 3 = Somewhat Clear - The summary makes sense overall but might be disjointed or awkward in places, requiring effort to follow.
    - 2 = Generally Unclear - Lacks coherence or has poor phrasing that makes it hard to follow the ideas.
    - 1 = Very Poor Clarity - The summary is very confusing or poorly structured, making it hard to understand.

    **Conciseness:** Is the summary succinct while still informative? (It should omit unnecessary detail but not at the expense of coverage.)
    - 5 = Highly Concise - The summary is brief yet covers all important information (no fluff or redundancy).
    - 4 = Concise - Generally to-the-point, with only minor redundancy or superfluous content.
    - 3 = Moderately Concise - Some excess detail or repetition that could be trimmed, but not egregious.
    - 2 = Verbose - Contains a lot of unnecessary detail or repeats points, making it longer than needed.
    - 1 = Excessively Verbose - The summary is overly long or wordy, with much content that doesn't add value.
  PROMPT
end

#clarity_justificationObject



179
180
181
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 179

def clarity_justification
  parsed_response["clarity"]["justification"] if completed?
end

#clarity_scoreObject



175
176
177
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 175

def clarity_score
  parsed_response["clarity"]["score"] if completed?
end

#conciseness_justificationObject



187
188
189
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 187

def conciseness_justification
  parsed_response["conciseness"]["justification"] if completed?
end

#conciseness_scoreObject



183
184
185
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 183

def conciseness_score
  parsed_response["conciseness"]["score"] if completed?
end

#coverage_justificationObject



163
164
165
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 163

def coverage_justification
  parsed_response["coverage"]["justification"] if completed?
end

#coverage_scoreObject



159
160
161
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 159

def coverage_score
  parsed_response["coverage"]["score"] if completed?
end

#overall_justificationObject



155
156
157
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 155

def overall_justification
  parsed_response["overall"]["justification"] if completed?
end

#overall_scoreObject



151
152
153
# File 'app/models/raif/evals/llm_judges/summarization.rb', line 151

def overall_score
  parsed_response["overall"]["score"] if completed?
end