Class: Raif::Evals::LlmJudges::Summarization

Inherits:

Raif::Evals::LlmJudge

Object
ApplicationRecord
Task
Raif::Evals::LlmJudge
Raif::Evals::LlmJudges::Summarization

show all

Defined in:: lib/raif/evals/llm_judges/summarization.rb

Constant Summary

Constants included from Concerns::LlmResponseParsing

Concerns::LlmResponseParsing::ASCII_CONTROL_CHARS

Instance Attribute Summary

Attributes inherited from Task

#files, #images

Instance Method Details

#accuracy_justification ⇒ `Object`



133
134
135

# File 'lib/raif/evals/llm_judges/summarization.rb', line 133

def accuracy_justification
  parsed_response["accuracy"]["justification"] if completed?
end

#accuracy_score ⇒ `Object`



129
130
131

# File 'lib/raif/evals/llm_judges/summarization.rb', line 129

def accuracy_score
  parsed_response["accuracy"]["score"] if completed?
end

#build_prompt ⇒ `Object`

# File 'lib/raif/evals/llm_judges/summarization.rb', line 71

def build_prompt
  <<~PROMPT.strip
    # Instructions
    Below is an original piece of content and its summary. Evaluate the summary against the original content based on our 4 criteria. For each, you should provide:
    - A brief justification (1-3 sentences) noting any relevant observations (e.g. what was missing, incorrect, unclear, or well-done).
    - A score from 1 to 5 (5 = excellent, 1 = very poor).

    Finally, provide an **overall evaluation** of the summary, consisting of a brief justification (1-3 sentences) and a score from 1 to 5 (5 = excellent, 1 = very poor).

    # Output Format
    Format your output as a JSON object with the following keys:
    {
      "coverage": {
        "justification": "...",
        "score": 1-5
      },
      "accuracy": {
        "justification": "...",
        "score": 1-5
      },
      "clarity": {
        "justification": "...",
        "score": 1-5
      },
      "conciseness": {
        "justification": "...",
        "score": 1-5
      },
      "overall": {
        "justification": "...",
        "score": 1-5
      }
    }
    #{additional_context_prompt}
    # Original Article/Document
    #{original_content}

    # Summary to Evaluate
    #{summary}
  PROMPT
end

#build_system_prompt ⇒ `Object`

# File 'lib/raif/evals/llm_judges/summarization.rb', line 37

def build_system_prompt
  <<~PROMPT.strip
    You are an impartial expert judge of summary quality. You'll be provided a original piece of content and its summary. Your job is to evaluate the summary against the original content based on the following criteria, and assign a score from 1 to 5 for each (5 = excellent, 1 = very poor):

    **Coverage (Relevance & Completeness):** Does the summary capture all the important points of the original content?
    - 5 = Excellent Coverage - Nearly all key points and essential details from the content are present in the summary, with no major omissions.
    - 4 = Good Coverage - Most important points are included, but a minor detail or two might be missing.
    - 3 = Fair Coverage - Some main points appear, but the summary misses or glosses over other important information.
    - 2 = Poor Coverage - Many critical points from the content are missing; the summary is incomplete.
    - 1 = Very Poor - The summary fails to include most of the content's main points (highly incomplete).

    **Accuracy (Faithfulness to the Source):** Is the summary factually correct and free of hallucinations or misrepresentations of the content?
    - 5 = Fully Accurate - All statements in the summary are correct and directly supported by the content. No errors or invented information.
    - 4 = Mostly Accurate - The summary is generally accurate with perhaps one minor error or slight ambiguity, but no significant falsehoods.
    - 3 = Some Inaccuracies - Contains a few errors or unsupported claims from the content, but overall captures the gist correctly.
    - 2 = Mostly Inaccurate - Multiple statements in the summary are incorrect or not supported by the content.
    - 1 = Completely Inaccurate - The summary seriously distorts or contradicts the content; many claims are false or not in the source.

    **Clarity and Coherence:** Is the summary well-written and easy to understand? (Consider organization, flow, and whether it would make sense to a reader.)
    - 5 = Very Clear & Coherent - The summary is logically organized, flows well, and would be easily understood by the target reader. No confusion or ambiguity.
    - 4 = Mostly Clear - Readable and mostly well-structured, though a sentence or transition could be smoother.
    - 3 = Somewhat Clear - The summary makes sense overall but might be disjointed or awkward in places, requiring effort to follow.
    - 2 = Generally Unclear - Lacks coherence or has poor phrasing that makes it hard to follow the ideas.
    - 1 = Very Poor Clarity - The summary is very confusing or poorly structured, making it hard to understand.

    **Conciseness:** Is the summary succinct while still informative? (It should omit unnecessary detail but not at the expense of coverage.)
    - 5 = Highly Concise - The summary is brief yet covers all important information (no fluff or redundancy).
    - 4 = Concise - Generally to-the-point, with only minor redundancy or superfluous content.
    - 3 = Moderately Concise - Some excess detail or repetition that could be trimmed, but not egregious.
    - 2 = Verbose - Contains a lot of unnecessary detail or repeats points, making it longer than needed.
    - 1 = Excessively Verbose - The summary is overly long or wordy, with much content that doesn't add value.
  PROMPT
end

#clarity_justification ⇒ `Object`



141
142
143

# File 'lib/raif/evals/llm_judges/summarization.rb', line 141

def clarity_justification
  parsed_response["clarity"]["justification"] if completed?
end

#clarity_score ⇒ `Object`



137
138
139

# File 'lib/raif/evals/llm_judges/summarization.rb', line 137

def clarity_score
  parsed_response["clarity"]["score"] if completed?
end

#conciseness_justification ⇒ `Object`



149
150
151

# File 'lib/raif/evals/llm_judges/summarization.rb', line 149

def conciseness_justification
  parsed_response["conciseness"]["justification"] if completed?
end

#conciseness_score ⇒ `Object`



145
146
147

# File 'lib/raif/evals/llm_judges/summarization.rb', line 145

def conciseness_score
  parsed_response["conciseness"]["score"] if completed?
end

#coverage_justification ⇒ `Object`



125
126
127

# File 'lib/raif/evals/llm_judges/summarization.rb', line 125

def coverage_justification
  parsed_response["coverage"]["justification"] if completed?
end

#coverage_score ⇒ `Object`



121
122
123

# File 'lib/raif/evals/llm_judges/summarization.rb', line 121

def coverage_score
  parsed_response["coverage"]["score"] if completed?
end

#overall_justification ⇒ `Object`



117
118
119

# File 'lib/raif/evals/llm_judges/summarization.rb', line 117

def overall_justification
  parsed_response["overall"]["justification"] if completed?
end

#overall_score ⇒ `Object`



113
114
115

# File 'lib/raif/evals/llm_judges/summarization.rb', line 113

def overall_score
  parsed_response["overall"]["score"] if completed?
end

Class: Raif::Evals::LlmJudges::Summarization

Constant Summary

Constants included from Concerns::LlmResponseParsing

Instance Attribute Summary

Attributes inherited from Task

Instance Method Summary collapse

Methods inherited from Raif::Evals::LlmJudge

Methods inherited from Task

Methods included from Concerns::JsonSchemaDefinition

Methods included from Concerns::LlmResponseParsing

Methods included from Concerns::HasAvailableModelTools

Methods included from Concerns::HasRequestedLanguage

Methods included from Concerns::HasLlm

Methods inherited from ApplicationRecord

Instance Method Details

#accuracy_justification ⇒ Object

#accuracy_score ⇒ Object

#build_prompt ⇒ Object

#build_system_prompt ⇒ Object

#clarity_justification ⇒ Object

#clarity_score ⇒ Object

#conciseness_justification ⇒ Object

#conciseness_score ⇒ Object

#coverage_justification ⇒ Object

#coverage_score ⇒ Object

#overall_justification ⇒ Object

#overall_score ⇒ Object

#accuracy_justification ⇒ `Object`

#accuracy_score ⇒ `Object`

#build_prompt ⇒ `Object`

#build_system_prompt ⇒ `Object`

#clarity_justification ⇒ `Object`

#clarity_score ⇒ `Object`

#conciseness_justification ⇒ `Object`

#conciseness_score ⇒ `Object`

#coverage_justification ⇒ `Object`

#coverage_score ⇒ `Object`

#overall_justification ⇒ `Object`

#overall_score ⇒ `Object`