Skip to main content

Deterministic Metrics

Deterministic metrics are traditional, code-based evaluation functions. They execute locally, consume zero LLM tokens, and will always produce the exact same score for the exact same inputs.

These metrics are ideal for initial structural checks, exact-matching scenarios, or bulk testing where LLM usage would be cost-prohibitive.

Available Deterministic Metrics

cosine-similarity

  • Description: Measures the semantic similarity between the model's output and your dataset's reference answer using sentence-transformer embeddings.
  • Score Range: 0.0 (Completely unrelated) to 1.0 (Identical meaning).
  • Inputs Required: output, reference

rouge-l

  • Description: Computes the Longest Common Subsequence (LCS) based F1 score. This measures lexical overlap (word-for-word matching) rather than semantic meaning.
  • Score Range: 0.0 (No overlap) to 1.0 (Identical tokens).
  • Inputs Required: output, reference

json-validity

  • Description: Checks if the model's output is valid, parsable JSON. It does not compare the output against a reference.
  • Score Range: 1.0 (Valid JSON) or 0.0 (Invalid JSON).
  • Inputs Required: output only.

invocation-success

  • Description: A built-in operational metric that tracks whether the HTTP call to your Target succeeded (e.g., didn't timeout or return a 500 status code).
  • Score Range: 1.0 (Success) or 0.0 (Failure).