Skip to main content

Deterministic Metrics

Deterministic metrics are traditional, code-based evaluation functions. They execute locally, consume zero LLM tokens, and will always produce the exact same score for the exact same inputs.

These metrics are ideal for initial structural checks, exact-matching scenarios, or bulk testing where LLM usage would be cost-prohibitive.

Available Deterministic Metrics

`cosine-similarity`

Description: Measures the semantic similarity between the model's output and your dataset's reference answer using sentence-transformer embeddings.
Score Range: 0.0 (Completely unrelated) to 1.0 (Identical meaning).
Inputs Required: output, reference

`rouge-l`

Description: Computes the Longest Common Subsequence (LCS) based F1 score. This measures lexical overlap (word-for-word matching) rather than semantic meaning.
Score Range: 0.0 (No overlap) to 1.0 (Identical tokens).
Inputs Required: output, reference

`json-validity`

Description: Checks if the model's output is valid, parsable JSON. It does not compare the output against a reference.
Score Range: 1.0 (Valid JSON) or 0.0 (Invalid JSON).
Inputs Required: output only.

`invocation-success`

Description: A built-in operational metric that tracks whether the HTTP call to your Target succeeded (e.g., didn't timeout or return a 500 status code).
Score Range: 1.0 (Success) or 0.0 (Failure).

Available Deterministic Metrics