Deterministic Metrics
Deterministic metrics are traditional, code-based evaluation functions. They execute locally, consume zero LLM tokens, and will always produce the exact same score for the exact same inputs.
These metrics are ideal for initial structural checks, exact-matching scenarios, or bulk testing where LLM usage would be cost-prohibitive.
Available Deterministic Metrics
cosine-similarity
- Description: Measures the semantic similarity between the model's output and your dataset's reference answer using sentence-transformer embeddings.
- Score Range:
0.0(Completely unrelated) to1.0(Identical meaning). - Inputs Required:
output,reference
rouge-l
- Description: Computes the Longest Common Subsequence (LCS) based F1 score. This measures lexical overlap (word-for-word matching) rather than semantic meaning.
- Score Range:
0.0(No overlap) to1.0(Identical tokens). - Inputs Required:
output,reference
json-validity
- Description: Checks if the model's output is valid, parsable JSON. It does not compare the output against a reference.
- Score Range:
1.0(Valid JSON) or0.0(Invalid JSON). - Inputs Required:
outputonly.
invocation-success
- Description: A built-in operational metric that tracks whether the HTTP call to your Target succeeded (e.g., didn't timeout or return a 500 status code).
- Score Range:
1.0(Success) or0.0(Failure).