Skip to main content

System Architecture

The Execution Engine

The CertOps Engine follows a strict Attack-Check-Decide loop for every component in the system under test.

Phase 1: Ingestion

  • The Engine loads your certops.yaml manifest.
  • It resolves the specific Dataset for each component (Target) defined in the manifest.

Phase 2: Attack (Multi-Target Inference)

  • The Engine iterates through the list of Targets.
  • Parallel Execution: Independent components run concurrently.
  • Sequential Execution: Components with dependencies run in order (e.g., "Certify Embeddings Retriever" before "Certify Chat Generator").

Phase 3: Grading (Judgement)

For each Target, responses are evaluated against predefined metrics using the 3-Tier Quality Matrix:

  1. Deterministic Local Gates: "Did the response contain a 10-digit phone number?" "Are the JSON keys correct?"
  2. Pointwise Evaluation (LLM-as-Judge): "Did the agent answer politely without hallucinating facts according to the ground truth dataset?"
  3. Pairwise Regression (Drift): "Is this new uncommitted answer better, worse, or equal to the Production baseline?"

Phase 4: Decision (Verdict)

  • Component Verdict: Each individual component gets a Pass/Fail based on its configured Quality Gates.
  • Suite Verdict: The overall System is marked CERTIFIED only if ALL Component Verdicts are strictly PASSED. Any blocking failure results in a REJECTED suite.
  • Drift Check: If a baseline exists, regression is calculated per-target to detect drops in performance over time.

Data Flow Diagram

The following diagram illustrates how CertOps integrates into a typical pipeline and orchestrates tests.