System Architecture
The Execution Engine
The CertOps Engine follows a strict Attack-Check-Decide loop for every component in the system under test.
Phase 1: Ingestion
- The Engine loads your
certops.yamlmanifest. - It resolves the specific Dataset for each component (Target) defined in the manifest.
Phase 2: Attack (Multi-Target Inference)
- The Engine iterates through the list of Targets.
- Parallel Execution: Independent components run concurrently.
- Sequential Execution: Components with dependencies run in order (e.g., "Certify Embeddings Retriever" before "Certify Chat Generator").
Phase 3: Grading (Judgement)
For each Target, responses are evaluated against predefined metrics using the 3-Tier Quality Matrix:
- Deterministic Local Gates: "Did the response contain a 10-digit phone number?" "Are the JSON keys correct?"
- Pointwise Evaluation (LLM-as-Judge): "Did the agent answer politely without hallucinating facts according to the ground truth dataset?"
- Pairwise Regression (Drift): "Is this new uncommitted answer better, worse, or equal to the Production baseline?"
Phase 4: Decision (Verdict)
- Component Verdict: Each individual component gets a Pass/Fail based on its configured Quality Gates.
- Suite Verdict: The overall System is marked
CERTIFIEDonly if ALL Component Verdicts are strictlyPASSED. Any blocking failure results in aREJECTEDsuite. - Drift Check: If a baseline exists, regression is calculated per-target to detect drops in performance over time.
Data Flow Diagram
The following diagram illustrates how CertOps integrates into a typical pipeline and orchestrates tests.