Rule Testing Framework¶

The rule testing framework validates that detection rules behave correctly against a set of known fixture files. It provides a manifest-driven approach to declaring expected outcomes and automatically checks for regressions when rules change.

Overview¶

The framework has three core components:

Manifest (rule_tests.yml) -- a YAML file declaring test cases: which fixture to scan, what verdict to expect, and which rules should (or should not) fire.
Runner (RuleTestRunner) -- scans each fixture using the rule_engine and threat_intel layers (no LLM, no URL crawling) and compares results to expectations.
Report (RuleTestReport) -- aggregates results and exports them as plain text, JSON, or JUnit XML for CI integration.

Manifest Format¶

The manifest is a YAML file with the following structure:

version: "1"
description: "Rule testing manifest"
base_dir: "."   # Resolved relative to the YAML file's parent directory

test_cases:
  - fixture: tests/fixtures/skills/benign/hello_world.md
    expected_verdict: CLEAN
    description: "Should be clean"

  - fixture: tests/fixtures/skills/malicious/env_harvesting.md
    expected_verdict: MALICIOUS
    expected_rules:
      - MALWAR-ENV-001
    not_expected_rules:
      - MALWAR-PI-001
    description: "Should detect environment variable harvesting"

Fields¶

Field	Required	Description
`fixture`	Yes	Relative path to the fixture file (from `base_dir`)
`expected_verdict`	Yes	`CLEAN`, `MALICIOUS`, or `SUSPICIOUS`
`expected_rules`	No	List of rule IDs that must fire
`not_expected_rules`	No	List of rule IDs that must NOT fire
`description`	No	Human-readable description

Verdict Matching¶

CLEAN -- the actual verdict must be exactly CLEAN.
MALICIOUS -- the actual verdict must be MALICIOUS or SUSPICIOUS.
SUSPICIOUS -- the actual verdict must be SUSPICIOUS, MALICIOUS, or CAUTION.

Pass/Fail Criteria¶

A test case passes when all of the following are true:

The actual verdict matches the expected verdict (per the rules above).
Every rule in expected_rules fired at least once.
No rule in not_expected_rules fired.

A test case is flagged as a regression when the expected verdict is MALICIOUS or SUSPICIOUS but the actual verdict does not match -- indicating a previously-detected threat is no longer caught.

CLI Usage¶

Run all tests (default manifest)¶

malwar test-rules

Custom manifest¶

malwar test-rules --manifest path/to/manifest.yml

Output formats¶

# Plain text (default)
malwar test-rules --format text

# JSON
malwar test-rules --format json

# JUnit XML (for CI)
malwar test-rules --format junit

Write results to a file¶

malwar test-rules --format junit --output results.xml

Exit codes¶

0 -- all tests passed.
1 -- one or more tests failed or errored.

Python API¶

import asyncio
from malwar.testing import RuleTestManifest, RuleTestRunner

async def main():
    manifest = RuleTestManifest.from_yaml("tests/rule_tests.yml")
    report = await RuleTestRunner.run(manifest)

    print(report.to_text())
    print(f"All passed: {report.all_passed}")

asyncio.run(main())

CI Integration¶

GitHub Actions¶

- name: Run rule tests
  run: |
    malwar test-rules --format junit --output rule-test-results.xml

- name: Publish test results
  uses: dorny/test-reporter@v1
  if: always()
  with:
    name: Rule Tests
    path: rule-test-results.xml
    reporter: java-junit

Adding New Test Cases¶

When adding a new detection rule or fixture:

Create the fixture file under tests/fixtures/skills/benign/ or tests/fixtures/skills/malicious/.
Add an entry to tests/rule_tests.yml specifying the expected verdict and rule IDs.
Run malwar test-rules to verify.

Regression Detection¶

The framework automatically flags regressions: if a fixture that was previously detected as malicious is now classified as clean, it is marked as a regression in the report. This makes it easy to catch rule breakage during development.