Is 42Crunch good for Multi-turn manipulation audit?
What middleBrick covers
- Executes 18 LLM adversarial probes across three scan tiers
- Focuses on text-only POST for system prompt and jailbreak tests
- Maps LLM findings to OWASP API Top 10 (2023)
- Provides risk score with prioritized remediation guidance
- Supports CI/CD gating via GitHub Action integration
- Tracks changes over time with scheduled rescan diffs
Scope of multi-turn manipulation testing
Multi-turn manipulation refers to attacks where an adversary chains multiple prompts to change behavior, exfiltrate data, or bypass guardrails over the course of a conversation. The scanner evaluates this surface by executing 18 adversarial probes across three scan tiers. Quick runs a subset of prompt-injection and encoding bypass checks. Standard adds jailbreak patterns, roleplay attempts, and token-smuggling scenarios. Deep includes nested instruction injection, indirect prompt injection, and PII extraction attempts across multi-turn dialogs.
How middleBrick handles these probes
middleBrick executes LLM probes as text-only POST requests focused on system prompt extraction, instruction override, DAN and roleplay jailbreaks, data exfiltration, cost exploitation, and encoding bypasses such as base64 or ROT13. No code execution occurs on your API, and no agent or SDK is required. The scanner correlates findings with API definitions to highlight endpoints accepting user-influenced input that could be abused in multi-turn chains.
Mapping to compliance frameworks
Findings from the LLM probe categories map directly to OWASP API Top 10 (2023), which covers injection and manipulation risks relevant to multi-turn attacks. Results also support audit evidence for SOC 2 Type II and align with security controls described in PCI-DSS 4.0. The tool surfaces security issues that help you prepare for regulatory expectations around input validation and access control, while explicitly avoiding claims of certification or compliance.
Limitations and complementary practices
middleBrick does not detect business logic vulnerabilities that require domain understanding, nor does it perform active SQL injection or command injection testing. Blind SSRF and out-of-band exfiltration paths are out of scope. Because multi-turn manipulation depends on conversational state and model behavior, the scanner cannot replace a human pentester for high-stakes audits. Use it as a continuous indicator alongside adversarial testing and model-level red-teaming.
Practical usage and integration
Run scans via the CLI with middlebrick scan <url> to receive a letter-grade risk score and prioritized findings. In CI/CD, the GitHub Action can gate merges when the score drops below your threshold. For ongoing monitoring, the Pro tier provides scheduled rescans and diff detection to track new findings or score drift related to changes in API behavior or prompt surfaces.