Is 42Crunch good for Model information disclosure check?

What middleBrick covers

  • Run 18 LLM adversarial probes across Quick, Standard, and Deep tiers
  • Map findings to OWASP API Top 10 (2023) for model disclosure risks
  • Support authenticated scans with Bearer, API key, Basic, and Cookie
  • Deliver findings via Web Dashboard, CLI, and API client
  • Integrate with GitHub Actions for CI/CD gating on score thresholds
  • Provide remediation guidance for prompt handling and input validation

Scope of model disclosure testing

Model information disclosure checks focus on unintended exposure of system prompts, instructions, and internal reasoning patterns through API interactions. This scope aligns with the LLM Security category in the OWASP API Top 10, where the scanner runs 18 adversarial probes across three scan tiers.

Quick, Standard, and Deep scans probe for system prompt extraction, instruction override attempts, DAN and roleplay jailbreaks, data exfiltration techniques, cost exploitation, and encoding bypasses such as base64 and ROT12. The scanner also tests for translation-embedded injection, few-shot poisoning, markdown injection, multi-turn manipulation, indirect prompt injection, token smuggling, tool abuse, nested instruction injection, and PII extraction.

Because this testing is purely read-only, no destructive payloads are sent. The scanner analyzes how an API responds to crafted inputs that attempt to coax models into revealing behavior or data outside their intended constraints.

How the scanner tests for model disclosure

The scanner evaluates API endpoints that accept text input by applying adversarial probes designed to elicit over-disclosure. These probes include attempts to bypass guardrails, trigger roleplay scenarios, and expose system instructions through carefully constructed prompts.

Testing techniques include:

  • DAN jailbreak patterns that try to convince the model to ignore prior instructions.
  • Roleplay attempts that ask the model to assume a different persona and reveal constraints.
  • Data exfiltration probes that check for verbose error messages or context leakage.
  • Cost exploitation attempts designed to probe for token-wasting behaviors.
  • Encoding bypass checks using base64 and ROT13 to obscure malicious intent.

Each probe is classified across three tiers so you can balance depth against runtime. The scanner does not execute code or mutate state; it observes whether responses contain indications of instruction overrides or sensitive internal details.

Mapping to compliance and security frameworks

Findings from LLM disclosure checks map directly to OWASP API Top 10 (2023), which is one of the three frameworks explicitly referenced for alignment. This helps you understand where model-related behaviors may fall outside expected security controls.

For frameworks such as SOC 2 Type II, the scanner surfaces findings relevant to audit evidence around access controls and system monitoring. It does not certify compliance, but it supports audit evidence collection by documenting observable API behaviors.

When evaluating against PCI-DSS 4.0, the tool maps findings to relevant control areas such as secure authentication and encryption, while being clear that it does not guarantee compliance or validate specific regulatory adherence.

Limitations and complementary testing

The scanner does not detect business logic vulnerabilities, which require domain-specific understanding to evaluate model behavior in context. It also does not perform active SQL injection or command injection testing, as those are outside the scope of read-only, non-intrusive probing.

Blind SSRF and out-of-band infrastructure checks are not in scope, and the tool is not intended to replace a human pentester for high-stakes audits or complex threat models. If your primary concern is model disclosure and instruction integrity, combining automated scanning with targeted manual review is recommended.

Integration and remediation guidance

Scan results are delivered through the Web Dashboard, CLI, and API client, with options to track score trends and download branded compliance PDFs for documentation purposes. Each finding includes remediation guidance to help developers tighten prompt handling, input validation, and output filtering.

Authenticated scanning supports Bearer, API key, Basic auth, and cookies, with domain verification to ensure only the domain owner can submit credentials. Header allowlists restrict forwarded headers to Authorization, X-API-Key, Cookie, and X-Custom-* to reduce unintended data exposure during scans.

For CI/CD workflows, the GitHub Action can gate merges when score thresholds are not met, while the MCP Server enables integration with AI coding assistants to catch risky prompt handling early in development.

Frequently Asked Questions

Is 42Crunch a good fit for model information disclosure checks?
Yes, for read-only detection of prompt injection, jailbreak attempts, and instruction leakage via LLM probes. It is not a replacement for manual red-teaming or business logic review.
What types of LLM probes are included in the scan?
The scanner runs DAN jailbreak, roleplay attempts, data exfiltration probes, cost exploitation, encoding bypasses, translation-embedded injection, and several forms of indirect prompt injection across three scan tiers.
Does the scanner remediate model disclosure issues automatically?
No. The tool detects and reports findings with remediation guidance, but it does not patch, block, or fix issues automatically.
How does authenticated scanning work for LLM checks?
Authenticated scanning with Bearer, API key, or cookies requires domain verification and restricts forwarded headers to minimize exposure during probing.
Can this replace a human pentester for model security?
No. The tool is designed to complement manual testing and does not cover business logic vulnerabilities or blind infrastructure issues.