Tool	Accuracy of Findings	Detects Non-Pattern-Based Issues?	Coverage of SAST Findings	Speed of Scanning	Usability & Dev Experience
DryRun Security	Very high – caught multiple critical issues missed by others	Yes – context-based analysis, logic flaws & SSRF	Broad coverage of standard vulns, logic flaws, and extendable	Near real-time PR feedback	Clear PR comments, expandable policies with no scripting or coding (NLCP)
Snyk Code	High on well-known patterns (SQLi, XSS), but misses other categories	Limited – AI-based, focuses on recognized vulnerabilities	Good coverage of standard vulns; may miss SSRF or advanced auth logic issues	Fast, often near PR speed	Decent GitHub integration, but rules are a black box
GitHub Advanced Security (CodeQL)	Very high precision for known queries, low false positives	Partial – strong dataflow for known issues, needs custom queries	Good for SQLi and XSS but logic flaws require advanced CodeQL experience.	Moderate to slow (GitHub Action based)	Requires CodeQL expertise for custom logic
Semgrep	Medium, but there is a good community for adding rules	Primarily pattern-based with limited dataflow	Decent coverage with the right rules, can still miss advanced logic or SSRF	Fast scans	Has custom rules, but dev teams must maintain them
SonarQube	Low – misses serious issues in our testing	Limited – mostly pattern-based, code quality oriented	Basic coverage for standard vulns, many hotspots require manual review	Moderate, usually in CI	Dashboard-based approach, can pass “quality gate” despite real vulns

Tool	Accuracy of Findings	Detects Non-Pattern-Based Issues?	Coverage of C# Vulnerabilities	Scan Speed	Developer Experience
DryRun Security	Very high – caught all critical flaws missed by others	Yes – context-based analysis finds logic errors, auth flaws, etc.	Broad coverage of OWASP Top 10 vulns plus business logic issues	Near real-time (PR comment within seconds)	Clear single PR comment with detailed insights; no config or custom scripts needed
Snyk Code	High on known patterns (SQLi, XSS), but misses logic/flow bugs	Limited – focuses on recognizable vulnerability patterns	Good for standard vulns; may miss SSRF or auth logic issues	Fast (integrates into PR checks)	Decent GitHub integration, but rules are a black box (no easy customization)
GitHub Advanced Security (CodeQL)	Low - missed everything except SQL Injection	Mostly pattern-based	Low – only discovered SQL Injection	Slowest of all but finished in 1 minute	Concise annotation with a suggested fix and optional auto-remedation
Semgrep	Medium – finds common issues with community rules, some misses	Primarily pattern-based, limited data flow analysis	Decent coverage with the right rules; misses advanced logic flaws	Very fast (runs as lightweight CI)	Custom rules possible, but require maintenance and security expertise
SonarQube	Low – missed serious issues in our testing	Mostly pattern-based (code quality focus)	Basic coverage for known vulns; many issues flagged as “hotspots” require manual review	Moderate (runs in CI/CD pipeline)	Results in dashboard; risk of false sense of security if quality gate passes despite vulnerabilities

Dimension	Why It Matters
Surface	Entry points & data sources highlight tainted flows early.
Language	Code idioms reveal hidden sinks and framework quirks.
Intent	What is the purpose of the code being changed/added?
Design	Robustness and resilience of changing code.
Environment	Libraries, build flags, and infra metadata flag, infrastructure (IaC) all give clues around the risks in changing code.

KPI	Pattern-Based SAST	DryRun CSA
Mean Time to Regex	3–8 hrs per noisy finding set	Not required
Mean Time to Context	N/A	< 1 min
False-Positive Rate	50–85 %	< 5 %
Logic-Flaw Detection	< 5 %	90%+

	Severity
Location	utils/authorization.py :L118	utils/authorization.py :L49 & L82 & L164
Issue	JWT Algorithm Confusion Attack: jwt.decode() selects the algorithm from unverified JWT headers.	Insecure OIDC Endpoint Communication: ‍urllib.request.urlopen called without explicit TLS/CA handling.
Impact	Complete auth bypass (switch RS256→HS256, forge tokens with public key as HMAC secret).	Susceptible to MITM if default SSL behavior is weakened or cert store compromised.
Remediation	Replace the dynamic algorithm selection with a fixed, expected algorithm list. Change line 118 from algorithms=[unverified_header.get('alg', 'RS256')] to algorithms=['RS256'] to only accept RS256 tokens. Add algorithm validation before token verification to ensure the header algorithm matches expected values.	Create a secure SSL context using ssl.create_default_context() with proper certificate verification. Configure explicit timeout values for all HTTP requests to prevent hanging connections. Add explicit SSL/TLS configuration by creating an HTTPSHandler with the secure SSL context. Implement proper error handling specifically for SSL certificate validation failures.
Key Insight	This vulnerability arises from trusting an unverified portion of the JWT to determine the verification method itself	This vulnerability stems from a lack of explicit secure communication practices, leaving the application reliant on potentially weak default behaviors.

AI in AppSec

June 5, 2026

Build vs. Buy for AI AppSec: Why AI Security Requires Independent Verification

Security teams have moved past the question that used to sound reckless: can AI review code and find real vulnerabilities?

The enterprise question now is sharper: should we build our own AI code scanner, or buy an AI-native SAST platform like DryRun Security?

On paper, building looks attractive. Your team already uses Claude Code, Mythos, and other frontier models. You control your own stack. Why bring in another vendor when you can add "AI review" to CI with a few prompts and some glue code?

The real question is not whether AI can review code. It is whether security teams can trust AI-generated and human-written code without an independent verification layer. As AI-assisted development accelerates, organizations need a way to consistently validate code before it reaches production regardless of who, or what, wrote it.

Claude Code /security-review and DryRun PR scanning are both aimed at the same workflow moment: reviewing a pull request for security risk. A developer can run Claude Code against their PR and get useful feedback. That is valuable. The difference is operationalization.

Claude Code security review is developer-triggered unless your organization builds rules to force it into the pipeline. DryRun is designed to run across pull requests as an org-owned control from the start — it does not depend on every developer remembering to ask the right question at the right time.

And once the scan runs, the enterprise problem is not finished. You still need policies, triage, reporting, evaluation, auditability, and governance. That is where a developer tool and an AppSec platform diverge. Because AI-native AppSec is not a model integration problem. It is a systems, evaluation, and governance problem. That is where internal builds usually fall short.

Where enterprises are on the AI AppSec maturity curve

Most enterprise security programs are moving through the same curve: from skepticism, to developer experimentation, to org-level control.

PHASE 0

Skepticism

“LLMs are toys. They hallucinate. They will never be enterprise-ready.”

PHASE 1

Developer assistance

Teams use Claude Code, chat-based review, or local agents to help individual developers check their own work. Useful, but voluntary, author-driven, and uneven across teams.

PHASE 2

Tooling around the developer

Teams wrap prompts, skills, and local workflows around models. These tools improve developer productivity, but still do not create consistent AppSec coverage or program-level evidence.

PHASE 3

Agents

Organizations experiment with systems that can plan, call tools, and reason over code. Without real controls, they still lack consistent enforcement, auditability, and predictability.

PHASE 4

Independent verification

AI-generated and human-written code are continuously validated through organization-owned security controls, policies, and review workflows before deployment.

Enterprises need Phase 4. They need AI that behaves like a security control — not a loose collection of model calls. That means enforcement, auditability, reliability, and integration into existing SDLC and governance processes.

DryRun Security was built for this phase. It is an Code Security Intelligence platform that integrates into developer workflows, provides contextual analysis, and gives AppSec teams the control surface they care about: consistent review, policy enforcement, auditability, and low-noise findings.

Build vs. buy: what you are really deciding

A useful build-vs.-buy analysis has to go beyond "we already have an LLM contract."

Most organizations discover they are not building a security scanner. They are building a security product that requires ongoing evaluation, governance, policy management, reporting, and operational ownership.

The first version of a DIY approach often looks simple: take Claude Code /security-review, point it at the pull request, and make developers run it. But if you want that review to behave like a control, you have to answer harder questions. Does it run on every PR? Who owns the rule that forces it? What happens when developers bypass it? Where do findings go? How are results tracked? How do you tune false positives? How do you prove to leadership or auditors that the control is working?

When you choose to build, you are effectively choosing to build and maintain:

Pipeline enforcement that guarantees review runs on the right pull requests
An agentic review system that can reason over repositories, diffs, and architecture
A continuous evaluation pipeline that keeps the system accurate as models and codebases change
Triage and reporting workflows for findings across teams
An observability layer for context assembly, routing, retries, latency, and output quality
A governance layer that turns findings into controls your security program can trust

If you are not committing to those pieces, you are not really building a security control. You are running prompts and hoping the output remains accurate over time.

DryRun Security brings those components together as a platform, not a collection of scripts.

Scanning is the overlap, not the whole product

Claude Code /security-review and DryRun PR scanning do overlap: both can review a pull request and identify security concerns. But overlap at the scan layer does not mean equivalence at the platform layer.

A developer can manually run Claude Code against a PR. An organization can also wire that review into CI and enforce it through pipeline rules — but then the organization owns the enforcement logic, exception handling, tuning, reporting, and maintenance around that workflow.

DryRun starts from the opposite direction. PR scanning is not an optional local action; it is the default control surface. DryRun runs across pull requests, applies consistent contextual review and policy checks, and gives AppSec the platform layer around the scan.

The scan tells you what the model saw. The platform tells you whether the control is working.

Systems, not models: the multi-provider reality

Claude Code, Mythos, and other frontier tools are impressive. They are also only part of the system.

A production AppSec control cannot depend on one model, one tool path, or one prompt pattern. It needs:

Multiple model providers to hedge outages and performance regressions
Routing logic to pick the right model and mode for a given task
Fallback paths when a provider fails or degrades
Evaluations that measure behavior over time, not just during onboarding

DryRun Security is designed around multi-model operation, structured routing, fallback, and continuous evaluation. Doing this in-house means standing up infrastructure, experimentation pipelines, and ongoing evaluation just to maintain baseline behavior. That is why it matters how your vendor actually analyzes code — not just which models it claims to support.

AI-generated code changes the economics of AppSec

AI-assisted development does not just change how code is written. It changes how much code security teams must review. As coding agents and AI-assisted workflows increase developer output, pull request volume grows while AppSec headcount remains relatively flat.

Organizations need security controls that can scale review coverage without relying on manual validation or developer self-review. Independent verification becomes increasingly important as AI systems generate more of the software entering production

How DryRun analyzes code

DryRun does not treat "AI review" as a single model call. It runs code changes through a multi-stage analysis pipeline designed for real-time AppSec at pull request scale: Harness → Planner → Eval → Exploitable → Contextual Security Analysis (CSA).

Harness

collects and normalizes context about the change and the repository.

Planner

decides which checks and tools to run for the specific diff.

Eval

runs those checks across multiple models and deterministic analyzers.

Exploitable

focuses the system on issues that look reachable and meaningful in the real application — not just theoretical patterns.

CSA

performs the final review pass that ties architecture, data flow, and business logic together into developer-ready findings.

‍

DryRun does not treat AI review as a single model call. It combines contextual analysis, deterministic checks, policy enforcement, and AI reasoning into an independent verification system for AI-generated and human-written code. The goal is not to determine which model is best, but to provide security teams with a control they can trust across every pull request.

Developer tools are author-driven. AppSec platforms are org-driven.

Claude Code /security-review can review a pull request. The question is whether it runs because the developer chose to run it, or because the organization owns it as a control.

In the default workflow, Claude Code is author-driven. The developer chooses when to run it, what context to include, what prompt to use, and when the answer is good enough. You can move Claude Code closer to a control by wiring it into CI and enforcing pipeline rules — but then your team is building the surrounding platform: trigger logic, policy exceptions, result handling, reporting, evaluation, and governance.

DryRun is designed for that org-owned model from the beginning. It runs in the pull request path, applies consistent contextual review and Natural Language Code Policies, and gives AppSec teams a way to manage findings across repositories and teams.

Evaluation and model degradation: the part most teams skip

The bigger risk in internal builds is not initial inaccuracy. It is silent drift.

As models update, prompts evolve, and codebases change, you need disciplined evaluations to know whether the system is still safe to trust. Before a model is released into the DryRun Security production workflow, it is evaluated against existing models across four critical areas:

Output integrity

Hallucinations, prompt leakage, response completeness, and whether the model invents fixes or exposes sensitive details.

Security accuracy

Vulnerability identification, severity correctness, and root-cause quality.

Instruction & label compliance

Whether the model follows instructions and applies correct security classifications instead of drifting into generic advice.

Agentic system performance

Planning quality, delegation decisions, tool usage, and task adherence across multi-agent workflows.

‍

DryRun's public coverage matrix and SAST Accuracy Report show how the Contextual Security Analysis engine is evaluated across vulnerability categories such as injection, broken auth, and logic flaws.

Replicating that evaluation discipline inside a single company is non-trivial. It is a product line, not a side project on top of "we wired a model into CI."

Observability for agentic systems, not just microservices

Even with strong prompts and a model integrated into CI, agentic systems fail in ways traditional services do not. Common failure modes include:

Wrong or missing context — incomplete history or the wrong files
Misfired tool calls or incorrect routing
Subtle provider behavior changes that turn previously good prompts into noisy or weak output
Retries and loops that increase latency and cost without improving quality

Without observability into tool invocation, context assembly, routing decisions, latency, retries, and output quality, you cannot safely debug or improve the system. DryRun Security includes telemetry that helps its team understand how agents reason about code changes, where they spend time, and how behavior shifts over time.

Cost: pull request scale scanning is where budgets go to die

Scanning a single repository with a model is cheap. Scanning every pull request across dozens or hundreds of services all year is not. Internal builds often underestimate:

Weak caching and redundant calls
Oversized models used for simple assessments
Reprocessing the same context across multiple checks
Unbounded agent loops that quietly balloon usage
The hidden cost of building the control layer around the scan

DryRun Security is optimized for pull request scale scanning, combining contextual analysis with caching, smart scoping, and multi-model strategies to manage cost without sacrificing accuracy.

Governance and change management: findings as controls

To function as a real security control, your AI AppSec system must plug into governance. Findings must be:

Tracked and triaged over time
Tied into tickets and change management
Mapped to policies that auditors and leadership can understand
Measured for effectiveness, not just volume

DryRun Security treats security findings as part of an enterprise control fabric — not just bot comments. Its contextual analysis and Natural Language Code Policies help security teams enforce standards across services without writing custom rules for every framework.

Where building still makes sense

Enterprises should not abandon internal experimentation. Building makes sense when you:

Need to test frontier models or highly specific workflows
Want to integrate AI-driven tooling tightly into custom pipelines
Are exploring new types of security checks that are not yet standardized

The best pattern is not "never build." It is build for exploration and differentiation, buy for the core control.

Use internal builds to learn, prototype, and push the edge of what your team needs. Use a specialized platform like DryRun Security for the production control: accurate, contextual, low-noise SAST integrated into developer workflows.

DryRun Security helps AppSec teams independently verify every pull request, apply security policies consistently, and prove that security controls are operating as intended across the software development lifecycle.

As models like Claude Code and Mythos advance, the competitive advantage shifts away from "which single model do you use?" and toward "what system have you built around them?" DryRun Security is that system for AI-native AppSec.

See how DryRun helps AppSec teams verify code before it reaches production.

Nic Lechner

Head of Customer Success

No items found.

Vulnerability Class	Snyk (partial)	GitHub (CodeQL) (partial)	Semgrep	SonarQube	DryRun Security
SQL Injection			*
Cross-Site Scripting (XSS)
SSRF
Auth Flaw / IDOR
User Enumeration
Hardcoded Token

Vulnerability Class	Snyk Code	GitHub Advanced Security (CodeQL)	Semgrep	SonarQube	DryRun Security
SQL Injection (SQLi)
Cross-Site Scripting (XSS)
Server-Side Request Forgery (SSRF)
Auth Logic/IDOR
User Enumeration
Hardcoded Credentials

Vulnerability	DryRun Security	Semgrep	GitHub CodeQL	SonarQube	Snyk Code
1. Remote Code Execution via Unsafe Deserialization
2. Code Injection via eval() Usage
3. SQL Injection in a Raw Database Query
4. Weak Encryption (AES ECB Mode)
5. Broken Access Control / Logic Flaw in Authentication
Total Found	5/5	3/5	1/5	1/5	0/5

Vulnerability	DryRun Security	Snyk	CodeQL	SonarQube	Semgrep
Server-Side Request Forgery (SSRF)				(Hotspot)
Cross-Site Scripting (XSS)
SQL Injection (SQLi)
IDOR / Broken Access Control
Broken Authentication Logic
Invalid Token Validation Logic
Broken Email Verification Logic

	Severity
	Critical	High
Location	utils/authorization.py :L118	utils/authorization.py :L49 & L82 & L164
Issue	JWT Algorithm Confusion Attack: jwt.decode() selects the algorithm from unverified JWT headers.	Insecure OIDC Endpoint Communication: ‍urllib.request.urlopen called without explicit TLS/CA handling.
Impact	Complete auth bypass (switch RS256→HS256, forge tokens with public key as HMAC secret).	Susceptible to MITM if default SSL behavior is weakened or cert store compromised.
Remediation	Replace the dynamic algorithm selection with a fixed, expected algorithm list. Change line 118 from algorithms=[unverified_header.get('alg', 'RS256')] to algorithms=['RS256'] to only accept RS256 tokens. Add algorithm validation before token verification to ensure the header algorithm matches expected values.	Create a secure SSL context using ssl.create_default_context() with proper certificate verification. Configure explicit timeout values for all HTTP requests to prevent hanging connections. Add explicit SSL/TLS configuration by creating an HTTPSHandler with the secure SSL context. Implement proper error handling specifically for SSL certificate validation failures.
Key Insight	This vulnerability arises from trusting an unverified portion of the JWT to determine the verification method itself	This vulnerability stems from a lack of explicit secure communication practices, leaving the application reliant on potentially weak default behaviors.