By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
ToolAccuracy of FindingsDetects Non-Pattern-Based Issues?Coverage of SAST FindingsSpeed of ScanningUsability & Dev Experience
DryRun SecurityVery high – caught multiple critical issues missed by othersYes – context-based analysis, logic flaws & SSRFBroad coverage of standard vulns, logic flaws, and extendableNear real-time PR feedback
Snyk CodeHigh on well-known patterns (SQLi, XSS), but misses other categoriesLimited – AI-based, focuses on recognized vulnerabilitiesGood coverage of standard vulns; may miss SSRF or advanced auth logic issuesFast, often near PR speedDecent GitHub integration, but rules are a black box
GitHub Advanced Security (CodeQL)Very high precision for known queries, low false positivesPartial – strong dataflow for known issues, needs custom queriesGood for SQLi and XSS but logic flaws require advanced CodeQL experience.Moderate to slow (GitHub Action based)Requires CodeQL expertise for custom logic
SemgrepMedium, but there is a good community for adding rulesPrimarily pattern-based with limited dataflowDecent coverage with the right rules, can still miss advanced logic or SSRFFast scansHas custom rules, but dev teams must maintain them
SonarQubeLow – misses serious issues in our testingLimited – mostly pattern-based, code quality orientedBasic coverage for standard vulns, many hotspots require manual reviewModerate, usually in CIDashboard-based approach, can pass “quality gate” despite real vulns
Vulnerability ClassSnyk (partial)GitHub (CodeQL) (partial)SemgrepSonarQubeDryRun Security
SQL Injection
*
Cross-Site Scripting (XSS)
SSRF
Auth Flaw / IDOR
User Enumeration
Hardcoded Token
ToolAccuracy of FindingsDetects Non-Pattern-Based Issues?Coverage of C# VulnerabilitiesScan SpeedDeveloper Experience
DryRun Security
Very high – caught all critical flaws missed by others
Yes – context-based analysis finds logic errors, auth flaws, etc.
Broad coverage of OWASP Top 10 vulns plus business logic issuesNear real-time (PR comment within seconds)Clear single PR comment with detailed insights; no config or custom scripts needed
Snyk CodeHigh on known patterns (SQLi, XSS), but misses logic/flow bugsLimited – focuses on recognizable vulnerability patterns
Good for standard vulns; may miss SSRF or auth logic issues 
Fast (integrates into PR checks)Decent GitHub integration, but rules are a black box (no easy customization)
GitHub Advanced Security (CodeQL)Low - missed everything except SQL InjectionMostly pattern-basedLow – only discovered SQL InjectionSlowest of all but finished in 1 minuteConcise annotation with a suggested fix and optional auto-remedation
SemgrepMedium – finds common issues with community rules, some missesPrimarily pattern-based, limited data flow analysis
Decent coverage with the right rules; misses advanced logic flaws 
Very fast (runs as lightweight CI)Custom rules possible, but require maintenance and security expertise
SonarQube
Low – missed serious issues in our testing
Mostly pattern-based (code quality focus)Basic coverage for known vulns; many issues flagged as “hotspots” require manual review Moderate (runs in CI/CD pipeline)Results in dashboard; risk of false sense of security if quality gate passes despite vulnerabilities
Vulnerability ClassSnyk CodeGitHub Advanced Security (CodeQL)SemgrepSonarQubeDryRun Security
SQL Injection (SQLi)
Cross-Site Scripting (XSS)
Server-Side Request Forgery (SSRF)
Auth Logic/IDOR
User Enumeration
Hardcoded Credentials
VulnerabilityDryRun SecuritySemgrepGitHub CodeQLSonarQubeSnyk Code
1. Remote Code Execution via Unsafe Deserialization
2. Code Injection via eval() Usage
3. SQL Injection in a Raw Database Query
4. Weak Encryption (AES ECB Mode)
5. Broken Access Control / Logic Flaw in Authentication
Total Found5/53/51/51/50/5
VulnerabilityDryRun SecuritySnykCodeQLSonarQubeSemgrep
Server-Side Request Forgery (SSRF)
(Hotspot)
Cross-Site Scripting (XSS)
SQL Injection (SQLi)
IDOR / Broken Access Control
Invalid Token Validation Logic
Broken Email Verification Logic
DimensionWhy It Matters
Surface
Entry points & data sources highlight tainted flows early.
Language
Code idioms reveal hidden sinks and framework quirks.
Intent
What is the purpose of the code being changed/added?
Design
Robustness and resilience of changing code.
Environment
Libraries, build flags, and infra metadata flag, infrastructure (IaC) all give clues around the risks in changing code.
KPIPattern-Based SASTDryRun CSA
Mean Time to Regex
3–8 hrs per noisy finding set
Not required
Mean Time to Context
N/A
< 1 min
False-Positive Rate
50–85 %< 5 %
Logic-Flaw Detection
< 5 %
90%+
Severity
CriticalHigh
Location
utils/authorization.py :L118
utils/authorization.py :L49 & L82 & L164
Issue
JWT Algorithm Confusion Attack:
jwt.decode() selects the algorithm from unverified JWT headers.
Insecure OIDC Endpoint Communication:
urllib.request.urlopen called without explicit TLS/CA handling.
Impact
Complete auth bypass (switch RS256→HS256, forge tokens with public key as HMAC secret).
Susceptible to MITM if default SSL behavior is weakened or cert store compromised.
Remediation
Replace the dynamic algorithm selection with a fixed, expected algorithm list. Change line 118 from algorithms=[unverified_header.get('alg', 'RS256')] to algorithms=['RS256'] to only accept RS256 tokens. Add algorithm validation before token verification to ensure the header algorithm matches expected values.
Create a secure SSL context using ssl.create_default_context() with proper certificate verification. Configure explicit timeout values for all HTTP requests to prevent hanging connections. Add explicit SSL/TLS configuration by creating an HTTPSHandler with the secure SSL context. Implement proper error handling specifically for SSL certificate validation failures.
Key Insight
This vulnerability arises from trusting an unverified portion of the JWT to determine the verification method itself
This vulnerability stems from a lack of explicit secure communication practices, leaving the application reliant on potentially weak default behaviors.
Security
March 19, 2026

Steering Agentic Security Scanners with Git Behavioral Graphs

Steering Agentic Security Scanners with Git Behavioral Graphs

How we use git forensics and behavioral analysis to help agentic AI security scanners find vulnerabilities that static analysis tools miss in enterprise codebases.

The techniques in this post are grounded in Adam Tornhill's Your Code as a Crime Scene (2nd ed., Pragmatic Programmers, 2024). We took those forensic principles and engineered them into a pipeline that steers an AI agent with deterministic precision.

In modern application security, the traditional method of static analysis (SAST) using abstract syntax tree (AST) parsing and call graphs is a well studied, well traveled road. However, traditional scanners lack a fundamental dimension of context: the human element. Vulnerabilities are rarely just syntactical errors; they are the byproduct of diffuse ownership, shifting requirements, and knowledge decay.

When building our agentic AI security scanner, we faced a hard constraint: LLM context windows are finite, and agentic loops are computationally expensive. Put simply, we can’t just feed a giant monorepo into an LLM and ask it to "please find bugs. K, thx, bye." While we already utilize a hybrid LLM-driven context gathering and SAST-esque multi-agent architecture to surface traditional vulnerabilities - hunting down non-traditional, structural flaws requires an additional layer. For these deeper, complex logic issues, the agent needs a deterministic, high-signal heuristic to prioritize its attention.

To solve this, we introduced a pre-processing pipeline that constructs a Git Behavioral Graph. Before our agent reads a single line of code, it analyzes the repository’s commit history across five distinct behavioral axes.

The Data Ingestion Engine

Parsing git history across a decade-old monorepo is notoriously memory-intensive. Loading the entire commit tree into memory at once guarantees out-of-memory (OOM) crashes.

To solve this, our harvester utilizes an asynchronous subprocess streaming architecture. We pipe git log --numstat with custom delimiters directly into a generator function. This allows us to yield, parse, and evaluate a single commit block at a time, resolving complex git brace rename notations on the fly while maintaining a strictly bounded memory footprint.

With this streaming foundation, we calculate five high-signal behavioral metrics.

1. Code Churn: Quantifying Diffuse Ownership

Code churn is mathematically one of the strongest historical predictors of vulnerability density. [Tornhill, Your Code as a Crime Scene, 2nd ed. code churn as a leading predictor of defect density] Our pipeline extracts total revisions, distinct contributor counts, and cumulative line additions/deletions.

Rather than relying on overly complex models, we look for two highly accurate heuristic anomalies:

  • The "Many-Hands" Anti-Pattern: When the number of distinct authors scales linearly with the commit count, ownership becomes completely diffuse. No single engineer holds the comprehensive mental model of the module's security boundaries. 
  • Security Contract Renegotiation: When a file exhibits a high churn velocity where cumulative deleted lines approximate cumulative added lines, it indicates continuous, heavy rewriting. This is a classic signature of a module whose design is fundamentally unstable and lacks a clean architectural reset.

AI entities (Codex, Claude, etc.) do not improve the situation here or level of risk. Arguably, the opposite is true. They are not humans with long term knowledge and storage. They have a point-in-time concept of the changing code with discrete directions, narrow focus, and the range of contributors can now widen with increased shipping velocity. Suddenly your support team can complete their own tickets. Marketing can write their own plugins. 

2. Temporal Coupling: Mapping Invisible Dependencies

Auth, payment, and session files that change together in every commit with no formal import relationship, but the coupling degree doesn't lie.

Attackers exploit the invisible gaps between microservices. To find these, we utilize temporal coupling—a forensic graph technique that identifies files that frequently change together in the same changeset, regardless of whether a formal AST import relationship exists.[Tornhill, Your Code as a Crime Scene, 2nd ed. — temporal coupling / change coupling as a method for discovering hidden architectural dependencies]

We model this as a bipartite graph of commits to files. To calculate the coupling degree, we look at the ratio of shared revisions to the average individual revision count of the pair:

Algorithmic Optimizations: Graph algorithms can easily explode in computational complexity. To keep our scanner highly performant, we apply strict pruning heuristics:

  1. Changeset Caps: We ignore massive, repo-wide refactors (e.g., license header sweeps) by filtering out any commit touching more than 30 files.
  2. Cold File Pruning: A file pair must have a baseline history of at least 5 individual revisions. This prevents a cold file from generating a spurious 100% coupling degree after a single co-commit.
  3. Aggressive Garbage Collection: Every 5,000 commits, our streaming engine pauses to prune pairs that only share a single co-commit. Because of our minimum revision threshold, a single co-commit can mathematically never reach our 50% target threshold, making it pure noise safe for garbage collection.

3. Knowledge Decay: Age and Ownership Concentration

Age and ownership determine if anyone actively understands a critical module. We track this by combining two straightforward metrics:

  • Age in Months: Calculated by diffing the file's last modified timestamp against current UTC time. High-age files often rely on deprecated cryptographic wrappers or broken standards (MD5, HTTP Basic Auth) that were patched everywhere else but forgotten here.
  • Ownership Concentration: We calculate the fraction of a file's total surviving code attributed to its primary contributor. If a single developer owns over 90% of a heavily modified file, it acts as a knowledge silo—if they leave, the implicit security context evaporates.

4. Temporal Anomalies: Circular Statistics for Threat Hunting

To identify compromised CI pipelines or insider threats, we flag commits made at statistically unusual hours. However, standard linear math fails on clocks—the linear mean of 23:00 and 01:00 is incorrectly calculated as 12:00, destroying the signal.

Instead, we map commit timestamps onto a unit circle using circular statistics. We extract the decimal hour ($HH + \frac{MM}{60}$) and convert it to an angle in radians:

We compute the mean of the sine and cosine components across an author's history to find their true circular mean, and calculate the circular standard deviation based on the shortest angular distances.

If a commit's angular distance from the author's circular mean exceeds $3.0\sigma$, it triggers an anomaly. To aggressively filter noise, we mandate a baseline commit history per author and completely ignore any anomalies occurring within standard local business hours (9 AM–5 PM).

5. Intent Mining: The Developer Confessional

Developers narrate their architectural shortcuts in commit messages. Keywords like "bypass," "hardcode," or "remove before" function as explicit confessions of technical debt.

While searching commit messages is trivial with regex, making it actionable for an AI agent is a data projection problem. Historically interesting hacks on files that no longer exist waste the LLM's finite context window.

Therefore, our pipeline strictly intersects the regex matches with the output of git ls-files. If a commit confesses to a "quick fix" and the files modified in that specific commit still exist in the current working tree, the agent is directed to analyze those live paths for latent bypass surfaces. To prevent sensitive data leaks into the LLM context, all matched commit messages are routed through a dedicated secrets sanitizer before being passed to the planner.

How We Orchestrate the Agent: Using Deterministic Code Quality to Bound AI StochasticityIn software engineering, classic code quality metrics like cyclomatic complexity, code churn, and temporal coupling are traditionally used to predict maintenance burden and technical debt. However, from an AppSec perspective, maintenance burden is computationally indistinguishable from security risk. Where code quality degrades, structural entropy rises, and vulnerabilities inevitably breed.

A seasoned human security auditor does not read a monorepo linearly from main.py. They operate on heuristics and intuition: they check git blame on the cryptographic wrappers, they look for the auth module everyone hot-patches but nobody actually owns, and they zero in on the files littered with "TODO: fix bypass later" comments. By building this Git Behavioral Graph, we are explicitly digitizing and scaling that human intuition into a deterministic data structure. [Tornhill, Your Code as a Crime Scene, 2nd ed. the core thesis: applying forensic psychology to treat git history as behavioral evidence]

This is crucial because Large Language Models are inherently stochastic engines. If deployed naively against a massive codebase, they suffer from context fragmentation, attention decay, and hallucination. By pre-processing the repository into this behavioral artifact, we fundamentally alter the agent's execution model. We use deterministic, mathematically rigorous algorithms to tightly bound the search space. Only then do we unleash the LLM's stochastic reasoning to evaluate the semantics within that bounded space.

We no longer ask our agent to blindly "find vulnerabilities" in a vacuum. We provide it with a prioritized, JSON-structured map of human risk pointing its finite context window precisely at the highly-coupled, heavily-churned, late-night architectural hacks. This synthesis of deterministic graph analysis and agentic semantic reasoning is what allows us to scale elite, human-level security auditing across enterprise codebases.

If you’re building AI-powered security tooling and want to talk through how behavioral analysis fits into your pipeline, we’d love to hear from you.

The behavioral analytics in this post are built on techniques pioneered by Adam Tornhill. For a deep dive into the theory:

  • Your Code as a Crime Scene, Second Edition - Adam Tornhill (Pragmatic Programmers, 2024) - pragprog.com/titles/atcrime2
  • Software Design X-Rays - Adam Tornhill (Pragmatic Programmers) – also worth reading for change coupling algorithms