jjjutla's comments

jjjutla · 2025-08-01T12:38:55 1754051935

No we didn't build one, we use the main foundation models. We have evals for each part of the workflow and different models perform better on different tasks, overall the majority of it uses Sonnet 4.

jjjutla · 2025-08-01T12:28:20 1754051300

Yes, that's exactly what we do. Some examples: https://github.com/eosphoros-ai/DB-GPT/pull/2650, https://github.com/dagster-io/dagster/pull/30002

We just need to follow responsible disclosure first by notifying the maintainers, working with them on a fix, and making it public once it is resolved.

jjjutla · 2025-07-31T23:09:21 1754003361

Thanks, we use a similar approach to GitHub's stack graphs (https://github.blog/open-source/introducing-stack-graphs/) to build a graph structure with definition/reference nodes. For dynamic typing in protobuf, we use the language compiler as an intermediary to resolve dynamic types into static relationships, then encode the relationships into protobuf.

Yes, we don't feed entire codebases to the LLM. The LLM queries our indexer for symbols names and code sections (exposed functions, data flow boundaries, sanitization functions) to build up the call chain and reason about the vulnerability.

jjjutla · 2025-07-31T20:51:01 1753995061

We’ve limited the free tier to one scan per user, so deleting a scan and starting a new one won’t work because of that restriction.

And yes, we don’t support C or C++ yet. Our focus is on detecting business logic vulnerabilities (auth bypasses, privilege escalations, IDORs) that traditional SAST tools often miss. The types of exploitable security issues typically found in C/C++ (mainly memory corruption type issues) are better found through fuzzing and dynamic testing rather than static analysis.

sanxiyn · 2025-08-01T02:04:12 1754013852

I understand it is not your focus, but fuzzing still falls short and there is a lot AI can help. For example, when there is checksum, fuzzers typically can't progress and it is "solved" by disabling checks when building for fuzzing. AI can just look at the source code doing the checksum and write the code to fill them in, or use its world knowledge to recognize the function is named sha_256 and import Python hashlib, etc.

Hint: we are working on this, and it can easily expand coverage in oss-fuzz even if those targets have been fuzzed for a long time with enormous amount of compute.

rixed · 2025-08-01T10:00:43 1754042443

Althoug a lot of the popular attention is directed toward buffer overflows and use-after-free errors, that does not mean that C programs are free from the same business logic vulnerabilities as programs written in other languages, or even that those errors are less frequent; Just that buffer overflows are easier to detect.

The other language that I would put next on the priority list is Java, which gecko also seems to not support. I guess gecko is more web-oriented, which makes sense for a security tool, I suppose.

Anyway, wish you lots of successes!

jjjutla · 2025-07-31T18:32:48 1753986768

Thank you. SAST tools built on AST or call graph parsing will struggle to detect code logic vulnerabilities because their models are too simplistic. They lose the language-specific semantics in dynamically typed languages where objects change at runtime, or in microservices where calls span multiple services. So they are limited to simple pattern-based detections and miss vulnerabilities that depend on long cross-file call chains and reflected function calls. These are the types of paths that auth bypasses and privilege escalations occur in.

AI code review tools aren’t designed for security analysis at all. They work using vector search or RAG to find relevant files, which is imprecise for retrieving these code paths in high token density projects. So any reasoning the LLM does is built on incomplete or incorrect context.

Our indexer uses LSIF for compiler-accurate symbol resolution so we can reconstruct full call chains, spanning files, modules, and services, with the same accuracy as an IDE. This code reasoning, tied with the LLM's threat modelling and analysis, allows for higher fidelity outputs.

jjjutla · 2025-07-31T17:33:35 1753983215

For all the vulns Gecko found they were manually validated by humans and have a CVE assigned by a CNA. The issue that curl had was because it was a paid bug bounty program they had an influx of AI slop reports that looked like real issues but weren't exploitable.

jjjutla · 2025-07-31T17:31:56 1753983116

The confidence score is calculated by two factors: whether the function call chain represents a valid code path (programmatic correctness) and how well it aligns with the defined threat model for what it thinks is a security vulnerability. False positives usually occur from incorrect assumptions about context, for example, flagging endpoints as missing authentication when such behaviour is actually intended.

Was this an incorrect code path or an incorrect understanding of a security issue?

This is why we focus heavily on threat modelling and defining the security and business invariants that must hold. From a code level, the only context we can infer is through developer intent and data flow analysis.

Something we are working on is custom rules and allowing a user to add context when starting a scan to improve alignment and reduces false positives.

bearsyankees · 2025-07-31T17:49:30 1753984170

The security issue and POCs provided were not real like they said there was a vuln but I double checked it and it was not an exploitable vuln

jjjutla · 2025-07-31T17:21:38 1753982498

We've had a few request for Elixir and it's definitely something we will work on.

jjjutla · 2025-07-31T17:00:30 1753981230

This is a bug, the email-address permissions have been descoped to read-only. Profile settings are either read/write or none, hence the former. If you're concerned about privacy, sign up using email/password.