Documentation
Find the bugs your team missed. Setup, pipeline, memory, commands — everything you need to get started.
Three minutes from zero to your first automated review.
One click at github.com/apps/argus-eye. Works with orgs and personal accounts. Your repos appear in the dashboard immediately.
Choose which repos Argus watches. Enable all or pick specific ones. You can change this any time.
Bring your own key — OpenAI, Anthropic, or any OpenRouter provider. Your key, your costs, your data stays yours.
Every PR triggers Argus automatically. Inline comments appear with one-click suggestion fixes you can commit straight from GitHub.
Choose a review persona, add custom rules, or let Argus learn your team's patterns over time. It gets sharper with every review.
Every PR triggers a nine-stage pipeline. Each stage runs a different model, configurable per-repo. The entire sequence completes in under 60 seconds.
Classifies every changed file as skip, skim, or deep review. Generated files, lockfiles, and vendored dependencies are discarded before a single token is spent.
Lead agent produces a brief for each file, identifying cross-cutting concerns, blast radius, and relevant context from memory before specialists begin.
4 specialists per file run in parallel — bug_hunter, security, architecture, regression — each reviewing with full codebase context and the briefing document.
Union-find deduplication removes duplicate findings across specialists. Same issue caught by multiple specialists is merged into a single finding.
Validates all findings have correct line ranges, file paths, and comment structure. Malformed findings are repaired or dropped before scoring.
A separate model scores each finding 0–100 independently. Threshold at 65 drops noise. Only high-signal findings survive to the next stage.
Re-reviews hot files (3+ high-scoring comments) with the architecture specialist. Catches systemic issues that only emerge after seeing the full finding set.
Generates a structured verdict with critical issues, warnings, and a top findings summary. Not a paragraph — a scannable, actionable review.
Posts to GitHub as inline comments. Learns reusable patterns, extracts codebase conventions, enriches the PR description with missing context and mermaid diagrams, and builds the architecture graph.
Most review tools see the diff. Argus sees the system.
Before reviewing a single line of code, Argus builds a living model of your codebase that evolves with every review. This is what separates a linter from an engineer.
Argus traces callers, imports, tests, and shared types. When you change a function, Argus already knows who calls it — and what breaks if the contract shifts.
A persistent dependency graph maps every function and class. On each PR, Argus surfaces what downstream code is affected. No more "I didn't realize that module depended on this."
Past bugs, incidents, and edge cases are remembered across team turnover. "The last time this module changed, EU billing broke." Argus remembers so your team doesn't have to.
Every review, every developer reply, every fix builds a living knowledge graph. Patterns that were dismissed stop recurring. Patterns that were confirmed get reinforced.
Argus maintains a world model of your codebase. The more it reviews, the more it understands. Context is not a feature — it is the architecture.
Before you merge, Argus imagines what happens.
Given a PR and known scenarios from your codebase history, Argus simulates execution paths and reports what it finds. Confidence scores tell you how certain the system is.
Root cause: No idempotency key on the cancellation path. Two concurrent requests reach the payment provider — first succeeds, second throws. DB update runs for both.
Impact: Double refund issued. Revenue loss proportional to cancellation volume.
Fix: Add mutex or idempotency key. Wrap call + DB write in a transaction.
Root cause: Deleted user IDs are recycled. Infinite TTL cache serves stale data from the previous account holder.
Impact: Data leakage between accounts. Severity scales with user churn.
Result: Idempotency key already present on this path. Retry is safe. No state corruption detected.
Simulation is powered by scenario memory — the richer your review history, the more scenarios Argus can test against. Currently in experimental rollout.
Argus doesn't post a list of findings. It writes you a review the way a senior engineer would — conversational, opinionated, and to the point.
Every review has three layers: the summary, the inline comments, and the feedback loop.
Verdict: Adds 20 utility modules but has critical security and correctness issues that must be fixed before merging.
Critical issues:
src/lib/convert/units.ts:L15 — Hour multiplier is 360,000ms instead of 3,600,000mssrc/lib/filter/predicate.ts:L42 — User input passed directly to RegExp without escapingWarnings:
src/lib/color/grade.ts:L10 — No NaN check before clampingsrc/lib/counter/rolling.ts:L28 — Unbounded bucket array (+4 more)Every inline comment follows a structured format: what the issue is, why it matters, and a one-click suggestion fix when applicable.
What: Two concurrent cancellation requests can both pass the status === "active" check. First succeeds at the payment provider, second throws — but the DB update runs for both.
Why: No lock or idempotency key on this path. The check-then-act window is ~200ms under load. This will cause double refunds in production.
Every Argus comment has approval reactions. Your feedback directly shapes future reviews.
Reinforces the pattern. Argus will catch similar issues with higher confidence in future reviews.
Suppresses the pattern. Argus stores a “dismissed” signal and avoids similar false positives going forward.
Watch reviews happen in real time.
When a review is in progress, the review detail page streams live activity via WebSocket. You see exactly what Argus is doing as it happens.
WebSocket-powered real-time updates. See which file is being reviewed, which specialist is assigned, and comments as they arrive.
Watch findings get scored in real time. Low-confidence findings drop out as scoring completes.
Live token usage and cost counter updates as each pipeline stage completes.
Running timer shows total review duration. Auto-scrolls when you're at the bottom, stops auto-scroll when you scroll up to read.
The timeline is collapsible for long reviews. All activity persists in the review detail page after completion.
Every finding is tagged with one of four severity levels. These drive the quality score and determine what gets posted.
Bugs, security vulnerabilities, data loss risks, or logic errors that will cause failures in production.
Performance issues, error handling gaps, race conditions, or code that works but is fragile.
Readability improvements, style consistency, better naming, or minor refactors.
Well-written code, good patterns, clever solutions, or thorough test coverage worth highlighting.
Every finding is also tagged with a category — the type of issue detected.
Injection vulnerabilities, leaked credentials, unsafe deserialization, SSRF, path traversal.
Off-by-one errors, nil dereferences, broken invariants, incorrect boolean logic, missing edge cases.
N+1 queries, unnecessary allocations, missing caching, O(n²) where O(n) is possible.
Swallowed errors, empty catch blocks, missing error propagation, silent fallbacks.
Unclear naming, complex nesting, missing comments on non-obvious logic, dead code.
Formatting inconsistencies, convention violations, import ordering, naming patterns.
Weak type invariants, stringly-typed APIs, missing generics, poor encapsulation.
Missing edge case tests, brittle assertions, untested error paths, test-only code in production.
Tell Argus what matters to your team. Rules are injected into every review, so every comment reflects your standards — not generic best practices.
Create rules in the dashboard under Rules. Each rule has a category, content, priority, and enabled flag. These apply to all repos in your org.
Add a .argus/rules.md file to your repo. Repo rules override org rules in the same category.
## security
- Always flag hardcoded API keys or secrets
- Check for SQL injection in raw query strings
## performance
- Flag N+1 queries in ORM code
- Warn about unbounded list fetches without pagination
## style
- Enforce camelCase for variables, PascalCase for types
- Require JSDoc on exported functionsAll 4 pipeline stages are independently configurable per-repo from the Settings page. Default model depends on your OpenRouter key. Temperature and MaxTokens are adjustable per stage via sliders.
Supported providers: OpenRouter, OpenAI, Anthropic, Azure OpenAI, GCP Vertex AI, AWS Bedrock, and Zhipu AI. Custom model names are supported — enter any model identifier your provider accepts.
Your keys, your models, your bill. Argus never stores prompts or code on our servers — API calls go straight from our backend to your chosen provider. No hidden costs, no surprises.
sk-...**** only. Full key never sent to frontend.We never see your code. We never see your keys. Without a key configured, Argus posts a friendly onboarding comment on your first PR linking to Settings.
Not every PR needs the same reviewer. Personas tune the tone, focus, and severity threshold — from a gentle mentor to a zero-mercy auditor. Set a default per-repo or override per-PR.
Balanced across all categories. The standard Argus experience most teams start with.
Treats every PR like a pen test. Injection risks, auth flaws, data exposure, SSRF.
Hunts N+1 queries, memory leaks, O(n²) loops, and missing cache invalidation.
Explains the why behind every comment. Suggests learning resources. Built for growing teams.
Thinks in boundaries. API contracts, separation of concerns, dependency direction.
No free passes. Comments on everything. Maximum coverage, minimum mercy.
Define your own persona with a freeform system prompt. Full control over tone, focus, and severity.
Override per-PR with @argus-eye review --persona strict
@argus-eye review --persona security_auditorTalk to Argus directly from any PR. Mention @argus-eye followed by a command and it responds in seconds.
@argus-eye reviewTrigger a full review. Add --force to re-review at the same SHA. Add --persona to switch style for this PR only.
@argus-eye review --force --persona mentor@argus-eye remember <pattern>Teach Argus something new. Saves a pattern to memory for future reviews. Add --org to apply across all repos.
@argus-eye remember --org always check for SQL injection in raw queries@argus-eye resolveScans all unresolved review threads and resolves ones where the referenced file has been updated in the latest push.
@argus-eye resolve@argus-eye fixApplies every suggestion block from the review as a single atomic commit pushed straight to your PR branch.
@argus-eye fix@argus-eye testGenerate a test plan from review findings. Covers unit, edge case, integration, and regression tests.
@argus-eye test@argus-eye test --codeDraft executable test code for findings, matching your project's framework and conventions.
@argus-eye test --code@argus-eye review --persona <name>Review with a specific persona for this PR only. Overrides the repo default.
@argus-eye review --persona strict@argus-eye helpLists all available commands and their usage right in the PR.
@argus-eye helpTurn review findings into tests before you merge.
Argus analyzes its own findings and generates targeted test plans or executable test code. No more “I'll add a test later.”
@argus-eye test generates a structured test plan covering unit tests, edge cases, integration tests, and regression tests — all derived from the review findings on the current PR.
@argus-eye test --code drafts ready-to-run test code that matches your project's testing framework and conventions. Copy, paste, run.
Test generation uses the same review context and memory that powers the review pipeline. The richer the review, the better the tests.
Most tools forget between PRs. Argus remembers everything.
Every review, every developer reaction, every fix and dismissal feeds a growing knowledge base. The system doesn't just review code — it accumulates institutional memory that survives team turnover.
Code conventions auto-learned from your codebase. Error handling styles, naming patterns, architecture decisions — extracted from what your team actually writes, not what a style guide says.
Three sources: auto-extracted from reviews, auto-imported from GitHub Issues labeled argus or bug, and manual via bot command. Each scenario includes steps, initial state, and expected outcome. Scenarios are marked outdated when referenced files change. React 👎 to dismiss.
Every review comment, every developer reply, every approval and dismissal. This is review history as institutional memory. Why was this pattern introduced? Who approved it? What broke last time?
The "event clock" of your codebase. A living record of why things are the way they are — connecting reviews, patterns, scenarios, and code changes into a navigable knowledge graph.
Every review makes the system smarter. Patterns that get approved are reinforced. Patterns that get dismissed are suppressed. Scenarios that match real bugs get higher confidence. Over time, Argus converges on your team's actual standards — not generic rules, but the hard-won knowledge that usually lives only in senior engineers' heads.
Your codebase has a health score now.
The Insights dashboard aggregates everything Argus learns into an operational view of your codebase. Not vanity metrics — actionable risk signals drawn from real review data.
Files most frequently flagged across reviews. These are the parts of your codebase that keep breaking — the modules that need a rewrite or better test coverage.
Per-file and per-module risk scores based on severity history, change frequency, and unresolved findings. Higher risk = higher attention from Argus.
A chronological view of every review, reaction, and pattern learned. See how your codebase quality trends over time — and which decisions shaped it.
Track quality scores across PRs, repos, and teams. Spot regressions before they compound. Know when a refactor is paying off.
Know exactly what every review costs.
Argus records per-stage token usage and cost for every review. Model and provider are tracked independently for each stage. Token data persists even on failed reviews.
Token usage tracked for: triage, review, scoring, synthesis, enrichment, conventions, patterns, file_synthesis, and graph. Each stage records input tokens, output tokens, model, and cost.
Hover any TokenPill in the review detail page to see the full cost breakdown per stage, including model name and provider.
Every advanced capability can be toggled independently per-repo. Start with the defaults and enable features as your team is ready.
Enables the 4-specialist parallel review (bug_hunter, security, architecture, regression) per file.
Enables dependency tracing and caller analysis across your codebase during review.
Maps downstream impact of every change using the persistent dependency graph.
Simulates execution paths against known scenarios. Reports confidence, root cause, and impact.
Auto-enriches PR descriptions with missing context and mermaid diagrams.
Learns reusable patterns from high-confidence findings across reviews.
Extracts codebase conventions from diffs — naming, error handling, architecture patterns.
Creates per-file institutional memory — summaries of what each file does and how it has changed.
Extracts dependency graph from code changes. Powers blast radius analysis and cross-file context.
All toggles are accessible from Settings in the dashboard. Changes take effect on the next review.