Documentation

Argus Documentation

Find the bugs your team missed. Setup, pipeline, memory, commands — everything you need to get started.

Getting Started

Three minutes from zero to your first automated review.

Install the GitHub App

One click at github.com/apps/argus-eye. Works with orgs and personal accounts. Your repos appear in the dashboard immediately.

Select repositories

Choose which repos Argus watches. Enable all or pick specific ones. You can change this any time.

Add your API key

Bring your own key — OpenAI, Anthropic, or any OpenRouter provider. Your key, your costs, your data stays yours.

Open a pull request

Every PR triggers Argus automatically. Inline comments appear with one-click suggestion fixes you can commit straight from GitHub.

Teach it your standards

Choose a review persona, add custom rules, or let Argus learn your team's patterns over time. It gets sharper with every review.

The Review Pipeline

Every PR triggers a nine-stage pipeline. Each stage runs a different model, configurable per-repo. The entire sequence completes in under 60 seconds.

01Triage

Classifies every changed file as skip, skim, or deep review. Generated files, lockfiles, and vendored dependencies are discarded before a single token is spent.

02Briefing

Lead agent produces a brief for each file, identifying cross-cutting concerns, blast radius, and relevant context from memory before specialists begin.

03Deep Review

4 specialists per file run in parallel — bug_hunter, security, architecture, regression — each reviewing with full codebase context and the briefing document.

04Dedup

Union-find deduplication removes duplicate findings across specialists. Same issue caught by multiple specialists is merged into a single finding.

05Validate

Validates all findings have correct line ranges, file paths, and comment structure. Malformed findings are repaired or dropped before scoring.

06Scoring

A separate model scores each finding 0–100 independently. Threshold at 65 drops noise. Only high-signal findings survive to the next stage.

07Pass2

Re-reviews hot files (3+ high-scoring comments) with the architecture specialist. Catches systemic issues that only emerge after seeing the full finding set.

08Synthesis

Generates a structured verdict with critical issues, warnings, and a top findings summary. Not a paragraph — a scannable, actionable review.

09Post & Learn

Posts to GitHub as inline comments. Learns reusable patterns, extracts codebase conventions, enriches the PR description with missing context and mermaid diagrams, and builds the architecture graph.

What Argus Sees

Most review tools see the diff. Argus sees the system.

Before reviewing a single line of code, Argus builds a living model of your codebase that evolves with every review. This is what separates a linter from an engineer.

Cross-file context

Argus traces callers, imports, tests, and shared types. When you change a function, Argus already knows who calls it — and what breaks if the contract shifts.

Blast radius

A persistent dependency graph maps every function and class. On each PR, Argus surfaces what downstream code is affected. No more "I didn't realize that module depended on this."

Scenario memory

Past bugs, incidents, and edge cases are remembered across team turnover. "The last time this module changed, EU billing broke." Argus remembers so your team doesn't have to.

Decision traces

Every review, every developer reply, every fix builds a living knowledge graph. Patterns that were dismissed stop recurring. Patterns that were confirmed get reinforced.

Argus maintains a world model of your codebase. The more it reviews, the more it understands. Context is not a feature — it is the architecture.

Code Simulation

Before you merge, Argus imagines what happens.

Given a PR and known scenarios from your codebase history, Argus simulates execution paths and reports what it finds. Confidence scores tell you how certain the system is.

argus — simulation output

failsScenario: Concurrent subscription cancellationconfidence 94%

Root cause: No idempotency key on the cancellation path. Two concurrent requests reach the payment provider — first succeeds, second throws. DB update runs for both.

Impact: Double refund issued. Revenue loss proportional to cancellation volume.

Fix: Add mutex or idempotency key. Wrap call + DB write in a transaction.

degradesScenario: Cache key collision under ID reuseconfidence 78%

Root cause: Deleted user IDs are recycled. Infinite TTL cache serves stale data from the previous account holder.

Impact: Data leakage between accounts. Severity scales with user churn.

passesScenario: Webhook retry under network partitionconfidence 91%

Result: Idempotency key already present on this path. Retry is safe. No state corruption detected.

Simulation is powered by scenario memory — the richer your review history, the more scenarios Argus can test against. Currently in experimental rollout.

The Conversational Review

Argus doesn't post a list of findings. It writes you a review the way a senior engineer would — conversational, opinionated, and to the point.

Every review has three layers: the summary, the inline comments, and the feedback loop.

The summary

argus — review summary

Verdict: Adds 20 utility modules but has critical security and correctness issues that must be fixed before merging.

Critical issues:

src/lib/convert/units.ts:L15 — Hour multiplier is 360,000ms instead of 3,600,000ms
src/lib/filter/predicate.ts:L42 — User input passed directly to RegExp without escaping

Warnings:

src/lib/color/grade.ts:L10 — No NaN check before clamping
src/lib/counter/rolling.ts:L28 — Unbounded bucket array (+4 more)

2 critical · 2 warnings · 4 suggestions

Inline comments

Every inline comment follows a structured format: what the issue is, why it matters, and a one-click suggestion fix when applicable.

argus — inline comment

criticalbug

What: Two concurrent cancellation requests can both pass the status === "active" check. First succeeds at the payment provider, second throws — but the DB update runs for both.

Why: No lock or idempotency key on this path. The check-then-act window is ~200ms under load. This will cause double refunds in production.

The feedback loop

Every Argus comment has approval reactions. Your feedback directly shapes future reviews.

Approve

Reinforces the pattern. Argus will catch similar issues with higher confidence in future reviews.

Dismiss

Suppresses the pattern. Argus stores a “dismissed” signal and avoids similar false positives going forward.

Live Activity Timeline

Watch reviews happen in real time.

When a review is in progress, the review detail page streams live activity via WebSocket. You see exactly what Argus is doing as it happens.

Live streaming

WebSocket-powered real-time updates. See which file is being reviewed, which specialist is assigned, and comments as they arrive.

Scoring results

Watch findings get scored in real time. Low-confidence findings drop out as scoring completes.

Token & cost counter

Live token usage and cost counter updates as each pipeline stage completes.

Elapsed timer

Running timer shows total review duration. Auto-scrolls when you're at the bottom, stops auto-scroll when you scroll up to read.

The timeline is collapsible for long reviews. All activity persists in the review detail page after completion.

Severities

Every finding is tagged with one of four severity levels. These drive the quality score and determine what gets posted.

critical

Bugs, security vulnerabilities, data loss risks, or logic errors that will cause failures in production.

warning

Performance issues, error handling gaps, race conditions, or code that works but is fragile.

suggestion

Readability improvements, style consistency, better naming, or minor refactors.

praise

Well-written code, good patterns, clever solutions, or thorough test coverage worth highlighting.

Review Rules

Tell Argus what matters to your team. Rules are injected into every review, so every comment reflects your standards — not generic best practices.

Org-level rules

Create rules in the dashboard under Rules. Each rule has a category, content, priority, and enabled flag. These apply to all repos in your org.

Repo-level rules

Add a .argus/rules.md file to your repo. Repo rules override org rules in the same category.

## security
- Always flag hardcoded API keys or secrets
- Check for SQL injection in raw query strings

## performance
- Flag N+1 queries in ORM code
- Warn about unbounded list fetches without pagination

## style
- Enforce camelCase for variables, PascalCase for types
- Require JSDoc on exported functions

Model Configuration

All 4 pipeline stages are independently configurable per-repo from the Settings page. Default model depends on your OpenRouter key. Temperature and MaxTokens are adjustable per stage via sliders.

Stage

Default Model

Max Tokens

Temperature

triage

configurable

review

configurable

scoring

configurable

synthesis

configurable

Supported providers: OpenRouter, OpenAI, Anthropic, Azure OpenAI, GCP Vertex AI, AWS Bedrock, and Zhipu AI. Custom model names are supported — enter any model identifier your provider accepts.

API Keys (BYOK)

Your keys, your models, your bill. Argus never stores prompts or code on our servers — API calls go straight from our backend to your chosen provider. No hidden costs, no surprises.

Setup

Go to Settings in the dashboard
Select a repo and choose a provider (OpenAI, Anthropic, etc.)
Enter your API key — it's encrypted at rest
Pick a model for each pipeline stage (triage, review, scoring, synthesis)

Security

AES-256-GCM — bank-grade encryption at rest. Plaintext never persists.
Unique nonce — every key produces different ciphertext, even if identical.
In-memory only — decrypted for API calls, then discarded. Never logged or cached.
Scoped — isolated per installation. No other workspace can access your keys.
Masked — dashboard shows sk-...**** only. Full key never sent to frontend.

We never see your code. We never see your keys. Without a key configured, Argus posts a friendly onboarding comment on your first PR linking to Settings.

Review Personas

Not every PR needs the same reviewer. Personas tune the tone, focus, and severity threshold — from a gentle mentor to a zero-mercy auditor. Set a default per-repo or override per-PR.

default

Balanced across all categories. The standard Argus experience most teams start with.

security_auditor

Treats every PR like a pen test. Injection risks, auth flaws, data exposure, SSRF.

performance_engineer

Hunts N+1 queries, memory leaks, O(n²) loops, and missing cache invalidation.

mentor

Explains the why behind every comment. Suggests learning resources. Built for growing teams.

architect

Thinks in boundaries. API contracts, separation of concerns, dependency direction.

strict

No free passes. Comments on everything. Maximum coverage, minimum mercy.

custom

Define your own persona with a freeform system prompt. Full control over tone, focus, and severity.

Per-PR override

Override per-PR with @argus-eye review --persona strict

@argus-eye review --persona security_auditor

Bot Commands

Talk to Argus directly from any PR. Mention @argus-eye followed by a command and it responds in seconds.

@argus-eye review

Trigger a full review. Add --force to re-review at the same SHA. Add --persona to switch style for this PR only.

@argus-eye review --force --persona mentor

@argus-eye remember <pattern>

Teach Argus something new. Saves a pattern to memory for future reviews. Add --org to apply across all repos.

@argus-eye remember --org always check for SQL injection in raw queries

@argus-eye resolve

Scans all unresolved review threads and resolves ones where the referenced file has been updated in the latest push.

@argus-eye resolve

@argus-eye fix

Applies every suggestion block from the review as a single atomic commit pushed straight to your PR branch.

@argus-eye fix

@argus-eye test

Generate a test plan from review findings. Covers unit, edge case, integration, and regression tests.

@argus-eye test

@argus-eye test --code

Draft executable test code for findings, matching your project's framework and conventions.

@argus-eye test --code

@argus-eye review --persona <name>

Review with a specific persona for this PR only. Overrides the repo default.

@argus-eye review --persona strict

@argus-eye help

Lists all available commands and their usage right in the PR.

@argus-eye help

Test Generation

Turn review findings into tests before you merge.

Argus analyzes its own findings and generates targeted test plans or executable test code. No more “I'll add a test later.”

Test plan

@argus-eye test generates a structured test plan covering unit tests, edge cases, integration tests, and regression tests — all derived from the review findings on the current PR.

Executable test code

@argus-eye test --code drafts ready-to-run test code that matches your project's testing framework and conventions. Copy, paste, run.

Test generation uses the same review context and memory that powers the review pipeline. The richer the review, the better the tests.

Memory & Learning

Most tools forget between PRs. Argus remembers everything.

Every review, every developer reaction, every fix and dismissal feeds a growing knowledge base. The system doesn't just review code — it accumulates institutional memory that survives team turnover.

Patterns

Code conventions auto-learned from your codebase. Error handling styles, naming patterns, architecture decisions — extracted from what your team actually writes, not what a style guide says.

Scenarios

Three sources: auto-extracted from reviews, auto-imported from GitHub Issues labeled argus or bug, and manual via bot command. Each scenario includes steps, initial state, and expected outcome. Scenarios are marked outdated when referenced files change. React 👎 to dismiss.

Decision traces

Every review comment, every developer reply, every approval and dismissal. This is review history as institutional memory. Why was this pattern introduced? Who approved it? What broke last time?

Context graph

The "event clock" of your codebase. A living record of why things are the way they are — connecting reviews, patterns, scenarios, and code changes into a navigable knowledge graph.

The flywheel

Every review makes the system smarter. Patterns that get approved are reinforced. Patterns that get dismissed are suppressed. Scenarios that match real bugs get higher confidence. Over time, Argus converges on your team's actual standards — not generic rules, but the hard-won knowledge that usually lives only in senior engineers' heads.

Insights & Risk

Your codebase has a health score now.

The Insights dashboard aggregates everything Argus learns into an operational view of your codebase. Not vanity metrics — actionable risk signals drawn from real review data.

Hot files

Files most frequently flagged across reviews. These are the parts of your codebase that keep breaking — the modules that need a rewrite or better test coverage.

Risk scores

Per-file and per-module risk scores based on severity history, change frequency, and unresolved findings. Higher risk = higher attention from Argus.

Decision trace timeline

A chronological view of every review, reaction, and pattern learned. See how your codebase quality trends over time — and which decisions shaped it.

Quality trends

Track quality scores across PRs, repos, and teams. Spot regressions before they compound. Know when a refactor is paying off.

Token & Cost Tracking

Know exactly what every review costs.

Argus records per-stage token usage and cost for every review. Model and provider are tracked independently for each stage. Token data persists even on failed reviews.

Per-stage breakdown

Token usage tracked for: triage, review, scoring, synthesis, enrichment, conventions, patterns, file_synthesis, and graph. Each stage records input tokens, output tokens, model, and cost.

TokenPill

Hover any TokenPill in the review detail page to see the full cost breakdown per stage, including model name and provider.

Settings & Controls

Every advanced capability can be toggled independently per-repo. Start with the defaults and enable features as your team is ready.

Deep Reviewoff

Enables the 4-specialist parallel review (bug_hunter, security, architecture, regression) per file.

Cross-File Contexton by default

Enables dependency tracing and caller analysis across your codebase during review.

Blast Radius Analysison by default

Maps downstream impact of every change using the persistent dependency graph.

Simulation & Scenariosoff

Simulates execution paths against known scenarios. Reports confidence, root cause, and impact.

PR Enrichmenton by default

Auto-enriches PR descriptions with missing context and mermaid diagrams.

Pattern Learningon by default

Learns reusable patterns from high-confidence findings across reviews.

Convention Learningon by default

Extracts codebase conventions from diffs — naming, error handling, architecture patterns.

File Synthesison by default

Creates per-file institutional memory — summaries of what each file does and how it has changed.

Architecture Graphon by default

Extracts dependency graph from code changes. Powers blast radius analysis and cross-file context.

All toggles are accessible from Settings in the dashboard. Changes take effect on the next review.