# Pharaoh — Complete Documentation > Pharaoh maps your codebase into a knowledge graph and gives AI coding tools full architectural awareness — every function relationship, dependency chain, module boundary, and entry point — before they write a single line. ## The Problem Pharaoh Solves AI coding tools read your codebase one file at a time. On a 200-file repo, the AI sees fragments — never architecture. The result: - It creates a retry utility without knowing one exists three modules away - It refactors a function without knowing 14 downstream callers depend on the exact signature - It writes new endpoints that are never wired to any entry point - It produces PRDs that ignore existing module boundaries and duplicate existing functionality - It burns 40K+ context tokens reading files to build understanding that a 2K-token graph query provides instantly This isn't an AI limitation. It's a context limitation. The AI is smart enough — it just can't see the codebase. ## How Pharaoh Works Pharaoh gives AI tools the complete structural picture: 1. **Install the GitHub App** (read-only access, 30 seconds) 2. **Automatic parsing** — tree-sitter extracts every function, import, export, call chain, endpoint, cron handler, and environment variable from TypeScript and Python codebases 3. **Knowledge graph** — all relationships stored in Neo4j with dual-layer per-tenant isolation 4. **MCP endpoint** — your AI tool queries the graph silently before every decision 5. **Always current** — webhooks re-map on every push to your default branch 6. **PR Guard** — automated structural checks on every pull request (reachability, regression risk, test coverage, breaking changes) No config files. No per-repo setup. No maintenance. Source code is read during parsing and immediately discarded — only structural metadata is stored. ## Languages Supported {#languages} - TypeScript (full support — imports, exports, call chains, decorators, barrel files) - Python (full support — imports, exports, call chains, decorators) - More languages planned (tree-sitter supports 100+ languages; Pharaoh's Cartographer adds semantic understanding layer by layer) ## Without Pharaoh vs With Pharaoh {#without-vs-with} ### Without Pharaoh - **Planning a refactor**: AI reads files hoping to find all callers. Misses indirect callers 3 hops away. Breaks an endpoint it didn't know existed. - **Creating a utility function**: AI writes a new helper. A nearly identical function already exists in another module. Now you have two. - **Reviewing a PR**: Reviewer manually traces imports to check if new code is wired up. Misses that a new export is unreachable from any entry point. - **Writing a PRD**: AI produces a spec that proposes building functionality that already exists, in a module it didn't explore. ### With Pharaoh - **Planning a refactor**: AI queries blast radius — gets every downstream caller, affected endpoint, and impacted cron job. Knows the exact risk before touching anything. - **Creating a utility function**: AI searches the graph first. Finds the existing function. Imports it instead of duplicating. - **Reviewing a PR**: AI verifies all new exports are reachable from production entry points. Catches orphaned code before it merges. - **Writing a PRD**: AI queries the codebase map and module context. The spec is grounded in what actually exists. ## Security Model {#security} - **Hosted, not installed** — no packages on developer machines, no transitive dependencies to audit, no supply chain attack surface. If Pharaoh gets a vulnerability, we remediate server-side. You do nothing. - **Read-only GitHub access** — Pharaoh cannot write to your repository, push commits, or modify code. The GitHub App requests repository contents (read) and metadata (read). No write access. Ever. - **No source code stored** — the graph contains function names, file paths, dependency edges, complexity scores. Source code is read during parsing, used to generate the graph in memory, then discarded. The graph is a table of contents, not the book. - **Per-tenant isolation** — dual-layer defense: every Neo4j query is repo-anchored AND application-level ownership checks run before every tool call. Both must fail for cross-tenant access. CI enforces this — every new query is automatically tested for isolation violations. - **Encrypted at rest** — GitHub tokens and sensitive graph properties (signatures, JSDoc, API routes) encrypted with AES-256-GCM using HKDF per-tenant derived keys. Each token uses a unique random initialization vector. Compromising one tenant does not expose others. - **GitHub-based access control** — remove someone from your GitHub org, their Pharaoh access revokes within minutes. No API tokens to rotate, no credentials on developer machines. Org membership re-verified on every token refresh. - **Open source parser** — the Cartographer that reads your code is [fully auditable](https://github.com/Pharaoh-so/pharaoh-parser) - **Account deletion** — uninstall the GitHub App or delete your account: knowledge graph deleted, encrypted tokens destroyed, audit logs retained 90 days then purged. No lock-in. ### Data Flow ``` GitHub API (read-only clone) → Parser (in-memory only) → Graph DB (metadata only) → MCP Tools (query interface) → Your AI (Claude, Cursor, etc.) ``` Source code never persists. It is read, parsed into structural metadata, and discarded. ### What gets stored vs. what doesn't **Stored (plaintext):** Function names, file paths, module boundaries, dependency relationships, complexity scores, import/export edges, call chains, entry points. **Stored (encrypted per-tenant):** Function signatures, documentation strings, API route patterns. These are opaque ciphertext in the database — readable only through the owning tenant's derived key. **Never stored:** Source code, file contents, variable values, string literals, implementation logic, comments (only encrypted signatures/docs), secrets, environment variable values, credentials, git history, commit messages, pull request content, issue content. ### What happens in a breach We would rather be honest about breach scenarios than pretend they cannot happen. **Exposed in a database breach:** Function names, file paths, module boundaries, dependency relationships, complexity scores. This is structural metadata — it reveals how code is organized, not what it does. **Protected even in a breach:** Source code (never stored). GitHub tokens (encrypted per-tenant — compromising one tenant does not expose others). Function signatures, documentation strings, and API routes (encrypted with per-tenant derived keys). Cross-tenant data (query-level isolation prevents lateral movement). **Your action required:** None. Pharaoh is a hosted service. We remediate server-side. No packages to update, no credentials to rotate, no patches to apply. ### How Per-Tenant Isolation Works {#tenant-isolation} All tenants share a single Neo4j database. Isolation is enforced entirely at the application layer through two independent mechanisms that must both fail for cross-tenant data access: **Layer 1: Cypher Query Repo-Anchoring** Every query starts from a `Repo` node filtered by name. There is no query path that returns data without first traversing through the tenant's repo: ```cypher // Every query follows this pattern — repo is always the entry point MATCH (r:Repo {name: $repo})-[:CONTAINS_MODULE]->(m:Module)-[:CONTAINS]->(f:Function) WHERE f.name CONTAINS $query RETURN f.name, f.filePath, f.complexity // Blast radius: traces callers through repo-anchored graph MATCH (r:Repo {name: $repo})-[:CONTAINS_MODULE]->(m:Module)-[:CONTAINS]->(f:Function {name: $name}) MATCH (caller:Function)-[:CALLS]->(f) MATCH (callerModule:Module)-[:CONTAINS]->(caller) MATCH (r)-[:CONTAINS_MODULE]->(callerModule) RETURN caller.name, caller.filePath, callerModule.name // Cross-module dependency: both modules must belong to the same repo MATCH (r:Repo {name: $repo})-[:CONTAINS_MODULE]->(from:Module {name: $from}) MATCH (r)-[:CONTAINS_MODULE]->(to:Module {name: $to}) MATCH path = (from)-[:DEPENDS_ON*1..5]->(to) RETURN path ``` CI enforces this invariant: `cypher-anchoring.test.ts` statically analyzes every exported `*Query` function and fails if any `MATCH` clause reaches nodes without traversing through `Repo {name: $repo}`. **Layer 2: Application-Level Ownership Check** Before any tool handler runs, `validateRepoOwnership()` verifies the requested repo belongs to the authenticated tenant by querying the Postgres `tenant_repos` table: ``` Request flow: 1. MCP tool call arrives with repo parameter 2. OAuth token → tenant ID (cached 5 min in session store) 3. SELECT FROM tenant_repos WHERE tenant_id = $tenantId AND repo_slug = $repo 4. If no row → 403 Forbidden (tool call rejected before Neo4j is touched) 5. If row exists → proceed to Neo4j query (which is also repo-anchored) ``` **Revocation timing:** Tenant suspension triggers immediate session eviction (~0s). Org member removal revokes within ~10 minutes (5-min session cache + 5-min idle timeout). Collaborator removal revokes within ~1 hour (access token TTL). ### Security Questions **Can Pharaoh write to my repository?** No. The GitHub App has read-only permissions. Pharaoh cannot push commits, modify files, create branches, or open pull requests. **Why not just run a local tool?** Local tools install packages with transitive dependencies on every developer machine, require local databases, and store credentials in dotfiles. Each machine becomes an attack surface. A hosted service centralizes the attack surface, removes supply chain risk from developer machines, and patches vulnerabilities once instead of per-machine. **What happens when someone leaves the org?** Remove them from your GitHub org. Their Pharaoh access revokes automatically. Org membership is re-verified on every token refresh. No tokens to rotate, no accounts to deactivate. **Who can see my architectural data?** Only members of your GitHub organization who authenticate through Pharaoh's OAuth flow. Tenant isolation is enforced at the query level — there is no API call, admin interface, or backdoor that surfaces one tenant's data to another. ### For Security-Conscious Teams: Local Parsing {#local-parsing} If your security policy prohibits granting read access to an external service, Pharaoh still works — source code never needs to leave your machines. The `request_upload` tool lets you: 1. **Parse locally** — run the [open-source parser](https://github.com/Pharaoh-so/pharaoh-parser) on your own machine. It extracts structural metadata (function names, imports, exports, complexity) and outputs JSON. 2. **Upload metadata only** — the JSON contains zero source code. Only structural facts: "function X in file Y calls function Z, has complexity 8, is exported." No implementation logic, no variable values, no string literals. 3. **Query the graph** — all 16 analysis tools work immediately on the uploaded data, identical to GitHub App-connected repos. The parser is fully open source (MIT license) — audit exactly what gets extracted before uploading anything. This gives air-gapped, classified, and compliance-restricted environments full access to Pharaoh's architectural intelligence without any code leaving the perimeter. ## Who Benefits Most {#who-benefits} - **Solo devs building production apps** — your AI stops creating duplicate code and breaking things it can't see - **Small teams (2-10) without dedicated DevOps** — architectural awareness without hiring a staff engineer to maintain it - **Open source maintainers** — evaluate contributor PRs against the full dependency graph, not just the diff - **Vibe coders** — you move fast and let AI handle the details. Pharaoh makes sure those details include architectural context. - **CI/CD pipelines and autonomous agents** — device flow and token exchange auth let headless agents query the graph without human interaction - **Any developer using Claude Code, Cursor, OpenClaw, or MCP-compatible tools** — if your AI tool supports MCP, Pharaoh makes it better ## Infrastructure {#infrastructure} - **Parsing**: tree-sitter (deterministic, no LLM hallucination risk) - **Graph storage**: Neo4j Aura Professional (shared database, application-level tenant isolation via Cypher repo-anchoring + ownership checks) - **Protocol**: Model Context Protocol (MCP) via SSE transport (Claude Code, Cursor) and Streamable HTTP transport (Claude.ai web) - **Hosting**: Render (web service) - **GitHub integration**: GitHub App with read-only repository access + push webhook for auto-refresh - **Encryption**: AES-256-GCM with HKDF per-tenant key derivation for tokens and sensitive graph properties ## Getting Started ### Step 1: Sign up Visit https://pharaoh.so to get started. ### Step 2: Connect GitHub Install the Pharaoh GitHub App and select which repositories to map. Pharaoh only requires read access to repository contents. No write access ever. ### Step 3: Add MCP endpoint Add your unique Pharaoh endpoint to your AI tool's MCP configuration: **Claude Desktop** (claude_desktop_config.json): ```json { "mcpServers": { "pharaoh": { "url": "https://mcp.pharaoh.so/sse" } } } ``` **Claude Code** (.claude/settings.json): ```json { "mcpServers": { "pharaoh": { "url": "https://mcp.pharaoh.so/sse" } } } ``` **Cursor**: Add via Settings > MCP Servers with the same URL. ### Authentication Options {#auth} Pharaoh supports three authentication methods depending on your environment: **1. OAuth (interactive — Claude.ai, Claude Desktop, Cursor)** Standard MCP OAuth flow. Your AI tool handles the handshake automatically. You just approve GitHub access once. **2. Device Flow (headless agents — RFC 8628)** For environments without a browser (CI runners, remote servers, autonomous agents): ``` POST https://mcp.pharaoh.so/device Content-Type: application/json {"client_id": "YOUR_CLIENT_ID"} ``` Returns a verification URL and user code. Open the URL in any browser, enter the code, approve. The agent polls `POST /device/token` until approved. **3. Token Exchange (CI/CD pipelines)** Exchange a GitHub PAT for a Pharaoh bearer token — no browser, no polling: ``` POST https://mcp.pharaoh.so/token-exchange Content-Type: application/json {"github_token": "YOUR_GITHUB_PAT"} ``` Returns: `{ "access_token": "...", "sse_url": "https://mcp.pharaoh.so/sse" }` The GitHub PAT is verified once and discarded — Pharaoh never stores it. The PAT needs `read:org` scope so Pharaoh can verify your org membership. **Local repos without GitHub App:** Use the `request_upload` tool to map local or air-gapped repositories. Run the open-source parser locally, upload the parsed data, and all Pharaoh tools work immediately. ### Step 4: Start building Your AI tool now silently queries Pharaoh when it needs structural context. Ask it to refactor a module, write a PRD, or find dead code — it will check the knowledge graph first. --- ## MCP Tool Reference Pharaoh exposes 16 analysis tools and 7 operational tools via the [Model Context Protocol](https://modelcontextprotocol.io) (MCP). Tools are invoked as MCP `tools/call` requests — **not REST endpoints**. The AI client (Claude, Cursor, etc.) sends JSON parameters via the MCP protocol; there are no HTTP methods (GET/POST) involved. The JSON parameter examples below show the exact format passed to each tool. Each tool is designed for a specific moment in the AI coding workflow. ### pharaoh_recon {#recon} **Get the full architectural picture in ONE call — combines codebase map, module deep-dives, function search, blast radius, and dependency queries.** All sub-queries run in parallel server-side. Designed for plan review, PR review, and architecture assessment workflows where multiple individual tool calls would clutter the conversation. Use when: - Starting a new task and need to orient before making changes - Running a plan review, PR review, or architecture assessment - You know which modules, functions, and blast targets you need up front Parameters: - `repo` (required): Repository name - `include_map` (optional): Include codebase map (default: true) - `modules` (optional): Module names to get full context for (max 5) - `functions` (optional): Function search queries (max 3) - `blast_radius` (optional): Blast radius targets with entity and entity_type (max 3) - `dependencies` (optional): Module dependency pairs to trace (max 3) Why not just grep/read files: Calling get_codebase_map, get_module_context, search_functions, and get_blast_radius individually creates 4-6 round-trips. This batches them into one call with parallel server-side execution. ### get_codebase_map {#codebase-map} **Orient yourself in an unfamiliar codebase. Call this FIRST when starting work on a repo.** Returns all modules with file counts and lines of code, the dependency graph with weights and bidirectional warnings, hot files (most changed in last 90 days), and all HTTP endpoints with their handler files. Use when: - Starting a new task and need to understand codebase structure - Need to know which modules exist and how they relate - Want to find the most actively changed files (likely where bugs live) - Need to see all API endpoints at a glance Parameters: - `repo` (required): Repository name - `include_metrics` (optional): Include LOC, complexity, change frequency Why not just grep/read files: Manually reading directory trees gives file structure but not dependency relationships, change frequency, or endpoint mappings. This gives the full architectural picture in one call instead of 20+ file reads. ### get_module_context {#module-context} **Get everything about a module BEFORE modifying it or writing a PRD.** Returns the complete module profile in ~2K tokens: file count, LOC, all exported function signatures with complexity, dependency graph (imports from + imported by), DB table access, HTTP endpoints, cron jobs, env vars, vision spec alignment, and external callers from other modules. Use when: - About to change code in a module - Writing a PRD or design doc and need ground-truth about what exists - Need to know what depends on this module (who breaks if you change it) - Want to see a module's DB tables, endpoints, cron jobs, or env vars at a glance Parameters: - `repo` (required): Repository name - `module` (required): Module name (e.g., "crons", "slack", "db") Why not just grep/read files: A module can span dozens of files. Manual exploration burns 10K-40K tokens and still misses cross-module callers, DB access patterns, and vision spec alignment. ### search_functions {#function-search} **Check if functionality already exists BEFORE writing any new function.** Searches all functions across the entire codebase by name or partial match. Returns matching functions with file paths, line numbers, module, export status, async flag, complexity scores, and full signatures. Use when: - About to create a new function — search first to prevent duplicates - Need to find where a concept is implemented (e.g., "notify", "validate", "parse") - Looking for the right function to import instead of reimplementing - A task says "add X functionality" — verify X doesn't already exist Parameters: - `repo` (required): Repository name - `query` (required): Function name or partial match - `module` (optional): Filter to a specific module - `exported_only` (optional): Only show exported functions - `limit` (optional): Max results (default: 20, max: 50) Why not just grep/read files: Grep only finds exact string matches and misses re-exports, aliases, and barrel-file indirection. This searches the full resolved dependency graph. ### get_design_system {#design-system} **Discover the design system BEFORE creating any UI component.** Returns components with props and usage count, design tokens with values, and anti-patterns where raw HTML is used instead of existing components. Use when: - About to create a React/Vue/Svelte component — check what already exists - Need to know the canonical component for a UI pattern (button, input, modal) - Want to find design tokens (colors, spacing, typography) to use - Suspect raw HTML elements are used instead of existing components Parameters: - `repo` (required): Repository name - `category` (optional): Which aspect to return — components, tokens, anti_patterns, or all (default: all) - `min_usage` (optional): Minimum usage count to include a component (default: 1) Why not just grep/read files: Design systems span dozens of files across multiple directories. This returns the complete inventory with usage frequency — the most-used components are the most canonical. ### get_blast_radius {#blast-radius} **Check what breaks BEFORE refactoring, renaming, or deleting a function, file, or module.** Returns risk assessment (LOW/MEDIUM/HIGH), all affected callers grouped by module with file paths, impacted HTTP endpoints, impacted cron jobs, and affected DB operations. Traces up to 5 hops deep through the call graph. Use when: - About to refactor or rename a function — see every caller that needs updating - Want to know if a change is safe — check if anything depends on this code - A PR modifies a shared utility — trace all downstream consumers - Need to assess risk level before a change Parameters: - `repo` (required): Repository name - `entity` (required): Function name, file path, or module name - `entity_type` (required): "function", "file", or "module" - `depth` (optional): How many hops to trace (default: 3, max: 5) Why not just grep/read files: Grep finds direct callers but misses indirect callers 2-3 hops away. You won't see affected endpoints or cron jobs. This traces the full transitive dependency chain. ### query_dependencies {#dependencies} **Trace how two modules are connected BEFORE splitting, merging, or decoupling them.** Returns forward and reverse dependency paths between two modules, circular dependency detection with warnings, and all shared dependencies (modules both depend on). Use when: - Refactoring and need to know if two modules depend on each other - Suspect a circular dependency and want to confirm it - Planning to extract shared code and need to see what both modules use - Need to understand why changing module A affects module B Parameters: - `repo` (required): Repository name - `from` (required): Source module name - `to` (required): Target module name Why not just grep/read files: Import statements show direct dependencies but miss transitive paths (A→B→C→D). This traces the full module graph and reveals indirect connections and circular dependencies invisible from file-level inspection. ### trace_flow {#trace-flow} **Understand how a feature works by walking its call tree forward — what it calls, what those call, across modules.** Returns a depth-limited tree from the starting function with signature, JSDoc, file path, module, complexity, and side effects (DB access, endpoints, cron jobs) at every node. Use when: - Someone asks "how does X work?" or "what happens when X?" - You need to understand a feature's behavior before answering questions - You want to see the full call chain from an entry point (endpoint, CLI command, webhook handler) - You're debugging and need to trace which functions are involved in a code path Parameters: - `repo` (required): Repository name - `function` (required): Function name (exact match, case-insensitive) - `file` (optional): File path substring to disambiguate when multiple functions share a name - `depth` (optional): How many hops to trace forward (default: 3, max: 5) Why not just grep/read files: Reading files manually to trace a feature requires guessing which files to read and burns 10K-40K tokens. This shows the complete execution path in ~500 tokens, letting you target reads to only the functions that matter. ### check_reachability {#reachability} **Verify functions are reachable from production entry points (API endpoints, CLI commands, cron jobs, event handlers, MCP tools).** Returns whether each exported function is reachable from a production entry point, the path from entry point to function, and classification (entry_point / reachable / unreachable). Use when: - After implementing a feature — verify new code is wired into the app - Reviewing a PR — are all new functions actually reachable? - Cleaning up dead code — find functions only called by tests - Before opening a PR — run as a pre-flight check Parameters: - `repo` (required): Repository name - `module` (optional): Filter to a specific module - `functions` (optional): Specific function names to check - `include_paths` (optional): Include full reachability paths ### get_vision_docs {#vision} **Get the documented intent — CLAUDE.md files, PRDs, roadmaps — to understand WHY code exists.** Returns all vision documents grouped by type (claude_md, prd, skill, roadmap), with each spec's title, section ID, and implementation status showing which functions implement each spec. Use when: - Implementing a feature and need to check if a PRD or spec exists - Want to understand the original design intent behind existing code - Need to verify implementation matches documented requirements - Reviewing code and want to check it against the documented vision Parameters: - `repo` (required): Repository name - `module` (optional): Filter to specs related to this module - `doc_type` (optional): Filter by type (claude_md, prd, skill, roadmap, all) ### get_vision_gaps {#vision-gaps} **Find what's missing — specs without code AND complex code without specs.** Returns two lists: (1) specified-but-not-built: PRD specs with no implementing functions, and (2) built-but-not-specified: complex functions above a threshold with no vision spec. Use when: - Planning work and need to find unimplemented features from PRDs - Want to find complex undocumented functions that need specs or tests - Need to audit spec-to-code alignment for a module - Someone asks "what's left to build?" or "what's undocumented?" Parameters: - `repo` (required): Repository name - `module` (optional): Filter to a specific module - `complexity_threshold` (optional): Min complexity for undocumented function flag (default: 5) ### get_cross_repo_audit {#cross-repo} **Compare two repositories for code duplication, structural overlap, and shared patterns.** Returns three tiers of function matches: HIGH (exact duplicates), MEDIUM (diverged implementations), LOW (name-only matches). Also reports shared module structure and shared environment variables. Use when: - Need to find copy-pasted code across two repos - Planning a shared package extraction - Want to compare the structure of two codebases - Auditing cross-repo duplication before a refactor Parameters: - `repo_a` (required): First repository name - `repo_b` (required): Second repository name - `exported_only` (optional): Only compare exported functions (default: true) - `min_loc` (optional): Minimum function LOC (default: 3) ### get_consolidation_opportunities {#consolidation} **Find code that does the same work in different places — parallel consumers, duplicated call chains, competing DB access, signature twins.** Returns structural clusters grouped by type, each detecting a different kind of semantic similarity: - **Signature twins**: Functions with matching parameter types, counts, and return types across different modules — even if names differ entirely. This is Pharaoh's primary mechanism for detecting semantically similar utility functions across directories. - **Parallel consumers**: Multiple functions that call the same set of dependencies — they likely do similar work. - **Fan-in duplication**: Functions that share the same callers — one may be redundant. - **Competing DB access**: Multiple functions querying the same database tables — candidate for a shared data access layer. - **Convergent imports**: Functions importing the same set of modules — structurally similar even if implementation differs. Each cluster includes file paths, line numbers, confidence tier (HIGH/MEDIUM/LOW), and context for evaluating whether merging makes sense. HIGH-confidence matches (exact signature twins, parallel consumers with 3+ shared deps) are almost always real duplicates. MEDIUM requires human judgment. LOW is noise — use `include_low_confidence: false` (default) to suppress. Use when: - Looking for code to consolidate or deduplicate - Before building something new — check if similar logic already exists - During refactoring planning — find highest-impact merge opportunities - Need to find semantically similar utility functions across different directories - When the codebase feels bloated but you can't pinpoint where Parameters: - `repo` (required): Repository name - `module` (optional): Focus on opportunities involving this module - `min_shared` (optional): Minimum shared dependencies to flag (default: 3) - `min_loc` (optional): Minimum function LOC to include (default: 5) - `include_low_confidence` (optional): Include lower-confidence matches (default: false) - `include_same_module` (optional): Include intra-module duplication Why not just grep/read files: Grep finds exact text duplicates but misses semantic duplicates — two functions with different names and different implementations that accept the same inputs and produce the same outputs. Pharaoh compares structural metadata (signatures, call patterns, imports) to find these without reading source code. ### get_unused_code {#dead-code} **Find dead code — functions not reachable from any production entry point.** Uses graph reachability + text reference backup layer for high-confidence dead code detection. Returns three tiers: - **Dead**: Graph-unreachable AND no text references anywhere. Safe to delete. - **Likely Dead**: Graph-unreachable BUT found as text in other files (may be string-dispatched or dynamically imported). Includes evidence file paths. - **Alive**: Graph-reachable from entry points. Not reported. Use when: - Looking for code safe to delete - Cleaning up after a refactor - Reducing codebase surface area - Auditing unused exports Parameters: - `repo` (required): Repository name - `module` (optional): Filter to a specific module - `reachability_analysis` (optional): Deep analysis against all entry points - `include_exported` (optional): Include exported functions (default: true) ### get_test_coverage {#test-coverage} **See which modules and files have test coverage and which don't.** Returns per-module test coverage summary — which files have corresponding test files, and which high-complexity functions lack tests. Use when: - Before writing tests — find what's already covered - During code review — check if changed modules have tests - Planning test strategy — identify untested high-complexity code Parameters: - `repo` (required): Repository name - `module` (optional): Filter to a specific module ### get_regression_risk {#regression-risk} **Score functions by regression risk — how likely a change breaks production.** Returns functions ranked by regression risk score (0-1), with tier (critical/high/medium/low), complexity, entry-point exposure, file churn, and downstream caller count. Use when: - Before modifying a function — understand blast radius and risk - During code review — prioritize review effort on highest-risk changes - Planning refactors — identify the riskiest code to change carefully - After a regression — find other high-risk functions that need attention Parameters: - `repo` (required): Repository name - `module` (optional): Filter to a specific module --- ## Operational Tools ### request_upload {#upload} Get a presigned URL to upload parsed codebase data. Use this when the user wants to map a local repo without installing the GitHub App. Workflow: run pharaoh-parser's inspect.js locally, call this tool to get a one-time upload URL, then PUT the JSON. All Pharaoh tools work immediately on the uploaded repo. ### map_open_source_repo {#map-open-source} Map any public GitHub repository into Pharaoh's shared open-source graph. Once mapped, the repo is queryable by all users with all analysis tools (codebase map, blast radius, module context, etc.). Mapping typically takes 1-3 minutes. ### setup_environment {#environment-setup} Install Pharaoh's curated plugin bundle — LSP, security scanning, code review, tailored to your codebase languages. ### pharaoh_account {#account} Manage subscription, toggle PR Guard, trigger graph refreshes for your repos. ### pharaoh_feedback {#feedback} Report false positives in dead code detection or provide feedback on tool results. Directly improves result quality. ### pharaoh_admin {#admin} Administrative operations for org management. Manage repos, view org status, and perform admin tasks. ### get_pharaoh_docs {#pharaoh-docs} Query Pharaoh's own documentation. Returns relevant doc sections as markdown with links to pharaoh.so/docs. Powered by an in-memory keyword index of the docs/gitbook/ folder. --- ## Competitive Positioning | Tool | What it does | Pharaoh's difference | |------|-------------|---------------------| | Sourcegraph | Code search — find code across repos | Pharaoh tells you what *breaks* if you change what you found | | CodeScene | Code health — file-level quality scores | Pharaoh analyzes cross-module *architectural relationships* | | SonarQube | Static analysis — line-level bugs and smells | Pharaoh provides *system-level structural intelligence* | | Snyk | Security scanning — vulnerabilities and dependencies | Pharaoh maps your *own code's* internal structure, not supply chain | | GitHub Copilot | Code completion — generates code | Pharaoh gives Copilot (and any AI tool) the *context to generate better code* | **Unique to Pharaoh** (no other MCP server provides these): - Graph-based blast radius analysis with transitive caller tracing - Production reachability verification via entry-point tracing - Dead code detection combining graph analysis with text-reference backup - Cross-repo structural comparison - Regression risk scoring combining complexity, exposure, churn, and caller count - Vision-to-implementation alignment checking (PRDs/CLAUDE.md vs actual code) --- ## Why a Knowledge Graph (Architecture Rationale) {#architecture} Pharaoh uses deterministic parsing (tree-sitter) into a Neo4j graph database. This is a deliberate architectural choice: **Why deterministic parsing, not LLM-based analysis:** - Zero hallucination risk — every node and edge in the graph corresponds to real code - Reproducible results — same codebase always produces the same graph - Fast — full parse of a 50K LOC codebase takes under 60 seconds - No API costs per analysis — the graph is queried, not re-analyzed **Why a graph database, not vector search or static analysis:** - Transitive dependency tracing — "what breaks 5 hops away" is a native graph query - Module boundary analysis — relationships between modules are first-class edges - Entry point reachability — walk the graph from HTTP endpoints, CLI commands, cron handlers backward to any function - Cross-repo comparison — two graphs can be queried together for structural overlap **What Pharaoh stores:** Functions (name, file path, line numbers, complexity, async, exported, parameters), files, modules, imports/exports, call edges, dependency edges, HTTP endpoints, cron handlers, CLI commands, MCP tool registrations, DB table access, environment variable usage, test file associations, vision spec associations. **What Pharaoh does NOT store:** Source code, comments, string literals, variable values, runtime behavior, git history, or any content that would create security exposure. The graph is structural metadata only. --- ## Workflow Patterns {#workflows} These are the most common multi-tool sequences. Each represents a real development workflow where Pharaoh provides value. ### Pattern 1: "I'm starting work on an unfamiliar codebase" 1. get_codebase_map — see all modules, their sizes, dependencies, and hot files 2. get_module_context on the module you'll be working in — full profile 3. search_functions for the concept you're implementing — check if it exists ### Pattern 2: "I need to refactor a function" 1. get_blast_radius on the function — see every caller, transitive dependent, affected endpoint 2. get_regression_risk on the module — understand which functions are riskiest to change 3. query_dependencies between the module and its dependents — understand the coupling 4. After refactoring: check_reachability — verify nothing got disconnected ### Pattern 3: "I'm writing a PRD or design doc" 1. get_codebase_map — understand current architecture boundaries 2. get_module_context on affected modules — know what exists before proposing changes 3. get_vision_docs — check existing specs for relevant context 4. get_vision_gaps — see what's specified but unbuilt, and what's built but unspecified 5. search_functions — verify assumptions about existing functionality ### Pattern 4: "I'm reviewing a PR" 1. get_blast_radius on changed functions — assess risk of the change 2. get_regression_risk on changed modules — see if high-risk functions were modified 3. check_reachability — verify new exports are wired to production entry points 4. get_consolidation_opportunities — check if the PR introduces duplication ### Pattern 5: "I'm cleaning up the codebase" 1. get_unused_code — find dead functions safe to delete 2. get_test_coverage — find untested high-complexity code 3. get_consolidation_opportunities — find duplicate logic to merge 4. get_vision_gaps — find complex undocumented code that needs specs ### Pattern 6: "I'm comparing two repositories" 1. get_cross_repo_audit — find code duplication across repos 2. get_codebase_map on each repo — compare structural approaches 3. query_dependencies — understand internal coupling in each ### Pattern 7: "I need to enforce architectural constraints" {#architectural-constraints} Pharaoh doesn't have a built-in constraint registry, but you can enforce architectural rules by combining tools with CI or AI-agent workflows: **Example: Enforce "no data-layer → presentation-layer" dependency rule** 1. query_dependencies with `from: "data"`, `to: "ui"` — check if a forbidden path exists 2. If the result shows a dependency path → the constraint is violated 3. query_dependencies with `from: "ui"`, `to: "data"` — verify the allowed direction works **Implementation in CI or PR Guard:** ```yaml # .pharaoh.yml — use module_boundaries to catch new violations pr_guard: module_boundaries: warn_new_deps: true # warn on any new cross-module dependency block_new_deps: true # block the PR if new deps are introduced allowed_new_deps: # allowlist for expected dependency directions - "ui -> data" # UI depends on data (allowed) - "api -> data" # API depends on data (allowed) # data -> ui is NOT listed → any new dep in this direction is blocked ``` **Programmatic enforcement with an AI agent:** ``` 1. Call get_codebase_map → extract all module names 2. Define forbidden pairs: [["data", "ui"], ["data", "api"]] 3. For each pair: call query_dependencies(from=forbidden_source, to=forbidden_target) 4. If any returns a dependency path → flag violation with exact file paths 5. Run as a pre-commit hook or scheduled health check ``` The `module_boundaries` PR Guard check automates this for PRs — it detects new cross-module dependencies introduced in the diff and blocks them unless they appear in `allowed_new_deps`. --- ## Example Output {#examples} > These examples use a fictional "myapp" codebase. Output format is representative of actual results. ### Example: get_blast_radius Input: `{ repo: "myapp", entity: "formatMessage", entity_type: "function" }` ``` Risk: HIGH Direct callers: 4 (across 3 modules) Transitive impact: 12 functions Callers by module: slack (2 callers): -> sendSlackNotification [src/slack/notify.ts:45] -> formatThreadReply [src/slack/threads.ts:23] notifications (1 caller): -> dispatchNotification [src/notifications/dispatch.ts:67] crons (1 caller): -> buildDigestEmail [src/crons/daily-digest.ts:34] Affected endpoints: POST /api/notifications/send POST /api/slack/webhook Affected cron jobs: daily-digest (09:00 UTC) ``` ### Example: get_codebase_map Input: `{ repo: "myapp" }` ``` Modules (8): api (12 files, 2,340 LOC) -> depends on: db, auth, slack auth (4 files, 890 LOC) -> depends on: db, crypto crons (3 files, 560 LOC) -> depends on: db, slack, notifications db (6 files, 1,200 LOC) -> no dependencies notifications (5 files, 780 LOC) -> depends on: db, slack slack (4 files, 650 LOC) -> depends on: db utils (3 files, 340 LOC) -> no dependencies web (2 files, 450 LOC) -> depends on: api, auth Key dependencies: api -> db: weight 8 crons -> slack: weight 4 auth <-> api: weight 3 (bidirectional) Hot files (most changed, last 90 days): src/api/routes.ts (14 changes) src/db/queries.ts (11 changes) ``` ### Example: search_functions Input: `{ repo: "myapp", query: "retry" }` ``` Found 2 matches: withRetry() [exported, async, complexity: 8] src/utils/resilience.ts:42-78 Used by 6 callers across 3 modules retryWithBackoff() [internal, async, complexity: 12] src/api/http-client.ts:156-203 Used by 2 callers in api module ``` --- ## Plans and Pricing {#pricing} ### Free Tier (always available, no credit card) 8 planning intelligence tools — everything you need to understand a codebase: - get_codebase_map — full architectural overview - get_module_context — deep module profiles - search_functions — find existing code before duplicating - get_design_system — discover UI components and tokens before creating new ones - get_blast_radius — know what breaks before changing - query_dependencies — trace module connections 1 repository included. PR Guard available (advisory mode). ### Personal — $27/mo Everything in Free, plus 8 discovery intelligence tools: - check_reachability — verify code reaches production entry points - get_unused_code — find dead code safe to delete - get_test_coverage — identify untested high-complexity code - get_regression_risk — score functions by production risk - get_vision_docs — retrieve and cross-reference specs - get_vision_gaps — find spec-vs-code drift - get_consolidation_opportunities — detect duplicate patterns - get_cross_repo_audit — compare repos for overlap Unlimited repositories. PR Guard with blocking protection. Priority graph refresh. ### Team — $99/mo Everything in Personal, for organizations. All org members get Pro access to all mapped repos. Centralized billing, org-wide audit logs, and administrative controls via the dashboard. ### PR Guard Add-on — $5/repo/mo Automated structural analysis on every pull request. Available on any paid plan. Checks reachability, regression risk, test coverage, breaking changes, duplication, and module boundaries. Advisory mode is free; blocking mode (required checks that prevent merging) is $5/repo/mo. See https://pharaoh.so/#pricing for full details. ### How Free Tier Previews Work When a free-tier user calls a Pro tool, Pharaoh returns a preview — enough real data to prove the tool has the answer, with an explanation of what the full result would reveal. The preview is real analysis, not a teaser. You see partial truth and can decide if the full picture is worth upgrading for. --- ## PR Guard — Automated PR Quality Checks {#pr-guard} ### The Problem PR Guard Solves Code review catches logic bugs. It doesn't catch structural bugs — orphaned exports that never get wired up, functions renamed without updating downstream callers three modules away, new code that duplicates an existing utility the author didn't know about. These aren't logic errors visible in a diff. They're architectural blind spots that only surface as regressions weeks later. A single production regression from a missed downstream caller costs 4-8 hours of debugging, hotfixing, and deploying. PR Guard catches it in the diff, before merge, for $5/repo/mo. ### What PR Guard Checks Pharaoh re-maps the codebase on the PR branch and performs structural analysis: - **Reachability check**: Are all new exported functions wired to a production entry point? Catches orphaned code before it merges. - **Blast radius assessment**: How many downstream callers are affected by changed functions? Annotates the PR with the exact impact. - **Regression risk scoring**: Are high-risk functions being modified without adequate test coverage? Flags the riskiest changes for careful review. - **Test coverage check**: Do changed modules have corresponding test files? Enforces that high-complexity changes ship with tests. - **Duplication detection**: Does the PR introduce code that duplicates existing functions? Prevents the "we already have a retry utility" problem. - **Module boundary check**: Does the PR introduce cross-module dependencies that violate architectural boundaries? Catches coupling creep. - **Breaking export changes**: Were exported function signatures changed in ways that break callers? Detects API contract violations. ### How It Works PR Guard runs as a GitHub Check Run — inline annotations on the diff showing exactly which lines triggered findings. No CI configuration needed. Pharaoh reads the knowledge graph, not your repo's CI pipeline. **Advisory mode (free):** Reports findings as informational comments. Every PR gets structural analysis. Nothing blocks. **Blocking mode ($5/repo/mo):** Critical findings (unreachable exports, breaking signature changes, high regression risk without tests) become required checks. The PR cannot merge until findings are addressed or explicitly overridden. Enable via: pharaoh_account tool with `enable_pr_guard` parameter, or toggle per-repo from the dashboard. ### .pharaoh.yml Configuration {#config} Optional per-repo configuration file. Drop `.pharaoh.yml` in your repo root to customize PR Guard behavior, thresholds, and entry point detection. **Full configuration reference:** ```yaml pr_guard: # ── Per-check enable/disable (all default to true) ── checks: reachability: true # flag unreachable exports regression_risk: true # score modified functions by risk test_coverage: true # check test file coverage blast_radius: true # count downstream callers complexity: true # track complexity changes module_boundaries: true # detect new cross-module deps duplication: true # find duplicate logic vision: true # check spec alignment breaking_changes: true # detect signature changes # ── Breaking change detection ── # Pharaoh compares exported function signatures before and after the PR. # A "breaking change" is any modification to an exported function's # parameter list, return type, or generic constraints that would cause # callers to fail at compile time or runtime. breaking_changes: warn_with_callers: true # warn when a changed signature has any callers block_with_callers: 5 # block PR if signature changed on function with 5+ callers # ── Regression risk thresholds ── regression_risk: block_on: "critical" # "critical" | "high" | "medium" | "none" # ── Test coverage thresholds ── test_coverage: warn_untested: true # warn when modified source has no test file warn_unit_only: false # warn when only unit tests exist (no integration) block_untested: false # block PR when modified source has no tests # ── Blast radius thresholds ── blast_radius: warn_above: 10 # warn if function has > 10 production callers block_above: 50 # block if function has > 50 production callers # ── Complexity budgets ── complexity: warn_function_delta: 5 # warn if function complexity increased by > 5 warn_total_delta: 20 # warn if total PR complexity delta > 20 block_function_above: 30 # block if any function exceeds complexity 30 # ── Module boundary rules ── module_boundaries: warn_new_deps: true # warn on new cross-module dependencies block_new_deps: false # block on new cross-module dependencies allowed_new_deps: # allowlist for expected new deps - "src/new -> src/utils" # ── Vision alignment ── vision: block_on_must: false # block on MUST violations in specs warn_on_should: true # warn on SHOULD violations # ── Reachability allowlists ── allowed_orphans: - "src/types:*" # all exports in src/types files (type-only) - "src/utils/helpers.ts" # all exports in this file - "src/**:testHelper" # testHelper function in any file # ── Override test file detection ── not_test: - "src/__mocks__/**" # treat these as production files, not tests # ── Custom entry point globs ── entry_points: - "src/lambdas/**/*.ts" # Lambda handlers - "src/workers/**/*.ts" # Worker entry points ``` **Pattern syntax for allowed_orphans:** - `"path/glob:functionName"` — specific function in matching files - `"path/glob:*"` — all functions in matching files - `"path/glob"` — all functions in matching files (implicit `:*`) - Path globs: `*` matches one segment, `**` matches any depth **How breaking change detection works:** 1. Pharaoh maps the codebase on both the base branch and the PR branch 2. For each exported function modified in the PR, it compares the function signature (parameters, types, return type) 3. If the signature changed and the function has downstream callers (traced via the knowledge graph), it flags the change 4. The `block_with_callers` threshold controls when a warning becomes a blocking check — e.g., `block_with_callers: 5` blocks the PR if a function with 5 or more callers had its signature changed **Defaults:** If no `.pharaoh.yml` is present, all checks are enabled in advisory mode with conservative thresholds. Pharaoh auto-detects entry points using framework conventions (Express, Next.js, SvelteKit, Remix, Hono, Fastify, Vercel, AWS Lambda) and package.json exports. --- ## Dashboard — Repo Management {#dashboard} Pharaoh includes a web dashboard at https://pharaoh.so/dashboard for managing your repositories: - **Repo listing** — see all mapped repos with status (active, pending, error, skipped) - **Auto-map toggle** — enable or disable automatic re-mapping per repo - **PR Guard toggle** — enable PR Guard and choose blocking vs. advisory mode per repo - **Manual remap** — trigger a graph refresh for any repo on demand - **Audit log** — complete event history: mapping events, session events, subscription changes, tool queries - **Bulk operations** — remap all stuck repos in one click - **Account deletion** — cryptographic key destruction with immediate session eviction Collaborators see a read-only view. Org admins get full management controls. --- ## Guided Workflows (MCP Prompts) {#guided-workflows} Pharaoh exposes 10 structured MCP prompts — multi-step playbooks that guide AI agents through common development workflows. Each prompt orchestrates multiple tools in a specific sequence with decision gates. | Prompt | What it guides | |--------|---------------| | `plan-with-pharaoh` | 4-phase planning: reconnaissance → analysis → approach → planning | | `review-with-pharaoh` | 4-phase PR review with auto-block triggers for unreachable exports, circular deps, and missing tests | | `safe-refactor` | Extract → test → refactor → verify — structured refactoring with safety checks | | `onboard-to-codebase` | Map → modules → entry points — guided codebase orientation | | `investigate-change` | Blast radius → callers → test coverage — impact analysis before modifying code | | `explore-module` | Context → dependencies → callers — deep module understanding | | `health-check` | Sweep and grade (A–F) based on testing, complexity, coverage, dead code | | `find-tech-debt` | Unused code → consolidation → complexity hotspots — prioritized debt inventory | | `validate-wiring` | Reachability → entry points → dead code — pre-commit quality gate | | `pre-pr-review` | Reachability → test coverage → dead code — pre-flight PR checklist | These prompts are discoverable via the MCP protocol's prompt listing capability. Your AI tool can invoke them directly. --- ## How Pharaoh Fits Your Workflow {#integrations} ### With Claude Code Add the MCP endpoint to `.claude/settings.json`. Claude Code will automatically query Pharaoh before modifying code, writing PRDs, or reviewing changes. The pre-commit hook pattern: Claude Code calls get_blast_radius before refactoring, search_functions before creating new utilities, check_reachability before finishing a session. ### With Cursor Add via Settings > MCP Servers. Cursor's agent mode will use Pharaoh tools when exploring unfamiliar codebases or assessing change impact. ### With CI/CD PR Guard integrates as a GitHub Check Run. Every PR gets structural analysis against the knowledge graph. No configuration files to maintain — Pharaoh reads the graph, not your repo. ### With Planning When writing design docs, architecture decisions, or sprint planning: query get_vision_gaps to see what's specified but unbuilt, get_codebase_map for current state, and get_module_context for ground-truth about the modules you're planning to change. --- ## Frequently Asked Questions {#faq} **Does Pharaoh store my source code?** No. Pharaoh stores structural metadata only — function signatures, file paths, import/export relationships, call edges, complexity scores. No source code, no comments, no string literals, no variable values. The parser is open source — you can audit exactly what gets extracted. **How long does initial setup take?** Under 5 minutes. Install the GitHub App, select repos, add the MCP endpoint to your AI tool. Pharaoh parses the codebase automatically. A 50K LOC TypeScript project maps in about 60 seconds. **How does the graph stay current?** A GitHub webhook triggers re-mapping on every push to your default branch. The graph is always within one commit of HEAD. You can also trigger a manual refresh from the dashboard. **What languages does Pharaoh support?** TypeScript and Python today, with full support for imports, exports, call chains, decorators, and barrel files. More languages coming — the parser is built on tree-sitter, which supports 100+ grammars. **How is Pharaoh different from Sourcegraph?** Sourcegraph answers "where is this code?" Pharaoh answers "what breaks if I change it?" Sourcegraph is search and navigation. Pharaoh is blast radius, dependency chains, and reachability from production entry points. Different tools, different questions. **How is Pharaoh different from CodeScene?** CodeScene analyzes file-level health using git history — complexity trends, code age, developer coupling. Pharaoh analyzes cross-module architecture using a knowledge graph — how modules connect, what depends on what, which functions are reachable from production. CodeScene is behavioral. Pharaoh is structural. **How is Pharaoh different from SonarQube?** SonarQube does line-level static analysis — bugs, code smells, security issues. Pharaoh does system-level structural analysis — blast radius, dead code via entry-point tracing, module boundaries. SonarQube looks at lines. Pharaoh looks at architecture. **Can Pharaoh work across multiple repositories?** Yes. Map multiple repos and your AI cross-references them automatically — finds duplicated code, catches when shared interfaces drift between services, and ensures changes in one repo align with implementations in others. No monorepo required. Each team member just connects to the same MCP endpoint. **Can I use Pharaoh with private repositories?** Yes. The GitHub App requests read-only access to repository contents. All graph data is tenant-isolated. Pharaoh never writes to your repository. **What happens if I cancel?** Your graph data is deleted. No lock-in — Pharaoh reads your code, it doesn't store it. Re-subscribing re-maps from scratch. **Do I need Pharaoh if I already use Claude Code or Cursor?** That's exactly who Pharaoh is for. Claude Code and Cursor are powerful, but they read files one at a time with no structural awareness. They don't know what depends on what, which functions are reachable from production, or whether the code they're about to write already exists somewhere else. Pharaoh gives them that context via MCP. Same tools, smarter decisions. **Is Pharaoh worth the subscription?** One prevented regression pays for months of Pharaoh. Without architectural context, AI tools introduce duplicate utilities, break downstream callers they can't see, and create exports that never get wired to entry points. Each of those costs hours of debugging. Pharaoh fixes the root cause — the AI's blind spot — for less than a single production incident. **Who built Pharaoh? Can I trust a small team with my code?** Trust shouldn't depend on team size — it should depend on architecture. Pharaoh never stores source code. Only structural metadata like function names and dependency edges. GitHub access is read-only. Tokens are encrypted with per-tenant keys. The parser is open source for full auditability. And if you cancel, your data is deleted — there's nothing to leak because there's no code to leak. --- ## Documentation {#docs} Full guides and reference documentation. Also available at https://pharaoh.so/docs ### Getting Started #### Pharaoh # Pharaoh Your AI coding tool reads your codebase one file at a time. Pharaoh gives it the full picture - every function, dependency, module boundary, and entry point - in a single query. Two steps to set up. **Step 1: Install the GitHub App** Go to [github.com/apps/pharaoh-so/installations/new](https://github.com/apps/pharaoh-so/installations/new) and install on your org or personal account. Select the repos you want mapped. Pharaoh requests **read-only** access. It parses your code with tree-sitter, builds a knowledge graph of your architecture, and discards the source code. Mapping starts automatically and takes 1-3 minutes. **Step 2: Connect your AI tool** Pharaoh uses [MCP](https://modelcontextprotocol.io/) to plug into your AI coding tool. Three ways to connect: ### Run a command (Claude Code / OpenClaw) ```bash npx @pharaoh-so/mcp ``` A browser opens for GitHub OAuth. Authorize and you're connected. ### Add the MCP URL (Claude.ai, Cursor, Windsurf, OpenClaw) Open your tool's MCP settings and add this server: ``` https://mcp.pharaoh.so/sse ``` Complete the GitHub OAuth flow when prompted. | Tool | Where to find MCP settings | |------|---------------------------| | **Claude.ai** | Settings → MCP Servers → Add | | **Cursor** | Settings → MCP → Add Server | | **Windsurf** | Settings → MCP Servers → Add | | **OpenClaw** | `/mcp add` or Settings → Skills → Add MCP | ### Device auth flow (headless, SSH, containers) No browser on the machine? No problem: ```bash npx @pharaoh-so/mcp ``` You get a URL and a code. Open the URL on any device with a browser, enter the code, authorize with GitHub. The MCP connection runs locally via stdio. ### Detailed setup + troubleshooting Each tool has a dedicated page with full setup steps and debug flow: - [Claude Code](setup/claude-code.md) - [Claude.ai](setup/claude-ai.md) - [Cursor](setup/cursor.md) - [Windsurf](setup/windsurf.md) - [OpenClaw](setup/openclaw.md) - [Headless / SSH / CI](setup/headless.md) Need to install the GitHub App first? [Full GitHub App guide](setup/github-app.md) **Verify** Ask your AI tool anything about your codebase: ``` What modules does this codebase have? ``` Your tool calls `get_codebase_map` automatically. If you see modules, dependencies, and endpoints, Pharaoh is working. **What happens next** Your AI tool now queries Pharaoh's knowledge graph automatically. Before writing code, before refactoring, before reviewing PRs. You don't invoke tools manually. The agent decides when architectural context would help and queries the graph on its own. The graph re-maps on every push to your default branch. No maintenance required. **Guides** Get the most out of Pharaoh: - [Explore your codebase](guides/explore-your-codebase.md) - see the full architecture before changing anything - [Refactor safely](guides/safe-refactoring.md) - trace every caller before touching shared code - [Review pull requests](guides/review-pull-requests.md) - structural checks, not just diffs - [Find dead code](guides/find-dead-code.md) - unreachable functions and duplicate logic - [Check test coverage](guides/check-test-coverage.md) - find untested high-risk code - [Map open-source repos](guides/map-open-source.md) - query any public GitHub repo's architecture - [Set up PR Guard](guides/pr-guard.md) - automated structural checks on every PR #### Table of contents # Table of contents * [Get Started](README.md) **Setup** * [GitHub App](setup/github-app.md) * [Claude Code](setup/claude-code.md) * [Claude.ai](setup/claude-ai.md) * [Cursor](setup/cursor.md) * [Windsurf](setup/windsurf.md) * [OpenClaw](setup/openclaw.md) * [Headless / SSH / CI](setup/headless.md) **Guides** * [Explore Your Codebase](guides/explore-your-codebase.md) * [Refactor Safely](guides/safe-refactoring.md) * [Review Pull Requests](guides/review-pull-requests.md) * [Find Dead Code](guides/find-dead-code.md) * [Check Test Coverage](guides/check-test-coverage.md) * [Map Open-Source Repos](guides/map-open-source.md) * [Set Up PR Guard](guides/pr-guard.md) * [Multi-Agent Teams](guides/multi-agent-teams.md) **Tools Reference** * [Overview](tools/overview.md) * [Orient Tools](tools/orient.md) * [Investigate Tools](tools/investigate.md) * [Audit Tools](tools/audit.md) * [Manage Tools](tools/manage.md) **Concepts** * [How Pharaoh Works](concepts/how-it-works.md) * [Security](concepts/security.md) * [Pricing](concepts/pricing.md) ### Setup Guides #### Claude.ai # Claude.ai Connect Pharaoh to Claude.ai so it sees your codebase architecture in every conversation. **Prerequisite:** [Install the GitHub App](github-app.md) first. **Setup** 1. Open the Claude desktop app 2. Go to **Settings > MCP Servers** 3. Click **Add** 4. Enter the URL: `https://mcp.pharaoh.so/sse` 5. Complete the GitHub OAuth flow in the browser window that opens 6. Authorize with the GitHub account that has the Pharaoh app installed MCP servers require the Claude desktop app. The browser-only version of Claude.ai does not support MCP. **Verify** Start a new conversation and ask: ``` What modules does my codebase have? Use get_codebase_map. ``` Claude should return a module breakdown with file counts, function counts, and dependency relationships. **What happens automatically** - Claude calls Pharaoh tools before making architectural decisions - checking for duplicate functions, understanding callers, mapping module boundaries - Every push to your default branch refreshes the graph within minutes - No config files or per-repo setup needed after the initial install - Pharaoh tools return structured data in minimal tokens so Claude has room to reason **Troubleshooting** **"MCP server fails to connect":** MCP requires the Claude desktop app. If you're using Claude in a browser tab, download the desktop app from [claude.ai/download](https://claude.ai/download). **Tools appear but return no data:** The GitHub App must be installed and mapping must be complete. Check your Pharaoh dashboard or wait 1-3 minutes after installing the app for initial mapping to finish. **OAuth redirect loop or authorization error:** Clear your browser cookies for `mcp.pharaoh.so`, then try again. If that fails, try in an incognito window to rule out extension interference. **Wrong repos showing up:** Pharaoh maps the repos your GitHub App installation covers. To change which repos are included, go to **GitHub > Settings > Applications > Pharaoh > Configure**. #### Claude Code # Claude Code Connect Pharaoh to Claude Code so it sees your codebase architecture in every conversation. **Prerequisite:** [Install the GitHub App](github-app.md) first. **Setup** Run one command: ```bash npx @pharaoh-so/mcp ``` This handles everything: authenticates via GitHub, registers Pharaoh as a global MCP server, and installs all development skills. Safe to re-run — it resets and reinstalls fresh. ### Alternative: Direct SSE (no local process) If you prefer a direct SSE connection instead of the stdio proxy: ```bash claude mcp add --transport sse --scope user pharaoh https://mcp.pharaoh.so/sse ``` **Verify** Start a new conversation and ask: ``` What modules does this codebase have? ``` Claude should call `get_codebase_map` and return a module breakdown with file counts, function counts, and dependency relationships. **What happens automatically** - Claude calls Pharaoh tools before making architectural decisions - checking for duplicate functions, understanding callers, mapping module boundaries - Every push to your default branch refreshes the graph within minutes - No config files or per-repo setup needed after the initial install - Pharaoh tools return structured data in minimal tokens so Claude has room to reason **Troubleshooting** **"No repos found" or empty results:** The GitHub App must be installed on the correct org or account. Go to [github.com/apps/pharaoh-so/installations/new](https://github.com/apps/pharaoh-so/installations/new) and verify the installation covers the repos you expect. **OAuth fails or redirects to an error:** Authorize with the same GitHub account that has the Pharaoh app installed. If you have multiple GitHub accounts, check which one is signed in. **Connection drops or tools stop responding:** Re-run setup (it removes stale entries automatically): ```bash npx @pharaoh-so/mcp ``` **Tools not appearing in conversation:** Start a new conversation after adding the MCP server. Existing conversations don't pick up new servers. #### Cursor # Cursor Connect Pharaoh to Cursor so it sees your codebase architecture when you use AI features. **Prerequisite:** [Install the GitHub App](github-app.md) first. **Setup** ### Option A: Cursor settings UI 1. Open Cursor 2. Go to **Settings > MCP** 3. Click **Add Server** 4. Enter the URL: `https://mcp.pharaoh.so/sse` 5. Complete the GitHub OAuth flow in the browser window that opens ### Option B: Config file Add to `.cursor/mcp.json` in your project root: ```json { "mcpServers": { "pharaoh": { "url": "https://mcp.pharaoh.so/sse" } } } ``` Restart Cursor after saving. The OAuth flow runs on first connection. **Verify** Open Cursor's AI chat and ask: ``` What modules does this codebase have? ``` Cursor should call `get_codebase_map` and return a module breakdown with file counts, function counts, and dependency relationships. **What happens automatically** - Cursor's AI calls Pharaoh tools before making architectural decisions - checking for duplicate functions, understanding callers, mapping module boundaries - Every push to your default branch refreshes the graph within minutes - No config files or per-repo setup needed after the initial install - Pharaoh tools return structured data in minimal tokens so the AI has room to reason **Troubleshooting** **"MCP not available" or no MCP option in settings:** Update Cursor to the latest version. MCP support requires a recent release. **Auth fails or OAuth window doesn't open:** Authorize with the same GitHub account that has the Pharaoh app installed. If you have multiple GitHub accounts, check which one is signed in to your browser. **Tools not showing in AI chat:** Restart Cursor after adding the MCP server. If using the config file approach, verify the JSON is valid and the file is in `.cursor/mcp.json` at the project root. **"No repos found" or empty results:** The GitHub App must be installed on the correct org or account. Go to [github.com/apps/pharaoh-so/installations/new](https://github.com/apps/pharaoh-so/installations/new) and verify the installation. #### Install the GitHub App # Install the GitHub App Pharaoh needs read-only access to your repos to build the knowledge graph. This is the first step before connecting any AI tool. **Install** 1. Go to [github.com/apps/pharaoh-so/installations/new](https://github.com/apps/pharaoh-so/installations/new) 2. Click **Install** 3. Choose your organization or personal account 4. Select **All repositories** or pick specific repos 5. Click **Install** Pharaoh requests two permissions: - **Repository contents** (read) - to parse code structure - **Repository metadata** (read) - to detect pushes and branches It cannot write to your repos, push commits, or modify code. **What happens after install** Pharaoh clones each selected repo using read-only installation tokens, parses the code with [tree-sitter](https://tree-sitter.github.io/), builds a Neo4j knowledge graph of the architecture, and discards the source code. This takes 1-3 minutes per repo depending on size. The graph contains function names, file paths, dependencies, complexity scores, and entry points. No source code is stored. The graph re-maps automatically on every push to your default branch via webhooks. No maintenance required. **Managing repos** To add or remove repos after install: 1. Go to **GitHub > Settings > Applications > Pharaoh** 2. Click **Configure** 3. Change repository access Changes take effect within minutes. Added repos are mapped. Removed repos are purged from the graph. **Next step** Connect your AI tool: [Claude Code](claude-code.md) | [Claude.ai](claude-ai.md) | [Cursor](cursor.md) | [Windsurf](windsurf.md) | [OpenClaw](openclaw.md) | [Headless](headless.md) #### Headless / SSH / CI # Headless / SSH / CI Connect Pharaoh in environments without a browser: VPS, containers, SSH sessions, CI pipelines. **Prerequisite:** [Install the GitHub App](github-app.md) first. **Setup** Run the device flow CLI: ```bash npx @pharaoh-so/mcp ``` 1. The CLI prints a URL and a one-time code 2. Open the URL on any device with a browser 3. Enter the code and authorize with GitHub 4. The MCP connection starts via stdio Requires Node.js 18+. **Configure with your AI tool** For any tool that supports stdio MCP transport, add this to its MCP config: ```json { "mcpServers": { "pharaoh": { "command": "npx", "args": ["@pharaoh-so/mcp"] } } } ``` For Claude Code specifically: ```bash npx @pharaoh-so/mcp ``` The device flow runs on first connection. After authorization, the token is cached locally. **Verify** Ask your AI tool: ``` What modules does this codebase have? ``` It should call `get_codebase_map` and return a module breakdown with file counts, function counts, and dependency relationships. **What happens automatically** - Your AI tool calls Pharaoh tools before making architectural decisions - checking for duplicate functions, understanding callers, mapping module boundaries - Every push to your default branch refreshes the graph within minutes - No config files or per-repo setup needed after the initial install - Pharaoh tools return structured data in minimal tokens so the AI has room to reason **Troubleshooting** **`npx` fails or command not found:** Ensure Node.js 18+ is installed. Run `node --version` to check. Install from [nodejs.org](https://nodejs.org/) or via your package manager. **Device code expired:** Codes expire after 15 minutes. Run `npx @pharaoh-so/mcp` again to get a new code. **Connection works once then fails on restart:** The cached token may have expired. Re-run the device flow to get a fresh token. **"No repos found" after authorization:** The GitHub App must be installed on the correct org or account. Go to [github.com/apps/pharaoh-so/installations/new](https://github.com/apps/pharaoh-so/installations/new) and verify the installation. **Firewall blocks outbound connections:** Pharaoh requires HTTPS access to `mcp.pharaoh.so` (port 443) and `github.com` (port 443) for the OAuth flow. #### OpenClaw # OpenClaw Connect Pharaoh to OpenClaw so your AI agent sees your codebase architecture. Works with desktop instances and headless bots on Signal, Telegram, Discord, and WhatsApp. **Prerequisite:** [Install the GitHub App](github-app.md) first. **Setup** ### Option A: Command ``` /mcp add pharaoh https://mcp.pharaoh.so/sse ``` Complete the GitHub OAuth flow when prompted. ### Option B: Config file (browser-based) Add to your OpenClaw MCP configuration: ```json { "mcpServers": { "pharaoh": { "url": "https://mcp.pharaoh.so/sse" } } } ``` The OAuth flow runs on first connection. ### Option C: Headless bots (Signal, Telegram, Discord, WhatsApp) Bots on messaging platforms have no browser. Use the stdio transport with device flow auth: ```json { "mcpServers": { "pharaoh": { "command": "npx", "args": ["@pharaoh-so/mcp"] } } } ``` On first run, the CLI prints a URL and a code. Open the URL on any device with a browser, enter the code, and authorize with GitHub. The bot stores the token and reconnects automatically after that. **Verify** Ask your OpenClaw agent: ``` What modules does my codebase have? ``` The agent should call `get_codebase_map` and return a module breakdown with file counts, function counts, and dependency relationships. **What happens automatically** - The agent calls Pharaoh tools before making architectural decisions - checking for duplicate functions, understanding callers, mapping module boundaries - Every push to your default branch refreshes the graph within minutes - No config files or per-repo setup needed after the initial install - Pharaoh tools return structured data in minimal tokens so the agent has room to reason **Troubleshooting** **"Skill not loading" or MCP tools don't appear:** Check that your OpenClaw version supports MCP. Update to the latest release if tools aren't showing. **Auth fails in headless mode:** Use the stdio transport (Option C) with device flow. The SSE transport (Options A and B) requires a browser for OAuth. **"No repos found" or empty results:** The GitHub App must be installed on the correct org or account. Go to [github.com/apps/pharaoh-so/installations/new](https://github.com/apps/pharaoh-so/installations/new) and verify the installation. **Device code expired:** Device codes expire after 15 minutes. Run the bot again to get a new code. **Bot authorized but returns no data:** Wait 1-3 minutes after installing the GitHub App for initial mapping to complete. Large repos take longer. #### Windsurf # Windsurf Connect Pharaoh to Windsurf so it sees your codebase architecture when you use AI features. **Prerequisite:** [Install the GitHub App](github-app.md) first. **Setup** ### Option A: Windsurf settings UI 1. Open Windsurf 2. Go to **Settings > MCP Servers** 3. Click **Add** 4. Enter the URL: `https://mcp.pharaoh.so/sse` 5. Complete the GitHub OAuth flow in the browser window that opens ### Option B: Config file Add to your Windsurf MCP configuration file (`~/.codeium/windsurf/mcp_config.json`): ```json { "mcpServers": { "pharaoh": { "url": "https://mcp.pharaoh.so/sse" } } } ``` Restart Windsurf after saving. The OAuth flow runs on first connection. **Verify** Open Windsurf's AI chat (Cascade) and ask: ``` What modules does this codebase have? ``` Cascade should call `get_codebase_map` and return a module breakdown with file counts, function counts, and dependency relationships. **What happens automatically** - Cascade calls Pharaoh tools before making architectural decisions - checking for duplicate functions, understanding callers, mapping module boundaries - Every push to your default branch refreshes the graph within minutes - No config files or per-repo setup needed after the initial install - Pharaoh tools return structured data in minimal tokens so the AI has room to reason **Troubleshooting** **MCP option not available in settings:** Update Windsurf to the latest version. MCP support requires a recent release. **Auth fails or OAuth window doesn't open:** Authorize with the same GitHub account that has the Pharaoh app installed. If you have multiple GitHub accounts, check which one is signed in to your browser. **Tools not showing in Cascade:** Restart Windsurf after adding the MCP server. Check that the server shows a connected status in Settings > MCP Servers. **"No repos found" or empty results:** The GitHub App must be installed on the correct org or account. Go to [github.com/apps/pharaoh-so/installations/new](https://github.com/apps/pharaoh-so/installations/new) and verify the installation. **Config file location varies by OS:** On macOS: `~/.codeium/windsurf/mcp_config.json`. On Linux: `~/.codeium/windsurf/mcp_config.json`. On Windows: `%USERPROFILE%\.codeium\windsurf\mcp_config.json`. ### How-To Guides #### Guides # Guides Step-by-step workflows for getting the most out of Pharaoh. Each guide shows you what to ask your AI tool, what results to expect, and how to act on them. - [Explore Your Codebase](explore-your-codebase.md) - see the full architecture before changing anything - [Refactor Safely](safe-refactoring.md) - trace every caller before touching shared code - [Review Pull Requests](review-pull-requests.md) - structural checks, not just diffs - [Find Dead Code](find-dead-code.md) - unreachable functions and duplicate logic - [Check Test Coverage](check-test-coverage.md) - find untested high-risk code - [Map Open-Source Repos](map-open-source.md) - query any public GitHub repo's architecture - [Set Up PR Guard](pr-guard.md) - automated structural checks on every PR - [Multi-Agent Teams](multi-agent-teams.md) - adversarial review and specialized agent playbooks #### Check Test Coverage # Check Test Coverage Line coverage is a vanity metric. A function can be "covered" by a test that never checks its output. What matters: which high-complexity, high-exposure functions have no tests at all? Pharaoh answers that question per module, ranked by risk. **The workflow** ### 1. Find the coverage gaps Ask your AI tool: ``` Which modules have the worst test coverage? ``` This triggers `get_test_coverage`. You get back per-module summaries: - Files with associated test files - Files without any test coverage - High-complexity functions that lack tests - the most dangerous gaps ### 2. Rank gaps by risk Ask your AI tool: ``` What's the regression risk in the payment module? ``` This triggers `get_regression_risk`. Each function gets scored by: - **Complexity** - cyclomatic complexity (branches, loops, conditionals) - **Caller count** - how many other functions depend on it - **Churn** - how frequently it's been modified - **Exposure** - whether it's connected to HTTP endpoints, cron jobs, or CLI commands The intersection of "high regression risk" and "no tests" is where production bugs live. ### 3. Trace downstream impact Ask your AI tool: ``` What breaks if the checkout flow fails? ``` This triggers `get_blast_radius` on the checkout module. Combine the result with coverage data: if a function has 12 downstream callers and no tests, that's your top priority. **What to look for** - **High complexity + no tests** - a function with cyclomatic complexity above 10 and zero test files is a ticking time bomb. Every branch in that function is an untested path to production. - **High regression risk + no tests** - the intersection of "likely to break" and "no safety net." These functions should be at the top of every test-writing sprint. - **Modules with 0% coverage** - often utility modules or internal services that were "too simple to test." They stay simple until they don't, and by then nobody remembers what they're supposed to do. - **High-exposure functions** - functions connected to multiple HTTP endpoints or called by many modules. A bug here affects the most users. **Tips** - Prioritize test writing by regression risk score, not by module name alphabetically. A risk-ranked backlog ensures you write the most valuable tests first. - High-complexity functions usually need integration tests, not unit tests. Mocking the 8 dependencies of a complex function hides the interaction bugs that matter most. Test the real code path. - Check coverage after every PR. Catching test gaps at review time costs minutes. Catching them after a production incident costs hours or days. #### Explore Your Codebase # Explore Your Codebase Your AI reads files one at a time. Before it touches anything, give it the full architectural picture - every module, dependency, entry point, and hot file - in a single query. **The workflow** ### 1. Get the full map Ask your AI tool: ``` What does this codebase look like? ``` This triggers `get_codebase_map`. You get back: - Every module with file count, LOC, and exported function count - The dependency graph between modules (who imports whom) - Entry points: HTTP endpoints, cron jobs, CLI commands - Hot files: most modified in the last 90 days This is the equivalent of looking at a city map before walking the streets. Start here. ### 2. Dive into a specific module Ask your AI tool: ``` Show me the auth module before I change anything. ``` This triggers `get_module_context`. You get back the complete module profile: - All files in the module - Exported functions with their signatures - Internal dependencies (what this module imports) - External callers (what other modules import from this one) - DB tables read or written - HTTP endpoints served - Environment variables used ### 3. Search before writing Ask your AI tool: ``` Is there already a function that validates email addresses? ``` This triggers `search_functions`. It searches every function in the codebase by name, signature, and module - finding existing code before you duplicate it. ### 4. Check existing UI components Ask your AI tool: ``` What design system components exist? ``` This triggers `get_design_system`. Returns existing UI components, design tokens, and patterns. Prevents creating a second `