PM Agent: Rethinking Project Management from the Ground Up

The Problem

Traditional project management tooling is broken in a specific, predictable way. A senior stakeholder sets an artificial deadline, a PM builds a work breakdown structure, work gets assigned, and then the whole system devolves into a weekly ritual of the PM interrogating the team, manually dragging date sliders in a Gantt chart, and writing status reports from information that’s already stale.

The PM ends up playing two roles that shouldn’t be the same job: coordinator (unblocking dependencies, the genuinely valuable work) and nag (chasing updates, asking why things are late, compiling status manually). Most project management tooling optimizes for the nag role, which is exactly backwards.

This isn’t a software development problem. It’s a project management problem. It looks the same whether you’re building an enterprise application, constructing a commercial building, managing a product launch, or running a legal due diligence process. Any work that can be broken into tasks with owners, dependencies, and due dates has this problem. The tools and the dysfunction travel together.

The Two-System Problem

Underneath this dysfunction sits a structural problem that rarely gets named directly: most organizations are running two parallel systems that are never truly in sync.

The PM lives in a scheduling tool — Microsoft Project, Smartsheet, Wrike, or similar. They maintain a schedule there: task names, start dates, end dates, dependencies, resource assignments. Meanwhile, the people doing the work live somewhere else entirely — a ticketing system, a punch list, a field management app, a shared spreadsheet. Both systems are supposed to represent the same project. In practice they diverge almost immediately and stay diverged.

A worker creates a task with a slightly different name than the PM used in the schedule. The PM updates an end date in the Gantt but forgets to update the work tracking system. A task gets split into two subtasks but the schedule still shows one item. Someone renames something in the field system but the schedule still has the old name. Over time the two systems describe two different versions of the project, and the PM spends a significant portion of their time manually reconciling them — work that produces no value and exists entirely because of the architecture.

The deeper problem is that neither system is authoritative. When they conflict, there’s no rule for which one wins. The PM trusts the schedule. The workers trust their task list. Stakeholders get reports synthesized from the schedule. The team operates from the task list. Everyone is working from a different picture of the same project.

The Silent Slip Problem

A further structural failure: when projects slip — and they always slip — there’s no reliable mechanism to capture why. Dates get moved silently in both systems, with no required explanation and no audit trail. By the time a project is six months late, the reasons are lost, disputed, or reconstructed from memory. In contractor/client relationships this becomes a legal problem. In any organization it makes learning from failure nearly impossible. Death by a thousand cuts, with no record of the cuts.

The Core Idea: Flip the Model

Instead of a top-down schedule that the team updates to satisfy a PM, the proposal is bottom-up: the task tracking system is the source of truth, and project status is computed from it — not maintained manually. The separate scheduling tool goes away entirely. There is one system, and it’s the one the people doing the work already use.

The key inversion:

The PM stops being responsible for the accuracy of the plan
The people doing the work are responsible for keeping their tasks current
The system detects slip automatically and surfaces it to the PM as signal
The PM’s job becomes purely coordination and unblocking — the work that actually requires a human

This Applies Everywhere

The framing so far has leaned toward software development because that’s where ticketing systems are most mature. But the underlying problem — two systems, silent slip, no audit trail — exists in any field where projects have tasks, owners, dependencies, and deadlines.

Consider a residential or commercial construction project. The general contractor has a schedule in Microsoft Project. Subcontractors have their own systems — some use field management apps, some use spreadsheets, some use paper. The foundation crew can’t start until the architect’s plans are approved. The framing can’t start until the foundation is inspected. The electricians and plumbers have to rough in before drywall. The inspector has to sign off before closing. Every one of those is a task with a dependency and a due date, and every one of them is subject to the same slip patterns: the permit took longer than expected, the ground was too frozen to dig, the lumber delivery was delayed, the inspector had a two-week backlog. Under the current model, the GC is manually reconciling their schedule against reality and calling subcontractors to find out where things stand. Under this model, tasks are tracked in a single system, status is computed from the task state, and when the frost delay pushes the foundation pour by three weeks, a slip event is created, the reason is recorded, and the downstream impact on framing and mechanical rough-in cascades forward automatically.

The same logic applies to event production (venue booking, vendor contracts, rehearsals, logistics all have dependencies and hard deadlines), legal and financial transactions (due diligence, regulatory filings, closing conditions), product launches (design, manufacturing, marketing, distribution all running in parallel with dependencies), film and video production (pre-production, casting, locations, shoot schedule, post), and any other domain where work has structure.

The template library is where domain specificity lives. A construction project uses templates built around permit sequences, inspection gates, and trade sequencing. A software project uses templates built around development lifecycle phases. The engine underneath — single source of truth, computed snapshots, immutable baseline, slip event log — is identical. The vocabulary of the slip reason categories is the same. The report format is the same. Only the templates change.

Design Decisions

Snapshots Are Computed, Not Maintained

A project snapshot is generated fresh on demand by querying the ticketing system (Jira or equivalent), comparing against a frozen baseline, and assembling a structured JSON document. No one drags a slider. The snapshot reflects reality as tickets actually stand — if tickets haven’t been updated, the system shows them as behind. Accountability shifts to where it belongs.

The Baseline Is Immutable

At project kickoff, a baseline is written and frozen. It captures every task, its original start and end dates, its estimated duration, and who owns it. This document never changes. It’s the contract. All future snapshots are compared against it to compute variance. This is the document you produce in a dispute when someone asks why the project is late.

Slip Events Are First-Class Objects

When a task’s due date moves past baseline, the system automatically creates a slip event — a structured record containing the task, the original date, the new date, and the number of days slipped. Slip events start in an unresolved state. They cannot be silently closed. The developer must acknowledge the event with a reason category and a narrative explanation before it resolves.

Reason categories (controlled vocabulary):

estimation_error — it took longer than expected, no external factor
external_dependency — waiting on another team or vendor
scope_change — the work turned out to be bigger than understood
resource_diversion — developer pulled onto something else
technical_blocker — discovered a problem requiring rework or research
environment_tooling — infrastructure not ready, CI broken, etc.
client_delay — client did not deliver something on time
requirements_change — requirements changed mid-task

The controlled vocabulary is what makes end-of-project analysis possible. Instead of a he-said/she-said argument about why a project ran long, you have a ledger: of 90 days of total slip, 45 were external dependencies, 30 were scope changes, 15 were estimation errors. That’s a different conversation, and it’s automatically generated as a byproduct of normal project operation.

The Snapshot Is Denormalized for LLM Consumption

The snapshot JSON is the primary document the LLM reads to generate reports, developer digests, and status summaries. Because the LLM works best with everything in one place — minimizing tool calls and context reconstruction — the snapshot is intentionally denormalized. Each task in the snapshot contains its baseline dates, current state, and all slip events inline. The underlying source files (baseline, slip event log) remain normalized and authoritative. The snapshot is a derived read model optimized for the query that runs most often: tell me about this project.

The Report Has Two Distinct Components

A generated status report separates two cognitive purposes:

The dependency graph is a directed graph — not a Gantt chart. It shows what depends on what, which tasks are on the critical path, and current status via color coding. It fits on a page because it’s laid out by relationship, not by timeline. This is the picture that communicates the structural shape of the project.

The slip ledger table is the accountability document. Three column groups: baseline (frozen), current state (from tickets), and variance (slip days + reason summaries). Each row is a task. Unresolved slip events are flagged explicitly. This table is the paper trail.

Templates Drive Kickoff

Project kickoff uses a template library — structured definitions of task sequences, dependencies, and estimated durations for known project types. A software team builds templates for REST service integrations, database migrations, and feature releases. A construction firm builds templates for foundation sequences, framing packages, and mechanical rough-ins. A legal team builds templates for due diligence checklists and closing workflows.

A PM selects a template, adjusts for project-specific factors (team experience, resource availability, known constraints), and the system generates all tasks in the tracking system and writes the frozen baseline. Templates accumulate calibration history over time: as projects complete, actual durations are compared to template estimates and the template is updated. This is how organizational knowledge about how long things actually take gets captured and propagated forward — whether that’s “REST API integrations take us 30% longer than the template assumes” or “permit approval in this county averages 6 weeks, not 3.”

Multi-Agent Architecture

The system is decomposed into specialized agents that communicate through the snapshot JSON schema. Because the schema is the contract, each agent can be developed and tested independently.

Jira Sync Agent — pulls current ticket state, produces a raw delta against the last snapshot. Deterministic, no LLM.
Snapshot Agent — assembles the full denormalized snapshot from baseline + Jira pull + slip events. Detects new slip events. Deterministic, no LLM.
Narrative Agent — takes a snapshot and produces human-readable output (status reports, executive emails, developer digests, postmortems). LLM-powered.
Report Agent — renders the dependency graph (graphviz) and slip ledger table (HTML/PDF). Deterministic.
WBS Generator — runs once at kickoff, applies template + kickoff parameters, creates tickets, writes baseline.

The LLM is used exclusively where judgment and language matter. Arithmetic, date calculations, and data assembly are handled by deterministic code.

Local-First, No Web Server

The system runs locally via Claude Desktop with MCP servers handling integration with the filesystem and Jira. The user interface is the Claude Desktop chat window. Commands like /snapshot AUTH or “what’s the status of the AUTH project?” invoke the appropriate agent. No browser, no deployment, no web server.

What This Eliminates

The two-system problem — one scheduling tool for the PM, one task system for the team, forever out of sync
Duplicate data entry and the name-mismatch drift that comes with it
The PM manually updating a schedule
The PM chasing team members for status updates
Status reports written by hand from stale information
Slip reasons lost or disputed after the fact
End-of-project postmortems assembled from memory
He-said/she-said arguments about why a project ran late — in any industry

What This Preserves (and Amplifies)

The PM’s genuine value: unblocking dependencies, coordinating across teams and trades, managing the human and organizational issues that no system can see
Accountability for the people doing the work: they own their tasks, they own their slip explanations
Organizational learning: templates calibrate over time from real project data, in any domain
Audit trail: an immutable record of what was planned, what happened, and why — whether the project was a software release or a building permit sequence

Implementation

A working prototype was built in Python with the following structure:

pm-agent/
├── pm_engine.py              ← Core computation (slip detection, snapshot assembly)
├── pm_agent.py               ← CLI orchestrator
├── agents/
│   ├── jira_sync_agent.py
│   ├── snapshot_agent.py
│   ├── narrative_agent.py    ← LLM via Anthropic API
│   ├── report_agent.py       ← HTML/PDF generation
│   └── wbs_generator.py      ← Project kickoff
├── mcp_servers/
│   └── pm_command_server.py  ← Claude Desktop MCP integration
├── templates/
│   └── rest-integration.json
└── projects/
    └── {PROJECT}/
        ├── baseline.json     ← Immutable
        ├── slip_events.json  ← Append-only
        └── snapshots/        ← Computed on demand

The system runs against mock Jira data out of the box for prototyping, and connects to a live Jira instance via API token for production use.