What it does
An AI agent powered by an AI model connects to your GitHub repository and/or Jira instance and processes each new issue at the moment it is created. The system extracts meaning from the unstructured text of the title and description, classifies the ticket according to the team's internal taxonomy, and performs standard triage actions without engineer involvement.
What automation does in practice:
- Picks up a new issue via webhook (GitHub Issues API or Jira Webhooks) immediately after creation.
- Extracts key entities from the description: type (bug/feature/question), affected component or module, mentioned versions, environment, stacktrace.
- Determines priority according to team rules: severity, affected users, business impact.
- Sets labels and components in accordance with the approved taxonomy.
- Searches for duplicates — semantic search across already open issues over the past 6-12 months.
- Assigns an owner: by component, by module ownership, by round-robin within the team.
- Leaves a comment with a concise structured summary for the engineer — what broke, where, how to reproduce, similar tickets.
- Escalates to the Slack team channel if the issue is marked as critical or resembles a production incident.
Results from deployment practice: a senior engineer was spending 3 hours a week on manual triage — it became 20 minutes of quick review of edge cases. Time-to-label dropped from 18 hours to 2 hours. Duplicates are caught automatically — previously it took up to 1-2 days before someone noticed a repeat.
What automation does NOT do
Honest boundaries of AI agent responsibility:
- Does not make engineering decisions. Triage sets the labels and assigns an owner, but does not decide 'fix now or not' and 'which sprint' — that stays with the tech lead.
- Does not close issues. Even obvious duplicates are flagged and linked by the agent, but the final closure is done by a human — this is a safeguard against false matches.
- Does not work with private architecture without context. The agent needs a populated labels taxonomy, module ownership map, and examples of correctly labeled tickets.
How it works
Architecturally, the automation is built as a slim service between the issue tracker and the LLM: a webhook catches the event, context is assembled (issue text, similar open tickets, module ownership, previously labeled examples), the LLM returns a structured JSON with the classification, and the response is applied to the issue via the platform API. Everything is wrapped in retry logic and human-in-the-loop for contested cases.
Technical sequence
- A webhook from GitHub or Jira arrives at the triage service (FastAPI or Node) on events
issues.openedandissues.edited. - The service assembles context: title, description, author, existing labels, list of potential duplicates (top-5 by embedding similarity from the vector index).
- The service builds a prompt for the AI model: the label taxonomy, module ownership map, few-shot examples of previously correctly labeled issues, and the text of the new ticket are passed in.
- Claude returns JSON:
{ labels, priority, component, owner, duplicate_candidates, summary, confidence }. - The service validates the JSON against a schema: if the confidence level is below the threshold (e.g., 0.75) — it marks the issue with the tag
needs-human-triageand does not assign an owner. - For confident cases, the service applies labels and the assignee via the GitHub or Jira API, and leaves a comment with a summary and links to similar tickets.
- Critical issues are escalated to the team's Slack channel via an incoming webhook.
- Every action is written to an audit log — tracking what the agent changed and why.
Solution components
Component | Role |
|---|---|
Webhook receiver | Receiving events from GitHub Issues and Jira Webhooks |
Context builder | Collecting description, similar tickets, ownership map |
Vector index | Storing embeddings of open issues for duplicate search |
LLM client (AI model) | Classification and entity extraction |
Schema validator | Validating JSON response and confidence level |
Action executor | Writing labels, assignee, comment via API |
Slack notifier | Escalating critical tickets |
Audit log | Full history of agent decisions |
Why custom-code, not ready-made no-code
For issue triage, three things matter that are poorly covered by ready-made builders: precise control over the prompt and few-shot examples (your taxonomy is specific), a duplicate vector index with persistent store embeddings, and a transparent audit log. All of this is handled in ~500-800 lines of Python or TypeScript with an LLM client, a vector DB (pgvector, Qdrant), and two API integrations. Typical stack: FastAPI + PostgreSQL with pgvector + AI model via Anthropic SDK + Octokit or Jira REST.
Human-in-the-loop as the default
The agent does not operate on full trust. Two mechanisms ensure safety:
- Confidence threshold — when the level is below the threshold, the decision is marked
needs-human-triage, labels are not applied. - Weekly review — once a week, the tech lead checks 20 random agent decisions against reality, few-shot examples are updated, and quality does not degrade.
Result: the majority of issues are labeled without engineer involvement, contested cases go to manual triage — but already with an agent-prepared summary and duplicate candidates, which still speeds up the work.
Prerequisites
To start triage, the team needs ready data, access credentials, and minimal organizational preparation.
Data and access:
- GitHub Personal Access Token (scope
repo) or a Jira API token with write permissions for labels, assignee, and comments. - Approved label taxonomy — a list of types (bug/feature/task), priorities, and components. Usually 15-40 categories.
- Module ownership map: which person or team is responsible for which component or module.
- 100-200 correctly labeled issues from the past 6-12 months — for few-shot examples and building a vector index of duplicates.
- Anthropic API key for the AI model.
Infrastructure:
- A server or managed platform for the triage service (VPS, Render, Railway, AWS Lambda — any option will work).
- PostgreSQL with pgvector or Qdrant/Weaviate for the vector index.
- A Slack workspace and incoming webhook if escalation of critical issues is needed.
Team readiness:
- A tech lead or senior engineer as the automation owner — aligns the taxonomy and reviews quality for the first 2-3 weeks.
- 30-40 minutes per week from the owner for a weekly review of the agent's decisions in the first month, then 15 minutes.
- The team accepts the rules: duplicates are linked automatically, but a person closes them.
Implementation timeline:
For complexity «week» the expected timeline is 2-4 weeks. Week 1 — taxonomy alignment and few-shot dataset preparation. Week 2 — service implementation and webhook connection. Week 3 — shadow mode (the agent writes decisions to log but does not apply them), confidence threshold calibration. Week 4 — transition to production with human review.
Pain points
- Errors in Manual Operations
- Repetitive Routine Tasks
- Constant context switching
FAQ
How long does implementation take from start to production?
For a team with a prepared labels taxonomy and ready API access — 2-4 weeks. Week 1 goes to aligning categories and collecting a few-shot dataset, week 2 — to service implementation and webhooks, week 3 — shadow mode for calibration, week 4 — launch with human review. If no taxonomy exists yet — add 1-2 weeks for its formalization with the tech lead.
What should we do if we don't have a clear labels taxonomy?
This is a normal situation for teams that have grown organically. Before launching automation, a short labeling workshop is held — 2-3 one-hour sessions with the tech lead and senior engineers, where the list of types, priorities, and components is formalized. Coverage is verified against the last 200-300 issues. Without this step, the AI agent will not be able to deliver stable labeling results.
What are the risks and what breaks with incorrect configuration?
Three main risks: misclassification with a weak taxonomy (addressed by few-shot examples and weekly review), false positives on duplicates (resolved by confidence threshold and the fact that closing is always done by a human), Slack overload with overly aggressive escalation. All three are mitigated by shadow mode in the first week — the agent writes decisions to a log but does not apply them until the tech lead confirms quality.
Is automation suitable for non-SaaS teams — agencies, internal IT, product teams in corporations?
Yes. Triage works wherever there is an issue tracker with an active ticket flow — GitHub Issues, Jira, Linear, GitLab. SaaS and product teams get the most value due to incoming volume, but agencies with multiple client projects and internal IT departments adopt a similar approach — the taxonomy simply becomes two-level (client + category).
Can it be used with Linear or GitLab instead of GitHub/Jira?
Yes, the architecture is tracker-agnostic. Linear and GitLab provide similar webhooks and REST/GraphQL API for writing labels, assignee, and comments. Adaptation of the webhook receiver and action executor will be required — 1-2 days of additional work. The confidence threshold semantics, prompt template, and duplicate vector index are reused without changes.
What about private data in issues — can information leak externally?
Data is passed to the AI model via the Anthropic API in accordance with their data usage policy for the commercial API. For strict compliance requirements, a redaction step is added: the service removes sensitive fields (emails, tokens, stacktraces with PII) before sending to the LLM. A full audit log of the agent's actions is written locally in your infrastructure and is available for audit.
Want this in your business?
Book a free audit — we'll show how this automation will work for you.