Alert Triage Automation for a UK MSP — Under 60 Seconds from Alert to Analyst

A UK managed service provider was managing security across multiple client environments from a shared operations desk. Firewall IPS alerts — dozens per shift — were being triaged by gut feel, with no consistent enrichment process and no documented audit trail. When the same alert landed with two different analysts, two different things happened.

We designed and built an intelligence-driven alert triage pipeline that changes that. Every alert is now automatically enriched, scored against external threat intelligence, and — where the score crosses the threshold — promoted to a structured case with a standardised task list and a real-time Teams notification sent to the on-call analyst.

From firewall event to enriched case and analyst notification: under 60 seconds, every time.

The Challenge

At a multi-client MSP, the alert volume is constant. WatchGuard IPS events, blocked connections, anomalous traffic patterns — the question is never whether any of them matter, it's which ones, and how quickly you can determine that.

Manual triage was creating three compounding problems:

Inconsistency. Different analysts made different calls on identical alerts. One would look up a source IP in AbuseIPDB; another wouldn't bother. The result was cases with wildly varying amounts of information, and no reliable way to know whether an alert had been properly investigated or quickly dismissed.

Speed. Logging into multiple tools, running lookups manually, cross-referencing IP addresses — this compounds across every alert in a shift. During high-volume periods, genuine threats were being skimmed. The process favoured analysts who happened to be on shift, not the severity of the threat.

No audit trail. Informal triage produces no record. If a decision wasn't documented in a case management system, there was no way to reconstruct what had been seen, what had been checked, or why a particular call had been made. For an MSP whose clients are increasingly asking security questions, this is a liability.

Our Approach

The goal was not to replace the analyst. Decisions about whether to block an IP — particularly in a multi-tenant environment — should always involve a human. The goal was to ensure that every alert reaching a human was already enriched, scored, and sitting inside a properly documented case, regardless of who was on shift.

We designed the pipeline around four components the MSP already had or could readily deploy, connected by a lightweight Python automation layer:

WatchGuard T185 — the edge firewall. IPS events are forwarded via syslog (using SC4S) to Splunk, carrying source IP, destination IP, rule name, policy, and action for every event.

Splunk — the SIEM. A correlation search groups WatchGuard IPS events by source IP over a rolling time window. When the same source IP triggers multiple rules within the threshold, Splunk fires an alert and creates a corresponding record in TheHive via webhook. This filters isolated single hits and focuses attention on persistent or repeated activity.

TheHive 5 — the case management platform, deployed as a Docker container alongside its Elasticsearch backend on an internal server. Every Splunk alert lands here as a New alert. Analysts can manage the full investigation lifecycle from a single interface, with a complete audit trail.

Cortex 4 — the enrichment engine, also running as a Docker container on the same host. Cortex provides access to external threat intelligence via analysers. We enabled AbuseIPDB (which returns an Abuse Confidence Score between 0 and 100) and several VirusTotal analysers for cross-referencing IPs, domains and file hashes.

What We Built

Tying the pipeline together is a Python 3 automation script — triage_enrich.py — running on a one-minute cron schedule on the internal server.

Each time it runs, the script queries TheHive's API for all alerts with status New. For each alert, it extracts the list of observables, filters for IP addresses, and submits an enrichment job to Cortex using the AbuseIPDB analyser. When Cortex returns the result, the script reads the Abuse Confidence Score from the taxonomy output. The decision logic from there is simple:

  • Score 50 or above: the alert is promoted to a case. The script attaches the Firewall IPS Block task list, sets the case severity, and posts a Teams Adaptive Card to the operations channel.
  • Score below 50, or no result returned: the alert status is updated to prevent reprocessing. No case is created.

The Teams notification contains the case number, the source IP, the AbuseIPDB score, and a direct link to the TheHive case — everything the on-call analyst needs to make an initial decision without opening another tool.

Critically, the automation stops at the notification. It does not touch the firewall. A block requires the analyst to work through the case tasks, confirm the enrichment, and obtain approval from the information security manager before any deny rule is added to WatchGuard.

Case Structure

Every promoted case arrives in TheHive with the same ten-task Firewall IPS Block workflow, giving every investigation the same structure regardless of which analyst picks it up:

  1. Triage — acknowledge and assign the case
  2. Triage — verify AbuseIPDB score and reported abuse categories
  3. Intel — review VirusTotal enrichment attached to the case
  4. Intel — confirm whether activity is malicious based on all enrichment
  5. SIEM — search for all traffic involving this IP in the last 30 days
  6. SIEM — identify affected internal hosts and traffic direction
  7. SIEM — export and attach log evidence to the case
  8. Risk — classify as High / Medium / Low and document the rationale
  9. Response — escalate to ISM for approval before any blocking action
  10. Response — manual block (ISM approval required; add IP to firewall deny rule, close task)

This structure means every analyst follows the same investigation path, every case accumulates the same categories of evidence, and every decision — including the decision not to block — gets documented.

Results

// outcomes

  • Alert-to-analyst notification time: under 60 seconds, end to end
  • Every qualifying alert enriched automatically — zero manual threat intel lookups required before first review
  • Consistent triage process regardless of which analyst is on shift or how busy the queue is
  • Every case arrives with a standardised 10-task workflow — no improvisation, no missed steps
  • Full audit trail from first firewall event to case closure, including all enrichment results and decision rationale
  • Noise reduction: single-hit IPS events are filtered at the Splunk correlation layer before they reach TheHive

In a demonstration scenario run during testing, a known-malicious IP generated repeated IPS hits against the firewall. Within a minute, the Splunk correlation search fired, a TheHive alert was created, the automation picked it up, called Cortex, and received an AbuseIPDB score of 100. The script promoted the alert to a case, attached the task list, and posted a Teams card to the support channel — all before an analyst had opened a single tool.

The automation doesn't make the decision. It makes sure every decision is made with the same quality of information — and that the whole process is on record.

What This Means for MSPs

The tooling involved — Splunk, TheHive, Cortex — is accessible. TheHive and Cortex are open source. AbuseIPDB provides a free tier covering hundreds of daily checks. The Python automation is straightforward to maintain. The principal infrastructure cost is an internal server and the initial integration time.

The return is a triage operation that is consistent across analysts and shifts, that enriches every alert automatically before a human sees it, and that produces a documented case for every incident meeting the severity threshold — including the incidents where the correct decision is to take no action.

For an MSP managing security across multiple clients, that consistency is significant. It reduces analyst cognitive load, reduces the risk of genuine threats being missed during busy periods, and produces the kind of audit trail that holds up when a client asks what was done about the suspicious traffic their firewall flagged last Tuesday.

If you're running MSP security operations and still triaging alerts manually, this architecture is a practical starting point. The components are mature, the integration points are well-documented, and the improvement in consistency is immediate from day one.