Blog
/
No items found.
/
June 25, 2024

Let the Dominos Fall! SOC and IR Metrics for ROI

Default blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog imageDefault blog image
25
Jun 2024
Vendors are scrambling to compare MTTD metrics laid out in the latest MITRE Engenuity ATT&CK® Evaluations. But this analysis is reductive, ignoring the fact that in cybersecurity, there are far more metrics that matter.

One of the most enjoyable discussions (and debates) I engage in is the topic of Security Operations Center (SOC) and Incident Response (IR) metrics to measure and validate an organization’s Return on Investment (ROI). The debate part comes in when I hear vendor experts talking about “the only” SOC metrics that matter, and only list the two most well-known, while completely ignoring metrics that have a direct causal relationship.

In this blog, I will discuss what I believe are the SOC/IR metrics that matter, how each one has a direct impact on the others, and why organizations should ensure they are working towards the goal of why these metrics are measured in the first place: Reduction of Risk and Costs.

Reduction of Risk and Costs

Every security solution and process an organization puts in place should reduce the organization’s risk of a breach, exposure by an insider threat, or loss of productivity. How an organization realizes net benefits can be in several ways:

  • Improved efficiencies can result in SOC/IR staff focusing on other areas such as advanced threat hunting rather than churning through alerts on their security consoles. It may also help organizations dealing with the lack of skilled security staff by using Artificial Intelligence (AI) and automated processes.
  • A well-oiled SOC/IR team that has greatly reduced or even eliminated mundane tasks attracts, motivates, and retains talent resulting in reduced hiring and training costs.
  • The direct impact of a breach such as a ransomware attack can be devastating. According to the 2024 Data Breach Investigations Report by Verizon, MGM Resorts International reported the ALPHV ransomware cost the company approximately $100 million[1].
  • Failure to take appropriate steps to protect the organization can result in regulatory fines; and if an organization has, or is considering, purchasing Cyber Insurance, can result in declined coverage or increased premiums.

How does an organization demonstrate they are taking proactive measures to prevent breaches? That is where it's important to understand the nine (yes, nine) key metrics, and how each one directly influences the others, play their roles.

Metrics in the Incident Response Timeline

Let’s start with a review of the key steps in the Incident Response Timeline:

Seven of the nine key metrics are in the IR timeline, while two of the metrics occur before you ever have an incident. They occur in the Pre-Detection Stage.

Pre-Detection stage metrics are:

  • Preventions Per Intrusion Attempt (PPIA)
  • False Positive Reduction Rate (FPRR)

Next is the Detect and Investigate stage, there are three metrics to consider:

  • Mean Time to Detection (MTTD)
  • Mean Time to Triage (MTTT)
  • Mean Time to Understanding (MTTU)

This is followed by the Remediation stage, there are two metrics here:

  • Mean Time to Containment (MTTC)
  • Mean Time to Remediation / Recovery (MTTR)

Finally, there is the Risk Reduction stage, there are two metrics:

  • Mean Time to Advice (MTTA)
  • Mean Time to Implementation (MTTI)

Pre-Detection Stage

Preventions Per Intrusion Attempt

PPIA is defined as stopping any intrusion attempt at the earliest possible stage. Your network Intrusion Prevention System (IPS) blocks vulnerability exploits, your e-mail security solution intercepts and removes messages with malicious attachments or links, your egress firewall blocks unauthorized login attempts, etc. The adversary doesn’t get beyond Step 1 in the attack life cycle.

This metric is the first domino. Every organization should strive to improve on this metric every day. Why? For every intrusion attempt you stop right out of the gate, you eliminate the actions for every other metric. There is no incident to detect, triage, investigate, remediate, or analyze post-incident for ways to improve your security posture.

When I think about PPIA, I always remember back to a discussion with a former mentor, Tim Crothers, who discussed the benefits of focusing on Prevention Failure Detection.

The concept is that as you layer your security defenses, your PPIA moves ever closer to 100% (no one has ever reached 100%). This narrows the field of fire for adversaries to breach into your organization. This is where novel, unknown, and permuted threats live and breathe. This is where solutions utilizing Unsupervised Machine Learning excel in raising anomalous alerts – indications of potential compromise involving one of these threats. Unsupervised ML also raises alerts on anomalous activity generated by known threats and can raise detections before many signature-based solutions. Most organizations struggle to find strong permutations of known threats, insider threats, supply chain attacks, attacks utilizing n-day and 0-day exploits. Moving PPIA ever closer to 100% also frees your team up for conducting threat hunting activities – utilizing components of your SOC that collect and store telemetry to query for potential compromises based on hypothesis the team raises. It also significantly reduces the alerts your team must triage and investigate – solving many of the issues outlined at the start of this paper.

False Positive Reduction Rate

Before we discuss FPRR, I should clarify how I define False Positives (FPs). Many define FPs as an alert that is in error (i.e.: your EDR alerts on malware that turns out to be AV signature files). While that is a FP, I extend the definition to include any alert that did not require triage / investigation and distracts the SOC/IR team (meaning they conducted some level of triage / investigation).

This metric is the second domino. Why is this metric important? Every alert your team exerts time and effort on that is a non-issue distracts them from alerts that matter. One of the major issues that has resonated in the security industry for decades is that SOCs are inundated with alerts and cannot clear the backlog. When it comes to PPIA + FPRR, I have seen analysts spend time investigating alerts that were blocked out of the gate while their screen continued to fill up with more. You must focus on Prevention Failure Detection to get ahead of the backlog.

Detect and Investigate Stages

Mean Time to Detection

MTTD, or “Dwell Time”, has decreased dramatically over the past 12 years. From well over a year to 16 days in 2023[2]. MTTD is measured from the earliest possible point you could detect the intrusion to the moment you actually detect it.

This third domino is important because the longer an adversary remains undetected, the more the odds increase they will complete their mission objective. It also makes the tasks of triage and investigation more difficult as analysts must piece together more activity and adversaries may be erasing evidence along the way – or your storage retention does not cover the breach timeline.

Many solutions focusing solely on MTTD can actually create the very problem SOCs are looking to solve.  That is, they generate so much alerting that they flood the console, email, or text messaging app causing an unmanageable queue of alerts (this is the problem XDR solutions were designed to resolve by focusing on incidents rather than alerts).

Mean Time to Triage

MTTT involves SOCs that utilize Level 1 (aka Triage) analysts to render an “escalate / do not escalate” alert verdict accurately. Accuracy is important because Triage Analysts typically are staff new to cyber security (recent grad / certification) and may over escalate (afraid to miss something important) or under escalate (not recognize signs of a successful breach). Because of this, a small MTTT does not always equate to successful handling of incidents.

This metric is important because keeping your senior staff focused on progressing incidents in a timely manner (and not expending time on false positives) should reduce stress and required headcount.

Mean Time to Understanding

MTTU deals with understanding the complete nature of the incident being investigated. This is different than MTTT which only deals with whether the issue merits escalation to senior analysts. It is then up to the senior analysts to determine the scope of the incident, and if you are a follower of my UPSET Investigation Framework, you know understanding the full scope involves:

U = All compromised accounts

P = Persistence Mechanisms used

S = All systems involved (organization, adversary, and intermediaries)

E = Endgame (or mission objective)

T = Techniques, Tactics, Procedures (TTPs) utilized by the adversary

MTTU is important because this information is critical before any containment or remediation actions are taken. Leave a stone unturned, and you alert the adversary that you are onto them and possibly fail to close an avenue of access.

Remediation Stages

Mean Time to Containment

MTTC deals with neutralizing the threat. You may not have kicked the adversary out, but you have halted their progress to their mission objective and ability to inflict further damage. This may be through use of isolation capabilities, termination of malicious processes, or firewall blocks.

MTTC is important, especially with ransomware attacks where every second counts. Faster containment responses can result in reduced / eliminated disruption to business operations or loss of data.

Mean Time to Remediation / Recovery

The full scope of the incident is understood, the adversary has been halted in their tracks, no malicious processes are running on any systems in your organization. Now is the time to put things back to right. MTTR deals with the time involved in restoring business operations to pre-incident stage. It means all remnants of changes made by the adversary (persistence, account alterations, programs installed, etc.) are removed; all disrupted systems are restored to operations (i.e.: ransomware encrypted systems are recovered from backups / snapshots), compromised user accounts are reset, etc.

MTTR is important because it informs senior management of how fast the organization can recover from an incident. Disaster Recovery and Business Continuity plans play a major role in improving this score.

Risk Reduction Stages

Mean Time to Advice

After the dust has settled from the incident, the job is not done. MTTA deals with identifying and assessing the specific areas (vulnerabilities, misconfigurations, lack of security controls) that permitted the adversary to advance to the point where detection occurred (and any actions beyond). The SOC and IR teams should then compile a list of recommendations to present to management to improve the security posture of the organization so the same attack path cannot be used.

Mean Time to Implement

Once recommendations are delivered to management, how long does it take to implement them? MTTI tracks this timeline because none of it matters if you don’t fix the holes that led to the breach.

Nine Dominos

There are the nine dominos of SOC / IR metrics I recommend helping organizations know if they are on the right track to reduce risk, costs and improve morale / retention of the security teams. You may not wish to track all nine, but understanding how each metric impacts the others can provide visibility into why you are not seeing expected improvements when you implement a new security solution or change processes.

Improving prevention and reducing false positives can make huge positive impacts on your incident response timeline. Utilizing solutions that get you to resolution quicker allows the team to focus on recommendations and risk reduction strategies.

Whichever metrics you choose to track, just be sure the dominos fall in your favor.

References

[1] 2024 Verizon Data Breach Investigations Report, p83

[2] Mandiant M-Trends 2023

Inside the SOC
Darktrace cyber analysts are world-class experts in threat intelligence, threat hunting and incident response, and provide 24/7 SOC support to thousands of Darktrace customers around the globe. Inside the SOC is exclusively authored by these experts, providing analysis of cyber incidents and threat trends, based on real-world experience in the field.
Author
John Bradshaw
Sr. Director, Technical Marketing

John Bradshaw is Sr. Director, Technical Marketing at Darktrace. He is a security practitioner at heart having built a Customer Security/SOC operations team for (then) the largest ISP on the planet. In his vendor roles he has worked with various security solutions utilized by SOC / IR teams and conducted advanced incident investigation workshops to help organizations understand the benefits and limitations of the solutions they are using. He holds a Bachelor of Business Administration from Averett University and a Master of Science in Network Security from Capitol College.

Book a 1-1 meeting with one of our experts
Share this article

More in this series

No items found.

Blog

/

January 29, 2025

/

Inside the SOC

Bytesize Security: Insider Threats in Google Workspace

Default blog imageDefault blog image

What is an insider threat?

An insider threat is a cyber risk originating from within an organization. These threats can involve actions such as an employee inadvertently clicking on a malicious link (e.g., a phishing email) or an employee with malicious intent conducting data exfiltration for corporate sabotage.

Insiders often exploit their knowledge and access to legitimate corporate tools, presenting a continuous risk to organizations. Defenders must protect their digital estate against threats from both within and outside the organization.

For example, in the summer of 2024, Darktrace / IDENTITY successfully detected a user in a customer environment attempting to steal sensitive data from a trusted Google Workspace service. Despite the use of a legitimate and compliant corporate tool, Darktrace identified anomalies in the user’s behavior that indicated malicious intent.

Attack overview: Insider threat

In June 2024, Darktrace detected unusual activity involving the Software-as-a-Service (SaaS) account of a former employee from a customer organization. This individual, who had recently left the company, was observed downloading a significant amount of data in the form of a “.INDD” file (an Adobe InDesign document typically used to create page layouts [1]) from Google Drive.

While the use of Google Drive and other Google Workspace platforms was not unexpected for this employee, Darktrace identified that the user had logged in from an unfamiliar and suspicious IPv6 address before initiating the download. This anomaly triggered a model alert in Darktrace / IDENTITY, flagging the activity as potentially malicious.

A Model Alert in Darktrace / IDENTITY showing the unusual “.INDD” file being downloaded from Google Workspace.
Figure 1: A Model Alert in Darktrace / IDENTITY showing the unusual “.INDD” file being downloaded from Google Workspace.

Following this detection, the customer reached out to Darktrace’s Security Operations Center (SOC) team via the Security Operations Support service for assistance in triaging and investigating the incident further. Darktrace’s SOC team conducted an in-depth investigation, enabling the customer to identify the exact moment of the file download, as well as the contents of the stolen documents. The customer later confirmed that the downloaded files contained sensitive corporate data, including customer details and payment information, likely intended for reuse or sharing with a new employer.

In this particular instance, Darktrace’s Autonomous Response capability was not active, allowing the malicious insider to successfully exfiltrate the files. If Autonomous Response had been enabled, Darktrace would have immediately acted upon detecting the login from an unusual (in this case 100% rare) location by logging out and disabling the SaaS user. This would have provided the customer with the necessary time to review the activity and verify whether the user was authorized to access their SaaS environments.

Conclusion

Insider threats pose a significant challenge for traditional security tools as they involve internal users who are expected to access SaaS platforms. These insiders have preexisting knowledge of the environment, sensitive data, and how to make their activities appear normal, as seen in this case with the use of Google Workspace. This familiarity allows them to avoid having to use more easily detectable intrusion methods like phishing campaigns.

Darktrace’s anomaly detection capabilities, which focus on identifying unusual activity rather than relying on specific rules and signatures, enable it to effectively detect deviations from a user’s expected behavior. For instance, an unusual login from a new location, as in this example, can be flagged even if the subsequent malicious activity appears innocuous due to the use of a trusted application like Google Drive.

Credit to Vivek Rajan (Cyber Analyst) and Ryan Traill (Analyst Content Lead)

Appendices

Darktrace Model Detections

SaaS / Resource::Unusual Download Of Externally Shared Google Workspace File

References

[1]https://www.adobe.com/creativecloud/file-types/image/vector/indd-file.html

MITRE ATT&CK Mapping

Technqiue – Tactic – ID

Data from Cloud Storage Object – COLLECTION -T1530

Continue reading
About the author
Vivek Rajan
Cyber Analyst

Blog

/

January 30, 2025

/
No items found.

Reimagining Your SOC: How to Achieve Proactive Network Security

Default blog imageDefault blog image

Introduction: Challenges and solutions to SOC efficiency

For Security Operation Centers (SOCs), reliance on signature or rule-based tools – solutions that are always chasing the latest update to prevent only what is already known – creates an excess of false positives. SOC analysts are therefore overwhelmed by a high volume of context-lacking alerts, with human analysts able to address only about 10% due to time and resource constraints. This forces many teams to accept the risks of addressing only a fraction of the alerts while novel threats go completely missed.

74% of practitioners are already grappling with the impact of an AI-powered threat landscape, which amplifies challenges like tool sprawl, alert fatigue, and burnout. Thus, achieving a resilient network, where SOC teams can spend most of their time getting proactive and stopping threats before they occur, feels like an unrealistic goal as attacks are growing more frequent.

Despite advancements in security technology (advanced detection systems with AI, XDR tools, SIEM aggregators, etc...), practitioners are still facing the same issues of inefficiency in their SOC, stopping them from becoming proactive. How can they select security solutions that help them achieve a proactive state without dedicating more human hours and resources to managing and triaging alerts, tuning rules, investigating false positives, and creating reports?

To overcome these obstacles, organizations must leverage security technology that is able to augment and support their teams. This can happen in the following ways:

  1. Full visibility across the modern network expanding into hybrid environments
  2. Have tools that identifies and stops novel threats autonomously, without causing downtime
  3. Apply AI-led analysis to reduce time spent on manual triage and investigation

Your current solutions might be holding you back

Traditional cybersecurity point solutions are reliant on using global threat intelligence to pattern match, determine signatures, and consequently are chasing the latest update to prevent only what is known. This means that unknown threats will evade detection until a patient zero is identified. This legacy approach to threat detection means that at least one organization needs to be ‘patient zero’, or the first victim of a novel attack before it is formally identified.

Even the point solutions that claim to use AI to enhance threat detection rely on a combination of supervised machine learning, deep learning, and transformers to

train and inform their systems. This entails shipping your company’s data out to a large data lake housed somewhere in the cloud where it gets blended with attack data from thousands of other organizations. The resulting homogenized dataset gets used to train AI systems — yours and everyone else’s — to recognize patterns of attack based on previously encountered threats.

While using AI in this way reduces the workload of security teams who would traditionally input this data by hand, it emanates the same risk – namely, that AI systems trained on known threats cannot deal with the threats of tomorrow. Ultimately, it is the unknown threats that bring down an organization.

The promise and pitfalls of XDR in today's threat landscape

Enter Extended Detection and Response (XDR): a platform approach aimed at unifying threat detection across the digital environment. XDR was developed to address the limitations of traditional, fragmented tools by stitching together data across domains, providing SOC teams with a more cohesive, enterprise-wide view of threats. This unified approach allows for improved detection of suspicious activities that might otherwise be missed in siloed systems.

However, XDR solutions still face key challenges: they often depend heavily on human validation, which can aggravate the already alarmingly high alert fatigue security analysts experience, and they remain largely reactive, focusing on detecting and responding to threats rather than helping prevent them. Additionally, XDR frequently lacks full domain coverage, relying on EDR as a foundation and are insufficient in providing native NDR capabilities and visibility, leaving critical gaps that attackers can exploit. This is reflected in the current security market, with 57% of organizations reporting that they plan to integrate network security products into their current XDR toolset[1].

Why settling is risky and how to unlock SOC efficiency

The result of these shortcomings within the security solutions market is an acceptance of inevitable risk. From false positives driving the barrage of alerts, to the siloed tooling that requires manual integration, and the lack of multi-domain visibility requiring human intervention for business context, security teams have accepted that not all alerts can be triaged or investigated.

While prioritization and processes have improved, the SOC is operating under a model that is overrun with alerts that lack context, meaning that not all of them can be investigated because there is simply too much for humans to parse through. Thus, teams accept the risk of leaving many alerts uninvestigated, rather than finding a solution to eliminate that risk altogether.

Darktrace / NETWORK is designed for your Security Operations Center to eliminate alert triage with AI-led investigations , and rapidly detect and respond to known and unknown threats. This includes the ability to scale into other environments in your infrastructure including cloud, OT, and more.

Beyond global threat intelligence: Self-Learning AI enables novel threat detection & response

Darktrace does not rely on known malware signatures, external threat intelligence, historical attack data, nor does it rely on threat trained machine learning to identify threats.

Darktrace’s unique Self-learning AI deeply understands your business environment by analyzing trillions of real-time events that understands your normal ‘pattern of life’, unique to your business. By connecting isolated incidents across your business, including third party alerts and telemetry, Darktrace / NETWORK uses anomaly chains to identify deviations from normal activity.

The benefit to this is that when we are not predefining what we are looking for, we can spot new threats, allowing end users to identify both known threats and subtle, never-before-seen indicators of malicious activity that traditional solutions may miss if they are only looking at historical attack data.

AI-led investigations empower your SOC to prioritize what matters

Anomaly detection is often criticized for yielding high false positives, as it flags deviations from expected patterns that may not necessarily indicate a real threat or issues. However, Darktrace applies an investigation engine to automate alert triage and address alert fatigue.

Darktrace’s Cyber AI Analyst revolutionizes security operations by conducting continuous, full investigations across Darktrace and third-party alerts, transforming the alert triage process. Instead of addressing only a fraction of the thousands of daily alerts, Cyber AI Analyst automatically investigates every relevant alert, freeing up your team to focus on high-priority incidents and close security gaps.

Powered by advanced machine-learning techniques, including unsupervised learning, models trained by expert analysts, and tailored security language models, Cyber AI Analyst emulates human investigation skills, testing hypotheses, analyzing data, and drawing conclusions. According to Darktrace Internal Research, Cyber AI Analyst typically provides a SOC with up to  50,000 additional hours of Level 2 analysis and written reporting annually, enriching security operations by producing high level incident alerts with full details so that human analysts can focus on Level 3 tasks.

Containing threats with Autonomous Response

Simply quarantining a device is rarely the best course of action - organizations need to be able to maintain normal operations in the face of threats and choose the right course of action. Different organizations also require tailored response functions because they have different standards and protocols across a variety of unique devices. Ultimately, a ‘one size fits all’ approach to automated response actions puts organizations at risk of disrupting business operations.

Darktrace’s Autonomous Response tailors its actions to contain abnormal behavior across users and digital assets by understanding what is normal and stopping only what is not. Unlike blanket quarantines, it delivers a bespoke approach, blocking malicious activities that deviate from regular patterns while ensuring legitimate business operations remain uninterrupted.

Darktrace offers fully customizable response actions, seamlessly integrating with your workflows through hundreds of native integrations and an open API. It eliminates the need for costly development, natively disarming threats in seconds while extending capabilities with third-party tools like firewalls, EDR, SOAR, and ITSM solutions.

Unlocking a proactive state of security

Securing the network isn’t just about responding to incidents — it’s about being proactive, adaptive, and prepared for the unexpected. The NIST Cybersecurity Framework (CSF 2.0) emphasizes this by highlighting the need for focused risk management, continuous incident response (IR) refinement, and seamless integration of these processes with your detection and response capabilities.

Despite advancements in security technology, achieving a proactive posture is still a challenge to overcome because SOC teams face inefficiencies from reliance on pattern-matching tools, which generate excessive false positives and leave many alerts unaddressed, while novel threats go undetected. If SOC teams are spending all their time investigating alerts then there is no time spent getting ahead of attacks.

Achieving proactive network resilience — a state where organizations can confidently address challenges at every stage of their security posture — requires strategically aligned solutions that work seamlessly together across the attack lifecycle.

References

1.       Market Guide for Extended Detection and Response, Gartner, 17thAugust 2023 - ID G00761828

Continue reading
About the author
Your data. Our AI.
Elevate your network security with Darktrace AI