Three Levels of SOC Maturity: Steps for Continual Service Improvement
Those who run security operations centres (SOC) acknowledge that the more automation built into the service, the more likely analysts will have the time to hunt for threats. Yet, the path to SOC maturity isn’t one that most SOCs follow. This blog looks at three levels of maturity that SOCs can pass through before being properly integrated with the rest of the business’s service management processes.
SOC Maturity Level 1 – Build Your Correlation Rules
A correlation rule is a logical grouping of events or conditions, where one or more triggers are grouped into a logical function. When all conditions are met, the SOC is notified and it is treated as an incident. Correlation rules are the core processing units of security information and event management (SIEM) systems and every SIEM allows analysts a way to construct these conditional expressions using a programming language, such as Regex. Often analysts think like an attacker, to derive the most useful correlation rules for the business they are protecting, using the context of what events would be generated under certain attack conditions for the systems the business runs.
They can build rules that trigger when specific attack patterns are used against the organisation, rather than triggering against generic rules and signatures like antivirus and intrusion detection systems. Here is an example correlation rule and how it works:
Correlation Rule – use Bob’s login event, record the subnet he logged in from, then crosscheck to see if Bob has recently logged in from another geolocation.
If Bob logged in from Sydney yesterday and China today, this could mean Bob’s account has been hacked and is under the control of a Chinese hacker. However, this is not guaranteed to be an attack, rather it is an event that the SOC analyst needs to investigate.
SOC Maturity Level 2 – Automation of Responses
As we’ve seen, correlation rules are the bedrock of tuning the SIEM to detect contextual attacks, however, the activities following a trigger are manually intensive and can involve phone calls, emails, meeting interruptions, and even holiday interruptions to see if Bob is, in fact, in China. This takes the analyst a lot of time to validate whether the incident is real or not and often these false positives take up the majority of SOC analyst’s time.
To make your SOC more efficient, it’s time to introduce automation; however, it’s talk of automation in a less mature SOC that makes analysts and security managers nervous. If you automate the incident responses badly, such as by closing off a firewall port that’s needed for a critical business application, or maybe quarantining the CEO’s device when they’re travelling, this action could negatively impact the business and undermine the reputation of the security team. Automations need to be carefully constructed and must include information that doesn’t come from security technology alone.
Taking our example to the next stage, if the company records its staff holiday status in its Active Directory, then if Bob can be shown to be away on holiday, there is a higher degree of confidence that the correlation trigger may be a false positive. Again, however, there is no guarantee that this is true.
By extending the logic further, looking at Bob’s role, if he is a salesperson who often travels abroad, the confidence level may be higher again. You can now work with the business team who plan staff travel to ensure that travel dates and destination information is always recorded in the company’s directory, that way the information can be queried by the SOC when a correlation rule is triggered. If Bob logs in from China and the staff travel system shows him to be on a sales trip to China, it’s likely a legitimate login from that region and no action will be taken.
Should Bob be travelling to the United States but the SOC sees a login from China, it’s likely his account has been compromised and the SOC can lock it with a high degree of confidence. If this is done automatically without an analyst ringing Bob or his manager or waiting for Bob to respond to an email, any hacker using compromised credentials will have a significantly reduced amount of time to exploit Bob’s access.
When you start to look at the responses that can be automated, try not to constrain your thinking to the information you can access today. By speaking with your HR teams, managers and other technical teams, you’ll soon see what the art of the possible is, and this is when maturity at Level 2 is achieved.
SOC Maturity Level 3 – Service Management Integration
So, you’ve automated your responses and have seen a great step up in efficiency, but there are still aspects of integration with the rest of your service management processes that should be further enhanced.
Most organisations use an enterprise service management tool to record user service requests, track system and application changes and record problems and knowledge. Service desk technicians are the enterprise’s first line of support, taking calls from users. Every change to the organisation’s ICT environment must be recorded as a service request, including changes made by the SOC, such as locking Bob’s account or quarantining his device.
If the SOC is at Level 1, the SOC analyst manually raises the service request in the service management tool, then starts the investigation, locking the account at the end of the investigation. This could take minutes or hours if a lot of investigation is necessary to validate the alert.
At Level 2, the alert is sent to the analyst, but the account is automatically locked, so the analyst then manually raises the service request and ensures all the appropriate information required by the service desk is recorded in the ticket. Even with the automation, raising the ticket is still manually intensive and leaves the SOC analyst having to undertake mundane activities.
Level 3 should be sought as quickly as possible. Your organisation’s service management architecture should include having all operational security processes integrated with change management, incident management, capacity management, etc., ensuring that any security activity automatically raises a service request and populates the appropriate information.
For example, Bob’s user name, his device location, and the category of the incident, along with anything needed by the service desk to answer Bob’s call when he rings in to ask why his account is locked. By the time the ticket is raised and escalated to the service desk, the device has already been quarantined and his account locked. If Bob calls the service desk now to find out why he can’t log in, they have all the information they need to explain to him what has happened and help him unlock his account and reset his password.
This is the best result for the business, since the threat has been removed and Bob will get the best possible support in getting his account re-enabled, since all the right people have the information they need. A workflow might also be introduced into the service management tool to inform Bob’s manager when security events affect any of his team, that way he can educate and reinforce security awareness.
SOC Maturity: Continual Service Improvement in your SOC
A core concept within an IT Infrastructure Library (ITIL) is Continual Service Improvement (CSI), something that applies equally to security operations as every other ITIL process. CSI takes feedback to help service managers monitor the efficiencies of the processes they run, allowing them to introduce automations and improvements to better the services they provide to customers.
The three levels of SOC maturity outlined in this post are fundamentally a route through CSI that introduces efficiencies and improvements to allow analysts to do higher value work. By removing the mundane tasks from analysts’ daily workloads and limiting the false positives they manually investigate, analysts will have the time they need to go hunting for more complicated threats, misconfigurations or indicators of compromise.