Introduction to SIEM and SOAR

7 min readJan 20, 2022

SOC analyst might receive thousands of alerts every minute, from numerous different network and appliances and across many protocols and activity types. Handling all this by yourselves is impossible for any human, but we can use a centralized repository to manage all this.

Security Information and Event management (SIEM) is software that aggregates and analyzes information from several different sources across the entire infrastructure. SO, instead of going through every security appliance's console, you get one centralized solution to find the alerts and react accordingly. This solution not only centralizes the information but also performs a deep analysis of the information to find hidden patterns and see if your company is under attack.

Security Orchestration, Automation, and Response (SOAR) will help you with threat and vulnerability management, security incident response, and security operation automation by bringing order to the alerts your SIEM create. It will give you more insight into what steps come next and what actions need immediate attention.

Alert and Monitor:

First, you need to be clear about what security appliances are implemented and being monitored in your company/organization. Most of the the time, SOCs monitor the following security appliances:

Firewall
Antivirus
Router and Switches
Intrusion prevention systems
Databases
Servers
Workstations

These devices contain valuable information about alerts and possible Indicators of compromise (IoC). That’s why a SOC analyst needs to pay special attention to the alerts thrown by these devices.

However, monitoring each appliance separately is not a good practice. Generally, you’ll want to collect all the alerts for each device on a centralized location, and here’s where you can use SIEM technology.

For example, a single activity on a firewall might not be suspicious or might not throw an alert. However, when this activity is combined with other actions registered in the antivirus, the intrusion prevention system, the data loss prevention system, and many other separate devices. this could actually show that malware infected the system.

As you can imagine now, SIEM needs to receive the logs and alerts from devices. It is just as good as the information it receives and cannot work by itself. Sometimes, users have the misperception that SIEM alone is good enough. However, if you don’t feed the SIEM the appropriate logs, you’ll be blind. This is know as a gray area.

As an example, consider a SIEM software that does not receive logs from the antivirus or the IPS, and only receives logs from the firewall. If the antivirus solution detects activity, SIEM will never see it and you’ll never receive the alert. This is why it’s very important to have clear understanding of the company’s infrastructure and to know which appliances are actually reporting to the SIEM tool.

Incident Response:

Let’s say a company was hit with a ransomware attack. An alert was thrown by the AV solution but unfortunately, it was overlooked by the security personnel. As a result, several people have found important business files encrypted. As time passes, more and more people are engaging with ransomware and finding their files locked.

Let’s examine how SIEM and SOAR could have helped in this scenario:

With a SIEM tool, you’d be able to get an alert just in time, so you can activate the Incident Response Team or notify level 2 support.
With a SOAR tool, you’d be able to automate a response, e.g., sending an email, creating a ticket, gathering information from different sources, etc.
With a Playbook, you’d be able to see what steps you need to take to respond to that alert, e.g., gather specific information, fill out a format, notify a team, escalate to level 2 support, etc.

SIEM, SOAR, and a well-defined list of playbooks can really help with the incident response process. By using all of them together, you’ll be able to collect threat-related data from a range of sources (in this case, it could be only from the SIEM software) and automate responses to different threats and alerts.

For the incident response activities, you also need to establish and follow a clear incident response process. This process might vary from organization to organization, but the steps a SOC will commonly use are preparation, identification, containment, eradication, recovery, and post-incident analysis.

Preparation

This is one of the most important steps in the process as it entails creating the strategy to tackle all the different incidents and attacks your organization may face. Some common activities performed at this stage are:

Creating incident response policies and procedures
Establishing communication paths
Creating a continuous improvement plan and testing it frequently

Identification

This is where SIEM, SOAR, and PlayBooks come in handy. In this step, you need to assess the alert and see if it’s a false positive or if it is, in fact, malicious activity.

Containment

At this stage, you try to minimize the threat and the impact. This can be done by containing the threat (e.g., disconnecting the affected machines from the network) or by applying specific countermeasures.

Eradication

This is where you try to eliminate the threat once and for all. For example, if you have an infected host you may need to format the affected systems or bring the system to a non-affected state (e.g., a previous snapshot of a virtual machine).

Recovery

This stage is critical for business continuity because it is the step where you bring everything to the original operational state. The longer it takes to fully recover from an incident, the more costly it can become.

Lessons Learned

This is where you document the incident and the whole recovery process from beginning to end. This gives the team the opportunity to evaluate processes, strategies, and other plans to deal with potential future events more effectively.

Cyber Investigations:

Now that you have all the information from the SIEM, SOAR, and other tools, it’s time to take a deep dive into the problem. This process is commonly known as root cause analysis (RCA). An RCA is the process that will lead you to the first or main factor that caused a problem and should thus be permanently eliminated to avoid future occurrences. In other words, RCA is the process that helps to determine the core issue that will trigger the entire cause-and-effect reaction that ultimately leads to the problem(s) or incidents.

Each problem might require different actions, like escalating it to a threat intelligence analyst, following a specific playbook, creating a report, etc. If you find a malicious or suspicious file during the investigation, you can use open-source intelligence (OSINT) tools, such as VirusTotal, which will provide valuable information about the suspicious file you detected.

SOC analysts are often asked to investigate incidents and provide a plan of action depending on the result of this investigation. Take the ransomware example from the previous step. The SOC may be asked to investigate this a bit further. During the investigation process, it turns out that the suspicious ransomware file has a hash value of db349b97c37d22f5ea1d1841e3c89eb4. With this information in hand, a SOC analyst can turn to their different sources of threat intelligence to identify the specific family of malware the ransomware belongs to. In the end, this will lead to a thorough plan for tackling the malware.

Recovery:

A disaster recovery plan (DRP) is a process that describes how an organization can quickly resume work after an incident. A DRP is an essential part of a business continuity plan (BCP). It is applied to the aspects of an organization that depend on a functioning IT infrastructure. A DRP aims to help an organization resolve data loss and recover system functionality so that it can perform in the aftermath of an incident, even if it operates at a minimal level.

While a DRP is purely focused on technology, the BCP is focused on the business. As an example, consider that the company lost the ability to sell online since their webserver is down. The BCP policy can dictate that the business is to continue selling over the phone, a messaging app, email, etc. The DRP, on the other hand, will try to restore the webserver’s availability to continue selling online. In other words, the BCP is focused on the business and the DRP on the technology. This is why having both is very important for the incident response and recovery.

Part of the DRP involves making a decision on acceptable risk. Upper levels of management will usually decide which situations are deemed acceptable when it comes to a security risk. For example, applying a security patch to business-critical workstations may cause a disruption to the sales and support teams for a business. Patching them during business hours is not an acceptable risk if they need to be offline for a long period of time. Instead, a plan is designed that will ensure those workstations remain secure and the issue is addressed during non-business hours.