Regex in the Trenches: A SOC Analyst’s Guide to Hunting IOCs (Part 2 — Practical Patterns for Analysts)

📌 Context

Regex only becomes useful once you can turn abstract syntax into patterns that hunt actual indicators of compromise. This is where most analysts trip: the regex they copy from a forum works fine in a demo, but fails on live logs — either flagging half the internet or missing the real IOC. This post drills into real-world patterns for IP addresses, domains, emails, file paths, hashes, and timestamps, with examples that actually work in SOC workflows.

🔬 Core Patterns

IPv4 Addresses

\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b

Matches four octets separated by dots. It’s not perfect validation (it will allow 999.999.999.999), but in log parsing you often want flexibility.

Splunk example: index=proxy | regex src_ip="\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b"

IPv6 Addresses

(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}

This catches full, unabbreviated IPv6 addresses. To handle compressed forms (::), you’ll need more complex patterns. Testing against real logs is key.

grep example: grep -E "([A-F0-9]{1,4}:){2,7}[A-F0-9]{1,4}" firewall.log

Domains

([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}

Matches domains like evil.com or sub.mail.attacker.org. Adjust the TLD length if you want to filter oddball domains.

YARA example:

rule Suspicious_Domain
{
    strings:
        $domain = /([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}/ nocase
    condition:
        $domain
}

Email Addresses

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

Core pattern for phishing hunts. This will match john.doe@company.com and x_attacker99@mail.ru.

Sigma rule snippet:

title: Suspicious Email Address
logsource:
    category: email
detection:
    selection:
        from: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/
    condition: selection

File Paths

Windows: [A-Z]:\\[^\s]+
Linux/Unix: \/[^\s]+

Suricata example (pcre):

alert http any any -> any any (msg:"Windows EXE Download"; content:".exe"; http_uri; pcre:"/[A-Z]:\\\\[^\s]+/"; sid:1000001; rev:1;)

Hashes

MD5: \b[a-fA-F0-9]{32}\b
SHA1: \b[a-fA-F0-9]{40}\b
SHA256: \b[a-fA-F0-9]{64}\b

grep example: grep -Eo "\b[a-fA-F0-9]{64}\b" malware_samples.txt

Timestamps

ISO 8601: \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z
Apache: \[\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2} [+-]\d{4}\]

Splunk example: rex field=_raw "\[(?<timestamp>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2} [+-]\d{4})\]"

📎 Attacker Behavior Snapshot

Attackers leave behind consistent artifacts: IPs in firewall logs, domains in DNS queries, hashes in endpoint alerts. Regex lets you surface these quickly, regardless of the log format. For example, a beacon to abc.attacker.com will hit DNS, proxy, and firewall logs differently — regex unifies the hunt.

🛠️ SOC Detection Strategy

Tier 1: Quick grep/regex checks for IOC presence across logs.
Tier 2: Field extractions in Splunk/ELK to pivot faster.
Tier 3: Incorporate validated regex into Sigma/YARA rules to codify knowledge.

🔐 Hardening & Mitigation

Baseline common regex patterns in shared playbooks.
Test all patterns on safe sample data before deploying to production SIEM or IDS.
Anchor and scope regex to reduce false positives.

📋 Incident Response Snippets

grep for IOC hits: grep -E "([a-fA-F0-9]{64})" endpoint.log
Splunk hash hunt: index=edr | regex hash="\b[a-fA-F0-9]{64}\b"
YARA hash rule:

🧾 Final Thoughts

This is where regex starts proving its worth: pulling hard IOCs out of messy logs. IPv4, domains, hashes, and timestamps are the bread and butter of SOC regex work. But these patterns are only powerful if you know when — and where — to deploy them. Next up: Part 3 — Regex in Action, where we chain these patterns inside grep, Splunk, ELK, Suricata, and Sigma.

Published: September 8, 2025

Ramblings of a CyberSecurity Nerd