
📌 Context
Regex only becomes useful once you can turn abstract syntax into patterns that hunt actual indicators of compromise. This is where most analysts trip: the regex they copy from a forum works fine in a demo, but fails on live logs — either flagging half the internet or missing the real IOC. This post drills into real-world patterns for IP addresses, domains, emails, file paths, hashes, and timestamps, with examples that actually work in SOC workflows.
🔬 Core Patterns
IPv4 Addresses
\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b
Matches four octets separated by dots. It’s not perfect validation (it will allow 999.999.999.999), but in log parsing you often want flexibility.
Splunk example: index=proxy | regex src_ip="\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b"
IPv6 Addresses
(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
This catches full, unabbreviated IPv6 addresses. To handle compressed forms (::), you’ll need more complex patterns. Testing against real logs is key.
grep example: grep -E "([A-F0-9]{1,4}:){2,7}[A-F0-9]{1,4}" firewall.log
Domains
([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}
Matches domains like evil.com or sub.mail.attacker.org. Adjust the TLD length if you want to filter oddball domains.
YARA example:
rule Suspicious_Domain
{
strings:
$domain = /([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}/ nocase
condition:
$domain
}
Email Addresses
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Core pattern for phishing hunts. This will match john.doe@company.com and x_attacker99@mail.ru.
Sigma rule snippet:
title: Suspicious Email Address
logsource:
category: email
detection:
selection:
from: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/
condition: selection
File Paths
- Windows:
[A-Z]:\\[^\s]+ - Linux/Unix:
\/[^\s]+
Suricata example (pcre):
alert http any any -> any any (msg:"Windows EXE Download"; content:".exe"; http_uri; pcre:"/[A-Z]:\\\\[^\s]+/"; sid:1000001; rev:1;)
Hashes
- MD5:
\b[a-fA-F0-9]{32}\b - SHA1:
\b[a-fA-F0-9]{40}\b - SHA256:
\b[a-fA-F0-9]{64}\b
grep example: grep -Eo "\b[a-fA-F0-9]{64}\b" malware_samples.txt
Timestamps
- ISO 8601:
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z - Apache:
\[\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2} [+-]\d{4}\]
Splunk example: rex field=_raw "\[(?<timestamp>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2} [+-]\d{4})\]"
📎 Attacker Behavior Snapshot
Attackers leave behind consistent artifacts: IPs in firewall logs, domains in DNS queries, hashes in endpoint alerts. Regex lets you surface these quickly, regardless of the log format. For example, a beacon to abc.attacker.com will hit DNS, proxy, and firewall logs differently — regex unifies the hunt.
🛠️ SOC Detection Strategy
- Tier 1: Quick grep/regex checks for IOC presence across logs.
- Tier 2: Field extractions in Splunk/ELK to pivot faster.
- Tier 3: Incorporate validated regex into Sigma/YARA rules to codify knowledge.
🔐 Hardening & Mitigation
- Baseline common regex patterns in shared playbooks.
- Test all patterns on safe sample data before deploying to production SIEM or IDS.
- Anchor and scope regex to reduce false positives.
📋 Incident Response Snippets
- grep for IOC hits:
grep -E "([a-fA-F0-9]{64})" endpoint.log - Splunk hash hunt:
index=edr | regex hash="\b[a-fA-F0-9]{64}\b" - YARA hash rule:
🧾 Final Thoughts
This is where regex starts proving its worth: pulling hard IOCs out of messy logs. IPv4, domains, hashes, and timestamps are the bread and butter of SOC regex work. But these patterns are only powerful if you know when — and where — to deploy them. Next up: Part 3 — Regex in Action, where we chain these patterns inside grep, Splunk, ELK, Suricata, and Sigma.
Published: September 8, 2025
Leave a comment