Creating a Network Traffic Baseline and Detecting Anomalies with Zeek: An Advanced SOC Analyst Guide

Introduction

Network traffic baselining is a cornerstone of proactive threat detection in Security Operations Centers (SOCs). Precise baselining enables detection of subtle deviations indicative of advanced persistent threats (APTs), lateral movement, and data exfiltration. Zeek (formerly Bro) is a high-fidelity network security monitoring framework capable of deep protocol parsing and extensive contextual logging—ideal for creating actionable baselines.

This guide details an advanced deployment of Zeek in a lab or production environment, ingestion and parsing of Zeek logs for baseline extraction, anomaly simulation, and crafting SIEM correlation rules optimized for SOC workflows.


Step 1: Zeek Deployment and Configuration

1.1 Environment Preparation

  • Use a dedicated Ubuntu 22.04 LTS server or VM with at least 4 vCPUs and 8 GB RAM for real-time processing.
  • Network interface must be in promiscuous mode or mirror/SPAN port to capture full traffic payloads.
  • Confirm kernel parameters for large buffers (net.core.rmem_max and net.core.wmem_max) are tuned for high throughput capture.
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216

1.2 Installation

Leverage the official Zeek apt repository for latest stable releases:

sudo apt install software-properties-common curl
curl -fsSL https://download.zeek.org/zeek-key.asc | sudo apt-key add -
sudo add-apt-repository "deb https://download.zeek.org/packages/apt focal main"
sudo apt update
sudo apt install zeek

Verify version and build configuration:

zeek --version

1.3 Interface and Node Configuration

Edit /usr/local/zeek/etc/node.cfg or /opt/zeek/etc/node.cfg depending on install path:

[zeek-1]
type=standalone
host=localhost
interface=eth0

For multi-node cluster setups, define manager and workers accordingly.

1.4 Custom Local Script Configuration

Edit /usr/local/zeek/share/zeek/site/local.zeek to enable or tune scripts, e.g., enabling scan detection:

@load policy/misc/scan

# Thresholds for port scan detection (default: 100 unique ports per 60 seconds)
redef Scan::default_scan_threshold = 50;

1.5 Launch Zeek

sudo zeekctl deploy

Monitor logs in /usr/local/zeek/logs/current/.


Step 2: Network Traffic Capture & Log Collection

Zeek automatically produces high-fidelity logs, including but not limited to:

  • conn.log: TCP/UDP/ICMP connection summaries
  • dns.log: DNS query and response metadata
  • http.log: HTTP requests and responses
  • files.log: File transfer metadata
  • ssl.log: TLS handshake details
  • scan.log: Detected scanning activity
  • notice.log: Alerts and warnings generated by Zeek scripts

Capture baseline traffic continuously for 24–72 hours during normal network activity to capture diurnal usage patterns and protocol diversity.


Step 3: Baseline Extraction and Analysis

3.1 Raw Log Parsing Using CLI Tools

Zeek logs are tab-separated value (TSV) files with a header prefixed by #. Use zeek-cut (Zeek-specific column extraction tool) to parse and filter fields.

Example: Extract and count top destination IPs in conn.log:

zcat logs/current/conn.log.gz | zeek-cut id.resp_h | sort | uniq -c | sort -nr | head -20

Example: Summarize DNS query types and frequencies:

zcat logs/current/dns.log.gz | zeek-cut query | sort | uniq -c | sort -nr | head -20

3.2 Importing Logs into SIEM/ELK for Visualization

  • Use Filebeat or Logstash to ingest Zeek logs into Elasticsearch.
  • Define index mappings reflecting Zeek’s TSV fields to allow fielded queries.
  • Create Kibana dashboards to visualize:
    • Protocol distribution (conn.log’s proto field)
    • Top internal/external talkers (id.orig_h and id.resp_h)
    • DNS query patterns (query, rcode)
    • Scan activities (aggregate counts from scan.log)

3.3 Statistical Baseline Metrics

MetricField(s)Typical Baseline Description
Top Protocolsconn.log: protoTCP/HTTP/TLS predominant
Common External IPsconn.log: id.resp_hTrusted external IP ranges
DNS Query Volumedns.log: queryFrequent legitimate domains
Connection Durationconn.log: durationMean duration per protocol
Port Scan Frequencyscan.logNear zero during normal operation
File Transfer Sizesfiles.log: fuid, sizeTypical small to medium file sizes

Use these metrics to build threshold values for anomaly detection.


Step 4: Anomaly Simulation and Detection

4.1 Generating Anomalous Traffic

  • Port Scanning: Use Nmap to scan entire subnet.
nmap -p- 192.168.1.0/24

  • DNS Anomalies: Generate random or high-volume DNS queries.
dig randomsubdomain$(date +%s).example.com @8.8.8.8

  • Data Exfiltration: Transfer large files over HTTP or FTP from internal to external IPs.

4.2 Zeek Detection Capabilities

  • Zeek’s scan.log will log IPs exhibiting scanning behavior with timestamps and port ranges.
  • Elevated DNS query NXDOMAIN rates logged in dns.log indicate suspicious domain generation algorithms (DGA).
  • Large connection sizes and long durations appear in conn.log and files.log, useful for exfiltration detection.

Example command to detect abnormal port scan events:

zcat logs/current/scan.log.gz | jq '. | {ts, src_ip: .src, dest_ip: .dst, num_ports: .num_ports}'


Step 5: Correlation Rules & SOC Integration

5.1 Forward Zeek Logs to SIEM

  • Use syslog or Filebeat forwarding for real-time ingestion.
  • Normalize fields using SIEM parsing pipelines.

5.2 Sample Detection Rules (Pseudocode)

rule: Port Scan Detected
when:
  event_type == "zeek_scan" and num_ports > 20
then:
  alert("Potential reconnaissance activity from " + src_ip)

rule: Suspicious DNS NXDOMAIN Spike
when:
  dns.rcode == "NXDOMAIN" and query_rate > baseline_threshold
then:
  alert("Possible DGA activity from " + id.orig_h)

rule: Large Data Exfiltration
when:
  conn.bytes_sent > baseline_average * 10 and conn.duration > 300
then:
  alert("Potential data exfiltration from " + id.orig_h)

5.3 Automation and Playbook Integration

  • Trigger SOAR workflows based on alerts for automated containment.
  • Enrich events with threat intelligence based on IP/domain reputation.
  • Use Zeek’s scripting to generate notice.log alerts, integrated into alert pipelines.

Conclusion

Zeek offers SOC analysts unparalleled visibility into network behavior. Mastering baseline creation and anomaly detection using Zeek’s rich log data, combined with SIEM correlation, enables early detection of stealthy adversaries and reduces alert fatigue. This advanced setup provides a scalable, extensible foundation for sophisticated SOC operations.

If you’d like, I can also help draft sample Zeek scripts for customized detection or SIEM ingestion configurations. Just let me know!

Leave a comment