Advanced 6 topic areas 45+ exercises

Security Data Engineer

Security Data Engineers build the data infrastructure that powers security operations — ingesting millions of log events per second, normalizing them to a common schema, enriching them with threat intelligence, and building the detection pipelines that run in SIEM and SOAR systems. Their English work includes writing data pipeline architecture documents, presenting detection pipeline performance metrics, and writing technical runbooks for the SOC team. This path covers the vocabulary of security data engineering.

Topics covered

  • Log ingestion & normalization
  • SIEM data models
  • Threat detection pipelines
  • Enrichment & threat intel
  • SOAR integration
  • Schema-on-read patterns

Vocabulary spotlight

4 terms every Security Data Engineer should know in English:

log normalization n.

The process of transforming log events from diverse sources (firewalls, endpoints, cloud APIs) into a common schema — enabling detection rules to be written once and applied across all log sources

"Log normalization to the OCSF schema let us deploy the same lateral movement detection rule across Windows Event Logs, Okta logs, and AWS CloudTrail."
threat detection pipeline n.

A real-time data pipeline that ingests normalized events, applies detection rules or ML models, and produces alerts or findings — the technical heart of a SIEM or custom detection platform

"Our threat detection pipeline processes 500,000 events per second with a p99 detection latency of 8 seconds."
enrichment n.

The process of adding context to a raw log event — appending threat intelligence lookups (is this IP known malicious?), user identity data, geolocation, or asset information

"Enrichment with our internal asset database allowed the detection rule to flag suspicious logins from non-corporate devices automatically."
OCSF n.

Open Cybersecurity Schema Framework — an open, vendor-agnostic schema standard for security event data, designed to simplify normalization and interoperability between security tools

"Adopting OCSF as our canonical security event schema reduced the normalization effort for each new log source from 2 weeks to 2 days."
Open full glossary →

📚 Vocabulary Reference

Key terms organised by category for Security Data Engineers:

Ingestion & Normalization

log ingestionlog normalizationschemaOCSFECSCEFLEEFlog parserfield mappingstructured logging

Detection

threat detection pipelinedetection rulesigma ruleYARA rulebehavioral detectionanomaly detectionthreshold alertcorrelation rulefalse positivetrue positive

Enrichment & Intel

enrichmentthreat intelligenceIOCreputation feedSTIXTAXIIgeolocation enrichmentasset contextuser identity enrichmentTI platform

Platform

SIEMSOARdata lake SIEMstreaming pipelineKafka for securityElasticsearch SIEMSplunkChronicleMicrosoft Sentineldetection-as-code
Study full vocabulary modules →

Recommended exercises

Real-world scenarios you'll practise

  • Writing a log ingestion architecture document: explaining the normalization pipeline, schema choices, and enrichment strategy for a new cloud environment
  • Presenting detection pipeline performance to the CISO: explaining detection latency, false positive rate, and coverage metrics in business risk terms
  • Writing an OCSF normalization guide: documenting the mapping from a new log source to the OCSF schema for the security data team
  • Designing a threat detection rule: writing the specification for a new behavioral detection and explaining the expected true positive rate and required enrichment data

Recommended reading

Explore another role

📦 OSPO Manager

Open path →