Security Data Engineer
Security Data Engineers build the data infrastructure that powers security operations — ingesting millions of log events per second, normalizing them to a common schema, enriching them with threat intelligence, and building the detection pipelines that run in SIEM and SOAR systems. Their English work includes writing data pipeline architecture documents, presenting detection pipeline performance metrics, and writing technical runbooks for the SOC team. This path covers the vocabulary of security data engineering.
Topics covered
- Log ingestion & normalization
- SIEM data models
- Threat detection pipelines
- Enrichment & threat intel
- SOAR integration
- Schema-on-read patterns
Vocabulary spotlight
4 terms every Security Data Engineer should know in English:
The process of transforming log events from diverse sources (firewalls, endpoints, cloud APIs) into a common schema — enabling detection rules to be written once and applied across all log sources
"Log normalization to the OCSF schema let us deploy the same lateral movement detection rule across Windows Event Logs, Okta logs, and AWS CloudTrail."
A real-time data pipeline that ingests normalized events, applies detection rules or ML models, and produces alerts or findings — the technical heart of a SIEM or custom detection platform
"Our threat detection pipeline processes 500,000 events per second with a p99 detection latency of 8 seconds."
The process of adding context to a raw log event — appending threat intelligence lookups (is this IP known malicious?), user identity data, geolocation, or asset information
"Enrichment with our internal asset database allowed the detection rule to flag suspicious logins from non-corporate devices automatically."
Open Cybersecurity Schema Framework — an open, vendor-agnostic schema standard for security event data, designed to simplify normalization and interoperability between security tools
"Adopting OCSF as our canonical security event schema reduced the normalization effort for each new log source from 2 weeks to 2 days."
📚 Vocabulary Reference
Key terms organised by category for Security Data Engineers:
Ingestion & Normalization
Detection
Enrichment & Intel
Platform
Recommended exercises
Real-world scenarios you'll practise
- Writing a log ingestion architecture document: explaining the normalization pipeline, schema choices, and enrichment strategy for a new cloud environment
- Presenting detection pipeline performance to the CISO: explaining detection latency, false positive rate, and coverage metrics in business risk terms
- Writing an OCSF normalization guide: documenting the mapping from a new log source to the OCSF schema for the security data team
- Designing a threat detection rule: writing the specification for a new behavioral detection and explaining the expected true positive rate and required enrichment data