Security data lake: centralised storage of raw security telemetry for retrospective analysis and ML
0 / 5 completed
1 / 5
The interviewer asks: "Which SIEM log sources would you prioritise for detecting initial access attacks, and why?" Which answer demonstrates the strongest threat-model thinking?
Option B is the strongest: it grounds the answer in the MITRE ATT&CK Initial Access tactic, provides five specific log sources with the exact attack technique each detects (not just vague categories), gives 2023–2024 threat data context (identity as primary vector), includes the coverage gap assessment methodology, and names specific log types within each source (DNS tunnelling, DGA domains, device registration anomalies). Option C efficiently covers the same five sources with two additional operational details: Passive DNS enrichment and domain age filtering to reduce alert fatigue — showing production tuning experience. Option D makes the important point that log source prioritisation should follow the organisation's threat model and adversary profile — this is mature security thinking that shows understanding of threat-led defence. Option A is too generic — firewalls and endpoints are correct but miss identity (the most common initial access vector) and the ATT&CK mapping. Senior SIEM answer: ATT&CK tactic mapping → five prioritised sources → specific techniques per source → threat data context → coverage gap assessment → signal-to-noise tuning.
2 / 5
The interviewer asks: "How would you write a detection rule for impossible travel?" Which answer is most technically precise?
Option B is the strongest: it defines the problem as stateful stream processing (the correct framing), gives the exact Haversine formula reference, specifies the 900 km/h threshold with justification, provides four specific false positive reduction techniques (VPN exclusion, ASN matching, geolocation confidence threshold, travel pattern exemption), adds two enrichment signals for severity scoring (sensitive resource access, new device registration), and names specific implementation languages (Sigma, SPL, KQL, ESQL) and streaming platforms (Flink, Spark). Option C adds an important operational detail: the alert content (both events, speed, geolocation confidence) to enable analyst dismiss-in-60-seconds — showing alert design thinking. Option D adds the third tuning lever (geolocation accuracy granularity) and the high-confidence account takeover escalation pattern (impossible travel + password reset + MFA re-enrol within 2 hours) — a real detection upgrade. Option A is technically correct but thin — no false positive reduction, no enrichment signals, no implementation detail. Senior detection answer: stateful stream processing framing → Haversine + 900 km/h → four FP reduction techniques → enrichment signals → Sigma/SPL/KQL implementation → escalation pattern.
3 / 5
The interviewer asks: "How would you build a threat intelligence enrichment pipeline?" Which answer is most architecturally complete?
Option B is the strongest: it defines six specific pipeline layers with named components in each, distinguishes feed types by reliability profile (OSINT vs. commercial vs. ISAC), explains STIX 2.1 as the normalisation standard with the relationship modelling benefit, gives the composite confidence scoring formula (source weight × age decay × cross-corroboration), names both TI platform options (MISP vs. OpenCTI) with their use cases, and defines three effectiveness metrics. Option C adds IOC expiry by type (90 days for IPs, 1 year for domains/hashes) — a real operational detail that prevents stale IOC matches — plus the per-feed hit rate as the primary effectiveness metric. Option D adds the feedback loop architecture (analyst verdict → per-feed precision calculation → confidence weight update) — turning the pipeline into a learning system — which is the difference between a static TI lookup and a mature TI programme. Option A describes the concept correctly but misses the scoring, expiry, effectiveness metrics, and feedback loop. Senior TI pipeline answer: six layers → STIX 2.1 with relationships → composite scoring formula → IOC expiry by type → effectiveness metrics → analyst feedback loop.
4 / 5
The interviewer asks: "How would you design a security data lake to complement your SIEM?" Which answer is most architecturally complete?
Option B is the strongest: it defines the complementary roles clearly (SIEM: real-time/30–90 days; lake: batch/1–7 years), covers all five design decisions (schema-on-read rationale, partitioning strategy with the small-file warning, three-tier storage with named AWS classes, row-level access control, and back-testing capability), and explains why schema-on-read is preferred for security logs specifically (frequent format changes). Option C focuses tightly on the detection rule back-testing use case with a concrete workflow (replay Sigma rule → calculate expected volume → tune threshold before SIEM deployment) — this is the most valuable lake capability and deserves detailed treatment. Option D introduces the "lake as source of truth with SIEM as curated subset" architecture — specifically the cost reduction from pre-processing before SIEM ingest and the forensic workflow (SIEM alert → lake raw stream query) — showing real production architecture thinking. Option A is accurate but thin — misses the design decisions that differentiate a well-designed lake from a data dump. Senior security data lake answer: complementary roles with retention comparison → five design decisions → schema-on-read rationale → tiered storage with named classes → back-testing use case → forensic workflow.
5 / 5
The interviewer asks: "How do you communicate a security analytics finding to non-security stakeholders?" Which answer is most effective for cross-functional communication?
Option B is the strongest: it provides a complete five-part communication framework (business impact lead → risk sentence structure → attacker timeline visualisation → findings/recommendations separation → severity-to-urgency translation), gives a concrete worked example of the risk sentence, explains why the Gantt-style timeline works for non-technical audiences (time sequences vs. log entries), and gives a specific example of severity translation (CVSS → exploit-in-wild + 72-hour urgency). Option C gives a complementary anti-pattern checklist (three named anti-patterns) and adds the "enable the 2-minute CEO brief" test for validating communication quality — a practical heuristic. Option D provides the one-page executive brief structure with four named sections — this is a concrete deliverable format that complements the communication framework in Option B. Option A captures the spirit correctly but lacks the worked example, the timeline visualisation idea, and the findings/recommendations separation. Senior security communication answer: five-part framework → worked risk sentence example → Gantt timeline rationale → findings/recommendations separation → severity-to-urgency translation → anti-patterns checklist → executive brief format.