A kernel engineer explains eBPF to a developer interested in observability: "eBPF — extended Berkeley Packet Filter — lets you run sandboxed programs in the Linux kernel without writing a kernel module. Before eBPF, instrumenting the kernel meant either using static kernel tracepoints (limited) or writing a kernel module (risky — one bug crashes the system). With eBPF, you load a small program written in restricted C, verified for safety by the kernel verifier, and it runs at hook points: every time a kernel function is called, a network packet arrives, a system call fires. Your program can record data into eBPF maps — shared memory between kernel and user space — which your tooling reads." What is the eBPF verifier and why is it essential for kernel safety?
eBPF verifier: runs as part of the kernel's bpf() syscall when you load a program. Performs: DAG check: the control flow graph must be a DAG (directed acyclic graph) — no unreachable instructions, no infinite loops. Bounds check: all memory accesses must be within known bounds. Pointer arithmetic is tracked. Type check: every register has a tracked type (pointer to map, pointer to context, scalar integer, etc.). Operations must be type-safe. Stack depth limit: max 512 bytes of stack. Instruction limit: older kernels limited to 4096 instructions; newer kernels support 1M for privileged users. Helper function whitelist: only approved helper functions can be called (bpf_map_lookup_elem, bpf_ktime_get_ns, bpf_probe_read, etc.). Result: verified programs cannot crash the kernel, even if buggy — worst case they do nothing useful. eBPF JIT compiler: after verification, the bytecode is JIT-compiled to native x86/ARM instructions. Performance: eBPF runs at near-native speed — 10-30% overhead for networking, orders of magnitude faster than older alternatives like ptrace. eBPF program types: XDP, TC, socket filter, kprobe, kretprobe, uprobe, tracepoint, perf_event, LSM, cgroup. Each type has a specific context (the data available to the program) and allowed helpers. In conversation: 'The verifier is what makes eBPF safe enough to run in production kernels. Without it, eBPF programs would be as dangerous as kernel modules. With it, they're as safe as user-space code.'
2 / 5
An SRE explains the difference between networking hook points while evaluating eBPF for packet filtering: "For our DDoS mitigation, we need to drop malicious packets as early as possible — before the kernel's TCP/IP stack even sees them. XDP — eXpress Data Path — is the earliest hook point, running in the NIC driver or even on the NIC itself. We can drop a packet in XDP in under 100 nanoseconds. The TC hook — Traffic Control — runs later, after the kernel parses the packet, so we have more context but higher overhead. For filtering pure volume attacks, XDP is the right tool. For complex policy enforcement that needs connection tracking, TC or nftables." What is XDP and what makes it suitable for high-performance packet processing?
XDP hook: runs when a packet arrives at the NIC driver, before the kernel allocates an sk_buff (socket buffer). This eliminates the overhead of memory allocation, copy, and kernel TCP/IP stack parsing for packets we're going to drop anyway. XDP return codes: XDP_DROP: drop the packet immediately. XDP_PASS: pass to the normal kernel network stack. XDP_TX: retransmit on the same NIC (reflect). XDP_REDIRECT: redirect to another interface or user-space socket (AF_XDP). XDP_ABORTED: drop with an error (for debugging). XDP modes: Native XDP: runs in the NIC driver (supported drivers: mlx5, i40e, ixgbe, virtio_net). Generic XDP: runs in the kernel network stack after sk_buff allocation — slower, but works on all NICs. Offloaded XDP: runs on the SmartNIC hardware itself. TC (Traffic Control) hook: runs after sk_buff exists, with full packet context including IP/TCP headers parsed. Can access: connection tracking state, routing tables, cgroup info. More powerful for policy but ~3-5x slower than XDP for pure drops. eBPF networking vocabulary: AF_XDP: user-space socket type that receives packets redirected by XDP — enables user-space packet processing (like DPDK) with kernel integration. BPF map (BPF_MAP_TYPE_LRU_HASH): connection tracker implemented as an LRU hash map shared between XDP and user-space control plane. In conversation: 'For volumetric DDoS at 100Gbps, XDP is the only option. At that rate, even allocating an sk_buff per packet isn't fast enough.'
3 / 5
A platform engineer explains eBPF maps to a developer building a custom tracing tool: "eBPF maps are the shared memory between your kernel-space eBPF program and user-space tooling. The kernel program writes observations — latency measurements, syscall counts, packet metadata — into maps. Your Go or Python user-space program reads them. Maps come in many types: hash maps for key-value lookups, per-CPU arrays for lock-free counters, ring buffers for streaming events, LRU hash maps for connection tracking. You can also pin maps to the BPF filesystem at /sys/fs/bpf/ so they persist after the program that created them exits." What is an eBPF map and why is the ring buffer type particularly useful for observability?
eBPF map types: BPF_MAP_TYPE_HASH: general key-value store. Lookup/insert O(1). Used for connection tracking, per-PID stats. BPF_MAP_TYPE_ARRAY: fixed-size indexed array. Faster than hash for sequential access. BPF_MAP_TYPE_PERCPU_ARRAY: per-CPU version — each CPU has its own copy, eliminating lock contention. Ideal for high-frequency counters. User space must sum across CPUs to get totals. BPF_MAP_TYPE_RINGBUF: ring buffer for event streaming. Multiple producers (kernel CPUs), single consumer (user space). Variable-size entries. Memory-mapped by user space — epoll for event notification. Replaced older perf_event_array for most tracing use cases. BPF_MAP_TYPE_LRU_HASH: automatically evicts least-recently-used entries. Used for connection tracking where you need a bounded table. BPF_MAP_TYPE_SOCKHASH / SOCKMAP: stores sockets, enables socket redirection between programs. Why ring buffer wins for observability: lock-free multi-producer design, no per-CPU merge required, supports reserving then committing entries (zero-copy), configurable watermark for wakeup. bpftrace vocabulary: bpftrace: one-liner eBPF tracing tool. Probe syntax: kprobe:do_sys_open { printf("%s\n", str(arg1)); }. Probe types: kprobe, kretprobe, tracepoint, uprobe, usdt, profile, interval. In conversation: 'eBPF maps are the secret. Your eBPF program sees every packet or syscall at kernel speed, aggregates into a map, and user space reads the summary. You observe everything, pay only for what you aggregate.'
4 / 5
A Kubernetes platform engineer explains why they chose Cilium as their CNI: "Cilium uses eBPF for everything: networking, load balancing, network policy enforcement, and observability. Traditional Kubernetes network policies use iptables — a chain of rules evaluated linearly for every packet. With 10,000 pods, you can have 100,000 iptables rules. Performance degrades linearly with rule count. Cilium replaces iptables with eBPF maps: a hash map lookup is O(1) regardless of cluster size. We also get L7 policy enforcement — allow traffic from ServiceA to ServiceB only if the HTTP path is /api/v2. iptables works at L3/L4 only. And Hubble gives us per-flow visibility into the entire cluster without any application changes." What is Cilium and what advantage does eBPF give it over iptables-based Kubernetes networking?
Cilium: CNCF graduated project. Implements Kubernetes CNI (Container Network Interface) using eBPF. Features: kube-proxy replacement: eBPF hash map for service-to-endpoint mapping (O(1)) vs iptables DNAT rules (O(n)). Critical at 10k+ services. NetworkPolicy enforcement: eBPF maps encode allow/deny rules. L3/L4 (IP, port) and L7 (HTTP, gRPC, Kafka, DNS). Bandwidth management: eBPF TC hooks for rate limiting and QoS per pod. Multi-cluster (Cluster Mesh): extend policies and service discovery across clusters. Hubble: built on eBPF, captures per-flow network telemetry. UI: service map, flow history, dropped packets with reason. CLI: hubble observe. Cilium Service Mesh: replaces some Istio functionality with eBPF (L4 mTLS via WireGuard, L7 proxy via per-node Envoy — not per-pod sidecar). eBPF observability vocabulary: kprobe: hook on kernel function entry. kretprobe: hook on kernel function return (access the return value). uprobe: hook on user-space function (needs symbol information). USDT (User Statically Defined Tracepoints): static tracepoints in user-space applications (Node.js, Python, PostgreSQL). CO-RE (Compile Once Run Everywhere): libbpf feature using BTF (BPF Type Format) — eBPF programs compiled on one kernel version run on other versions without recompilation. Falco vocabulary: Falco: CNCF runtime security using eBPF (and legacy kernel module). Detects: unexpected syscall sequences, privilege escalation, sensitive file access. Rules written in YAML. In conversation: 'At 500 nodes, iptables becomes your performance bottleneck. Cilium's eBPF kube-proxy replacement shaved 30% off our east-west latency.'
5 / 5
A security engineer explains runtime security using eBPF during a compliance review: "We use Falco, which instruments the kernel with eBPF probes to detect security events in real time. A Falco rule: 'alert if any process other than our allowed list opens /etc/shadow'. The eBPF program hooks into the open() syscall — kprobe:sys_openat. Every time any process opens a file, the eBPF program fires. It checks the filename and the process name against a Falco rule. If it matches a suspicious pattern, it sends an event to user space. This gives us real-time intrusion detection without modifying any application or adding sidecars. The overhead is sub-1% CPU for most workloads." How does eBPF enable runtime security monitoring without requiring application changes?
eBPF runtime security: because eBPF hooks into the kernel, it observes all processes equally — containers, VMs, bare-metal. No agent injection, no library hooking, no LD_PRELOAD tricks. Syscall visibility: every privileged operation goes through a syscall — file open, network connect, process fork, privilege escalation. Hooking syscall entry/exit with tracepoints or kprobes captures every security-relevant event. Falco rule examples: Detect outbound connection from container not in allowlist. Detect container running with --privileged. Detect unexpected file write to /etc. Detect shell spawned in container. Falco architecture: eBPF program (or kernel module) → ring buffer → Falco user-space engine → rule evaluation → alert (Slack, PagerDuty, Falco Sidekick). eBPF security tools: Tetragon (Cilium/Isovalent): eBPF security observability and enforcement. Can enforce policies at kernel level (drop syscall if it violates policy). Tracee (Aqua Security): eBPF-based runtime security. Inspektor Gadget: collection of eBPF gadgets for Kubernetes debugging. kubectl-trace: run bpftrace programs in Kubernetes cluster. LSM hook (Linux Security Module): eBPF programs can attach to LSM hooks, implementing custom security policies that operate at the security decision layer (before the kernel grants the operation). More powerful than syscall tracing — can deny operations, not just observe. In conversation: 'eBPF runtime security is the best argument against the "we need kernel access to instrument" claim. We see everything from user space, with zero changes to applications.'