eBPF
Apr 2025 · 8 min read
Writing Your First XDP Program: Packet Drop at Line Rate
How I built a kernel-bypass packet filter using XDP that processes 14 Mpps on a single core — and what I learned about BPF maps, verifier errors, and the AF_XDP socket interface.
Read post →
DPDK
Mar 2025 · 10 min read
DPDK PMD Internals: How Poll Mode Drivers Kill Interrupt Overhead
A deep dive into DPDK's poll mode driver model, huge page allocation, NIC RSS configuration, and why removing interrupts is the biggest win in high-throughput packet I/O.
Read post →
RDMA
Feb 2025 · 12 min read
RDMA from Scratch: Queue Pairs, Completions, and Zero-Copy Transfers
Understanding the verbs API, how protection domains and memory regions work, and writing a simple RDMA Write benchmark that achieves sub-2μs latency between two hosts.
Read post →
AI Infra
Jan 2025 · 7 min read
Why AI Clusters Need RoCE: NCCL, Congestion, and PFC Deadlocks
How GPU-to-GPU communication in distributed training works, why RoCE v2 is preferred over InfiniBand in hyperscale deployments, and how PFC pause frames can deadlock your whole fabric.
Read post →
Kernel
Dec 2024 · 9 min read
SR-IOV Deep Dive: Virtual Functions and the VFIO Framework
How SR-IOV splits a physical NIC into virtual functions, how the VFIO driver exposes them safely to VMs, and the performance implications vs. software switching with OVS.
Read post →
Coming Soon
GPUDirect RDMA: Bypassing the CPU for DMA Transfers to GPU Memory
How GPUDirect allows NIC ↔ GPU direct DMA, the BAR mapping trick, and benchmarking MPI collectives with vs. without GPUDirect enabled.
Draft in progress…