3+ years shipping production systems — Kafka pipelines, multi-tenant SaaS, billing engines, and infrastructure that ages well. I design APIs you can extend, queues you can replay, and case studies with real metrics. Open to new roles (full-time or contract); remote-first with reliable overlap for EU, UK, and US teams.
Senior Software Engineer
Born from years in fintech, logistics, and B2B SaaS — often under NDA, always fixing what others called “unfixable.” My compass: observability, idempotency, and runbooks that make sense at 3AM.
SLOs, DLQs, exponential backoff, replayable events. I want incidents to be boring because the system handles itself. I measure pages per month, not lines of code.
Kafka, SQS, idempotent consumers. I design dead-letter visibility and replay tooling — because failures are inevitable, but losing data is not.
Terraform, containerization, CI/CD that doesn't need a shaman. I build self-serve environments that devs actually love to work with.
Shipped surfaces — this site, GitHub, and six long-form case studies (billing → CDC & edge patterns).
Next.js, static pages, contact & reviews — one place to browse work and jump into case studies with clean URLs.
Experiments, backend utilities, and contributions — open in a new tab.
Six stories: billing, DLQs, compliance, queries, edge resilience, and CDC-driven caches — metrics-first.
Real systems, real tradeoffs, measurable results. No marketing fluff.
Stripe webhooks timing out, duplicate invoices, angry customers. Rebuilt with Kafka idempotent workers and SQS DLQ replay UI. Billing tickets dropped 68%.
Invoices stuck in “processing” forever. Support couldn't see why. Built a DLQ explorer + automated retry policies that cut unknown-failure tickets by 80%.
Auditors needed full immutable history; Zoho CRM sync was flaky. Event-sourcing plus conflict resolution layer cut manual reconciliation by 90%.
Dashboard reports taking 8 seconds meant users gave up. Composite indexes, read replicas, and Redis caching brought p95 latency from 3.2s down to 0.8s.
API Gateway, WAF, tenant quotas, and adaptive circuit breakers when traffic spiked 22× — graceful degradation instead of 503 storms.
Debezium binlog → Kafka → Redis invalidation — stale inventory tickets dropped 87%; median propagation under 2s.
Tell me about your stack, pain points, and timeline. I reply with actual words, not automated sequences. No ghosting — even if we're not a fit.