Site Reliability Engineer
Signal AI
Software Engineering
London, UK
GBP 70k-85k / year
Location
London Office
Employment Type
Full time
Location Type
Hybrid
Department
TechnologyEngineering
Compensation
- £70K – £85K • Offers Bonus
We're on a mission to change the way businesses make decisions with our cutting-edge AI technology. To achieve that, we’re looking for passionate people to join our open and inclusive workplace. Our inclusive environment welcomes skills and experiences from diverse backgrounds, and defines who we are.
We're hiring an SRE to help us run and evolve the infrastructure behind Signal AI's decision intelligence platform.
You'd be joining a small, collaborative Infrastructure team at a moment when the work is genuinely changing shape. Over the last year we've hardened the platform, reduced cost, and built serious observability into our highest-volume systems. The next year is about scaling that work, absorbing infrastructure from a recent acquisition, and being thoughtful about how AI shows up in operational work: not as a gimmick, but as a tool we trust ourselves to use well.
We're looking for someone who wants to shape the direction of the team; someone who brings curiosity and care to the work, and who wants to leave things meaningfully better than they found them.
What we've shipped recently
Cut ~$50k/year off our Elasticsearch bill by migrating compute to more efficient chips. (Apr 2026)
Built the foundation for our MCP server platform: leveraging and contributing to open-source tooling to give the whole company extensible, production-grade AI integrations. (2025–2026)
Rebuilt production from scratch in a full DR gameday. End-to-end restore validated across our multi-account AWS setup. (Jan 2026)
What we're working on next
AI-augmented operations: Claude Enterprise is deployed across Signal. We want this team to help define what good looks like for SRE: incident triage, runbook generation, capacity planning, cost analysis. This is a strategic investment, not a side project: and we'd love someone genuinely curious about what these tools can and can't do.
Security in the age of AI The threat landscape has shifted. Supply chain security is more at threat than ever, and powerful models are emerging that promise to change how the industry thinks about security. We're looking for someone interested in thinking seriously about what actually matters to protect now.
Acquisition integration: Bringing a recently acquired product's infrastructure under our reliability, security, and operational standards. A substantial, multi-quarter piece of work with real technical and organisational complexity, and plenty of room to make your mark.
Batch workload consolidation: Moving disparate batch jobs onto EKS for unified scheduling, cost visibility, and operational tooling.
Your first six months
We want to set you up to thrive. Here's what that looks like in practice:
Month 1: You're onboarded across our AWS estate, Terraform, and observability stack. You've completed your first on-call shift with support from the team, landed your first PR in the DevOps repo, and started working Claude Enterprise into your daily flow.
Month 3: You're owning a workstream end-to-end. You've led the SRE response to at least one production incident and hosted your first post-mortem. You’ve surfaced a real opportunity that you've pushed to a measurable result.
Month 6: You're driving a multi-quarter workstream with clear direction, and you're contributing insights to our AI-in-operations playbook: including where Claude adds real leverage and where it doesn't.
What we’re looking for
You have solid AWS and Terraform experience, and you're comfortable writing Python or Go to solve operational problems. You think in distributed systems: failure modes, observability, blast radius: and you take problems end-to-end rather than stopping at the edges of your own work.
You're pragmatic about AI tooling. Not evangelical, not dismissive. You can tell us when you'd reach for an LLM and when you wouldn't, and you'd have a clear reason either way.
You communicate openly and you're comfortable pushing back when you think something could be better. We want to leverage your experience and perspective to grow our platform.
We know not every strong candidate will have every skill on this list. If you're excited about the work and you're close on the experience, we'd encourage you to apply.
Nice to haves
Networking depth. You're comfortable below the load balancer: TCP/IP fundamentals, DNS, VPC design, and what actually happens when a service can't reach another one.
Operational security instincts. You follow the threat landscape with genuine interest: not just CVEs, but shifts in how attacks happen and how the industry is responding. You have a point of view on what actually matters right now.
Linux internals comfort. When something behaves strangely under load, you know where to look.
Communication across technical levels. You can collaborate with your infrastructure teammates and explain the same concepts clearly to a product manager. You've worked alongside colleagues with a wide range of technical backgrounds and adapted naturally.
Not sure you meet every requirement? Studies show that women and other underrepresented groups often hesitate to apply unless they check every box. At Signal AI, diverse perspectives strengthen our teams, drive innovation, and lead to better performance. So even if your background doesn’t align perfectly with each qualification, we encourage you to apply if you’re passionate about this role.
We're dedicated to creating an inclusive environment where every Signaller feels welcomed, valued, and heard—a place where you can truly thrive as yourself.
Compensation Range: £70K - £85K