Site Reliability Engineer (SRE)
Key points
- 23/10/25
- London, Harmondsworth
- £90k - 100k per year + 25% Bonus
- Permanent
- Cloud Infrastructure, SRE & DevOps
- Full time
Job role
Senior Site Reliability Engineer
Location: London (Hybrid)
Salary: £100,000 per annum + 25% Bonus + Excellent Benefits
We are hiring a Senior SRE to support a large-scale digital organisation undergoing a major commercial re-platforming across web and mobile channels.
This role sits much closer to the application layer than traditional infrastructure SRE positions. You will work directly with product and engineering teams across customer-facing platforms (web, mobile, payment journeys, APIs) to improve reliability, resilience, and service behaviour in production.
This is not a ticket-driven operational role and not a pure platform engineering post. It is about embedding measurable reliability into distributed systems at service level.
What You’ll Be Doing
- Embed SRE practices across API and microservices-based architectures
- Define and own meaningful SLIs/SLOs aligned to customer journeys and business-critical flows
- Improve service reliability through proactive observability, tracing, telemetry and alert tuning
- Partner closely with backend and platform engineers to reduce systemic failure modes
- Lead and contribute to incident response, post-incident reviews and resilience improvements
- Move the organisation from symptom-based alerting to customer-impact driven diagnostics
- Contribute to release safety, progressive deployments and production guardrails
What We’re Looking For
- Proven experience operating as an SRE within digital product environments
- Strong understanding of API architectures, microservices and distributed systems behaviour
- Hands-on experience defining and implementing SLIs, SLOs and error budgets
- Deep observability exposure (e.g. Datadog, Splunk, Prometheus, tracing/APM platforms)
- Experience working closely with application engineering teams, not just infrastructure teams
- Background in high-availability, customer-facing systems where outages have commercial impact
- Cloud-native exposure (AWS preferred) with practical understanding of Kubernetes environments
Important
This role is best suited to engineers who care deeply about production behaviour, customer experience in failure scenarios, and reliability as a first-class product feature, rather than engineers focused purely on infrastructure provisioning or CI/CD enablement.
Please get in touch with Benjamin Applewhaite to discuss the role in confidence.
Benefits
Apply for this position
Want to apply faster?
Create an account with Xpertise to upload your CV, covering letter and personal details and apply faster for each role.
Login / register
US