Build the future you want

Join the companies disrupting their industries

Site Reliability Engineer



Software Engineering
Belfast, UK · antrim bt41, uk
Posted on Tuesday, May 14, 2024

Are you ready to ensure the highest level of system reliability and performance for a leading cloud-native service built on AWS? Do you have expertise in system administration, network operations, and a passion for solving complex technical issues with innovative solutions? If you're seeking a challenging and rewarding role that puts you at the forefront of modern technology operations, join Cloudsmith as our Site Reliability Engineer!

We help our customers manage and secure their software supply chain, and improve their developer experience. We are by developers, for developers. Our culture thrives on openness and transparency, driven by our ambition to build a great company. We enjoy working with great people and customers. We learn, support one another, and we’re growing. Now is a great time to join us.

The Opportunity that lies ahead

As a Site Reliability Engineer at Cloudsmith, you will be instrumental in enhancing the scalability and reliability of our cloud-based platforms and services. This role is crucial to our business strategy as it ensures that our services are always available and performant, underpinning the seamless experience we strive to provide to our customers.

Our Team

We are four squads responsible for building application capabilities which deliver value to our customers, and the underlying platform upon which Cloudsmith is founded. We operate in a highly collaborative environment, where people with different skills come together to make things happen. We have a good approach to CI/CD, and we support a global set of customers who are developers like us.

Your Role:

  • System Reliability and Incident Response: Ensure that Cloudsmith's services are reliable, robust, and responsive. Participate in incident management, post-incident analysis, and preventive solution engineering.
  • Proactive Engineering and Automation: Develop tools and processes to automate deployment, monitoring, and operations. Work towards eliminating manual operations and repetitive tasks.
  • Performance Management: Monitor service and system performance metrics using Datadog, Cloudwatch and other tools. Tune systems for optimal operating efficiency.
  • Collaborate with Application and Platform Teams: Work closely with software engineers, data engineers, and business stakeholders to ensure that our infrastructure meets the growing needs of the business and aligns with other technological advancements.
  • Innovation and Continuous Improvement: Stay updated with the latest in systems and operations technology. Advocate for changes that improve reliability and velocity.
  • System Security and Compliance: Assist in establishing and enforcing robust security practices for infrastructure management, including compliance with data security and privacy standards.

Required Skills and Experience:

  • Proven Experience in System Reliability: Demonstrated history in site reliability, system administration, or network operations.
  • Expertise in Cloud Services: Proficiency with AWS services is essential, along with familiarity with cloud infrastructure management and automation tools.
  • Strong Programming Skills: Proficiency in scripting languages like Python or Ruby, and infrastructure as code tools such as Terraform.
  • Problem Solving and Critical Thinking: Ability to tackle complex system issues with innovative solutions.
  • Leadership and Communication: Excellent interpersonal and communication skills, capable of leading initiatives and collaborating effectively with diverse teams.
  • Project Management: Ability to manage multiple projects simultaneously and meet critical deadlines.

What benefits will you enjoy?

  • Competitive salary and equity package.
  • Comprehensive health, dental, and vision insurance.
  • Generous annual leave and flexible working policies.
  • Budget for professional development, conferences, and training.
  • Opportunity to work in a dynamic, innovative, and supportive environment.

Health and Wellness

Regardless of your location, we deeply care about the health and wellness of our staff and their families; a sustainable pace is important to us. In addition to generous annual leave (PTO), we offer parental leave and health benefits that can cover you and your dependents up to 100%. We also offer flexible family-friendly working policies.

Personal Growth

You will have an enormous scope to learn new skills alongside your colleagues, and your continued professional development is essential to us because it's important to you. We will support you with budgets for equipment, training, books, conferences, travel, and certifications. The more powerful you become, the better for all of us.

Cloudsmith is based in Belfast, and we use out HQ regularly for activities like bi-weekly development review sessions and operations team in-office days. We also hold all-hands offsites in Belfast a few times per year, with guest speakers and team activities. On a day-to-day basis, most Cloudsmithers work remotely, so we really rely on our online collaboration tools! Note that this role is open only to candidates who have the right to work in the UK or Ireland without requiring sponsorship.

Join Us:

If you're excited about shaping the future of data engineering at Cloudsmith, we would love to hear from you. Apply now and take the first step towards a rewarding and challenging career with a growing Irish business.

Cloudsmith is an equal opportunities employer committed to diversity and inclusion - we welcome applications from all suitably qualified candidates.

NB Applicants will be scanned and shortlisted via our trusted recruitment partner NineDots