Sr. Site Reliability Engineer, Containerization & Kubernetes Infra
Toronto, Vancouver (Remote, ON or BC only)
Our Client is searching for an experienced Site Reliability Engineer (SRE) to join our
Containerization & Kube Infra team. As a member of this team, you will be focused on enabling
reliable and efficient service runtime across our Engineering organization. We partner closely
with contributors responsible for our Build & Delivery systems, our VMWare-based
infrastructure, and our Observability systems.
Your Role
In this role, you will be expected to work on continuously improving the ability of
engineers to develop, test, release, and maintain their production services. You will participate
in managing the systems and processes that ensure a flexible and reliable container ecosystem,
including Kubernetes cluster stability, deployment tooling, ecosystem security, and service
integration support. To be successful, you will need to work with teams across the Engineering
organization to understand their needs, and you will need to work closely with our internal
Platform and Infrastructure teams to build and maintain the services that provide for those
needs.
Who You Are
You are an active participant in a culture of sharing and learning. You believe that we succeed
or fail as a team, and you confront problems (not people) when things are difficult. You are an
experienced technologist with a passion for DevOps, and you have spent a few years dealing
with complex automation problems in a Linux/Unix ecosystem. We expect experience with most
of the tools and concepts outlined in the skill section (or comparable) -- but we know that
nobody knows everything, and you are a growth-oriented engineer, right?
Your Skills
● Extensive experience with Linux/Unix, particularly programming to automate tasks
● Moderate experience with distributed systems
● Some experience with low-level Linux ecosystem (eg kernel, cgroups)
● Familiarity with self-managed container orchestration (e.g. Kubernetes)
● Familiarity with Release Orchestration (Ansible, Capistrano)
● Familiarity with Build Automation (Jenkins, Github Actions)
● Familiarity with Configuration Management (Puppet)
● Programming with at least one OOP language (Python preferred; you may also
encounter Ruby, etc)
● Scripting (distinct from mid-sized software development: as an SRE, you aren’t going to
be able to avoid hacking on Bash)