Site Reliability Engineer
Toronto
Summary
Our engineering team is growing! We’re looking for a talented Site Reliability Engineer to join a small team and build a consumer-facing application that will transform how users create and discover inspirational content everyday.
Responsibilities
-
Build and maintain a highly fault tolerant, elastic infrastructure on cloud.
-
Help building and deploying internet scale services.
-
Participate in 7x12 on-call rotation
-
Drive building of production monitoring/alerting to proactively capture production issues
before down times
-
Establish and building monitoring for critical business metrics
Qualifications
-
Bachelor’s degree in computer programming, computer science, or a related field
-
5+ years experience and passion. Consumer products technology preferred
-
3+ years experience with high-traffic monitoring system (logging, metrics, tracing, APM)
-
2+ years experience with building production monitoring/alerting to proactively capture
production issues before down times
-
Fluent in Python & Bash, added bonus if experience with Ansible, Terraform & Docker
-
Experience with maintenance production environment, kubernetes cluster & building out
continuous integration and automated deployment pipelines
-
Comfortable with change: ability to demonstrate comfort with ambiguity, adapt quickly
and be effective in new situations in a highly dynamic setting
-
Data-driven but also imaginative and intuitive in coming up with ideas and solutions
-
Must possess a start-up mindset: hunger to learn quickly and the ability to balance
multiple priorities in a fast-paced team environment