Cloud SRE

Toronto

Our Client provides brokerage services to clients in over 60+ markets around the world. Through its advanced suite of electronic trading strategies, experienced high-touch trading group, top-ranked Commission Management services, award-winning desktop trading platform, and unparalleled access to insightful analytics, content and unique liquidity, our Client helps institutions lower overall trading costs and ultimately improve investment performance.

 

Role responsibilities:

The responsibilities will include but not limited to the following:

The Cloud SRE will demonstrate passion for technical excellence in building highly performant, resilient, and flexible system architectures. The successful candidate will display expert level development skills as applied to systems engineering. In this role, you will be responsible for building leading AWS systems, for an innovator and leader in electronic trading. You will leverage cloud design elements as part of an integrated Agile development team consisting of application developers, SREs and data engineers. You are foremost a systems developer, with a thirst for metrics-based decision making, as a basis for continuous system improvements.

 

The ideal candidate understands that people have different ways of working and leverages that understanding to achieve common team goals, not least of which is stellar customer success. This is unique role at the intersection of cloud architecture, application development, and infrastructure optimization

 

Key objectives critical to success in this role:

  • Lead designs of major software components, systems, and features to improve the availability, scalability, latency, and efficiency of services.

  • Lead sustainable incident response, blameless postmortems, and production improvements

  • Provide guidance to other team members on managing end-to-end availability and performance of mission critical services, on building automation to prevent problem recurrence, and on building automated responses for non-exceptional service conditions.

  • Mentor and train other team members on design techniques and coding standards, and to cultivate innovation and collaboration across multiple teams.

  • Expert level operational and architecture knowledge of AWS services, as well as mapping to specific problem sets.

  • 5+ years of experience in DevOps field to support scalable and reliable distributed applications;

  • Configuration Management, application deployment, and intra-service orchestration experience using Ansible/Terraform/CloudFormation/Python

  • Understanding of infrastructure concepts like network topologies, security, routing, load balancing, firewalls, and enterprise patterns

  • Understanding of protocols like TCP/IP, SSH, RDP

  • In depth knowledge of containerization technologies (Docker, Kubernetes, ECS, etc)

  • Design/implement metrics in Prometheus, Grafana with external integrations

  • Build Logging infrastructure (e.g ELK)

  • Backend storage management and scaling

  • Disaster Recovery and High Availability strategy

  • Experience in managing production systems on a very large scale

  • Experience working with relational, No-SQL, and Hadoop based systems