BitMEX
AI

Site Reliability Engineer: Monitoring Observability

BitMEX · Vancouver, British Columbia, Canada · $105k - $120k

Actively hiring Posted over 4 years ago

Company Overview


BitMEX explores, incubates, and pursues opportunities and investments, as part of its mission to reshape the modern digital financial system into one which is inclusive and empowering. BitMEX is a pioneer in the industry whose trading platform handles tens of thousands of low latency transactions per second, representing several billions of dollars traded every day.


 


Job Purpose


The BitMEX infrastructure team is responsible for the reliability and scalability of all the services that power the BitMEX exchange, and for providing turn-key self-service platforms to the developers. As a Site Reliability Engineer focused on monitoring and observability, you will be in charge of increasing our overall monitoring & observability capabilities - setting and implementing the technical standards by providing the cutting-edge self-service tooling for instrumentation, visualization and alerting; shedding light on blind areas, increasing our ability to proactively address issues and accelerating incident response by empowering developers with overview/drilldown dashboards.


 


Responsibilities



  • Implementing scalable tooling and standards for consistent self-service monitoring, observability and alerting; championing and maintaining said frameworks

  • Improve observability instrumentation of the applications alongside the critical trading flow and its periphery - enabling end-to-end request tracing and prediction of future issues

  • Collaborating with the Product Engineering, Trading Technology and Application Support teams to develop dashboards and integrations (e.g. logs / time-series cross-referencing) that allow quick identification of reliability/performance problems and guided drill down to accelerate incident management.

  • Collaborating with the rest of the DevOps teams to include dashboards/alerts as a standard in our self-service applications (e.g. database-as-a-service)


 


Qualifications



  • 5 years of relevant experience with at least 3 years experience supporting production critical time-series databases (e.g. Influx, Prometheus, Graphite)

  • 2 years cloud native experience (e.g. Kubernetes)

  • Familiarity with or knowledge of Terraform (or similar product)

  • Familiarity with or knowledge of Lightstep, Envoy is a plus


  • Strong AWS, Linux/UNIX knowledge


  • Experience working with offshore support teams

  • Strong collaboration, analytical, verbal and written communication skills

  • Experience working with offshore support teams

  • Utilizes sound decision-making skills and communicates well with other team members and business users. Identifies problems and recommends solutions.

  • Works in a team environment, including cross-functional teams and teams with business users throughout the company. Interacts with all levels of management and staff across the organization

  • You are comfortable context-switching across a wide variety of platforms and technologies and are able to find ways to clue different technologies together

  • You are comfortable managing a complex, polyglot, and global infrastructure as code, and you understand how to fully automate their management from a centralized git repository.


 

Tags & focus areas

Used for matching and alerts on DevFound
Engineer Terraform Bitcoin Kubernetes Aws
Common Questions

Frequently asked questions

Quick answers about how DevFound's AI matching, resumes, and referrals work.

DevFound's AI Copilot ingests your profile, goals, and live job data to deliver curated matches in seconds. Every match includes a resume variant, suggested referrals, and interview prep so you can act immediately. The more feedback you provide, the sharper the Copilot becomes.

AI-led job searches shrink the hours spent sifting through boards and formatting resumes. DevFound pairs automation with your personal outreach, so you reserve energy for interviews and negotiation. Traditional networking still matters, but AI gives you a lift before you even send a message.

Modern AI roles expect comfort with production-grade code, data fluency, and practical ML tooling. The strongest candidates pair deep technical chops with storytelling—translating model impact to product, GTM, and exec partners. Continuous learning keeps you ahead as stacks evolve.

DevFound rewards active seekers. Keep your profile fresh, respond to match quality prompts, and enable alerts so you never miss a role. The AI prioritizes companies and teams that align with your feedback, accelerating both introductions and interview invites.

High-density tech hubs continue to host the deepest AI talent pools, yet distributed teams are catching up fast. Use DevFound filters to hone in on onsite, hybrid, or fully remote roles and watch openings expand across time zones.

DevFound aggregates thousands of remote AI openings and flags the nuances—core hours, async culture, and visa needs—up front. The Copilot also recommends how to position your distributed work experience so hiring managers know you can thrive on a remote team.