Resiliency Architect
- Type:
- Full Time
- Location(s):
- Montréal, Quebec
- Toronto, Ontario
- Date Posted:
- Job ID:
- R152089
Our employees are at the heart of everything we do. Together, we help people, businesses, and society prosper in good times and be resilient in bad times.
Our employee promise represents Intact’s commitment to you in exchange for living our Values, striving to do your best work, being open to change and investing in your career. In return, we promise to provide support, opportunities and performance-led financial rewards at a workplace where you can shape the future, win as a team and grow with us.
Pay at Intact is about much more than just salary.
Flexible work arrangements and a hybrid work model
Possibility to purchase up to 5 extra days off per year
Multiple benefits offered to support physical and mental wellbeing, including telemedicine, Wellness account and much more
Share plan & other savings: up to 12% of salary or even more (ask how you could earn guaranteed income for life)
Salary range (but not limited to):
149,600 - 182,800Annual bonus target, based on the base salary, with a potential payout of up to double the target (subject to personal and company performance):
15%As part of our commitment to Win As A Team, we share our success with employees through our annual bonus plan and Employee Share Purchase Plan (ESPP) – with Intact matching 50% of your net shares.
Our pension offerings provide flexibility and long-term security for our employees beyond their careers. We are one of the few companies offering the opportunity to receive guaranteed income for life via our defined benefit pension plan.
Salary for the candidate will be determined taking into consideration a number of factors including: experience, skills, qualifications, anticipated contribution to role, internal equity, etc. The salary range presented above is based on a 35-hour workweek and would represent a majority of different candidate profiles. However, we encourage candidates who may fall outside of this range to apply as well.
About the role
We are seeking a Resiliency Architect to define and drive our end-to-end resiliency architecture and production reliability posture across Azure, AWS, Google Cloud, and on‑prem environments.
This person will be responsible to design standards, production readiness, and enforcement mechanisms at enterprise scale.
The ideal candidate combines deep SRE expertise with advanced systems architecture and a strong vision for explicit blue/green and chaos engineering practices—alongside AI/GenAI—to make systems reliable, leverage AI as a force multiplier for resiliency, transform team workflows, and deliver resilient, intelligent user solutions.
What you'll do here:
Core objectives :
Establish the enterprise resiliency architecture, patterns, and production guardrails for all critical platforms and services.
Govern design quality through rigorous architecture reviews and production readiness assessments.
Make blue/green deployments and chaos engineering first-class, codified practices across the estate: design, tooling, automation, and continuous validation.
Integrate AI/GenAI into reliability engineering: robust AI system architectures, AI-assisted observability, causal detection, and autonomous remediation.
Lead the evolution of disaster recovery, ransomware protection, and continuity strategies grounded in hard SLAs/SLOs and measurable business outcomes.
Key responsabilities
Own the resiliency reference architecture for multi-cloud/hybrid (multi-region/zone, active-active/passive, blast-radius reduction) and define/enforce NFRs (availability, latency, durability, RTO/RPO).
Establish governance via design reviews, production gates, policy-as-code, scorecards, and automated controls integrated with CI/CD, IaC, and runtime platforms.
Standardize blue/green deployment architecture and engineer safe traffic shifting, health gates, progressive cutovers, rollback, and zero-downtime data migrations.
Lead an enterprise chaos engineering program (experiments, failure injection, game days) and feed outcomes back into architecture guardrails and SLO improvements.
Define production readiness standards (capacity/saturation, graceful degradation, retries/backoff, circuit breakers, rate limiting) and codify runbooks, dependency maps, and failover topologies validated via DR drills and rehearsals.
Drive observability and SRE practices: OpenTelemetry adoption, distributed tracing, SLIs/SLOs/SLAs, error budgets, and executive reliability dashboards.
Architect DR and cyber-resilience (immutable/air-gapped backups, PITR, ransomware-resistant segmentation, recovery validation) aligned to regulatory and audit needs.
Guide platform and data resiliency across Kubernetes/service mesh, replication/consensus, geo-distribution, and event streaming (DLQs, backpressure, reprocessing).
Enable reliable AI/GenAI systems and AI-driven operations (monitoring/guardrails, anomaly detection, predictive modeling, human-in-the-loop remediation, ops copilots).
Serve as principal resilience authority: mentor teams, lead councils/forums, and communicate tradeoffs clearly to executives and engineers.
What you bring to the table:
10+ years in SRE/Platform/Infrastructure/Systems Architecture with proven large-scale, production-critical experience across Azure, AWS, GCP, and on‑prem.
Multi‑region traffic management, global load balancing, DNS/BGP, TLS/mTLS, CDN/edge patterns.
Kubernetes ecosystems (AKS/EKS/GKE), service meshes (Istio/Linkerd), autoscaling strategies, readiness/liveness, topology constraints.
Observability stacks: OpenTelemetry, Prometheus/Grafana, Jaeger/Tempo, ELK/OpenSearch, commercial APM; correlation and topology modeling.
Data resilience: consensus/replication (Raft/Paxos), partitioning, PITR, snapshots, CDC; caches (Redis), databases (Aurora, Cosmos DB, Spanner).
IaC and automation: Terraform/Pulumi, GitOps (Argo CD/Flux), policy‑as‑code (OPA), CI/CD patterns (blue/green, canary, progressive delivery).
Chaos engineering, DR orchestration, and automated failover at enterprise scale.
For candidates located in Quebec, bilingualism is required considering the necessity to interact on a regular basis with English speaking colleagues across the country.
No Canadian work experience required however must be eligible to work in Canada
AI/GenAI competencies:
Architecting reliable AI systems: model serving (Ray/SageMaker/Vertex), vector stores (Pinecone/FAISS/pgvector), retrieval pipelines, guardrails and safety.
ML/ops: model monitoring (drift, performance, hallucination detection), feature pipelines, lineage/observability, prompt/content governance.
Applying AI to operations: causal detection, predictive resiliency, autonomous remediation frameworks.
Strong software engineering skills (Go/Python/TypeScript) and systems thinking; excellent communication (written, visual, verbal) and executive presence.
#LI-Hybrid
Il s'agit d'un nouveau rôle au sein de notre équipe en plein croissance | This role is a new member of our growing team.
We are an equal opportunity employer
At Intact, our Value of respect is founded on seeing diversity as a strength. We strive to create an accessible workplace where employees feel valued, included and encouraged to share their unique perspectives.
We encourage applications from individuals who are members of equity-deserving groups, including but not limited to women, Indigenous peoples, persons with disabilities, Black people, and members of the 2SLGBTQI+ community.
As part of Intact’s commitment to reconciliation, we acknowledge that we work, meet and travel across the land currently called Canada, originally inhabited by First Nations, Metis and Inuit people. This history extends through many centuries and continues to evolve today.
We have policies to ensure equal access and participation for people with disabilities, including providing workplace adjustments (accommodations). A copy of applicable policies is available on request.
If we can provide a specific adjustment to make the recruitment process more accessible for you, please let us know when we reach out about a job opportunity. We’ll work with you to meet your needs.
Learn more about our recruitment process and your candidate journey here.
Please note that Intact does not provide sponsorship or other support for immigration-related matters including but not limited to employer-specific closed work permits. Candidates must be eligible to work in Canada from the anticipated start date and throughout their employment and are solely responsible for maintaining their work eligibility.
If you are an employee of Intact or belairdirect, please apply for this role on Internal Career Site.