Duncan Ward

Engineering Leader | Platform & Infrastructure | DevOps & SRE

Platform engineering leader with 20+ years building teams and systems that endure.

Currently directing SRE operations for a multi-tenant SaaS platform with 15,000+ production databases across multiple cloud providers and regions. Deep experience in test automation and QA reliability—built BDD frameworks that improved test pass rates from below 50% to over 99%. Known for establishing technical and organizational frameworks that continue delivering value long after implementation. Proven ability to earn trust within existing team structures and drive measurable progress through structured quarterly planning with weekly milestone tracking.

Experience

DevOps SRE Manager

ActionStep (Legal SaaS)

Feb 2024 – Present · 2 yrs

Toronto, Canada

Lead SRE team of 5-8 engineers for multi-tenant legal practice management platform. Joined an established team structure, earned trust through technical credibility and collaborative approach, and now report directly to CTO with full accountability for infrastructure operations and cloud strategy.

Own 15,000+ production databases 12,000 Aurora PostgreSQL across 5 AWS regions, plus 3,000 SQL Server databases across 45 Azure clusters currently migrating to AWS

• Leading Azure-to-AWS migration - broke project into quarterly objectives with 10 weekly milestones each for transparent progress tracking; on track for January 2026 completion

• Established incident response framework including on-call rotations, blameless post-mortems, and runbook documentation

• Drove SOC2 compliance initiatives, authored security documentation, and implemented controls for regulated legal industry requirements

Cloud Manager

Uberflip / PathFactory (Marketing Tech SaaS)

Jan 2020 – Feb 2024 · 4 yrs 1 mo

Toronto, Canada

Directed platform engineering team responsible for AWS infrastructure, developer experience, and operational excellence. Instituted quarterly goal-setting with granular weekly milestones, enabling predictable delivery and clear stakeholder communication.

Achieved 99.999% platform uptime through monitoring architecture and automated remediation systems still in production post-departure

Negotiated 30% reduction in AWS infrastructure costs established cost review processes and automation practices adopted as ongoing operational standards

• Designed YAML-driven microservice configuration framework for managing deployments across EKS and AWS services via Terraform – standardized service provisioning across engineering teams

• Championed BDD testing framework (Cucumber/TypeScript) that improved QA-product alignment - methodology still in use across engineering organization

Key Skills:
Engineering Management AWS Amazon EKS TypeScript Next.js Go Terraform Agile

Cloud Manager

Rubikloud Technologies (Acquired by Kinaxis)

Jun 2018 – Jan 2020 · 1 yr 7 mos

Toronto, Canada

Managed cloud infrastructure team for AI-powered retail analytics platform operating across AWS and GCP. Successfully integrated into existing team during rapid growth phase; established operational practices that supported successful acquisition by Kinaxis.

Migrated Apache Spark workloads from EMR to Kubernetes improved resource utilization and deployment flexibility for data engineering pipelines

• Built custom Kubernetes operator using CRDs to automate user account provisioning for long-duration SFTP uploads, including custom SSH cipher configuration to support legacy client systems

Key Skills:
Google Cloud Platform GKE Apache Spark Jupyter Data Engineering

Toolsmith / Platform Engineer

FreshBooks (Accounting SaaS)

Jun 2013 – Jun 2018 · 5 yrs

Toronto, Canada

Transformed test automation reliability from <50% to >99% built custom Capybara BDD test runner with intelligent retry logic; instituted 100-run validation requirement for new/modified test cases to ensure stability

• Led GCP migration proof-of-concept achieving 10x target performance using Kubernetes and Cloud SQL – architecture patterns informed company's long-term cloud strategy

• Revolutionized CI/CD platform enabling pull request testing – infrastructure remained foundational to developer experience years after implementation

Key Skills:
Kubernetes AWS CI/CD Python Airflow Jenkins

Earlier Experience

Cloud Manager

Bell Canada

Sep 2011 – Jun 2013

Integrated Zabbix monitoring providing NOC visibility; Puppet automation frameworks for repeatable infrastructure

Team Leader

Symantec/MessageLabs

Jan 2003 – Sep 2011

Built monitoring and support systems for global 13-site network; identified $5M annual CAPEX savings; established ITIL-compliant service management framework