Russell Groves

Senior Cloud Engineer

Summary

Senior Cloud Engineer with 13+ years of AWS experience specializing in secure, scalable cloud infrastructure design and Site Reliability Engineering. Proven expertise in Infrastructure as Code, DevOps automation, and cost optimization initiatives. Strong track record of leading cloud migrations, implementing security best practices across GDPR and ISO 27001 compliance frameworks, and driving operational excellence through observability and incident management.

Experience

Lepaya

02/2024 - Present

Engineering Manager

Promoted to Engineering Manager within 6 months, leading cloud infrastructure and SRE initiatives for fast growing scaleup.

  • Achieved 30% cloud cost reduction through rightsizing, optimizing resource usage and automated resource lifecycle management
  • Established engineering best practices including comprehensive code reviews and automated testing, increasing deployment frequency by 70% and reducing deployment times by 50%
  • Designed self service developer platforms using Terraform modules that eliminated friction points between teams
  • Orchestrated company wide incident management framework implementing SRE principles and improving MTTR by 40%
  • Implemented security testing into automated CI/CD pipelines to identify misconfigurations and vulnerabilities early in development lifecycle
  • Collaborated with security and compliance teams aligning cloud architectures with GDPR and ISO 27001 regulatory requirements
  • Mentored team members through regular performance reviews and career development planning, achieving 70% internal promotions
  • Led cross functional collaboration initiatives increasing communication efficiency between development and operations teams
AWS
DevOps
CI/CD
Security
ISO 27001
SRE

Lepaya

06/2023 - 02/2024

Principal Engineer

Led cloud infrastructure architecture and SRE practices, driving scalability and reliability improvements for rapidly scaling platform.

  • Built comprehensive observability platform from scratch using Prometheus, Grafana, Loki, and OpenTelemetry providing single pane of glass for entire software development lifecycle
  • Led infrastructure as code transformation with Terraform, creating reusable modules enabling developer self service capabilities
  • Designed microservices architecture migration strategy breaking down monolithic applications into domain driven services
  • Implemented comprehensive monitoring and alerting reducing MTTR from hours to minutes using SRE best practices
  • Established cloud data encryption standards implementing encryption at rest and in transit across multi cloud service environments
  • Automated vulnerability scanning and patch management processes in CI/CD pipelines, accelerating remediation and decreasing exposure windows
  • Created golden pathways for developer infrastructure interactions providing clear guidance and enabling autonomous development teams
Terraform
Observability
Microservices
SRE
Cloud Security
Infrastructure as Code

OneSpark Insurance

05/2022 - 05/2023

Lead Cloud Engineer

Architected and deployed multi region AWS infrastructure supporting global insurance platform with focus on security, scalability and compliance.

  • Architected and deployed multi region AWS infrastructure supporting global insurance platform with automated failover capabilities
  • Engineered scalable cloud infrastructure using AWS ECS, EKS and Kubernetes, improving deployment reliability by 40%
  • Built custom CI/CD pipelines using GitHub Actions enabling numerous daily deployments to production with zero downtime
  • Implemented Infrastructure as Code using Terraform managing AWS resources across development, staging, and production environments
  • Designed auto scaling solutions using ECS Fargate and Application Load Balancers handling traffic spikes during peak periods
  • Established security best practices including WAF rules, VPC security groups, and encrypted data pipelines meeting insurance industry compliance requirements
  • Optimized cloud costs through reserved instances and spot instance strategies reducing monthly infrastructure spend by 40%
  • Mentored development teams on cloud best practices and DevOps methodologies improving overall team velocity
AWS
Terraform
ECS Fargate
GitHub Actions
Cost Optimization

Elenjical Solutions

04/2019 - 05/2022

Lead Cloud Engineer & Murex Engineer

Led cloud transformation initiatives for fintech consultancy, architecting AWS solutions for financial services clients including major Murex trading platform migrations.

  • Led architecture design for migrating Momentum Metropolitan's entire Murex trading platform from on premise to AWS including all development, staging and production environments
  • Successfully executed year long Murex version upgrade project migrating from v3.1.28 to v3.1.41 with all environments moved to AWS cloud infrastructure
  • Built automated integrations with S3 enabling production environments to deliver database backups to development environments and support diagnostics
  • Migrated Murex scheduling from legacy Tivoli scheduler to Apache Airflow for end of day runs and automated data handling tasks
  • Created AWS Lambda functions and Terraform infrastructure as code scripts to automate development environment provisioning and housekeeping tasks
  • Ensured Murex production environments achieved 99.99% uptime through highly available and fault tolerant architecture design
  • Implemented automated database migration scripts reducing environment provisioning time from 2 days to 4 hours
AWS
Murex
Apache Airflow
Terraform
High Availability
Financial Services

Lumabyte (Pty) Ltd

01/2013 - 05/2022

Founder & CEO

Founded cloud technology company providing customized cloud solutions focusing on operational efficiency and data protection for local businesses.

  • Developed fully automated cloud native software solutions that effectively mitigated data loss incidents utilizing latest cloud technologies
  • Spearheaded technology adoption in companies driving shift towards containerized workloads, automated testing, and just in time infrastructure
  • Applied DevOps best practices to automate repetitive and error prone tasks resulting in more reliable and resilient production environments
  • Utilized open source technologies exclusively and contributed actively to their development ensuring sustainable ecosystem
Cloud Architecture
DevOps
Containerization
Automation
Open Source

Cruze Control Technologies // Arago GmbH

11/2018 - 03/2019

Lead DevOps Engineer & AI IT Automation Expert

Led DevOps automation initiatives for AI powered enterprise automation platform, managing large scale distributed systems and mentoring technical teams.

  • Managed HIRO AI automation clusters spanning 50+ nodes with Cassandra, Kafka, and ZooKeeper for major enterprise clients
  • Built automated cluster provisioning workflows using Ansible and Terraform reducing deployment time from weeks to hours
  • Created comprehensive monitoring solutions using Prometheus and Grafana for multi datacenter deployments
  • Developed Python automation scripts that reduced manual ticket resolution time by 60% through intelligent workflow automation
  • Led AWS migration proof of concepts that demonstrated 30% infrastructure cost reduction while improving system reliability
  • Configured ELK stack for centralized logging and Kibana dashboards enabling faster diagnostics across distributed systems
Kubernetes
Cassandra
Kafka
Ansible
Terraform
Prometheus
Grafana
Python
AWS

Argotek (Pty) Ltd

03/2016 - 11/2018

General Manager & Technology Lead

Drove technology innovation and process automation within manufacturing startup, achieving sustained growth through custom software solutions.

  • Built custom job tracking database with Python and SQLite that automated invoicing processes and reduced administrative overhead by 50%
  • Implemented CAD/CAM automation workflows that reduced part programming time from 4 hours to 30 minutes through process optimization
  • Developed lean manufacturing processes that contributed to 50% year over year growth for 3 consecutive years
Python
SQLite
Automation
Process Optimization

IntelloHome

01/2015 - 03/2016

Automation Specialist

Led home automation team delivering end to end automation solutions and developed cloud based control systems for residential clients.

  • Led end to end home automation projects from initial design consultation through implementation and ongoing support
  • Developed cloud based third party control system allowing remote management of Control4 systems leveraging cutting edge cloud technologies
  • Created custom Control4 drivers for automation systems integration, becoming the only South African developer with this specialized capability
Automation
Cloud Technologies
Linux
RF Technology

Basil Read Rally Team

02/2014 - 07/2014

Technical Consultant

Provided specialized technical consultancy designing innovative solutions for performance optimization and security infrastructure.

  • Designed custom wireless security system addressing specialized requirements including proximity to railway infrastructure
  • Conceptualized and designed first of its kind suspension testing machine that simulated track conditions in controlled environment
Technical Consulting
Hardware Design
Performance Testing

Auckland Micro Limited

11/2011 - 12/2012

Second Line Support Technician

Managed IT operations for large multinational corporation, implementing monitoring systems and driving process improvements through automation.

  • Implemented custom fault tracking system that identified core infrastructure issues affecting 150 users across 4 international offices
  • Developed Python and Bash monitoring scripts that actively monitored network resources from printers to hardware firewalls
  • Reduced daily average fault reports from 300+ to 10-20 through systematic identification and resolution of root causes
Python
Bash
Network Monitoring
Infrastructure Management

System Consultants

02/2011 - 11/2011

First Line Support Technician

Provided comprehensive technical support and developed web hosting infrastructure for small to medium businesses across multiple locations.

  • Designed and implemented company's first web hosting offering based on CentOS LAMP stack creating entirely new revenue stream
  • Successfully installed long haul wireless network links up to 20km range maintaining network integrity for remote clients
CentOS
LAMP Stack
Web Hosting
Wireless Networks

Projects

AI assisted resume creation platform helping users tailor resumes based on actual work experience for specific job applications. Built with cloud native architecture using Python, JavaScript, FastAPI, SvelteKit, Docker, and self hosted infrastructure.

Python
JavaScript
FastAPI
Svelte
Docker
Cloud Architecture
  • Designed and implemented full stack cloud native application architecture
  • Implemented containerized deployment using Docker for scalability
  • Built RESTful APIs using FastAPI for backend services

Serverless gaming infrastructure built for on demand game server provisioning. Utilized Python, Terraform, and the Serverless Framework with AWS services including Lambda, ECS Fargate, and DynamoDB for scalable gaming platform.

Python
Terraform
Serverless Framework
AWS
Lambda
ECS Fargate
DynamoDB
  • Architected serverless infrastructure using AWS Lambda and ECS Fargate
  • Implemented Infrastructure as Code using Terraform for reproducible deployments
  • Built event driven architecture using EventBridge and API Gateway

Discord bot providing game server management and orchestration tools for gaming community. Written in Python using Discord.py library with server administration and automated game organization capabilities.

Python
Discord.py
Game Server Management
Automation
  • Built automated game server management and orchestration system
  • Implemented community management tools with Python automation
  • Designed event driven architecture for game coordination

Gaming community platform for South African TF2 players with event organization and server management capabilities. Led community coordination and developed supporting tooling infrastructure.

Community Management
Server Administration
Event Planning
Automation
  • Managed community infrastructure and server administration
  • Built supporting automation and tooling for community operations