Contents
Back

September 2023 - January 2024

Cloud Recomposition

From fragile and costly to resilient and high-performing, cloud architecture built for rapid growth.

-62%
cost savings on cloud operations
99.98%
System Uptime
3x scalability
with auto-scaling improvements

Overview

In September 2023, a sudden surge in user activity pushed our platform to its limits and exposed long-standing weaknesses in our cloud setup. Performance bottlenecks, unpredictable scaling, and soaring costs quickly became major challenges. Over time, the infrastructure had also grown bloated, driving expenses higher than necessary. This project set out to re-architect the system by making it leaner, more scalable, and resilient enough to power the next phase of growth

The Challenge

The legacy cloud setup struggled on three fronts:

Cost Inefficiency

Overprovisioned resources and lack of optimization led to inflated compute and storage costs with poor utilization.

Scalability Limitations

The system faltered under sudden traffic spikes, causing huge disruptions during the surge and risking both user experience and revenue.

Reliability Gaps

Weak fault tolerance and limited observability left the platform vulnerable to cascading failures and downtime under load.

Technologies Used

ECS EC2 MongoDB Cloudwatch DataDdog s3 Docker Lambda WAF Redis/Elasticache

Approach

September 2023
Initial Assessment
Emergency audited the existing infrastructure to uncover performance bottlenecks, cost inefficiencies, and scalability gaps exposed during the traffic surge.
October 2023
Initial Optimization
Optimized EC2 configurations and scaling policies, while collaborating with the backend team to fix slow MongoDB queries that had been dragging performance.
November 2023
Architecture Redesign & Scaling Adjustments
Retained ECS as the core but overhauled its auto-scaling strategy, ensuring resources scaled smoothly with demand instead of failing under spikes.
December 2023
Cost Optimization & OPEX Reduction
Conducted a service-by-service audit, reconfiguring overprovisioned resources, optimizing S3 and CloudFront usage, and rightsizing EC2 instances. Attempting to cut costs without sacrificing performance.
January 2024
Monitoring & Observability with CloudWatch
Enhanced monitoring with CloudWatch dashboards, alerts, and metrics, giving the team real-time visibility into system health and enabling proactive incident response.

Key Implementations

ECS Optimization & Auto-Scaling Adjustments

Fine-tuned ECS auto-scaling to respond dynamically to real-time traffic. The system now scales up smoothly during surges and contracts during off-peak hours, improving performance while cutting unnecessary costs.

Mongo DB Performance Tuning

Identified and flagged inefficient queries that were slowing down the platform. Partnered with the backend team to optimize them, reducing latency and boosting overall database efficiency.

Cost Optimization

Audited all active cloud services, eliminating overprovisioned resources and rightsizing EC2 instances. Optimized S3 storage and CloudFront delivery, achieving major cost savings without trade-offs in performance.

Cloud image dashboard

Results & Impact

62% Lower Cloud Costs

Rightsized EC2, optimized storage, and fine-tuned auto-scaling to eliminate waste and cut operational expenses by more than half.

99.98% Uptime

Strengthened fault tolerance and query performance, ensuring near-zero downtime and a smoother user experience.

3x scalability

Infrastructure now supports three times more concurrent users without disruption, unlocking room for rapid growth.

Operational Agility

With CloudWatch monitoring in place, the team gained real-time visibility and proactive alerts, reducing downtime and enabling faster incident response.

Conclusion

This cloud transformation project turned a fragile, costly setup into a lean, resilient, and growth-ready platform. By optimizing ECS, auto-scaling, and cost management, we built infrastructure that not only adapts to business needs but also drives them forward.

It’s proof that targeted technical improvements, smarter scaling, better monitoring, and sharper cost control, deliver outsized business impact: lower costs, higher reliability, and the ability to scale without friction.

Want to discuss this project?

I'm always happy to share more details about the cloud transformation or discuss how similar approaches could benefit your organization.

Explore Other Projects

Project Leadership

Project Ops Leadership

Led global team of 10+ developers, saved €200k+

Masterclass Platform

Masterclass Platform

Built scalable learning platform serving 1000+ active users