Last week we thought our product was finally “taking off”. Traffic on AWS Load Balancer was going up every hour. We didn’t run any big campaign… but still, numbers looked great. For a moment, we were happy. Then we checked actual users. Nothing changed. Something felt off. So we started digging. 1. First checked app logs → nothing useful 2. Then ingress logs → a bit weird 2. Then Load Balancer logs → yeah… something’s wrong It wasn’t users. It was bots. A lot of them. Same endpoints getting hit again and again Random user agents No real behavior — just hammering requests And the worst part? They were quietly increasing our AWS bill. -- Just fixed it step by step: 1/ Put AWS WAF in front and enabled basic bot + rate limiting 2/ Blocked a few obvious IP ranges 3/ Added rate limits on Kubernetes ingress 4/ Added some basic checks in backend (nothing crazy) Within a few hours: Traffic dropped (the fake one) Costs dropped Server felt lighter Honestly, the scary part is: If we didn’t check deeper, we would’ve celebrated "growth" that wasn’t even real. Not all traffic is good traffic. #aws #kubernetes #devops #buildinpublic
Bot Traffic on AWS Load Balancer: A Cautionary Tale
More Relevant Posts
-
A client came to us with a Video Automation Platform deployed on AWS that was crashing under load & traffic. Users upload a video, it gets posted to Facebook, Instagram, LinkedIn and more. Simple idea. But the infrastructure behind it was a ticking time bomb. Here is what was happening: Every time someone uploaded a large video, the server would download it locally, run FFmpeg to optimize it, then re-upload it to S3. One user? Fine. Three users at the same time? The disk fills up. Server crashes. Client gets angry emails. The junior dev who built it did not think about concurrency. Not a blame game, it is just one of those things you only realize when real users show up. So I rebuilt the processing pipeline. The fix was moving FFmpeg off the EC2 server entirely and onto AWS Lambda with an S3 event trigger. Now the flow looks like this: User uploads video to S3 under a raw/ prefix. S3 automatically triggers a Lambda function. Lambda runs FFmpeg in its own isolated environment with 10GB of temp storage. The optimized file lands in optimized/ prefix. Raw file gets deleted. EC2 never touches the video again. The key insight is that every Lambda invocation is completely isolated. 10 users uploading at the same time means 10 separate Lambda containers running in parallel. No shared disk. No memory pressure. No crashes. I also added a size threshold. Videos under 300MB get copied directly without FFmpeg since the optimization benefit does not justify the processing time. Only heavy files go through compression. The result: a pipeline that scales horizontally without any changes to the EC2 server, at a cost of roughly $0.005 per video processed. Sometimes the fix is not more RAM or a bigger server. It is moving the right work to the right place. If your backend is doing heavy processing synchronously and you are starting to feel the pain of scale, this pattern is worth looking at. #AWS #NodeJS #CloudArchitecture #Serverless #WebDevelopment #Lambda #AWS_Lambda #Scaling #Cloud
To view or add a comment, sign in
-
-
🚨 Debugging a Mysterious Outage in a “Well-Architected” AWS Setup Recently ran into an interesting issue while working with a setup that looked solid on paper: - Two EC2 instances in different Availability Zones - Behind an AWS Load Balancer - Fronted by Cloudflare CDN Everything aligned with best practices… yet both EC2 instances were frequently going unhealthy — sometimes even going down at the same time. 🤯 🔍 The Investigation Logs didn’t reveal much. Health checks seemed fine. Infrastructure looked correct. But digging deeper into the Load Balancer configuration revealed something subtle: 👉 Sticky Sessions (Session Affinity) were enabled 💡 The Root Cause Sticky sessions were causing the Load Balancer to route repeated requests from the same users to the same EC2 instance. This led to: - Uneven traffic distribution - One instance getting overloaded - Health checks failing under pressure - Cascading failures affecting both instances ✅ The Fix Disabled sticky sessions → Traffic started distributing evenly → System stabilized 🎯 --- 📘 What Are Sticky Sessions? Sticky sessions (or session affinity) ensure that a user’s requests are consistently routed to the same backend server during a session. This is typically achieved using cookies (e.g., AWS ALB-generated cookies). --- 👍 When Should You Use Sticky Sessions? Sticky sessions make sense when your application: - Stores session state locally on the server (not shared) - Uses in-memory sessions (e.g., legacy apps) - Requires user-specific context tied to a single instance --- 🚫 When Should You Avoid Sticky Sessions? Avoid them in modern, scalable architectures where: - You want true load balancing across instances - Your app is designed to be stateless - You use shared session stores (Redis, DynamoDB, etc.) - High availability and auto-scaling are critical Sticky sessions can silently: - Skew traffic distribution - Overload specific instances - Reduce fault tolerance - Mask scaling issues --- 🧠 Key Takeaway Even small configuration choices can have a massive impact on system behavior. 👉 If you're building for scale, aim for stateless services and let your load balancer do its job properly. --- #AWS #CloudComputing #DevOps #SystemDesign #Debugging #Cloudflare #LoadBalancing #EngineeringLessons
To view or add a comment, sign in
-
I thought I understood AWS. Then I deployed everything… and nothing worked. • EC2 running, but no internet • Load balancer up, but traffic still failing • Website updated, but changes not showing • API live, but returning errors with no logs That’s when it clicked: I didn’t understand AWS. I just knew how to follow steps. So I changed how I approached it. Instead of thinking in services (EC2, S3, Lambda), I started thinking in systems: • where requests actually go • how failures show up to users • what depends on what • how security boundaries shape everything Over the past few weeks I built: • a custom VPC with public/private subnet design and controlled access (including bastion host access to private instances) • a multi-AZ load balanced system with proper traffic isolation and resilience • a CDN-backed static site with caching, HTTPS, security headers, and CI/CD (GitHub Actions → S3 + CloudFront invalidation) • a serverless API with API Gateway, Lambda, DynamoDB, IAM least-privilege, observability (CloudWatch), and production-style controls (API keys, rate limiting, custom domain) But the real learning wasn’t building it. It was debugging it. Full repo + breakdown: https://lnkd.in/efjUeB9C #AWS #DevOps #CloudEngineering #PlatformEngineering #SRE
To view or add a comment, sign in
-
-
Day 5 of the #30DayTerraformChallenge I've been building up to this day 5. variables, resources, auto scaling groups, and today I finally put it all together with a fully load-balanced AWS infrastructure. And then broke it. And fixed it. Multiple times. Here's what I built: ✅ An Application Load Balancer routing public traffic across multiple EC2 instances ✅ An Auto Scaling Group spanning multiple availability zones ✅ Security groups that only allow instances to receive traffic from the ALB not directly from the internet ✅ Health checks that automatically pull unhealthy instances out of rotation The moment it worked — hitting the ALB DNS in my browser and seeing the hostname and instance ID come back — was genuinely satisfying. Refreshing the page and watching it cycle through different instances across different AZs? Even better. But the real lesson today wasn't the infrastructure. It was Terraform state. I used to think the .tfstate file was just a log. It's not. It's the source of truth for everything Terraform manages. I also hit some real errors today - a 502 Bad Gateway that took a while to diagnose, an instance type not supported in us-east-1e, and a DNS issue that had nothing to do with my code. Documenting and working through each one was honestly more valuable than when things just work. #30DayTerraformChallenge #TerraformChallenge #Terraform #AWS #ELB #IaC #AWSUserGroupKenya #EveOps #CloudEngineering #DevOps
To view or add a comment, sign in
-
AWS Lambda cold starts can bring even the most robust serverless architectures to their knees 💻. A seemingly harmless function invocation can turn into a frustrating 10-second delay, causing cascading failures and disappointed users. Our team learned this the hard way when our Lambda function, built with Node.js 14.x, started experiencing cold starts after a deployment 🚀. The issue was caused by a combination of factors, including inadequate provisioned concurrency, insufficient memory allocation, and a recent increase in invocation frequency. To mitigate the issue, we implemented a multi-faceted solution, including configuring provisioned concurrency using the AWS CLI command `aws lambda create-provisioned-concurrency-config`, upgrading to Node.js 16.x, and optimizing our function's memory usage using AWS X-Ray and CloudWatch metrics 📊. By taking these steps, we significantly reduced the occurrence of cold starts and improved our application's overall performance. What strategies have you employed to minimize AWS Lambda cold starts in your own serverless applications? #AWSLambda #Serverless #CloudComputing #PerformanceOptimization #Nodejs #AWSXray
To view or add a comment, sign in
-
I just paid ₹16,086 for a lesson I could have learned for free—well, almost.💸 I recently finished an end-to-end RAG application using AWS Bedrock, OpenSearch, S3, and EC2. The project was a success, but I made the rookie mistake of walking away without hitting "Delete.” Luckily, the cloud gods had mercy: AWS waived the bill as a one-time gesture after I explained it was a configuration oversight. Consider this my lucky break and your final warning! 💡 Pro-tips for my fellow devs: • Tag Everything: Use Resource Tagging (e.g., Project: RAG-Demo). It’s the only way to track which specific service is bleeding your wallet in a complex stack. • Billing Alarms: Set a CloudWatch alarm at $10. If you don't, you’re essentially handing AWS a blank check. • OpenSearch Costs: These instances are "always-on." Switch to Serverless for dev work or terminate them the second you’re done. • The "Clean Up" Ritual: S3 storage is pennies, but running EC2 and vector databases will eat your rent. Delete the stack once the demo is over. #AWS #CloudComputing #RAG #GenerativeAI #DevOps #ExpensiveLessons #AWSBilling
To view or add a comment, sign in
-
-
AWS just released general availability of Kubernetes Gateway API support in its Load Balancer Controller, and it's a meaningful upgrade. The old way: cramming load balancer configs into annotation strings with no validation. The new way: type-safe CRDs that validate at apply time, clean RBAC separation between platform teams and app developers, and cross-namespace routing out of the box. It covers both L4 (TCP/UDP/TLS via NLB) and L7 (HTTP/gRPC via ALB), completing AWS's Gateway API story alongside VPC Lattice for east-west traffic. The best part? Gateway API isn't AWS-specific. 20+ conformant controllers exist across GKE, NGINX, Envoy, Istio, and more, so your core routing logic stays portable. #kubernetes #aws #gatewayapi #eks #devops #cloudnative #platformengineering https://lnkd.in/dwWfQXjh
To view or add a comment, sign in
-
I deployed an app to AWS EC2 in 13 minutes without writing a single line of Dockerfile. Here's how SmartDeploy does it: A LangGraph agent scans your GitHub repo, detects your stack, flags security risks, generates all your infra files, and deploys it to AWS EC2. The agent handles the heavy lifting. You still get full visibility into every generated file, so you can verify and tweak before deploying. Built on top of SD-Artifacts, my open source library for managing deployment outputs. Swipe to see the full flow. Built with Next.js, LangGraph, Bedrock, Docker, and AWS. GitHub link in the first comment. #buildinpublic #devops #aws #llm #langgraph #nextjs #softwaredevelopment
To view or add a comment, sign in
-
Is your "healthy" server actually broken? There is nothing worse than a server that stays "Green" in AWS while your users are seeing nothing but errors. The Problem: By default, an Auto Scaling Group (ASG) only checks if the "hardware" is powered on. If your app crashes in memory but the virtual machine stays on the ASG stays "deaf." It won't restart the server because it thinks everything is fine. The Fix: Stop relying on hardware checks. Connect the ASG to the Load Balancer (ALB). The Code: In Terraform, it’s just one line in your aws_autoscaling_group: health_check_type = "ELB" The Result: If the Load Balancer can't reach your app, the ASG finally "hears" the problem. it kills the zombie instance and spins up a fresh, working one automatically. It’s a simple change, but it’s the difference between a self-healing system and a 3:00 AM emergency call. #AWS #Terraform #DevOps
To view or add a comment, sign in
-
-
New video is live on AWS Secrets Manager. In this video, I explain how AWS Secrets Manager helps securely manage application secrets without relying on .env files in production environments. I walk through how to store, retrieve, and update secrets using AWS CLI, and how to integrate them into real-world applications using Shell scripts and Node.js. I also demonstrate a complete hands-on setup using EC2 and IAM roles, showing how to avoid hardcoded credentials and follow a more secure, production-ready approach. Watch the video here: https://lnkd.in/giUzKWmg Blog: https://lnkd.in/gEpzWExU Happy Learning! Amitabh Soni #aws #secretsmanager #cloudsecurity #devops #awscli #nodejs #iam #cloudcomputing
To view or add a comment, sign in
-