Photo by Annie Spratt on Unsplash
When Monday mornings go wrong: How I accidentally DDOS-ed the startup I worked for
I have some stories I am going to share from the time I was still a backend developer. This is the first and one of my biggest fuckups, but I can say that with a big smile looking back at what happened. It is like they say, "if you are not failing, you are not trying hard enough."
It was a couple of years ago, and I think I was only developing for about a year. I learned on the job while studying for my bachelor's in computer science in the evening. So I was still green as grass. My team was no exception, we were all pretty new to the game.
It is fair to say that this happened at a startup, and the rest of the development team were also young and novice developers. Sure, we were proud and the platform was doing well, but we were still learning.
One day, I picked up a project to show all the emails that we sent to our users automatically through our application. We decided it would be best to show this in our admin panel which was developed in-house. Back then we had a mindset to build everything in-house, something we have come back on with experience.
The way I created this was pretty simple. The service that we integrated to send emails, was able to fire webhooks on certain events, such as sent
, delivered
, failed delivery
, opened
and clicked
. I remember feeling like a genius for finding this in their documentation and I got to work. Little did I know, I was about to set off a chain of events that would result in one of the worst weeks for the company.
Of course, I sent a few emails during development to ensure that all events worked accordingly and I was satisfied with the results. This was back in a time when we did not know about automated testing, did not properly review each other their code, and overall, were pretty new to the game. So, we thought the feature worked fine, and it was tested and reviewed by others.
All seemed well, and we deployed the update. Our admins were happy, as they now had the tools to properly see what happened. There were a few minor bugs here and there, but nothing we could not fix instantly. Famous last words.
That was until we reached the first Monday after our update. It was 9 in the morning and suddenly the server crashed. What? We did not do any updates today or last Friday. What could cause this?
Poor logging and poor understanding caused us not to find the issue so we just restarted and thought that was that. Until 30 minutes passed and it happened again. And again, and again, and again.
You see, up until this point, we did not see more than 100 webhooks fired in an hour. The company was small and we just did not send many emails. Except on Monday, which is when we sent our newsletter. For comparison, our subscriber list then was around 20k users, and we always sent that Monday morning at 9.
Because I implemented all the different events, it did not stop at 20k events. It went way over 100k on just Monday alone. New events were forced upon our little server which simply could not handle the load.
That resulted in them receiving error codes, flagging the events as undelivered
and put in a queue to try again. Primarily due to our lack of experience (and in all honesty lack on their end to disable the webhooks or empty the queue), we were forced to just watch the queue count slowly go down over the week, by the few events our server could process before it crashed again.
It was a painful week, but I learned a lot. For one, always test your code properly before deploying it. And two, never underestimate the power of a Monday morning newsletter ๐.
I hope you found this story entertaining and perhaps even learned a thing or two. Just remember, we all make mistakes, and it is how we learn from them that counts.