Another outage?! Cloudflare glitch knocks hundreds of websites offline

What exactly happened?

Cloudflare, a well-known web infrastructure service provider strives to help online businesses around the globe reduce website downtime and allow netizens to access and display website content in the most seamless possible manner.

Cloudflare achieves this by taking advantage of a wide network of edge servers or ‘PoPs’ (Point of Presence) which are configured to deliver content to users from the geographically closest server to that client, hence boosting the transfer speed.

However, on Tuesday June 21st 2022, Cloudflare reported an incident, causing the websites of companies that solely rely on its services to go down. Organizations including Discord, Omegle, DoorDash and other online businesses experienced downtime, leaving thousands of netizens in the dark. 

Cloudflare outage incident report
Cloudflare status report on the incident.

Are outages here to stay?

In the meantime, this is the third time that Cloudflare reports an outage, the CDN giant experienced similar issues in the past such as in July and August 2020. 

In the summer of 2021, other well-known CDN providers like Akamai and Fastly also had to deal with outages and service glitches, causing banks, airlines, stock exchanges and trading platforms to go dark and cease business operations for a certain time.

Truth is website and web app outages happen occasionally and usually don’t last very long. Content delivery networks and other hosting services leverage a global network of backup servers designed to limit the risk of disruptions when things go down. However, when things go down, and they do – this may have devastating consequences for websites’ brand name and revenue streams.

Outages that took place recently have made experts alert of the risks of the internet’s reliance on a relatively small number of core infrastructure providers or in other words ‘CDN single point of failure’. 

As stated by Nick Merrill, research fellow at UC Berkeley’s Center for Long-Term Cybersecurity: “CDNs are the biggest centralized point on the internet, making them a potential target for cybercriminals or government actors. If one of them goes down huge swaths of the internet could go with it.” 

How can you mitigate CDN outage-related risks?

This incident (and the other ones from last year) teach us that CDN service providers like Cloudflare, Akamai, Fastly, etc. stay vulnerable to outages no matter what, taking down all websites who rely solely on their services when they experience a technical hiccup. 

Websites should be available to users free of lags and downtime. Therefore, limiting or even better eliminating the risks associated with outages is extremely important. This is where Multi CDN solutions come into play.

Multi CDN setup, as the name implies, is a solution that leverages multiple CDNs from different CDN providers simultaneously to boost the speed of content delivery and assists in avoiding outages and latency issues.

“Website operators have to take some of the blame for outages. More sites should consider using a Multi CDN strategy to reduce risk.” Michael Dorosh, Senior Director, Gartner

How Mlytics helped clients to keep business afloat

Mlytics constantly collects vast amounts of RUM and synthetic monitoring data to analyze CDN performance data, including CDN latency and availability.

This data goes through the Mlytics decision engine, helping users automatically identify and choose the best-performing CDN anywhere, anytime. These collective features are what we call ‘Smart Load Balancing’.

In the end, all the data is combined and displayed on the ‘Pulse’ (performance analytics) chart, which gives a holistic overview of each CDN’s performance at a certain time.

This time, even before Cloudflare went down and made its official announcement, Mlytics’ 24×7 Information Security Operation Center team already saw from these monitoring tools that Cloudflare’s availability had dropped significantly, showing signs of a possible outage.

Hence, in an early stage, Mlytics users’ traffic was routed from Cloudflare edge servers to other CDNs through the smart load balancing solution, immediately minimizing the possible damage to customer websites and Applications.

The chart below shows the optimization decisions made by our Smart Load Balancer, it displays which CDNs were selected during a certain timeframe. As shown Cloudfront and Edgeextension had several query spikes due to the Cloudflare performance drop – the system automatically switched Cloudflare CDNs with better performing CDNs.

Mlytics Smart Load Balancer automatically switching to the best performing CDN

On the Pulse chart below, we clearly see a drop in availability for Cloudflare in the same time frame. This helps illustrate what happened when aligning this with the chart above.

Cloudflare outage and availability drop
Mlytics CDN performance monitoring tool shows an availability drop for Cloudflare at the time of the incident.

Takeaway : Redundancy is key

Every CDN provider aims to deliver the most seamless user experience possible, but as seen from recent outages, things could -and do go haywire. 

Therefore, it is business-critical to have a solid cloud redundancy and disaster recovery plan in place to prevent any event from causing your service to go down. As stated by Michael Dorosh, website operators have to take some of the blame for outages and more sites should consider using a Multi CDN strategy to reduce risk.

At Mlytics we help our customers to eliminate the risks associated with outages and latency to ensure maximum uptime at all times.