Categories
Security

Apps Down… Again!

Yesterday, Cloudflare experienced a widespread outage that disrupted access to numerous websites and online services across the globe. The incident, which lasted for approximately 45 minutes, affected a large portion of the internet due to Cloudflare’s central role in managing web traffic, security, and content delivery for millions of domains.

So, what happened Bal?

The outage was traced to a configuration error during a routine network update. Cloudflare’s engineers identified that a change in the routing configuration caused a cascade of failures across several data centers. This led to a temporary loss of connectivity for many websites relying on Cloudflare’s infrastructure.

Cloudflare quickly rolled back the faulty update and restored normal operations. The company confirmed that the issue was not the result of a cyberattack or external interference but rather an internal technical misconfiguration. And that’s good – in a strange kind of way. Imagine if it was a cyberattack…!

Which websites and services were impacted?

Because Cloudflare acts as a content delivery network (CDN) and security layer for a vast number of websites, the outage had a ripple effect across the internet. Users attempting to access affected sites encountered “502 Bad Gateway” or “Connection Timed Out” errors.

Some of the major platforms and services impacted included:

  • Discord – Users reported connection issues and message delivery failures.
  • Shopify – Many online stores were temporarily inaccessible or unable to process transactions.
  • Medium – The blogging platform experienced downtime for both readers and writers.
  • Coinbase – Cryptocurrency traders faced difficulties accessing their accounts and viewing market data.
  • DoorDash – Food delivery orders were delayed or failed to process.
  • Feedly – The RSS reader service was unavailable for a short period.

Smaller websites and independent businesses using Cloudflare for DNS or DDoS protection were also affected, highlighting the company’s extensive reach across the web. And this is also why I couldn’t tweet much within that 45 minute period!

Cloudflare’s Response

Cloudflare’s engineering team provided real-time updates through its status page and social media channels. Within an hour, most services were restored, and the company issued a detailed post-incident report explaining the root cause and steps taken to prevent similar issues in the future. If you’re interested in that report – go ahead and google it.

The company emphasized improvements to its deployment process, including additional safeguards for configuration changes and enhanced monitoring to detect anomalies faster.

Lessons from the Outage

The incident underscored the internet’s reliance on a few key infrastructure providers. While Cloudflare’s services improve performance and security for millions of websites, the outage demonstrated how a single point of failure can have global consequences.

Categories
Security

Global IT Outage

Friday was, somewhat busy and crazy if you work in IT!

You must know by now… there was a Global IT outage. Well, I say you must know – there was worldwide disruption to arilines…

Difficulty in paying for your shopping…

It was basically a faulty software update by a company called Crowdstrike, that caused the global IT outage which likely skipped checks before being deployed.

An estimated 8.5 million Microsoft Windows PCs devices were affected worldwide by the update from cybersecurity firm CrowdStrike, causing delays for airports, broadcasters, hospitals and businesses.

Problems came to light quickly after the latest version of CrowdStrikes Falcon sensor software was rolled out on Friday.

The update was meant to make systems more secure against hacking, but instead caused devices to display a “blue screen of death” due to faulty code.

Shares in Crowdstrike fell, considerably. Not only them, airlines revealed a 46% fall in profit!

CrowdStrike told customers early Friday the outages were caused by “a defect found in a single content update of its software on Microsoft Windows operating systems,” according to a post on X from CEO George Kurtz. The issue was identified and isolated, and engineers deployed an update to fix the problem, Kurtz said.

I’ve been reading about it on various social media platforms. The one that really caught my eye was an engineer, up at 2am, who had a call come in about the Blue Screen Of Death… they, at the time thought it was an isolated incident, then all of a sudden, more calls started coming in and it got all serious. The good thing was, they’re kind of trained for situations like this – firstly, they focused on getting hospitals up…

But it really does make you think. How reliant we are on tech – and how one single content update caused worldwide disruption.

I’ve always said, and will always say – technology is wonderful. However, we should have back up systems or something in place if all goes wrong.

Below is the latest from Crowdstrike…

The last bit I wanted to say was – if you have a company reaching out that you’ve never heard of suggesting they can fix your computer system for you – don’t get scammed!

Efforts by CrowdStrike to make clients more secure against hacking attempts further backfired as malicious websites have begun to use the incident to publish “unofficial code” claiming to fix any ongoing issues, Australia’s cyber intelligence agency has warned.

On its website, the Australian Signals Directorate said its cybersecurity centre “strongly encourages all consumers to source their technical information and updates from official CrowdStrike sources only”.

I talk alot a fair amount regarding cyber security – but in all honesty, make sure you listen to your IT team, official updates only and stay alert. These can be testing times as we’re so reliant on tech – well, not just us – but our customers, friends, etc… too.