News, Security, Technology

Global Cyber Outage Disrupts Key Industries

A major cyber outage wreaked havoc across various industries globally, causing significant disruptions in aviation, banking, healthcare, and retail sectors. More than 1,000 flights were cancelled, leaving travelers stranded and airports in chaos. Banks, hospitals, and stores also faced severe operational challenges, leading to widespread inconvenience and concern.

The cause of the outage has been traced back to a “defect” in a recent software update from the cybersecurity firm Crowdstrike. The defect specifically impacted Windows operating systems, causing widespread system failures and disruptions.

Crowdstrike quickly addressed the issue, releasing a statement to clarify the nature of the problem. “This is not a security incident or cyberattack. The issue has been identified, isolated, and a fix has been deployed,” the firm assured. It added that “We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website.” The statement emphasized that the outage was due to a software defect and not a malicious attack.

Microsoft said in a post on X  that “ We’re aware of an issue with Windows 365 Cloud PCs caused by a recent update to CrowdStrike Falcon Sensor software  and called for  restoring their Windows 365 Cloud PC to a known good state prior to the release of the update (July 19, 2024) as documented: Enterprise – https://learn.microsoft.com/en-us/windows-365/enterprise/restore-overview

Despite the rapid response, the impact of the outage has been significant. Airlines have reported extensive delays and cancellations, with airports struggling to manage the fallout. Banking services were also hit hard, with customers unable to access online banking platforms, ATMs, and other services. Healthcare systems experienced disruptions in patient care and record-keeping, while retailers faced difficulties with payment processing and inventory management.

Authorities and industry leaders are now focusing on restoring normal operations and addressing the backlog of issues caused by the outage. The incident highlights the critical importance of robust IT infrastructure and the potential risks associated with software updates.

Referring to the incident, Alois Reitbauer, Chief AI Strategist at Dynatrace, said: “Given the increasing complexity of software, all software developers and organizations are susceptible to outages. When outages do occur, organizations need the capability to pinpoint root cause and remediate immediately. AI-driven approaches have become essential for complex IT operations to deploy as manual processes cannot keep up. A power of 3 approach to AI leveraging predictive, causal, and generative AI is increasingly critical to help organizations deliver the highest availability and performance of software as well as minimize disruption to end user experience.”

James Maude, Field CTO, BeyondTrust, said: “It appears an update from Crowdstrike causes the Windows OS to crash, creating global IT systems outages that have impacted almost every industry. Impacted systems present users with the dreaded “Blue Screen of Death” (BSOD), and in the worst cases, users are stuck in a crash and reboot loop. The fix appears to require physical intervention to rename or remove the update file which is responsible making the recovery process time consuming and complicated for remote systems.

While any piece of software can be unstable or have bugs, it is particularly an issue for security vendors such as Crowdstrike, as they have a very deep integration into the operating system in order to monitor and protect the endpoint. This means that any bugs or instability can cause the entire operating system to crash which appears to be what we have unfortunately experienced in the past 24 hours.

There are a few strategies to mitigate the risks of unstable software updates, but ultimately it starts with the vendor conducting rigorous QA in test environments that are as representative of customer environments as possible. Then, having a phased deployment process, gradually rolling out the updates, in stages, to groups of real users, to ensure the software is stable in real world environments before deploying to all users. In this case, it appears that the vendor was confident in the update and had deployed it at scale. In the coming days we should see a root cause analysis conducted to understand how this was able to happen, and most importantly, ensure that it can’t happen again.

Microsoft have been investing heavily in their own native security tooling over the past few years, having had issues with anti-virus vendors patching areas of the operating system and causing instability issues in the past. In recent years this has resulted in increased stability in the operating system, however this incident goes to show that we can’t be complacent. Microsoft need to ensure the OS remains stable in the event 3rd party software crashes and will need to work with security vendors to ensure stability on both sides.”

As industries recover, there will likely be calls for more stringent testing and validation processes for software updates to prevent such widespread disruptions in the future.

Meanwhile, In the UAE, authorities have issued an alert urging users of Crowdstrike software to exercise caution with any recent software updates.

“We inform you that there is a technical defect in the Crowdstrike software update that may affect the electronic systems of the institutions that use it,” the UAE’s Telecommunications and Digital Government Regulatory Authority said, in a post.

Previous ArticleNext Article

GET TAHAWULTECH.COM IN YOUR INBOX

The free newsletter covering the top industry headlines