- Facebook and other apps such as Instagram, Whatsapp and Oculus VR faced a global outage lasting about 6 hours due to network configuration issues at Facebook data centres.
- Billions of global users suffered due to the blackout and even Facebook employees reported the lack of communication services for work purposes.
- A team of data engineers had to be dispatched to a data centre in California to fix the issues manually as services slowly resumed afterwards.
At around 11:40 AM Eastern Standard Time(EST) on Monday, a massive outage shook the world’s beloved social networking platform Facebook which in turn brought down all of its apps including Instagram and WhatsApp. This outage which lasted around 6 hours, is noted to be the second-longest outage on the social media platform after a similar incident in 2019 which took out the servers for around 24 hours. This global blackout affected not only billions of users but also millions of advertisers and business owners who use the platform for commercial purposes.
The cause behind the global Facebook blackout
Experts believe that the outage was due to an issue with the networking technology called BGP(Border Gateway Protocol). Border Gateway Protocol refers to a gateway that enables the internet to communicate and direct the flow of information. In other words, the internet is divided into many blocks known as autonomous systems. These autonomous systems interact with each other to allow users to exchange information. Such interaction or communication is enabled by the Border Gateway Protocol algorithm.
On the other hand, Facebook has not given a detailed explanation behind the failure of its services last Monday. The Chief Technology Officer(CTO) of the company, Mike Schroepfer appeared to blame it on “network configuration issues”. According to outside sources, Facebook services were altered due to a faulty change in the configuration settings of the established BGP algorithm.
The backbone routers of the company which connect all the data centres faced these changes first hand and this altered the coordination between networks. This in turn also progressed to interrupt the flow of information at each data centre level which maintains the activity of the Facebook services. The net result of this cascading pathway was an outage limited not only to Facebook but also to its other apps such as social media platform Instagram, the messaging service Whatsapp and Oculus VR which produces virtual reality-based headsets.
Effects of the outage
More info on the outage today: https://t.co/D54G0PLaqk
— Mike Schroepfer (@schrep) October 5, 2021
Facebook CTO, Mike Schroepfer took to Twitter to apologise to the world as the entire suite of Facebook’s applications failed to launch globally on Monday afternoon. In his tweet, Schroepfer expressed his apologies to the various families and individuals who use Facebook and its other apps to stay connected to the world. He also apologised to the various business owners of small and large scale businesses who use the platform and its Advertisement service to direct customers to their store websites to conduct sales.
*Sincere* apologies to everyone impacted by outages of Facebook powered services right now. We are experiencing networking issues and teams are working as fast as possible to debug and restore as fast as possible
— Mike Schroepfer (@schrep) October 4, 2021
However, it wasn’t merely the world population that faced the brunt of the disruption in Facebook’s services. The employees of the organisation went through an outage in their communication facilities as well. The six-hour halt in its services caused a crack in the internal communication system for employees. Facebook workers use an internal version of the app itself to share information to and fro. This version of the app is not accessible to the general public and was also broken down during the blackout. Employees couldn’t use their official work accounts which in turn were linked to other important services such as Google Docs and Zoom and hence were completely out of service.
Due to the sensitive nature of their data and its management, Facebook doesn’t allow its employees to use non-Facebook authenticated accounts or personal accounts at work and so workers faced a communication blackhole. They were only able to use their Microsoft outlook accounts to email information whenever required while some people shifted to using applications such as Discord and FaceTime on their phones to communicate. To make matters worse, employees reported the inability to enter their own offices and conference rooms due to the failure of the badge-identification system linked to the network.
Restoration of services
After nearly six hours of the outage at around 6:00 PM EST, Facebook and its allied suite of apps starting restoring their function. Many users around the world tweeted in unison about their ability to use most of the applications. However, it wasn’t as easy for the task force of Facebook data engineers to control the damage done.
Due to the changes in the configuration settings at the data centre level, engineers for the firm were unable to fix the bug remotely. Hence because of the lack of remote access combined with the lack of adequate personnel at the individual data centres to fix the bug physically due to the pandemic restrictions, the services remained disrupted for longer. Then, Headquarters dispatched a team of data engineers who went to a data centre in California to resolve the issue physically after which, slowly the service started getting back to normal