![]() Everyone at Airbnb is familiar with and has access to Slack, and it’s easy to bring people and resources together in an incident channel. Our goal was to centralize incident management in Slack. In order to have quick resolutions of incidents, we developed an incident management bot, a centralized automation tool for incident management. We found that our teams spent a lot of time switching between applications such as Slack, Pagerduty and Jira to raise an incident, page responders, and provide context. Quickly figuring out what service is in trouble, and who to page is paramount to timely incident resolution. Although each incident is unique, we follow the same procedure for detection, escalation, management, and resolution of incidents.Īt Airbnb, we utilize a service oriented infrastructure which involves many interconnected services managed by small teams. This is why it’s important to prepare and to train people to handle incidents in a timely and organized manner. Incidents are unforeseeable events that disrupt normal business operations and are inevitable in complex systems that must be up and running 24/7.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |