Both Sage and Xero suffered outages this week. Sage suffered a failure to its identity service and restored services after around two hours. Xero customers were less lucky. The outage commenced around 08.25am (BST) on July 27th and it wasn’t until 19.04 that Xero confirmed that the incident was over. Services had started to return from around 16.30.
Major Xero failure
The response to the incident indicates that it was similar in nature to Sage. The Xero status page said: “We can confirm that the issue we have been experiencing has now been resolved and service has resumed. The issue was related to our login infrastructure, and was not a security incident.
\“A tool used to store and trigger events in the login flow was not operating as expected and resulted in failures and errors. Our product teams worked to restore the flow of these events, and return service in a controlled way. Once again, we apologise for the inconvenience this has caused our customers.”
Xero will need to look at how it dealt with this incident and its response afterwards. In a blog published after the incident, It apologised for the outage and is still investigating the cause. It has not yet published an update.
Customers were not happy about the outage. Many customers will have wondered whether the will receive any compensation for the outage. Sorry to burst that bubble. The Xero Terms of service clearly state:
“36. No compensation: Whatever the cause of any downtime, access issues or data loss, your only recourse is to discontinue using our services.”
Should Xero give users more surety? Other larger financial management vendors do. For example NetSuite offers a service credit if it fails to meet its 99.7% availability commitment. It should be noted that Sage appears to have a similar policy for Business Cloud Accounting, although there is no mention of compensation within its Terms and Conditions.
This raises the question of what happens if the outage was longer. What backups does the organisation have in place? There is currently no easy way to extract data from Xero. However, Xero does offer an article that indicates how one might accomplish this here. Organisations might want to consider whether they should extract the data on a monthly or quarterly basis.
Many companies, including Enterprise Times parent company use Xero and rely on it. As most companies are becoming increasingly electronic with less paper records, what happens if things go wrong. It’s a worrying thought.
For users who do not engage online and may not have seen the blog or the status page, there is no obvious way for them to find out what happened. Perhaps a message on login or the dashboard might improve engagement with customers?
Incident response failure
While Xero did update its status page on at least an hourly basis many users are likely unaware of it. The messages on the login page were at times confusing with the system asking the user to reset passwords.
Xero did, after a while, update its status page with “Please note that passwords and logins are not incorrect. There is no need to attempt to reset your password”. For future incidents, it might have been wiser to change the login landing page to something that indicated more clearly what was happening.
Enterprise Times: What does this mean
Every SaaS company is likely to suffer an outage at some point. The cause may be within their remit or be part of a larger hosting company failure. While Xero fixed the problem many customers will have lost almost a full working day of access.
Xero did recover service, but its response to failure was poor at best. Customers will wonder not only what caused the issue, but will it happen again and what Xero will change. Hopefully, the now global company will make some changes.
It could improve communication during and after incidents. It could even offer an SLA. What it shouldn’t do is bury its head in the sand and assume that constant uptime over the next few months will mean people forget about the outage.
The internet is already recognised as a basic need for many people. SaaS solutions fall into a similar category. There will be some businesses that will have been severely affected by the incident. What will Xero do?
There are also lessons for Sage here. While it managed to restore its services relatively quickly, a plus point, it should also review how it dealt with the incident and what lessons it can learn from it. It would also be wise to keep an eye on the Xero response.
Customers should be grateful to the engineers working behind the scenes to restore service to them. Neither incident was a quick fix and they will have worked hard, possibly through the night in Xero’s case, to rectify the issue.