Call Issues

Incident Report for Byphone

Postmortem

On Sunday 10th July at approx 13:00 Global Swich datacentre in London suffered another power supply failure which was compounded by a fault on their DRUPS. Consequently, we lost power to our Power Distribution Units and they became faulty, probably due to a power spike. Fortunately, we had the infrastructure in place to move all customer services over to the newly built Amazon Web Services based hosted version of byphone. We switched the primary routes for all calls over to Amazon on Sunday afternoon.

Obviously, this was not a planned migration, so we have been commissioning during the last week, with live customer device registrations impacted as we make configuration changes and optimise resource allocation. Last Monday morning as customer traffic began building up, we had problems with capacity on the new BT gateway router. We were quickly able to address this by balancing traffic across a few more routes. Since Monday 11th July calls have been routed over byphone successfully.

Since then we saw some repeated issues on registrars which caused devices to drop registration and then re-register a few minutes later. These issues were very difficult to debug without disrupting customer call traffic further, so it took as a few attempts over the following week to home in on and resolve the issues.

We are seeing benefits already in the AWS version of byphone, as we have greater control and insight on resource allocation. We are continuing to improve byphone.

We also had engineers on site in Global Switch last week to bring the servers there back online, and we have maintenance planned to replace and upgrade our PDU’s there. More information will follow when we have a maintenance window booked.

We believe it will have the reliability that you would expect. Our back up servers are now situated in the DC’s that we will continue to manage in London and Dublin.

We are very sorry by the disruptions caused to your services most of which were beyond our control. Please get in touch with your account manager if you have any further questions.

Posted Jul 21, 2022 - 09:06 BST

Resolved

Services have been confirmed operational and stable for the past few hours.
Some customers may still be experiencing some problems with individual configuration issues, but each of those will be dealt with on an individual basis and are not caused by this incident.
Post-mortem to follow in due course. Thank you for your patience and understanding.

Posted Jul 11, 2022 - 17:49 BST

Monitoring

We have added additional resource to our Gateways improving the quality of inbound and outbound external calls, we are continuing to monitor.

Posted Jul 11, 2022 - 12:26 BST

Update

We have seen an improvement in the number of external calls able to connect. We are seeing quality issues due to high call volume. We are continuing to work on improving this issue.

Posted Jul 11, 2022 - 12:11 BST

Update

We are continuing to work on a fix for this issue. We will provide a further update as soon as possible.

Posted Jul 11, 2022 - 11:42 BST

Identified

The issue has been identified and a fix is being implemented.

Posted Jul 11, 2022 - 10:38 BST

Investigating

We are aware of some users experiencing issues with inbound and outbound calls this morning and are currently investigating

Posted Jul 11, 2022 - 09:14 BST

This incident affected: SIP Registrar and SIP Gateway.