Issue triggering Contacts via Journeys
Incident Report for Autopilot
Postmortem

Last week we experienced several hardware failures with our Cloud computing provider which led to extreme pressure on the servers which handle our task backlog.

While we restored the hardware quickly, the backlog was difficult for the system to overcome which led to several hours of journey delays, slow responses and other related issues.

To prevent this issue happening again we have taken the following actions:

  • Split the load of the queueing servers into 3, which gives significantly more power to the system to cope at times of peak demand.
  • Modified parts of our code to place less impact on the system.
  • Are testing a way to split the load even further so that more important tasks such as journey and email tasks are given dedicated processing hardware.

In the medium term we have several plans close to completion which permanently remove the chance of these issues occurring.

Posted Jun 15, 2020 - 00:53 UTC

Resolved
This incident has been resolved.
Posted Jun 14, 2020 - 20:24 UTC
Update
We are currently working through the rest of the processing backlog, including feed items. Our system is getting back to normal, we are continuing to monitor the situation.
Posted Jun 12, 2020 - 14:06 UTC
Monitoring
Our system is getting back to normal, we are continuing to monitor the situation.
Posted Jun 12, 2020 - 00:23 UTC
Investigating
We are currently investigating this issue.
Posted Jun 11, 2020 - 20:21 UTC
This incident affected: Application.