Sporadic platform latency
Incident Report for Flash
Postmortem

FLASH OS SYSTEM POST-MORTEM

The Flash On-Call and Site Reliability teams monitored through the evening, night, and early morning rush hour today without issue. All systems remain healthy.

A detailed incident summary from this weekend is available here.

Thank you.
Anthony Broad-Crawford, CTO

Posted Aug 14, 2023 - 08:32 CDT

Resolved
FLASH OS SYSTEM UPDATE

The teams have monitored through the evening, night, and early morning without issue. All systems are healthy.

We will be authoring a detailed incident response document shortly. We currently target no later than Monday 9 am CT for its publication. Please don't hesitate to contact us and let us know if you are experiencing any issues at your site.

Thank you.

Anthony Broad-Crawford, CTO
Posted Aug 13, 2023 - 07:50 CDT
Monitoring
FLASH OS SYSTEM UPDATE

I wanted to update you on our platform's performance and where we stand now.

· The system should be fully returning to normal—no further sporadic latency.

· Working with our partner, Microsoft, we made several patches to our system to restore typical performance characteristics.

· The fixes focused primarily on 3rd party vendors.

The team will continue to monitor through the evening and into tomorrow morning.

We will provide more updates should we see a change in site performance or latency. However, at this time, that is not anticipated.

We will be authoring a detailed incident response document shortly. We currently target no later than Monday 9 am CT for its publication.

Please don't hesitate to contact us and let us know if you are experiencing any issues at your site.

Thank you, and our apologies.

Anthony Broad-Crawford, CTO
Posted Aug 12, 2023 - 17:22 CDT
Update
We are continuing to work with Microsoft to investigate the issue and improve system latency.
Posted Aug 12, 2023 - 14:49 CDT
Investigating
FLASH OS SYSTEM UPDATE

I wanted to provide an update on our platform’s performance, what happened yesterday, and where we stand today:

· We discovered Microsoft installed a routine patch on their cloud servers yesterday.

· Unfortunately, their patch conflicts with another piece of their own technology, and it is impacting our site performance.

· Internally, Flash did not perform any site maintenance in and around this time, nor any maintenance last week.

· Believing the Microsoft patch was the root cause of the issues, we engaged with Microsoft and have since rolled back their patch to begin regaining platform stability.

· We monitored through the entire night and observed that the site is taking traffic as normal but experiencing occasional latency (taking seconds when it usually takes sub-second).

· We remain actively engaged with Microsoft to solve for any remaining latency.

· Other than occasional slowness, the platform should be performing as expected.

We will continue to provide more updates as we gain additional insight. Please don’t hesitate to contact us and let us know if you are experiencing any issues at your site.

Thank you, and our apologies.

Anthony Broad-Crawford, CTO
Posted Aug 12, 2023 - 10:05 CDT
This incident affected: FlashValet and FlashPARCS.