Date of the incident: July 4th, 2022 14:00 CET
Duration: 25 calendar hours
Affected services: Job processing and imposition taking longer times than usual
Issue timeline:
- July 4th 14:00 CET - after a cloud certificate renewal is missed, the number of microservices stops scaling up, causing processing performance to decrease. Order backlog starts increasing.
- July 4th afternoon/night - several attempts to manually start new microservices fail. The imposition microservice becomes the main bottleneck.
- July 5th morning - Devops troubleshooting starts checking system certificates
- July 5th mid day - a certificate out of date is found. It is updated on July 5th 14h00 CET, immediately allowing microservices to scale up properly and causing backlog to rapidly decrease.
- July 5th afternoon - several actions are taken to review the backlog, in particular jobs/orders that may need to be re-run or re-fetched.
Root cause:
Due to a problem with a cloud service certificate renewal, Site Flow services could not scale up as usual to manage the job processing workload. As a consequence, the system performance was degraded.
Path forward:
Necessary measures have been taken to prevent the certificate renewal to failed again. That includes improving the certificate auto-renewal process.
Terms/Glossary
- “Maintenance Event” means maintenance of the Services that require its interruption;
- “Scheduled Maintenance” means a Maintenance Event in respect of which HP has given the Customer at least twenty-four (24) hours prior written notice;
- "System degradation" means that the customer is unable to utilize Site Flow as usual so his business is being impacted but the situation is not yet an Outage.
- “Incident” means any set of circumstances resulting in an Outage;
- “Outage” means, that the Customer is unable to access all parts of the Site Flow Subscription service via both API and web-browser log-in, AND all transmitted orders directed to the Customer’s Site Flow account are not being acknowledged (i.e. the entire Site Flow service is “down”).
- “Working Hour” means, the hours between Monday through Friday 09:00-17:00 local time, excluding national and HP designated holidays.
- "Calendar hours" are regular full-day hours and they cover everything around 24x7. Correspondance with working hours will depend on the actual customer timezone.