Date of the incident: July 4th, 2022 14:00 CET 


Duration: 25 calendar hours


Affected services: Job processing and imposition taking longer times than usual


Issue timeline:

  • July 4th 14:00 CET - after a cloud certificate renewal is missed, the number of microservices stops scaling up, causing processing performance to decrease. Order backlog starts increasing.
  • July 4th afternoon/night - several attempts to manually start new microservices fail. The imposition microservice becomes the main bottleneck.
  • July 5th morning - Devops troubleshooting starts checking system certificates
  • July 5th mid day - a certificate out of date is found. It is updated on July 5th 14h00 CET, immediately allowing microservices to scale up properly and causing backlog to rapidly decrease.
  • July 5th afternoon - several actions are taken to review the backlog, in particular jobs/orders that may need to be re-run or re-fetched.

 

Root cause:

Due to a problem with a cloud service certificate renewal, Site Flow services could not scale up as usual to manage the job processing workload. As a consequence, the system performance was degraded.

Path forward:

Necessary measures have been taken to prevent the certificate renewal to failed again. That includes improving the certificate auto-renewal process. 


Terms/Glossary

  • Maintenance Event” means maintenance of the Services that require its interruption;
  • Scheduled Maintenance” means a Maintenance Event in respect of which HP has given the Customer at least twenty-four (24) hours prior written notice;
  • "System degradation" means that the customer is unable to utilize Site Flow as usual so his business is being impacted but the situation is not yet an Outage.
  • Incident” means any set of circumstances resulting in an Outage;
  • Outage” means, that the Customer is unable to access all parts of the Site Flow Subscription service via both API and web-browser log-in, AND all transmitted orders directed to the Customer’s Site Flow account are not being acknowledged (i.e. the entire Site Flow service is “down”).
  • Working Hour” means, the hours between Monday through Friday 09:00-17:00 local time, excluding national and HP designated holidays.
  • "Calendar hours" are regular full-day hours and they cover everything around 24x7. Correspondance with working hours will depend on the actual customer timezone.