In the midst of the late December holiday rush, employers were facing a thin talent market complicated by pandemic-driven uncertainty. Then, adding insult to injury, timekeeping and payroll went down for many.
On Saturday, Dec. 11, 2021, UKG, the parent company of workforce management platform Kronos, notified clients using its Kronos Private Cloud product of a "ransomware incident." As a result of the attack, employers across a swath of industries experienced a weekslong outage affecting both timekeeping and payroll.
For UMass Memorial Health, one of the largest health systems in Massachusetts, the outage had an immediate impact.
For more than a month, the organization relied on backup timekeeping methods. It was not until Jan. 27, 2022, that UMass resumed using Kronos as the timekeeping source for its payroll, and even then, the organization noted discrepancies. "The first what I would call 'clean' payroll would have been the Feb. 3 payroll," said Sergio Melgar, executive vice president and chief financial officer of the health system.
In an interview, Melgar provided HR Dive a detailed timeline of events, from the moment UMass recognized Kronos' services went down, to his communication with executives and Kronos representatives, to the eventual restoration of services. He also discussed UMass' future plans to respond to similar incidents and the lessons learned from what Melgar said he described to UMass executives as "the most serious problem we have ever faced."
'Hopefully it would be up in short order'
Melgar's team first became aware of the attack on Sunday, Dec. 12, the day after it occured. He said he was part of a group that received an email indicating Kronos was down. Because Melgar oversees UMass' finance and IT departments, the outage directly affected areas of the company under his leadership.
"The system can go down at other times for different reasons," he said. "It didn't necessarily mean anything that the system was down."
"Hopefully," they thought, "it would be up in short order."
But when another email on Sunday confirmed that things were still down, "that was not a good sign," Melgar said. After making some calls Sunday afternoon, he confirmed that Kronos was the source of the outage, not UMass. Kronos was on the phone with UMass' IT department that same day.
A timeline of the Kronos cybersecurity attack and UMass' response
Saturday, Dec. 11, 2021
UKG, the parent company of workforce management platform Kronos, notifies clients of a "ransomware incident."
Thursday, Dec. 16, 2021
UMass runs payroll for the pay period ending Dec. 11, using hours-worked data from a previous period.
Jan. 4, 2022
A labor union representing some UMass employees advises members to keep a record of hours worked.
Jan. 27, 2022
UMass resumes using Kronos as the timekeeping source for its payroll, but discrepancies persist.
Feb. 3, 2022
UMass runs its first "clean" payroll since the attack.
"I was hoping it would be an infrastructure problem [or] that they were having some certain hardware issues," Melgar said. "I understood that if it was not a hardware issue, that the alternative is a cyber software problem, in which case may be the worst of all situations."
The course of the day's events made it clearer what UMass was facing, however. Kronos informed UMass that it had shut down its system because it had noticed some irregularities, according to Melgard. "I would say I had pretty high confidence that it was a cyberattack by the end of Sunday," he said.
The backup plan
UMass' immediate attention turned to payroll processing for the payroll period ending Dec. 11, the day before UKG's disclosure. UMass is a weekly payroll organization, Melgar explained, so it would need to transact pay to employees the following Thursday, Dec. 16.
"Effectively, we were trying to understand, how quickly can you back me back up? That was the first thing," Melgar said of his initial outreach to Kronos. "And so I needed to know, are you going to have a system up? And they basically were telling us no, the system is not going to be up."
UMass had to improvise a way to run payroll for more than 16,000 employees without data on what hours they worked. Few options were available, Melgar said. The health system ultimately took the last finished payroll it had on record and duplicated it, with some adjustments for staff hires and departures.
"We were making decisions that, in retrospect, I think would be considered the best option given the difficult situation we were in."
Executive vice president and chief financial officer, UMass Memorial Health
Prior to the outage, UMass workers would clock in either manually or remotely, through an app. Kronos would gather that information, then transmit it back to UMass upon the completion of payroll so the employer could make adjustments. UMass would then transmit the information to its enterprise resource planning, or ERP, system, which runs payments.
Essentially, while UMass could still run the payroll by itself, that would involve some degree of guesswork. "At that point, I knew we could pay people because we actually went ahead and did the effectively cloned payrolls on the 16th. And we [knew] we could continue to do that. But to get an accurate payroll, I needed Kronos to be active."
UMass knew these manual procedures were designed as short-term fixes, not long-term solutions, Melgar said. That's because of the complexity of the typical healthcare payroll; it's "maybe the most complicated payroll that exists," he continued. "And it can be incredibly cumbersome, especially if you're doing it weekly."
Asked whether UMass employees were still clocking in using an app or writing down their clock-in and clock-out times manually, Melgar said the organization took an "all of the above" approach. Employees were asked to record those times as often as possible and write them down on paper so that officials had a source to reference when they went back to fix any issues.
All the while, Melgar was unaware of the outage's true extent in the broader business community: "The one thing I wish I knew a little bit better early on was the totality of the problem across the country and the world," he said. "It was a while before we found out that there were thousands of employers that were put in this situation."
That lack of awareness meant that Melgar and his team could not communicate to employees the magnitude of the problems they were experiencing. Employees, he said, began to think UMass had failed them.
In a Jan. 4 blog post, SHARE, a labor union representing some UMass employees, said staff had reported "over 11,000 paycheck errors." SHARE advised members to keep track of hours themselves in addition to documenting them for UMass.
"It's natural [that] people were looking inward and thought, 'Why aren't you doing something different?' — hoping that we would have the immediate solution," Melgar continued. "I think we were trying to do all of the right things in as quick a time frame as possible."
Fixing discrepancies: 'It can become quite a mess'
With Kronos functionality restored in late January, UMass went about fixing discrepancies in the restored data. To illustrate what his team found, Melgar explained the different buckets into which employees in the health system may fall. Roughly one-third of UMass workers are classified as exempt employees, he said. The other two-thirds are a combination of either nonexempt, hourly workers or nonexempt, hourly and variable pay employees who work different shifts at different times.
"In a complex environment like ours, people could have shift differentials," Melgar said. "You have overtime that kicks in at different points in time. You could have a bonus for shifts. You could have all the different variables that affect the pay that somebody gets. And if you don't have the data, you cannot calculate it."
It was one thing to fix discrepancies for employees on variable schedules, but even calculations for exempt employees could be problematic, Melgar explained. Because the outage occurred during a holiday period, such employees were potentially using accrued paid time off or vacation time. If those hours were subtracted from the wrong source, it could leave workers' leave balances incorrect.
Exempt employees also may have taken unpaid leave during that time. "I know this for a fact, so I'm not giving you a hypothetical," Melgar continued. "Even though they were exempt, [some] actually were paid short on their check because they happened to have had only a partial week the weeks that we ended up [cloning]."
Melgar cited the health system's complex payroll situation among the reasons he insisted that UMass be "at the front of the line" for restoration. He said he felt "pretty confident" UMass was in fact given that deference. "Let's say, if there were 2,000 clients, I'm pretty confident that we were within the first 10 that got their system back. Not fully, but at least in a usable format."
How 'joint leadership,' 'joint accountability' helped
In the last five years, UMass had fully implemented Epic, a clinical system used by healthcare providers. The process took some two to three years to complete, Melgar said, and it involved heavy collaboration between the organization's IT, HR and finance departments. Melgar said he believes this experience prepared UMass staff to coordinate around objectives like the response to the Kronos outage.
"In order for either the clinical or for the revenue side to have optimal performance, they have to have full integration and cooperation with the IT folks so that, effectively, everybody has a common, understood responsibility for the outcomes," he continued. "What we had basically was joint leadership that accepted joint accountability for the process."
After the outage, Melgar got together with UMass' CIO and senior vice president of finance for joint meetings, later adding other staff to their calls. They created a resource group around the incident that pulled from the IT, finance and HR departments. Members of the group worked side by side in call centers to solve the problem.
They worked thoughtfully and collaboratively, Melgar said. "We were making decisions that, in retrospect, I think would be considered the best option given the difficult situation we were in."
Planning for future incidents
Following the ransomware attack, Melgar said UMass is still a Kronos customer; "We have to be."
To replicate the system would take years, Melgar explained. "Because of the complexity of the payroll, you have to basically have another software implementation. It would literally take two years to do."
Melgar said that, due to his understanding that UMass received a fairly accelerated restoration of its system, he believed that Kronos provided its share of support. "Do I wish it was a week later or two weeks later as opposed to weeks later? Yeah, absolutely. But not knowing how bad the damage was specifically, because I'm not there, I don't know whether I can say if they did absolutely their best, or they didn't, without having that information."
"There's some employees that still believe that there's a problem, or that we failed them."
Executive vice president and chief financial officer, UMass Memorial Health
Asked how UMass is planning to respond to similar events in the future, Melgar divulged that it is working on an upgrade to its ERP system, which has a timekeeping element within it that could serve as a backup. But it will take two years before the system is up and running.
And even then, it won't be perfect, Melgar said, again noting the complexity of UMass' payroll. But it's better than nothing: "If we have it as a backup at least, we might be able to get to it a little bit smoother and not necessarily clone a payroll, which is part of what creates the problems that we ended up having to clean up."
For employers that want to prepare for such exigencies, Melgar recommended a focus on joint leadership. Executives in HR, IT, finance or similar operational roles may want to gather different groups together and inform leaders about the enormity of such problems when they occur. He also said executives need to advocate for resolving problems and support employees.
"At the end of the day, ultimately you need to be able to support the employee so that they feel confident that they're getting paid correctly," Melgar said.
Executives, he continued, need to know that employees may not understand the extent of incidents like the Kronos outage. Leaders may attempt to convey that message to employees, but this is not an easy task.
"There's some employees that still believe that there's a problem, or that we failed them," Melgar said. "You're not going to be able to convince everybody. That's just the nature of human beings."