Microsoft has patched a bug that caused Exchange Server 2016 and 2019 to queue all emails rather than deliver them. The coding error relates to a datOK check problem in how Exchange stores all dates after December 31 2021. Making matters more complicated is that the error message points the finger, wrongly, at the Microsoft Scan Engine.
The error is caused by the way the Scan engine stores numbers. It converts dates into the format YYMMDDHHMM and stores them as a 32-bit number. The problem is that the largest number it can store is 2147483648. That means it can only deal with dates up to and including December 31, 2021. When the clock ticks over to 2022, the first two numbers are 22, which breaks the code.
With the malware engine not loading, the Exchange Server fails to work. If it cannot scan emails for malware, it won’t deliver them. Cue lots of users sending test messages to see if their email is broken. This is likely to be followed by emails to the support teams who, due to the error, won’t get them. That is likely to lead to increased calls to IT support.
What has Microsoft provided to fix this?
Microsoft has provided a note on the Exchange Team Blog. It details what errors administrators can expect to see in the Application event log on the Exchange Server. They are events 5300 and 1105 (FIPFS).
There are two fixes – automated and manual.
- Automated: To use this solution, users need to download a script linked from the blog. They then have to set the execution policy for PowerShell scripts before running the script on every Exchange Server in their organisation.
- Manual: Customers who don’t want to download and run a script can apply the changes manually to each Exchange Server. The steps required to do this are also detailed in the blog. It requires a version check of the Exchange Server, deletion of several files, application of the update, and restart.
Irrespective of which option customers choose, this is not a quick fix. The blog states: “Implementation of the solution requires customer actions, and it will take some time to make the necessary changes, download the updated files, and clear the transport queues.”
What will concern some customers is that the blog has changed several times since it was first issued a couple of days ago. Several of those changes relate to what is needed to apply the changes. It makes it seem that the blog was rushed out without enough checking.
Enterprise Times: What does this mean?
Coding errors happen. In this case, it was wholly avoidable. When this code was written, it’s hard to see why the developer would not have realised it would fail in 2022. Additionally, Microsoft’s failure to detect this in testing is a surprise. It seems to be a case that once code is written and passes a syntax text, it is ignored. Otherwise, this would have been updated by now.
In solving the problem, it is not clear if Microsoft has rewritten the code to store dates as 64-bit numbers or just fudged the way the date is stored.
Reading through the comments at the bottom of the blog shows that many people have struggled to apply the patches successfully. It will be interesting to see if those offering Managed Exchange services offer customers any rebate for failing their SLA or if they can get away with just blaming Microsoft. Given that the first version of the blog and patch was released on January 1, any customers still suffering are likely to invoke a breach of SLA.