The Wannacry and NotPetya bug - CVE-2017-0144 SMB Remote Execution RCE
I've seen a lot of descriptions of the bug that Wannacry and now NotPetya leveraged to worm its way into the spotlight, but most of them are pretty vague. Descriptions range from "logic error" and "buffer overflow" to Trend Micro's post that actually called out the actual bug - a casting error. But even the Trend Micro post, which went into pretty good detail, didn't show the actual code that the error resided in, only a high level disassembly.
There also seems to be some disagreement on where the bug resides in Windows 10. One of the first people to analyze the Wannacry bug and do a write up on it posted his analysis to Reddit and was excoriated in the comments over it because of an error in where he thought the bug was in Windows 10. Long story short, someone insisted very strongly to him that his analysis was wrong, and he eventuallytook down the analysis entirely changed the URL of the analysis and posted an errata regarding the location of the vulnerable code path, which can be found at https://zerosum0x0.blogspot.com/2017/06/eternalblue-exploit-analysis-and-port.html
But his analysis was mostly correct. He said there was an error in SrvOS2FeaListSizeToNt regarding an attacker controlled DWORD value that is then re-calculated and re-written as a WORD sized value. This is 100% true, as I will show you. He also said that SrvOS2FeaListSizeToNt is inlined in Windows 10 versions of srv.sys. This is also 100% true and I will show you that as well.
His only error was that he chose a screenshot of the wrong code to show the inline in Windows 10, and that was enough to make people not trust his analysis.
So let's look at the bug. Keep in mind I'm not that familiar with SMB and I'm only doing static analysis, no live analysis, so generally I don't know what these data structures or members are supposed to be. So things may be named weirdly, sorry!
First, let's look at the patch that fixed the bug:
The image with red highlighting is the pre-patch binary, the image with green highlighting is the post-patch binary. You can see the change is that the subtraction is changed from WORD sized registers to DWORD sized registers, and the target that we write to is changed from WORD to DWORD as well. Seeing just the patch by itself with no other knowledge about this function implies that there's an integer overflow that was fixed by using a DWORD sized integer rather than a WORD sized one. But that's not *quite* what's going on here.
If we take a step back and look at the beginning of the function we can get an idea of what's going on. rcx (arg1) points to some structure. *arg1 is stuffed into esi as a DWORD sized element (this is important), and &(arg1 + 4) is put into rbx and r10. If &(arg1 + 4) >= &arg1 + *arg1, we do some stuff.
So arg1's structure has at least two fields so far - one that is clearly an offset to some point within the structure, and one that appears to be some sort of variable sized data related to that offset. Since the function name is SrvOs2FeaListToNt, I'm assuming this structure is a FeaList and it conteans Fea structures.
struct FeaList {
DWORD offset
~Fea structures~
}
After the first block, we see some looping logic. You can see that structures within arg1 are being walked. The structure seems to contain a byte and WORD sized size value, and some data that corresponds to those sizes.
struct Fea {
BYTE unk (but 0x7F is an invalid value for this field, maybe flags?)
BYTE blob1Size
BYTE blob2Size
~blobs~
}
I know it's hard to see because the arrows got cut off, but if the structure walking ends in a certain way, the address of the last structure is used to calculate a new offset value and stuff it back into arg1->offset. So now we know what the offset is - an offset to the last entry in the array. During this calculation, the total size needed to store the elements after they're translated is stored in r11, and this is used to allocate memory a little bit later.
And then we get to our bug. arg1->offset is written to as a WORD which, if you remember from before, was read as a DWORD sized value. But the issue isn't an integer overflow. Rather, the issue is leaving values in the high WORD of the offset value if we've started with a size greater than MAX WORD and ended with a size less than it.
For instance, if our starting offset is 0001 0000 and our calculated offset is 0000 FFFD, the offset in arg1 is written as 0001 FFFD. This is a problem! Not just because our offset is wrong, but because that offset is used in a loop when copying those structures into a new structure.
Reportedly, in other versions of windows this code is in the function SrvOs2FeaListSizeToNt, but in Windows 10 it has been inlined into this function, just like the guy on Reddit said.
Some other writeups have said the offset value is used in a memcpy/memove in SrvOs2FeaToNt, but that's not *exactly* the case. A loop goes through what it expects to be the structures in the list passed in arg1 using the offset that was calculated. For each structure it calculates its size and translates it and copies it into our new list that we've allocated with SrvAllocateNonPagedPool (which is allocated with a correct size and not our offset value, by the way). So it's copying one struct at a time and continues to copy after it walks off the end of the list of actual structures. This is how the overflow occurs. Since the overflow is copying data off the end of the list it's copying from, some heap grooming would have to occur before this overflow in order to control the data being copied.
The actual copying occurs in SrvOs2FeaToNt which is a pretty unremarkable function. You can see that it pulls data out of the structure passed into it and basically copies it into the other structure passed into it.
So that's the Wannacry bug, the simple, subtle, and nuanced casting error that set the world on fire 20 years after it was made.
There also seems to be some disagreement on where the bug resides in Windows 10. One of the first people to analyze the Wannacry bug and do a write up on it posted his analysis to Reddit and was excoriated in the comments over it because of an error in where he thought the bug was in Windows 10. Long story short, someone insisted very strongly to him that his analysis was wrong, and he eventually
But his analysis was mostly correct. He said there was an error in SrvOS2FeaListSizeToNt regarding an attacker controlled DWORD value that is then re-calculated and re-written as a WORD sized value. This is 100% true, as I will show you. He also said that SrvOS2FeaListSizeToNt is inlined in Windows 10 versions of srv.sys. This is also 100% true and I will show you that as well.
His only error was that he chose a screenshot of the wrong code to show the inline in Windows 10, and that was enough to make people not trust his analysis.
So let's look at the bug. Keep in mind I'm not that familiar with SMB and I'm only doing static analysis, no live analysis, so generally I don't know what these data structures or members are supposed to be. So things may be named weirdly, sorry!
First, let's look at the patch that fixed the bug:
The image with red highlighting is the pre-patch binary, the image with green highlighting is the post-patch binary. You can see the change is that the subtraction is changed from WORD sized registers to DWORD sized registers, and the target that we write to is changed from WORD to DWORD as well. Seeing just the patch by itself with no other knowledge about this function implies that there's an integer overflow that was fixed by using a DWORD sized integer rather than a WORD sized one. But that's not *quite* what's going on here.
If we take a step back and look at the beginning of the function we can get an idea of what's going on. rcx (arg1) points to some structure. *arg1 is stuffed into esi as a DWORD sized element (this is important), and &(arg1 + 4) is put into rbx and r10. If &(arg1 + 4) >= &arg1 + *arg1, we do some stuff.
So arg1's structure has at least two fields so far - one that is clearly an offset to some point within the structure, and one that appears to be some sort of variable sized data related to that offset. Since the function name is SrvOs2FeaListToNt, I'm assuming this structure is a FeaList and it conteans Fea structures.
struct FeaList {
DWORD offset
~Fea structures~
}
After the first block, we see some looping logic. You can see that structures within arg1 are being walked. The structure seems to contain a byte and WORD sized size value, and some data that corresponds to those sizes.
struct Fea {
BYTE unk (but 0x7F is an invalid value for this field, maybe flags?)
BYTE blob1Size
BYTE blob2Size
~blobs~
}
I know it's hard to see because the arrows got cut off, but if the structure walking ends in a certain way, the address of the last structure is used to calculate a new offset value and stuff it back into arg1->offset. So now we know what the offset is - an offset to the last entry in the array. During this calculation, the total size needed to store the elements after they're translated is stored in r11, and this is used to allocate memory a little bit later.
And then we get to our bug. arg1->offset is written to as a WORD which, if you remember from before, was read as a DWORD sized value. But the issue isn't an integer overflow. Rather, the issue is leaving values in the high WORD of the offset value if we've started with a size greater than MAX WORD and ended with a size less than it.
For instance, if our starting offset is 0001 0000 and our calculated offset is 0000 FFFD, the offset in arg1 is written as 0001 FFFD. This is a problem! Not just because our offset is wrong, but because that offset is used in a loop when copying those structures into a new structure.
Reportedly, in other versions of windows this code is in the function SrvOs2FeaListSizeToNt, but in Windows 10 it has been inlined into this function, just like the guy on Reddit said.
Some other writeups have said the offset value is used in a memcpy/memove in SrvOs2FeaToNt, but that's not *exactly* the case. A loop goes through what it expects to be the structures in the list passed in arg1 using the offset that was calculated. For each structure it calculates its size and translates it and copies it into our new list that we've allocated with SrvAllocateNonPagedPool (which is allocated with a correct size and not our offset value, by the way). So it's copying one struct at a time and continues to copy after it walks off the end of the list of actual structures. This is how the overflow occurs. Since the overflow is copying data off the end of the list it's copying from, some heap grooming would have to occur before this overflow in order to control the data being copied.
The actual copying occurs in SrvOs2FeaToNt which is a pretty unremarkable function. You can see that it pulls data out of the structure passed into it and basically copies it into the other structure passed into it.
So that's the Wannacry bug, the simple, subtle, and nuanced casting error that set the world on fire 20 years after it was made.
Comments
Post a Comment