Anton Gostev wrote a wonderful article on the Spectre and Meltdown attack in his weekly forums digest.
I will repost it here for those who haven’t subscribed yet.
Veeam Community Forums Digest | January 1 – January 7, 2018 |
THE WORD FROM GOSTEV |
By now, most of you have probably already heard of the biggest disaster in the history of IT – Meltdown and Spectre security vulnerabilities which affect all modern CPUs, from those in desktops and servers, to ones found in smartphones. Unfortunately, there’s much confusion about the level of threat we’re dealing with here, because some of the impacted vendors need reasons to explain the still-missing security patches. But even those who did release a patch, avoid mentioning that it only partially addresses the threat. And, there’s no good explanation of these vulnerabilities on the right level (not for developers), something that just about anyone working in IT could understand to make their own conclusion. So, I decided to give it a shot and deliver just that.First, some essential background. Both vulnerabilities leverage the “speculative execution” feature, which is central to the modern CPU architecture. Without this, processors would idle most of the time, just waiting to receive I/O results from various peripheral devices, which are all at least 10x slower than processors. For example, RAM – kind of the fastest thing out there in our mind – runs at comparable frequencies with CPU, but all overclocking enthusiasts know that RAM I/O involves multiple stages, each taking multiple CPU cycles. And hard disks are at least a hundred times slower than RAM. So, instead of waiting for the real result of some IF clause to be calculated, the processor assumes the most probable result, and continues the execution according to the assumed result. Then, many cycles later, when the actual result of said IF is known, if it was “guessed” right – then we’re already way ahead in the program code execution path, and didn’t just waste all those cycles waiting for the I/O operation to complete. However, if it appears that the assumption was incorrect – then, the execution state of that “parallel universe” is simply discarded, and program execution is restarted back from said IF clause (as if speculative execution did not exist). But, since those prediction algorithms are pretty smart and polished, more often than not the guesses are right, which adds significant boost to execution performance for some software. Speculative execution is a feature that processors had for two decades now, which is also why any CPU that is still able to run these days is affected.Now, while the two vulnerabilities are distinctly different, they share one thing in common – and that is, they exploit the cornerstone of computer security, and specifically the process isolation. Basically, the security of all operating systems and software is completely dependent on the native ability of CPUs to ensure complete process isolation in terms of them being able to access each other’s memory. How exactly is such isolation achieved? Instead of having direct physical RAM access, all processes operate in virtual address spaces, which are mapped to physical RAM in the way that they do not overlap. These memory allocations are performed and controlled in hardware, in the so-called Memory Management Unit (MMU) of CPU.At this point, you already know enough to understand Meltdown. This vulnerability is basically a bug in MMU logic, and is caused by skipping address checks during the speculative execution (rumors are, there’s the source code comment saying this was done “not to break optimizations”). So, how can this vulnerability be exploited? Pretty easily, in fact. First, the malicious code should trick a processor into the speculative execution path, and from there, perform an unrestricted read of another process’ memory. Simple as that. Now, you may rightfully wonder, wouldn’t the results obtained from such a speculative execution be discarded completely, as soon as CPU finds out it “took a wrong turn”? You’re absolutely correct, they are in fact discarded… with one exception – they will remain in the CPU cache, which is a completely dumb thing that just caches everything CPU accesses. And, while no process can read the content of the CPU cache directly, there’s a technique of how you can “read” one implicitly by doing legitimate RAM reads within your process, and measuring the response times (anything stored in the CPU cache will obviously be served much faster). You may have already heard that browser vendors are currently busy releasing patches that makes JavaScript timers more “coarse” – now you know why (but more on this later).As far as the impact goes, Meltdown is limited to Intel and ARM processors only, with AMD CPUs unaffected. But for Intel, Meltdown is extremely nasty, because it is so easy to exploit – one of our enthusiasts compiled the exploit literally over a morning coffee, and confirmed it works on every single computer he had access to (in his case, most are Linux-based). And possibilities Meltdown opens are truly terrifying, for example how about obtaining admin password as it is being typed in another process running on the same OS? Or accessing your precious bitcoin wallet? Of course, you’ll say that the exploit must first be delivered to the attacked computer and executed there – which is fair, but here’s the catch: JavaScript from some web site running in your browser will do just fine too, so the delivery part is the easiest for now. By the way, keep in mind that those 3rd party ads displayed on legitimate web sites often include JavaScript too – so it’s really a good idea to install ad blocker now, if you haven’t already! And for those using Chrome, enabling Site Isolation feature is also a good idea.OK, so let’s switch to Spectre next. This vulnerability is known to affect all modern CPUs, albeit to a different extent. It is not based on a bug per say, but rather on a design peculiarity of the execution path prediction logic, which is implemented by so-called Branch Prediction Unit (BPU). Essentially, what BPU does is accumulating statistics to estimate the probability of IF clause results. For example, if certain IF clause that compares some variable to zero returned FALSE 100 times in a row, you can predict with high probability that the clause will return FALSE when called for the 101st time, and speculatively move along the corresponding code execution branch even without having to load the actual variable. Makes perfect sense, right? However, the problem here is that while collecting this statistics, BPU does NOT distinguish between different processes for added “learning” effectiveness – which makes sense too, because computer programs share much in common (common algorithms, constructs implementation best practices and so on). And this is exactly what the exploit is based on: this peculiarity allows the malicious code to basically “train” BPU by running a construct that is identical to one in the attacked process hundreds of times, effectively enabling it to control speculative execution of the attacked process once it hits its own respective construct, making one dump “good stuff” into the CPU cache. Pretty awesome find, right?But here comes the major difference between Meltdown and Spectre, which significantly complicates Spectre-based exploits implementation. While Meltdown can “scan” CPU cache directly (since the sought-after value was put there from within the scope of process running the Meltdown exploit), in case of Spectre it is the victim process itself that puts this value into the CPU cache. Thus, only the victim process itself is able to perform that timing-based CPU cache “scan”. Luckily for hackers, we live in the API-first world, where every decent app has API you can call to make it do the things you need, again measuring how long the execution of each API call took. Although getting the actual value requires deep analysis of the specific application, so this approach is only worth pursuing with the open-source apps. But the “beauty” of Spectre is that apparently, there are many ways to make the victim process leak its data to the CPU cache through speculative execution in the way that allows the attacking process to “pick it up”. Google engineers found and documented a few, but unfortunately many more are expected to exist. Who will find them first? Of course, all of Now, the most important part – what do we do about those vulnerabilities? Having said that, there are Last but not least, do know that while those patches fully address |