Meltdown: implement user/kernel page table separation.
On Intel CPUs which speculate past user/supervisor page permission checks,
use a separate page table for userspace with only the minimum of kernel code
and data required for the transitions to/from the kernel (still marked as
supervisor-only, of course):
- the IDT (RO)
- three pages of kernel text in the .kutext section for interrupt, trap,
and syscall trampoline code (RX)
- one page of kernel data in the .kudata section for TLB flush IPIs (RW)
- the lapic page (RW, uncachable)
- per CPU: one page for the TSS+GDT (RO) and one page for trampoline
stacks (RW)
When a syscall, trap, or interrupt takes a CPU from userspace to kernel the
trampoline code switches page tables, switches stacks to the thread's real
kernel stack, then copies over the necessary bits from the trampoline stack.
On return to userspace the opposite occurs: recreate the iretq frame on the
trampoline stack, switch stack, switch page tables, and return to userspace.
mlarkin@ implemented the pmap bits and did 90% of the debugging, diagnosing
issues on MP in particular, and drove the final push to completion.
Many rounds of testing by naddy@, sthen@, and others
Thanks to Alex Wilson from Joyent for early discussions about trampolines
and their data requirements.
Per-CPU page layout mostly inspired by DragonFlyBSD.
ok mlarkin@ deraadt@
22 files changed: