amd64: simplify TSC synchronization testing
Computing a per-CPU TSC skew value is error-prone, especially on
multisocket machines and VMs. My best guess is that larger latencies
appear to the current skew measurement test as TSC desync, and so the
TSC is demoted to a kernel timecounter on these machines or marked
non-monotonic.
This patch eliminates per-CPU TSC skew values. Instead of trying to
measure and correct for TSC desync we only try to detect desync, which
is less error-prone. This approach should allow a wider variety of
machines to use the TSC as a timecounter when running OpenBSD.
In the new sync test, both CPUs repeatedly try to detect whether their
TSC is trailing the other CPU's TSC. The upside to this approach is
that it yields no false positives. The downside to this approach is
that it takes more time than the current skew measurement test. Each
test round takes 1ms, and we run up to two rounds per CPU, so this
patch slows boot down by 2ms per AP.
If any CPU fails the sync test, the TSC is marked non-monotonic and a
different timecounter is activated. The TC_USER flag remains intact.
There is no middle ground where we fall back to only using the TSC in
the kernel.
Before running the test, we check for the IA32_TSC_ADJUST register and
reset it if necessary. This is a trivial way to work around firmware
bugs that desync the TSC before we reach the kernel. Unfortunately,
at the moment this register appears to only be available on Intel
processors. I cannot find an equivalent but differently-named MSR for
AMD processors.
Because there is no per-CPU skew value, there is also no concept of
TSC drift anymore.
Miscellaneous notes:
- This patch adds a new timecounter utility function, tc_reset_quality().
Used after sync test failure to mark the TSC non-monotonic.
- I have left TSC_DEBUG enabled for now. Unsure if we should leave it
enabled for release or not. If we disable it we no longer run the
sync test after failing it once. Running the test even after failure
provides information about the desync on every CPU.
- Taking 1ms per test round is fairly conservative. We can experiment
with and discuss shorter test rounds. My main goal with a relatively
long test round is ensuring VMs actually run the test. It would be
bad if a hypervisor interrupted the test for so long that it concealed
desync.
- The use of two test rounds is mostly a diagnostic tool: it would be
very strange if a CPU passed the first round but failed the second.
If we ever saw this in the wild it would indicate something odd.
- Most of the desync seen in test reports is on Ryzen CPUs. I
believe, but cannot prove, that this is due to a widespread
firmware bug on AMD motherboards. Hopefully AMD and/or the
downstream vendors fix it.
- Fixing TSC desync by writing the TSC directly with WRMSR is very
difficult. The TSC is a moving target incrementing very quickly and
compensating for WRMSR overhead is non-trivial. We can experiment
with this, but my confidence is low that we can make it work reliably.
Prompted by deraadt@ and kettenis@ in 2021. Shepherded along by
deraadt@ throughout. Reprompted by Yuichiro Naito several times.
With input from Yuichiro Naito, naddy@, sthen@, dv@, and deraadt@.
Tested by florian@, gnezdo@, sthen@, Josh Rickmar, dv@, Mohamed Aslan,
Hrvoje Popovski, Yuichiro Naito, semarie@, mlarkin@, asou@, jmatthew@,
Renato Aguiar, and Timo Myyra.
Patch v1: https://marc.info/?l=openbsd-tech&m=
164330092208035&w=2
Patch v2: https://marc.info/?l=openbsd-tech&m=
164558519712957&w=2
Patch v3: https://marc.info/?l=openbsd-tech&m=
165698681018991&w=2
Patch v4: https://marc.info/?l=openbsd-tech&m=
165835507113680&w=2
Patch v5: https://marc.info/?l=openbsd-tech&m=
165923705118770&w=2
"just commit it" deraadt@