w.f.j.mueller@gsi.de [hercules-390]
2018-01-14 15:19:19 UTC
The Kernel page-table isolation (KPTI https://en.wikipedia.org/wiki/Kernel_page-table_isolation) patches recently introduced to mitigate the Meltdown https://en.wikipedia.org/wiki/Meltdown_(security_bug) security vulnerability increases the overhead seen by system calls and will thus impact system performance.
I wondered whether that can be seen with Hercules, and indeed there are cases where the instruction timing increases by more than a factor of two !
I used the s370_perf https://github.com/wfjm/s370-perf/blob/master/README.md instruction time benchmark, now available as GitHub project wfjm/s370_perf https://github.com/wfjm/s370-perf.
I run the benchmark, under MVS 3.8J with Hercules as included in tk4-, in a dual CPU configuration (NUMCPU=2 MAXCPU=2) before and after the updates fighting Spectre/Meltdown were installed. The CS, CDS and TS tests in the lock missed configuration show a clear effect, times are up by more than a factor two, all other tests stay the same within measurement precision. See the test reports
https://github.com/wfjm/s370-perf/blob/master/data/2018-01-14_sys1-a.dat https://github.com/wfjm/s370-perf/blob/master/data/2018-01-14_sys1-a.dat
https://github.com/wfjm/s370-perf/blob/master/data/2018-01-14_sys1-b.dat https://github.com/wfjm/s370-perf/blob/master/data/2018-01-14_sys1-b.dat
and inspect tests T292,T297, and T621. Summarized
Tag Comment : before after
T292 LR;CS R,R,m (ne) : 333.92 726.15
T297 LR;CDS R,R,m (ne) : 334.79 742.46
T621 MVI;TS m (ones) : 342.58 729.77
As said, all other instruction times are essentially unchanged.
What happened is easy to explain. The CS, CDS and TS emulation code contains
if (sysblk.cpus > 1) sched_yield();
to get spin locks in the lock missed case efficiently handled. That's why the lock missed case shows a substantially slower instruction time than the lock taken case (which takes only about 80-90 usec). So this test is essentially a system call benchmark, thus very sensitive to the KPTI patch.
Really nice to see this with such clarity.
The practical impact for normal code is likely negligible though, that's why I resisted the temptation to title the thread 'Hercules a factor 2 slower' :).
Cheers, Walter
I wondered whether that can be seen with Hercules, and indeed there are cases where the instruction timing increases by more than a factor of two !
I used the s370_perf https://github.com/wfjm/s370-perf/blob/master/README.md instruction time benchmark, now available as GitHub project wfjm/s370_perf https://github.com/wfjm/s370-perf.
I run the benchmark, under MVS 3.8J with Hercules as included in tk4-, in a dual CPU configuration (NUMCPU=2 MAXCPU=2) before and after the updates fighting Spectre/Meltdown were installed. The CS, CDS and TS tests in the lock missed configuration show a clear effect, times are up by more than a factor two, all other tests stay the same within measurement precision. See the test reports
https://github.com/wfjm/s370-perf/blob/master/data/2018-01-14_sys1-a.dat https://github.com/wfjm/s370-perf/blob/master/data/2018-01-14_sys1-a.dat
https://github.com/wfjm/s370-perf/blob/master/data/2018-01-14_sys1-b.dat https://github.com/wfjm/s370-perf/blob/master/data/2018-01-14_sys1-b.dat
and inspect tests T292,T297, and T621. Summarized
Tag Comment : before after
T292 LR;CS R,R,m (ne) : 333.92 726.15
T297 LR;CDS R,R,m (ne) : 334.79 742.46
T621 MVI;TS m (ones) : 342.58 729.77
As said, all other instruction times are essentially unchanged.
What happened is easy to explain. The CS, CDS and TS emulation code contains
if (sysblk.cpus > 1) sched_yield();
to get spin locks in the lock missed case efficiently handled. That's why the lock missed case shows a substantially slower instruction time than the lock taken case (which takes only about 80-90 usec). So this test is essentially a system call benchmark, thus very sensitive to the KPTI patch.
Really nice to see this with such clarity.
The practical impact for normal code is likely negligible though, that's why I resisted the temptation to title the thread 'Hercules a factor 2 slower' :).
Cheers, Walter