Discussion:
Hercules vs z/PDT - some observations
(too old to reply)
w.f.j.mueller@gsi.de [hercules-390]
2018-05-26 12:49:58 UTC
Permalink
Hallo,




I received some time ago results from a handful of s370_perf https://github.com/wfjm/s370-perf/blob/master/README.md runs done on a z/PDT V1.7 system. Took some time to analyze, the data and full analysis is now under


https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md


Performance testing of z/PDT is explicitly disallowed in the z/PDT license conditions. s370_perf would be not the proper tool any way. The page cited above merely gives some observations on general features of z/PDT, the key ones are repeated here
z/PDT is based on on-the-fly binary translation, see section background https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-back.
z/PDT does an optimizing compilation, see section code optimization https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-opt.
the same code is sometimes compiled, sometimes not, see section to compile or not to compile https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-comp.
if code is compiled, compilation can happen with substantial delay, see section compilation delay https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-compdel.
performance in plain interpretive mode seems similar to Hercules, see section interpreter mode performance https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-inter.
RR instructions is the easy part, see section RR instructions https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-rr.
RX instructions is the hard part, see section RX instructions https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-rx.
z/PDT vs Hercules comparisons difficult to interpret, see section z/PDT vs Hercules https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-sum.
little gain for floating point arithmetic, see section floating point performance https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-float.
a bit faster for decimal packed arithmetic, see section decimal packed performance https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-dec.
EX is apparently always interpreted, see section EX instruction https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-ex.
z/PDT performs well for some instructions which are slow on Hercules, see sections CLCL https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-clcl, MVCIN https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-mvcin, and TRT https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-trt.
overall JIT gain depends heavily on workload, see bottom line https://github.com/wfjm/s370-perf/blob/master/narr/2018-03-10_zpdt.md#user-content-obs-bline.

For any further background and detail follow the links.
Any remarks and comments are very welcome.


With best regards, Walter
'\'Fish\' (David B. Trout)' david.b.trout@gmail.com [hercules-390]
2018-05-27 06:54:19 UTC
Permalink
Dr. MÃŒller,

Thank you SO MUCH for the tremendous effort you have put into analyzing Hercules performance! I for one am VERY appreciative! Your analysis is quite illuminating to say the least, and reveals quite clearly that Hercules's performance obviously needs improved in more than one area.

With that in mind I have created a new GitHub Issue to track this problem for SDL Hyperion 4.0:

"Poor performance of CLCL, MVCIN and TRT instructions" (#99)
https://github.com/Fish-Git/hyperion/issues/99


I *was* working on something else (CCKD64), but I am now going to switch to working on this new issue instead, as I firmly believe it is possible that significant improvements can be made to each of these instructions (CLCL, MVCIN and TRT) in, hopefully, short order.

I WOULD LIKE TO ALSO INVITE ALL CURRENT/FORMER HERCULES DEVELOPERS to try their own hand at trying to improve the performance for each of these instructions. We could even maybe have a contest to see who can come up with the best implementation for each.

In any case, I encourage feedback regarding your thoughts on what is the best approach to take in trying to improve the performance for each of these instructions.

Thanks!
--
"Fish" (David B. Trout)
Software Development Laboratories
http://www.softdevlabs.com
mail: ***@softdevlabs.com
Loading...