*** Off List ***
Thank you for taking the time to begin the formalization of
As a prior specialist in this area, as well as a hardware and firmware
architect, PLEASE update the test to use the MVS task CPU time to remove
both OS and host OS times. Using the TOD clock via STCK yields the wall
clock execution time, NOT the actual instruction time, and the times
will significantly vary when tests are properly constructed and have a
minimum proper length. If variance is not seen when running under an OS,
then there is a test construction error, and/or STPT emulation is not
Even with these updates, the test sizes (instruction counts between the
branches) need to be large enough to offset test overhead variance
timings as well as system overhead variance timings. We used to use a 4K
instruction/data buffer for each test, with a pre-read of the buffer to
ensure that it was in cache. While not for quite the same reasons under
Hercules, it still makes a difference when you truly work at the
When running standalone (possibly in a later version of your code), you
will want to use STPT for your clock source.
And, yes, I also reviewed Liba Svobodova's paper in 1974-1975, along
with my office partners of the day, as we were writing instruction
timing routines to both show where our machines operated faster than
IBM's, and that no "critical" instruction ran slower. This was done as
we had to show that both the underlying machine was faster via proof,
and that both individual and combination loads ran faster. The proof was
in the CPU time, NOT the wall clock time. In addition we had to show
that our "third-party" memory we were selling was not a source of
slowdowns on the machines.
The instruction mix information used by Svobodova was generated by each
installation shown in Table B.4 on page 68, were not fully business
operational mixes -- significantly different from a true workload mixes
of the 1980s and beyond.
I will be glad to answer any questions that you may have; please be
aware that there are still areas that I am not permitted to address. If
I appear to dodge, or intentionally not answer, a question, please take
that information into consideration. Restating a question in a different
manner may permit me to answer the question.
Mark L. Gaubatz
Post by ***@gsi.de [hercules-390]
The s370_perf instruction time benchmark is now feature complete and
available as GitHub projectwfjm/s370_perf
<https://github.com/wfjm/s370-perf/> in version 0.80. Also lots of
which will allow a lot of deeper analysis.
The data has been generated on a variety of systems, on real CPUs like
the P/390 and on Hercules emulators running on a wide range of host
systems, from Raspberry Pi 2B to a XEON workstation. More host CPUs
likely to come, and maybe also more Hercules versions.
So it be nice to condense a set of instruction timings, see for
example the P/390 listing
into a single figure-of-merit.
One classical way is to use instruction frequencies to generate a
weighted average, which could be converted into a 'MIPS' number. So I
started to look for such instruction frequencies, of course for S/370
workloads, and found the Stanford Technical Report Nr 66 written 1974
by Liba Svobodova, see
which contains in Table B.3 on page 63+64 a full distribution. In
seems that the workload was integer dominated, the frequencies for
floating and decimal instructions are negligible.
There are other papers on the subject, which also mention
distributions based on FORTRAN and COBOL workloads (so with a
significant floating and decimal arithmetic instruction fraction), but
I haven't found complete distributions so far.
Any help or hint where to find such instruction frequency distribution
data is very appreciated. Best from the S/370 times, because for me
it's a retro computing project and s370_perf only tests S/370
Â Â Â Thanks in advance,Â Â Â Walter