Post by ***@sympatico.ca [hercules-390]D235D186 6696E000
D235D186 6696F000
D235D186 66970000
D235D186 66971000
necessarily the same", not that they will. Is it really an error then
that Hercules returns unique values ?
While the following discusses both the clocks and the INSTRATE program,
please understand that the concepts of both the INSTRATE program and
instruction timing are neither an issue or at issue. The key is that
some of the fundamental concepts used for timing must be changed to
generate what can be considered proper results -- regardless of the
timing program in use.
Background: I have been working with the architecture and instruction
timing as a hardware vendor, software vendor, and ISV, for over 45
years; I was also part of a performance analysis team as an SE/PSR for
four years (including stepping through customer issues with the
performance of specific instructions).
Now down to the points:
1) Hercules is not in error in providing timing services by the
methodology in use; it is within the guidelines of the Principles of
Operation. What can be observed is properly termed as "model dependent
behavior." If one knows what to look for, these variations can be seen
between each model of each vendor's machines -- including clock behavior.
2) On Hyperion, the clock values are more "refined." For STCKF,
resolution is significantly better; on very rare occasions I have seen
the "same" clock value (primarily because I wrote and debugged much of
the core clock updates on Hercules and Hyperion, except steering). All
accuracy past the underlying OS clock calls is calculated based on
engine timings of the OS and will always somewhat lag the true actual
time (but always within one OS clock unit for the underlying OS). On my
Linux machine (with 1ns timing and an OS clock only guaranteed to
produce valid millisecond values) the STCKF values for one back-to-back
sequence ran as follows:
D234F481 8C891899
D234F481 8C893049 17B0 1480ns
D234F481 8C8930D9 90 35ns
D234F481 8C893169 90 35ns
D234F481 8C8938A9 740 453ns
D234F481 8C893939 90 35ns
D234F481 8C894006 6CD 425ns
D234F481 8C894096 90 35ns
This shows that the "real" underlying OS clock updated three, four or
five times during the sequence, and the Hercules CPU ran during the
sequence without interruption. The Hercules clock code during this
sequence was triggered not only by the STCKF, but also by various other
Hercules internal functions. It is less expensive in overhead to run a
single actual Hercules clock, with STCKF then *not* checking for
duplicate values.
3) Using the STORE CLOCK facility is, and always has been, the wrong
approach to instruction timing, unless one is running on a non-shared
(true dedicated) CPU *and* guaranteed NOT to take an interrupt. LPARs,
even with "dedicated" CPUs are not necessarily fully dedicated to only
working with a given LPAR -- cycles may be used by the LPAR
scheduler/dispatcher/etc. This has not changed in the 45+ years of my
working with the architecture.
Sidebar: On the 360, one could only use the Interval Timer as a clock
source. As such, it was a common practice (though not ethical) of
various representatives of multiple companies to surreptitiously disable
the Interval Timer using the Disable Interval Timer toggle switch during
timing runs. As SE's and CE's, we were trained in how to spot such
activities...
4) Running on "REAL" hardware, one must use a dedicated CPU in an LPAR.
It may not be defined as SHARED, and even then, the LPAR is not
guaranteed to get all available CPU cycles without interruption.
5) On real hardware, one must also take the caches, buffers, etc., into
account. On emulators, these and other considerations must be made,
including what the hosting OS is in use.
6) Hercules, as well as any other emulator running as either a user or
system application, and without a CPU fully dedicated with all maskable
interrupts masked off, exacerbates the issue. If and when "consistency"
is seen, it is generally due to the lack of interrupts.
7) That said, there is a way to get the proper, or closest to the
proper, results. Use the CPU Timer (measures time consumed, rather than
time elapsed). This means under a given OS where the timing job is
running, one must use other facilities to properly set and use the
information from the task control blocks; run in supervisor state with
interrupt facilities turned off, or "standalone."
8) Should one run in a virtualized operating system under Hercules, even
the emulated CPU Timer behaves better than the TOD clock, yielding
results that are normally well within a percentage point of "actual" on
extended runs.
9) The CPU Timer resolution is that of what the underlying OS provides.
So on my Linux system, that means sub-nanosecond resolution in TOD Clock
format:
FFFFFFF1 8345FB00 500 312.5ns
FFFFFFF1 8345E600 600 375.0ns
FFFFFFF1 8345E000 500 312.5ns
FFFFFFF1 8345DB00 600 375.0ns
FFFFFFF1 8345D500 500 312.5ns
FFFFFFF1 8345D000 600 375.0ns
FFFFFFF1 8345CA00 500 312.5ns
FFFFFFF1 8345C500
As such, the CPU Timer properly reflects the ACTUAL CPU time used by the
CPU thread (or underlying processor), and with consistency not possible
with the TOD Clock, regardless of whatever machinations are
algorithmically done. It should also be noted that the clock logic used
for the CPU Timer cannot be used for the TOD Clock, just as with a
"real" mainframe.
10) The timing sequences within INSTRATE are too short to properly
negate the branch loop overhead with consistency.
Mark