[hercules-390] Hercules vs Sim390

Discussion:

[hercules-390] Hercules vs Sim390 - a comparison

w.f.j.mueller@gsi.de [hercules-390]

2018-12-02 13:34:45 UTC

Hi,

Michael Short was so kind to run a s370_perf https://github.com/wfjm/s370-perf/blob/master/doc/s370_perf.md version ported to MUSIC/SP https://en.wikipedia.org/wiki/MUSIC/SP on his Sim390 http://www.canpub.com/teammpg/de/sim390/ emulator based system. The different OS should have no sizable impact on the measured instruction timings since SVC and privileged instructions, which depend on system response times, aren't covered. A reference run with Hercules 4.0 on the same host CPU is available too. The data and full analysis is under

https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-herc40.md https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-herc40.md
https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md

The key findings are

- Sim390 is, on identical Host hardware, a factor is a factor 6.5 slower than Hercules 4.0, based on the lmark https://github.com/wfjm/s370-perf/blob/master/narr/README_narr.md#user-content-lmark MIPS ratio of 6.39 to 41.54.
- simple instructions, like LR R,R, are about a factor 9 slower, see section LR timing https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md#user-content-find-lr.
- branch timing does not depend on same/different page, see section branch timing https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md#user-content-find-btime.
- CLCL and TRT are much faster on Sim390, see section CLCL+TRT performance https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md#user-content-find-clcl-trt.

The overall summary is that Hercules has a much more efficient handling of instruction fetch and decoding and of virtual to real address mapping.

Any remarks and comments are very welcome.

With best regards, Walter

Bernd Oppolzer berndoppolzer@yahoo.com [hercules-390]

2018-12-02 14:16:30 UTC

Permalink

Hi,

I've got some experience with Sim390 when I retrieved the initial 1982
McGill version of my New Stanford Pascal compiler from MUSIC/SP in 2011
(then without the "New" attribute) and ported it to Hercules.
I've got the impression then that Sim390 is a - sort of - dead product
and is very rarely used - in contrast to Hercules. That's why I did the
port in the first place - and because the compiler available on Hercules
was older than the McGill version (from 1979).

So IMO there is no big win from improving Sim390; I don't know who is
managing it today. Anyway, if this should be done, I would start here:

"Hercules avoids to recalculate the virtual to real address mapping for
an instruction fetch
whenever possible. This likely explains a good part of the observed
speed difference
for simple instructions seen in section LR timing
<https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md#user-content-find-lr>."

This should be done for Sim390, too.

But, OTOH, there are some areas where Sim390 seems to be better than
Hercules,
so this could be a motivation for examining these areas in Hercules and
try to
make improvements there: decimal, TRT, STCK etc. - for details see the
links below.

Thank you for doing these tests and comparisons.

Kind regards

Bernd

Post by ***@gsi.de [hercules-390]
Hi,
Michael Short was so kind to run a s370_perf
<https://github.com/wfjm/s370-perf/blob/master/doc/s370_perf.md>version
ported to MUSIC/SP <https://en.wikipedia.org/wiki/MUSIC/SP> on his
Sim390 <http://www.canpub.com/teammpg/de/sim390/> emulator based
system. The different OS should have no sizable impact on the measured
instruction timings since SVC and privileged instructions, which
depend on system response times, aren't covered. A reference run with
Hercules 4.0 on the same host CPU is available too. The data and full
analysis is under
https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-herc40.md
https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md
The key findings are
- Sim390 is, on identical Host hardware, a factor is a factor 6.5
slower than Hercules 4.0, based on the lmark
<https://github.com/wfjm/s370-perf/blob/master/narr/README_narr.md#user-content-lmark>
MIPS ratio of 6.39 to 41.54.
- simple instructions, like LR R,R, are about a factor 9 slower, see
section LR timing
<https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md#user-content-find-lr>.
- branch timing does not depend on same/different page, see section
branch timing
<https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md#user-content-find-btime>.
- CLCL and TRT are much faster on Sim390, see section CLCL+TRT
performance
<https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md#user-content-find-clcl-trt>.
The overall summary is that Hercules has a much more efficient
handling of instruction fetch and decoding and of virtual to real
address mapping.
Any remarks and comments are very welcome.
Â Â Â With best regards,Â Â Walter

Kevin Monceaux Kevin@RawFedDogs.net [hercules-390]

2018-12-02 17:42:57 UTC

Permalink

I've got the impression then that Sim390 is a - sort of - dead product and
is very rarely used - in contrast to Hercules.
So IMO there is no big win from improving Sim390; I don't know who is
managing it today.

If Sim390 were ever ported to Linux, and if someone ever released the rest
of MUSIC/SP, I'd probably use it to run MUSIC to enable MUSIC's network
functionality.

--
Kevin
http://www.RawFedDogs.net
http://www.Lassie.xyz
http://www.WacoAgilityGroup.org
Bruceville, TX

What's the definition of a legacy system? One that works!
Errare humanum est, ignoscere caninum.

'\'Fish\' (David B. Trout)' david.b.trout@gmail.com [hercules-390]

2018-12-02 19:11:32 UTC

Permalink

FYI:

https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-herc40..md

Hercules 4.0.0.8906-SDL-g1035eafe (2017-11-07)

Whereas:

https://github.com/SDL-Hercules-390/hyperion/releases/tag/Release_4.1

Hercules-4.1.0.9426-SDL-g42b533fa (2018-11-10)

Performance of TRT, CLC, CLCL and MVCIN instructions
vastly improved (Fish and Ivan Warren) (**)

(**) TRT performance was improved by a factor of 16, MVCIN by a factor of 20, and CLCL performance was improved by a factor of 150 (i.e. the CLCL instruction is now over 150 times faster than before).

Therefore:

- CLCL and TRT are much faster on Sim390, see section
CLCL+TRT performance <https://github.com/wfjm/s370-perf/blob/master/narr/2018-07-30_ms1-sim390.md#user-content-find-clcl-trt>

is very likely no longer true.

--
"Fish" (David B. Trout)
Software Development Laboratories
http://www.softdevlabs.com
mail: ***@softdevlabs.com

Ivan Warren ivan@vmfacility.fr [hercules-390]

2018-12-06 17:53:46 UTC

Permalink

Le 12/2/2018 Ã 8:11 PM, ''Fish' (David B. Trout)'

Post by '\'Fish\' (David B. Trout)' ***@gmail.com [hercules-390]
Hercules-4.1.0.9426-SDL-g42b533fa (2018-11-10)
Performance of TRT, CLC, CLCL and MVCIN instructions
vastly improved (Fish and Ivan Warren) (**)

Hey guys,

Just to be clear, the performance increase techniques are all Fish's. I
just happened to address a couple of architectural issues occuring in
boundary situations (like those that happen once in a billion years)
after the changes were done.

--Ivan

[Non-text portions of this message have been removed]

kerravon86@yahoo.com.au [hercules-390]

2018-12-10 17:09:32 UTC

Permalink

1. That quote doesn't say that AM31 is required.
It says that the addresses all need to be clean,
even if invoking in AM24.

2. That is IBM's EZASOKET, and is not related
to Jason Winter's EZASOKET for MVS 3.8j,
an AM24 system.

BFN. Paul.

---In hercules-***@yahoogroups.com, <***@...> wrote :

I dont see how you are going to do that...

EZASOKET requires AMODE31. How is that possible on MVS 3.8J?

https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.hala001/resreqb.htm https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.hala001/resreqb.htm

"The EZASOKET API can be invoked while the caller is in either 31-bit or 24-bit Amode. However, if the application is running in 24-bit addressability mode at the time of the call, all addresses of parameters passed by the application must be addressable in 31-bit Amode. This implies that even if the addresses being passed reside in storage below the 16 MB line (and therefore addressable by 24-bit Amode programs) the high-order byte of these addresses needs to be 0."

Joe

On Mon, Dec 10, 2018 at 12:34 PM ''Fish' (David B. Trout)' ***@... mailto:***@... [hercules-390] <hercules-***@yahoogroups.com mailto:hercules-***@yahoogroups.com> wrote:

Paul Edwards wrote:

[...]

As far as I am aware, the interface everyone is supposed
to be using is a call to EZASOKET. The EZASOKET function
should be able to be changed to invoke an SVC instead of
an x'75' instruction. Then a simple relink will allow
applications to use the new design.

Great! That's exactly what I was hoping to hear!

How many application programs are there that would need
to be relinked? Isn't there just ftp (under your control)
and the web server?

That's good news too.

The scope of the problem is not as large as I originally feared.

--
"Fish" (David B. Trout)
Software Development Laboratories
http://www.softdevlabs.com http://www.softdevlabs.com
mail: ***@... mailto:***@...

Laddie Hanus laddiehanus@yahoo.com [hercules-390]

2018-12-10 19:12:32 UTC

Permalink

Harold Grovesteen implemented Jasonâs xâ76â instruction as a diagnose about 10 years ago. There is a framework for the 75 also but not implemented. Last I checked it was still in Hyperion. Itâs call host resource facility. I have his documentation on my home desktop.

Laddie

Sent from whatever device I am using.

Post by ***@yahoo.com.au [hercules-390]
[...]

Great! That's exactly what I was hoping to hear!

How many application programs are there that would need
to be relinked? Isn't there just ftp (under your control)
and the web server?

That's good news too.
The scope of the problem is not as large as I originally feared.
--
"Fish" (David B. Trout)
Software Development Laboratories
http://www.softdevlabs.com

Ivan Warren ivan@vmfacility.fr [hercules-390]

2018-12-10 19:05:49 UTC

Permalink

Even if implemented as a PRIVILEGED instruction?

As a privilged instruction ? It's VERY fine by me !

(It's just the equivalnt of a TCP/IP Offload using an instruction)

--Ivan

[Non-text portions of this message have been removed]

kerravon86@yahoo.com.au [hercules-390]

2018-12-10 20:42:39 UTC

Permalink

So, if one is using the maximum number of 1023
currently allowed sockets there will get almost
13.000 X'75' instructions executed per second.

I don't think you should be basing your
reasoning on people using 1023 sockets.

Regardless, I can do 77,000 ATL GETMAINs
per second on the cheapest laptop that
Dell Australia sells:

https://groups.yahoo.com/neo/groups/hercules-os380/conversations/messages/19245

Wrapping each and every of these instructions
with an SVC doing whatever security checking,
would be a significant overhead,

So don't do any security checking. Just
unconditionally call the x'75'. That will
still satisfy the technical requirements
of a privileged instruction, and if someone
wants to run a "better" SVC with "better"
security checking, they are free to do so,
especially if they use a more reasonable
number of sockets.

But how should I explain to users that they
now can no longer just code away as they
like, but instead have to modify SVC code,

They can continue to use the EZASOKET
interface unchanged.

BFN. Paul.

pricgren pricgren@yahoo.com [hercules-390]

2018-12-10 23:41:19 UTC

Permalink

I can do 77,000 ATL GETMAINs per second

Well, presumably that is via SVC 120 which is a type-1 SVC so there is a
timing benchmark which includes MVS local lock management overhead.

I think that the SVC method is viable - a type 1 would involve getting
the local lock, while a type 3 would involve creating an SVRB - but they
come from a cellpool anyway (usually) so a bit of a trade-off to avoid
lock management.

So what's the SVC have to do in terms of extra checking?

The main stipulation I'd have is that the SVC issuer can access the
nominated I/O buffer(s) with their own storage key.Â The convention is
that callers in a system key can bypass this check.Â This bypass could
reasonably be extended to supervisor state and APF authorized callers.Â
I'd be happy with checking access to a byte every 4K (along a nominated
buffer) in an MVS environment.Â I think the extra overhead would be small.

Dunno if there needs to be or should be some checking re "I am not
stealing another application's socket".

So a privileged instruction can be executed in supervisor state, right?

So a specialized address space like an FTP server could be made APF
authorized, which would allow it to (switch into supervisor state and)
issue the privileged instruction without issuing any SVC.

If a hobbyist wants to execute the instruction natively from their own
programs, then they could install and use a "magic" SVC on their system
if that's want they want.Â In practise, I'd only use such a bypass if
the application code was to run as a general subroutine or function call
for other applications.Â For specific stand alone programs including
specific TSO commands, I'd probably make do with having to APF authorize
those programs.Â But that's just me.

Any additional pain would be limited to op-code x'75' assembler coders
who should be able to cope, and using interfaces like EZASOKET and/or
EZASMI should avoid any application code changes.

I probably don't get a vote, but the privileged instruction path seems
they way to go for me.Â May be a bit of a pain for some plebby users,
but hey, who said mainframes are easy to use?
:)

Cheers,
Greg P.

Ivan Warren ivan@vmfacility.fr [hercules-390]

2018-12-10 17:38:43 UTC

Permalink

Post by ***@yahoo.com.au [hercules-390]
I dont see how you are going to do that...
EZASOKET requires AMODE31. How is that possible on MVS 3.8J?

Did you read it ?

24 bit mode is Perfectly acceptable (unless you are running this MVS/380
(barf) thing)

--Ivan

Post by ***@yahoo.com.au [hercules-390]
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos.v2r3.hala001/resreqb.htm
<https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.3.0/com.ibm.zos...v2r3.hala001/resreqb.htm>
"The EZASOKET API can be invoked while the caller is in either 31-bit
or 24-bit Amode. However, if the application is running in 24-bit
addressability mode at the time of the call, all addresses of
parameters passed by the application must be addressable in 31-bit
Amode. This implies that even if the addresses being passed reside in
storage below the 16 MB line (and therefore addressable by 24-bit
Amode programs) the high-order byte of these addresses needs to be 0."

[Non-text portions of this message have been removed]

'\'Fish\' (David B. Trout)' david.b.trout@gmail.com [hercules-390]

2018-12-10 18:56:13 UTC

Permalink

This post might be inappropriate. Click to display it.

Harold Grovesteen h.grovsteen@tx.rr.com [hercules-390]

2018-12-10 21:27:44 UTC

Permalink

Sounds good to me. I think we just need a
minimal SVC to start with that just gives
any user the ability to execute x'75'. I already
have code (from Gerhard) for an SVC 120
intercept and x'75' could be an additional
API call in that.

But if it's going to be a privileged instruction
behind an SVC, wouldn't it make more
sense for it to be a DIAG instruction?
BFN. Paul.
Â

Which is exactly what I had been working on some time ago, a DIAGNOSE.
Â But that still requires mods to the OS, which seem to be the very
thing being avoided by X'75'.

The work Ivan is doing, directly addresses this issue. Â Note though,
the existing user land applications using X'75', like FTP, would
require changes.

Harold Grovesteen

kerravon86@yahoo.com.au [hercules-390]

2018-12-10 21:34:21 UTC

Permalink

Post by Harold Grovesteen ***@tx.rr.com [hercules-390]

But if it's going to be a privileged instruction
behind an SVC, wouldn't it make more
sense for it to be a DIAG instruction?

Which is exactly what I had been working on some time ago, a DIAGNOSE.
But that still requires mods to the OS, which seem to be the very
thing being avoided by X'75'.

The only mod to the OS is to create an
SVC that calls the DIAG, right? That
doesn't seem to be an onerous mod
to the OS to me.

BFN. Paul.

Harold Grovesteen h.grovsteen@tx.rr.com [hercules-390]

2018-12-10 21:42:10 UTC

Permalink

Post by Ivan Warren ***@vmfacility.fr [hercules-390]
Â
--Ivan
PS : I am adapting a TCP/IP stack for VM/370 which do not requireÂ
breaking any architecture constraint.. (Based on LWIP and the
herculesÂ
LCS device)
LWIP is written in C. Â Are you porting it to assembler or using a C
compiler, like Paul's port of gcc?

Harold

W Mainframe mainframew@yahoo.com [hercules-390]

2018-12-10 23:30:38 UTC

Permalink

Guys,
Sorry.. Maybe a stupid question, but would be possible to code instruction x'75' in VM370 and/or Â VMSP?I tried to port EZASOKET to VMSP running under Hercules/TK4- version but the results are unexpected. Nothing like I would like... :(
RegardsDan

Sent from Yahoo Mail for iPhone

On Monday, December 10, 2018, 7:42 PM, Harold Grovesteen ***@tx.rr.com [hercules-390] <hercules-***@yahoogroups.com> wrote:

Â

Harold
#yiv3595065559 #yiv3595065559 -- #yiv3595065559ygrp-mkp {border:1px solid #d8d8d8;font-family:Arial;margin:10px 0;padding:0 10px;}#yiv3595065559 #yiv3595065559ygrp-mkp hr {border:1px solid #d8d8d8;}#yiv3595065559 #yiv3595065559ygrp-mkp #yiv3595065559hd {color:#628c2a;font-size:85%;font-weight:700;line-height:122%;margin:10px 0;}#yiv3595065559 #yiv3595065559ygrp-mkp #yiv3595065559ads {margin-bottom:10px;}#yiv3595065559 #yiv3595065559ygrp-mkp .yiv3595065559ad {padding:0 0;}#yiv3595065559 #yiv3595065559ygrp-mkp .yiv3595065559ad p {margin:0;}#yiv3595065559 #yiv3595065559ygrp-mkp .yiv3595065559ad a {color:#0000ff;text-decoration:none;}#yiv3595065559 #yiv3595065559ygrp-sponsor #yiv3595065559ygrp-lc {font-family:Arial;}#yiv3595065559 #yiv3595065559ygrp-sponsor #yiv3595065559ygrp-lc #yiv3595065559hd {margin:10px 0px;font-weight:700;font-size:78%;line-height:122%;}#yiv3595065559 #yiv3595065559ygrp-sponsor #yiv3595065559ygrp-lc .yiv3595065559ad {margin-bottom:10px;padding:0 0;}#yiv3595065559 #yiv3595065559actions {font-family:Verdana;font-size:11px;padding:10px 0;}#yiv3595065559 #yiv3595065559activity {background-color:#e0ecee;float:left;font-family:Verdana;font-size:10px;padding:10px;}#yiv3595065559 #yiv3595065559activity span {font-weight:700;}#yiv3595065559 #yiv3595065559activity span:first-child {text-transform:uppercase;}#yiv3595065559 #yiv3595065559activity span a {color:#5085b6;text-decoration:none;}#yiv3595065559 #yiv3595065559activity span span {color:#ff7900;}#yiv3595065559 #yiv3595065559activity span .yiv3595065559underline {text-decoration:underline;}#yiv3595065559 .yiv3595065559attach {clear:both;display:table;font-family:Arial;font-size:12px;padding:10px 0;width:400px;}#yiv3595065559 .yiv3595065559attach div a {text-decoration:none;}#yiv3595065559 .yiv3595065559attach img {border:none;padding-right:5px;}#yiv3595065559 .yiv3595065559attach label {display:block;margin-bottom:5px;}#yiv3595065559 .yiv3595065559attach label a {text-decoration:none;}#yiv3595065559 blockquote {margin:0 0 0 4px;}#yiv3595065559 .yiv3595065559bold {font-family:Arial;font-size:13px;font-weight:700;}#yiv3595065559 .yiv3595065559bold a {text-decoration:none;}#yiv3595065559 dd.yiv3595065559last p a {font-family:Verdana;font-weight:700;}#yiv3595065559 dd.yiv3595065559last p span {margin-right:10px;font-family:Verdana;font-weight:700;}#yiv3595065559 dd.yiv3595065559last p span.yiv3595065559yshortcuts {margin-right:0;}#yiv3595065559 div.yiv3595065559attach-table div div a {text-decoration:none;}#yiv3595065559 div.yiv3595065559attach-table {width:400px;}#yiv3595065559 div.yiv3595065559file-title a, #yiv3595065559 div.yiv3595065559file-title a:active, #yiv3595065559 div.yiv3595065559file-title a:hover, #yiv3595065559 div.yiv3595065559file-title a:visited {text-decoration:none;}#yiv3595065559 div.yiv3595065559photo-title a, #yiv3595065559 div.yiv3595065559photo-title a:active, #yiv3595065559 div.yiv3595065559photo-title a:hover, #yiv3595065559 div.yiv3595065559photo-title a:visited {text-decoration:none;}#yiv3595065559 div#yiv3595065559ygrp-mlmsg #yiv3595065559ygrp-msg p a span.yiv3595065559yshortcuts {font-family:Verdana;font-size:10px;font-weight:normal;}#yiv3595065559 .yiv3595065559green {color:#628c2a;}#yiv3595065559 .yiv3595065559MsoNormal {margin:0 0 0 0;}#yiv3595065559 o {font-size:0;}#yiv3595065559 #yiv3595065559photos div {float:left;width:72px;}#yiv3595065559 #yiv3595065559photos div div {border:1px solid #666666;min-height:62px;overflow:hidden;width:62px;}#yiv3595065559 #yiv3595065559photos div label {color:#666666;font-size:10px;overflow:hidden;text-align:center;white-space:nowrap;width:64px;}#yiv3595065559 #yiv3595065559reco-category {font-size:77%;}#yiv3595065559 #yiv3595065559reco-desc {font-size:77%;}#yiv3595065559 .yiv3595065559replbq {margin:4px;}#yiv3595065559 #yiv3595065559ygrp-actbar div a:first-child {margin-right:2px;padding-right:5px;}#yiv3595065559 #yiv3595065559ygrp-mlmsg {font-size:13px;font-family:Arial, helvetica, clean, sans-serif;}#yiv3595065559 #yiv3595065559ygrp-mlmsg table {font-size:inherit;font:100%;}#yiv3595065559 #yiv3595065559ygrp-mlmsg select, #yiv3595065559 input, #yiv3595065559 textarea {font:99% Arial, Helvetica, clean, sans-serif;}#yiv3595065559 #yiv3595065559ygrp-mlmsg pre, #yiv3595065559 code {font:115% monospace;}#yiv3595065559 #yiv3595065559ygrp-mlmsg * {line-height:1.22em;}#yiv3595065559 #yiv3595065559ygrp-mlmsg #yiv3595065559logo {padding-bottom:10px;}#yiv3595065559 #yiv3595065559ygrp-msg p a {font-family:Verdana;}#yiv3595065559 #yiv3595065559ygrp-msg p#yiv3595065559attach-count span {color:#1E66AE;font-weight:700;}#yiv3595065559 #yiv3595065559ygrp-reco #yiv3595065559reco-head {color:#ff7900;font-weight:700;}#yiv3595065559 #yiv3595065559ygrp-reco {margin-bottom:20px;padding:0px;}#yiv3595065559 #yiv3595065559ygrp-sponsor #yiv3595065559ov li a {font-size:130%;text-decoration:none;}#yiv3595065559 #yiv3595065559ygrp-sponsor #yiv3595065559ov li {font-size:77%;list-style-type:square;padding:6px 0;}#yiv3595065559 #yiv3595065559ygrp-sponsor #yiv3595065559ov ul {margin:0;padding:0 0 0 8px;}#yiv3595065559 #yiv3595065559ygrp-text {font-family:Georgia;}#yiv3595065559 #yiv3595065559ygrp-text p {margin:0 0 1em 0;}#yiv3595065559 #yiv3595065559ygrp-text tt {font-size:120%;}#yiv3595065559 #yiv3595065559ygrp-vital ul li:last-child {border-right:none !important;}#yiv3595065559