Discussion:
Stress Testing Hyperion under Windows Server, Updated
(too old to reply)
'Dan Skomsky' poodles511@sbcglobal.net [hercules-390]
2017-01-31 00:41:26 UTC
Permalink
After all the discussion of z/OS benchmarks, I decided to see how a simple
change to the .CNF file would affect my Hyperion stress test numbers. Per
the information previously pointed out by Fish, I increased my DEVTMAX value
from 8 to 16, giving me eight more device threads.. As before, I ran my ten
(10) full NBENCH Assemble, Link, and Go JOBS. Absolutely nothing else has
changed, nothing whatsoever. The average wall clock time per JOB is now
6.93 minutes. That's a decrease of 1.40 minutes, or 16.80%. That's a
significant number. Here are my HERCGUI log numbers:



17:02:26.928 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30
17:02:26 2017

17:02:26.928 00000D8C HHC02272I MIPS: 910.316468

17:02:26.928 00000D8C HHC02272I IO/s: 2900

17:02:26.928 00000D8C HHC02272I Current interval is 1440 minutes



The MIPS number went up by 98.168611, or 12.09%. Again, that's significant.
But IOs per second decreased by 336 or 10.38%.



Again, absolutely nothing changed in my stress test except for doubling the
DEVTMAX value. Who would have thought such a small change would have
produced such a significant performance increase?



In the near future I will be updating the operating system to Windows Server
2012 R2. When everything is stable once more, I'll re-run my stress test.



From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Sunday, September 04, 2016 12:55 PM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server.

Well, I updated the BIOS and chipset microcode on my Dell box to the latest
and greatest levels. I then continued with my stress testing. This time I
submitted ten copies of NBENCH (with unique JOB names of course). Eight
immediately jumped into my free initiators and ran to completion. As the
first two NBENCH jobs completed, the remaining two pending jobs jumped into
the free initiators. All NBENCH jobs were full Assemble, link, and go
examples. For those who aren't familiar with full NBENCH, each JOB consists
of 35,000+ lines of Assembler source, executes 43 program steps, and
produces 72,000+ lines of SYSOUT. The average wall clock time per JOB was
8.33 minutes.

I decided this method of testing would maximize the mix of I/O and CPU usage
and give me a more realistic measurement of true machine performance. Below
are my HercGui numbers. The CPU number is slightly lower than before, but
now notice the I/O number.

12:06:19.242 00001344 HHC02272I From Sat Sep 03 12:06:19 2016 to Sun Sep 04
12:06:19 2016

12:06:19.242 00001344 HHC02272I MIPS: 812.147857

12:06:19.242 00001344 HHC02272I IO/s: 3236

12:06:19.242 00001344 HHC02272I Current interval is 1440 minutes

My gut feeling is that the high I/O capability can be attributed entirely to
the SSD HDDs. Per Windows Task Manager, my memory usage never went over 22%
(of the 16 GB installed).

For the record, the processor is an Intel Xeon CPU Quad Core E3-1225 v3 @
3.20GHz, RAM is Hyundai PC3-12800 (800 MHz), and the HDDs are LITEONIT LCTs,
SATA-III 6.0Gb/s. All components were acquired from Dell.

From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Friday, September 02, 2016 4:19 PM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server.

It looks like my tests from yesterday produced nearly identical results
today. But this time, I skipped running the Assembly and link steps and
simply ran the NBENCH steps in parallel. As before, with eight concurrent
NBENCH steps executing, all HercGui and Task Manager indicators stayed maxed
out at 100%. As you can see, without running the Assembly and link steps my
I/O numbers dropped like a stone.

15:57:33.967 00000450 HHC02272I From Thu Sep 01 15:57:33 2016 to Fri Sep 02
15:57:33 2016
15:57:33.967 00000450 HHC02272I MIPS: 899.872161
15:57:33.967 00000450 HHC02272I IO/s: 527
15:57:33.967 00000450 HHC02272I Current int! erval is 1440 minutes

My Windows Server is at the current Microsoft update level. Over this
weekend I plan to apply some Dell BIOS and chipset updates to the T20 server
and bring it up to current level also. I will then rerun my tests. Who
knows? Maybe the numbers will change. I'll be pushing for that magic 900
MIPS level.

-----Original Message-----
From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Friday, September 02, 2016 7:31 AM
To: hercules-***@yahoogroups.com
Subject: Re: [hercules-390] Stress Testing Hyperion under Windows Server.

Ivan,

Your TGV reservation with SNCF also wind up in TPF aka Airline Control
Program. TPF used to be maintained from VM, but now MVS is required and
those who relied on Pipelines under CMS had a problem.

When I did some performance measurements along the lines you suggest in
conjunction with what I thought should improve the code, my experience that
was that there was no correlation between what code I changed in Hercules
and where performance changed.

My theory is that Hercules performance is governed by cache misses because
the footprint of running even the simplest instruction is enormous. So
pushing a bit of code across a cache line might have a lot more effect than
saving a single x85 instruction.
Paul, out in the real world there would at least be a data base
involved. If it is a transaction processor, you will at least have
CICS or equivalent, if not (shudder) Websphere.
There! is also TPF (z/TPF).. But I never run it or saw it run (I think
it is mainly used in the airline industry to provide centralized
booking systems).
Now as far as "performance" is concerned, I once ran an (automated)
daily test on various instructions to see if any change might have had
significant impact on instruction execution times.
It was an IPLable test that ran various instructions in long unrolled
loops (within a page) and that punched the results. The punch cards
were then incorporated in a DB and could be then processed by a PHP
program and then viewed in a web browser. It also relied on the timer
facilities (TOD clock). I had this service up for about 3 years.
- There was obviously some sort of "heisencache" issue... I could
never get very consistant results (from 1 day to the next I would have
sometimes 10% difference although no change had been made), possible
because a subtle difference of host instruction execution sequence, or
some background process running
- I only ran tests on a limited set of instructions (mainly loads
(L,LR,LA,LM,IC,ICM) , stores (ST, STC, STCM, STM), Branches, Moves
(MVC)) and didn't test DAT (but in hercules, DAT and Real is almost
the
same)
However, it allowed detecting if some seemingly minor change could
have had some major impact through some weird side effect.
--Ivan
opplr@hotmail.com [hercules-390]
2017-01-31 14:09:22 UTC
Permalink
Dan wrote:

"As before, I ran my ten (10) full NBENCH Assemble, Link, and Go JOBS. Absolutely nothing else has changed, nothing whatsoever. The average wall clock time per JOB is now 6.93 minutes. That’s a decrease of 1.40 minutes, or 16.80%. That’s a significant number. Here are my HERCGUI log numbers:"

I believe some recently posted that benchmarking was something of a 'black art'. It may have been an off list email, anyway.

Be careful in stating 'nothing else has changed'.

NBENCH is self adjusting in that it can run more tests to get consistent results and if the CPU is fast enough then it requires more instances of a test to be performed.

In other words, on a 3 MIP system it may only run the series of tests by looping only 5 times on each. Whereas on a 30 MIP system it may run the series of tests by looping 20 times on each. So as you have loaded the system with multiple executions it may have slowed the capacity of the CPU(s) so that NBENCH is now executing a different number of testing cycles.

A fixed benchmark such as dhrystone or whetstone, doesn't self adjust like NBENCH.

Phil
williaj@sympatico.ca [hercules-390]
2017-01-31 16:49:24 UTC
Permalink
That's an amazing mips rate. I looked up the E3-1225 v3, and Passmark rates it as 7080 vs the x5680 which is rated at 8767. Yet as an 8-way, the best I've seen is 350mips and 2400 io/s. I don't have SSDs, just iron drives but run as raid with a 512MB controller that delivers 280MB/sec avg at full blast.

I'm using 3.12, so if that's the sort of performance boost Hyperion gives, that's sure a good reason to upgrade.


---In hercules-***@yahoogroups.com, <***@...> wrote :

17:02:26.928 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30 17:02:26 2017
17:02:26.928 00000D8C HHC02272I MIPS: 910.316468
17:02:26.928 00000D8C HHC02272I IO/s: 2900
17:02:26.928 00000D8C HHC02272I Current interval is 1440 minutes

The MIPS number went up by 98.168611, or 12.09%. Again, that’s significant. But IOs per second decreased by 336 or 10.38%.
'Dan Skomsky' poodles511@sbcglobal.net [hercules-390]
2017-01-31 18:00:47 UTC
Permalink
I’m not looking at the performance numbers generated by NBENCH itself, just the numbers from HERCGUI as to work load. I wanted to drive my machine to the max and see what HERCGUI says. As with all my ten job stress tests, both the Windows Task Manager CPU monitor and HERCGUI CPU monitor show 100% on all four cores. But, Windows still reports that I never go over 25% usage of installed memory (16Gb). The performance increase using SSDs is simply amazing. The Dell box runs cool and quiet. The fans never turn on. Plus, you never defrag the partitions. How can you beat that?



I’m not trying to get actual benchmarks. Rather, I’m trying to stress everything to see what she’ll do flat out.



From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Tuesday, January 31, 2017 10:49 AM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server, Updated

That's an amazing mips rate. I looked up the E3-1225 v3, and Passmark rates it as 7080 vs the x5680 which is rated at 8767. Yet as an 8-way, the best I've seen is 350mips and 2400 io/s. I don't have SSDs, just iron drives but run as raid with a 512MB controller that delivers 280MB/sec avg at full blast.

I'm using 3.12, so if that's the sort of performance boost Hyperion gives, that's sure a good reason to upgrade.

---In hercules-***@yahoogroups.com, <***@...> wrote :

17:02:26.928 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30 17:02:26 2017

17:02:26.928 00000D8C HHC02272I MIPS: 910.316468

17:02:26.928 00000D8C HHC02272I IO/s: 2900

17:02:26.928 00000D8C HHC02272I Current interval is 1440 minutes

The MIPS number went up by 98.168611, or 12.09%. Again, that’s significant. But IOs per second decreased by 336 or 10.38%.

_____

Posted by: ***@sympatico.ca

.

<http://geo.yahoo.com/serv?s=97359714/grpId=342064/grpspId=1707281942/msgId=81033/stime=1485881365>
<http://y.analytics.yahoo.com/fpc.pl?ywarid=515FB27823A7407E&a=10001310322279&js=no&resp=img&cf12=CP>
williaj@sympatico.ca [hercules-390]
2017-01-31 18:22:18 UTC
Permalink
The 350 mips I mentioned, that's from the maxrates command, not any benchmark. This seems very odd. You have a 4 core box running eight jobs, and I have a 12 core box running 8 jobs. So in theory, each of my emulated cpu engines should get it's own core, while yours have to double up. Yet even so, you still get more than twice the speed, even though the 1225 is only about 40% faster on a per core basis than the x5680.

Maybe I should have bought a Dell. I can't run my machine at more than around 60% for any length of time or else the fans switch to high and it's just like being on the raised floor, heh. On the rare occasions when I reboot, the bios powers them to full for five minutes, and even though the rack is in the far basement, someone invariably starts shouting down the stairs that they can hear it and it better stop soon, lol.

---In hercules-***@yahoogroups.com, <***@...> wrote :

I’m not looking at the performance numbers generated by NBENCH itself, just the numbers from HERCGUI as to work load. I wanted to drive my machine to the max and see what HERCGUI says. As with all my ten job stress tests, both the Windows Task Manager CPU monitor and HERCGUI CPU monitor show 100% on all four cores. But, Windows still reports that I never go over 25% usage of installed memory (16Gb). The performance increase using SSDs is simply amazing. The Dell box runs cool and quiet. The fans never turn on. Plus, you never defrag the partitions. How can you beat that?

I’m not trying to get actual benchmarks. Rather, I’m trying to stress everything to see what she’ll do flat out.

From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Tuesday, January 31, 2017 10:49 AM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server, Updated


That's an amazing mips rate. I looked up the E3-1225 v3, and Passmark rates it as 7080 vs the x5680 which is rated at 8767. Yet as an 8-way, the best I've seen is 350mips and 2400 io/s. I don't have SSDs, just iron drives but run as raid with a 512MB controller that delivers 280MB/sec avg at full blast.

I'm using 3.12, so if that's the sort of performance boost Hyperion gives, that's sure a good reason to upgrade.
---In hercules-***@yahoogroups.com mailto:hercules-***@yahoogroups.com, <***@... mailto:***@...> wrote :
17:02:26.928 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30 17:02:26 2017
17:02:26.928 00000D8C HHC02272I MIPS: 910.316468
17:02:26.928 00000D8C HHC02272I IO/s: 2900
17:02:26.928 00000D8C HHC02272I Current interval is 1440 minutes
The MIPS number went up by 98.168611, or 12.09%. Again, that’s significant. But IOs per second decreased by 336 or 10.38%.






Posted by: ***@... mailto:***@...


.
'Dan Skomsky' poodles511@sbcglobal.net [hercules-390]
2017-01-31 23:44:21 UTC
Permalink
Well, I ran the ten NBENCH stress JOBS again today. This time I changed all the ‘SYSPRINT DD DUMMY’ statements to ‘SYSPRINT DD SYSOUT=*’. The result is that each job now generates 108,000+ lines of SYSPRINT, up from 72,000+ lines. The average JOB wall clock time now went to 6.73 minutes. That is down slightly from 6.93 minutes. Again while eight NBENCH jobs are running, all my CPU cores were pegged at 100%, but my memory usage number went down to 19%. Here are my new HERCGUI numbers which are impressive:



17:02:26.285 00000D8C HHC01603I maxrates

17:02:26.286 00000D8C HHC02272I Highest observed MIPS and IO/s rates:

17:02:26.286 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30 17:02:26 2017

17:02:26.286 00000D8C HHC02272I MIPS: 910.316468

17:02:26.287 00000D8C HHC02272I IO/s: 2900

17:02:26.287 00000D8C HHC02272I From Mon Jan 30 17:02:26 2017 to Tue Jan 31 17:02:26 2017

17:02:26.287 00000D8C HHC02272I MIPS: 1026.198338

17:02:26.287 00000D8C HHC02272I IO/s: 1820

17:02:26.288 00000D8C HHC02272I Current interval is 1440 minutes



Why the IO/s rate went down so drastically surprised me. I have no clue why that is. I guess the creation of SYSPRINT DUMMY files somehow result in more IOs than SYSOUT= files.





From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Tuesday, January 31, 2017 12:22 PM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server, Updated



The 350 mips I mentioned, that's from the maxrates command, not any benchmark. This seems very odd. You have a 4 core box running eight jobs, and I have a 12 core box running 8 jobs. So in theory, each of my emulated cpu engines should get it's own core, while yours have to double up. Yet even so, you still get more than twice the speed, even though the 1225 is only about 40% faster on a per core basis than the x5680.

Maybe I should have bought a Dell. I can't run my machine at more than around 60% for any length of time or else the fans switch to high and it's just like being on the raised floor, heh. On the rare occasions when I reboot, the bios powers them to full for five minutes, and even though the rack is in the far basement, someone invariably starts shouting down the stairs that they can hear it and it better stop soon, lol.

---In hercules-***@yahoogroups.com, <***@...> wrote :

I’m not looking at the performance numbers generated by NBENCH itself, just the numbers from HERCGUI as to work load. I wanted to drive my machine to the max and see what HERCGUI says. As with all my ten job stress tests, both the Windows Task Manager CPU monitor and HERCGUI CPU monitor show 100% on all four cores. But, Windows still reports that I never go over 25% usage of installed memory (16Gb). The performance increase using SSDs is simply amazing. The Dell box runs cool and quiet. The fans never turn on. Plus, you never defrag the partitions. How can you beat that?

I’m not trying to get actual benchmarks. Rather, I’m trying to stress everything to see what she’ll do flat out.

From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Tuesday, January 31, 2017 10:49 AM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server, Updated

That's an amazing mips rate. I looked up the E3-1225 v3, and Passmark rates it as 7080 vs the x5680 which is rated at 8767. Yet as an 8-way, the best I've seen is 350mips and 2400 io/s. I don't have SSDs, just iron drives but run as raid with a 512MB controller that delivers 280MB/sec avg at full blast.

I'm using 3.12, so if that's the sort of performance boost Hyperion gives, that's sure a good reason to upgrade.

---In hercules-***@yahoogroups.com, <***@...> wrote :

17:02:26.928 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30 17:02:26 2017

17:02:26.928 00000D8C HHC02272I MIPS: 910.316468

17:02:26.928 00000D8C HHC02272I IO/s: 2900

17:02:26.928 00000D8C HHC02272I Current interval is 1440 minutes

The MIPS number went up by 98.168611, or 12.09%. Again, that’s significant. But IOs per second decreased by 336 or 10.38%.

_____

Posted by: ***@sympatico.ca
Mike Schwab Mike.A.Schwab@gmail.com [hercules-390]
2017-01-31 23:48:15 UTC
Permalink
Maybe because it had to do actual I/O it waited until the I/O buffer filled
up?
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
Well, I ran the ten NBENCH stress JOBS again today. This time I changed
all the ‘SYSPRINT DD DUMMY’ statements to ‘SYSPRINT DD SYSOUT=*’. The
result is that each job now generates 108,000+ lines of SYSPRINT, up from
72,000+ lines. The average JOB wall clock time now went to 6.73 minutes.
That is down slightly from 6.93 minutes. Again while eight NBENCH jobs are
running, all my CPU cores were pegged at 100%, but my memory usage number
17:02:26.285 00000D8C HHC01603I maxrates
17:02:26.286 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30 17:02:26 2017
17:02:26.286 00000D8C HHC02272I MIPS: 910.316468
17:02:26.287 00000D8C HHC02272I IO/s: 2900
17:02:26.287 00000D8C HHC02272I From Mon Jan 30 17:02:26 2017 to Tue Jan 31 17:02:26 2017
17:02:26.287 00000D8C HHC02272I MIPS: 1026.198338
17:02:26.287 00000D8C HHC02272I IO/s: 1820
17:02:26.288 00000D8C HHC02272I Current interval is 1440 minutes
Why the IO/s rate went down so drastically surprised me. I have no clue
why that is. I guess the creation of SYSPRINT DUMMY files somehow result
in more IOs than SYSOUT= files.
*Sent:* Tuesday, January 31, 2017 12:22 PM
*Subject:* RE: [hercules-390] Stress Testing Hyperion under Windows
Server, Updated
The 350 mips I mentioned, that's from the maxrates command, not any
benchmark. This seems very odd. You have a 4 core box running eight jobs,
and I have a 12 core box running 8 jobs. So in theory, each of my emulated
cpu engines should get it's own core, while yours have to double up. Yet
even so, you still get more than twice the speed, even though the 1225 is
only about 40% faster on a per core basis than the x5680.
Maybe I should have bought a Dell. I can't run my machine at more than
around 60% for any length of time or else the fans switch to high and it's
just like being on the raised floor, heh. On the rare occasions when I
reboot, the bios powers them to full for five minutes, and even though the
rack is in the far basement, someone invariably starts shouting down the
stairs that they can hear it and it better stop soon, lol.
I’m not looking at the performance numbers generated by NBENCH itself,
just the numbers from HERCGUI as to work load. I wanted to drive my
machine to the max and see what HERCGUI says. As with all my ten job
stress tests, both the Windows Task Manager CPU monitor and HERCGUI CPU
monitor show 100% on all four cores. But, Windows still reports that I
never go over 25% usage of installed memory (16Gb). The performance
increase using SSDs is simply amazing. The Dell box runs cool and quiet.
The fans never turn on. Plus, you never defrag the partitions. How can
you beat that?
I’m not trying to get actual benchmarks. Rather, I’m trying to stress
everything to see what she’ll do flat out.
*Sent:* Tuesday, January 31, 2017 10:49 AM
*Subject:* RE: [hercules-390] Stress Testing Hyperion under Windows
Server, Updated
That's an amazing mips rate. I looked up the E3-1225 v3, and Passmark
rates it as 7080 vs the x5680 which is rated at 8767. Yet as an 8-way, the
best I've seen is 350mips and 2400 io/s. I don't have SSDs, just iron
drives but run as raid with a 512MB controller that delivers 280MB/sec avg
at full blast.
I'm using 3.12, so if that's the sort of performance boost Hyperion gives,
that's sure a good reason to upgrade.
17:02:26.928 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30 17:02:26 2017
17:02:26.928 00000D8C HHC02272I MIPS: 910.316468
17:02:26.928 00000D8C HHC02272I IO/s: 2900
17:02:26.928 00000D8C HHC02272I Current interval is 1440 minutes
The MIPS number went up by 98.168611, or 12.09%. Again, that’s
significant. But IOs per second decreased by 336 or 10.38%.
------------------------------
--
Mike A Schwab, Springfield IL USA
Where do Forest Rangers go to get away from it all?
'Dan Skomsky' poodles511@sbcglobal.net [hercules-390]
2017-02-01 23:32:51 UTC
Permalink
Thinking my HERCGUI numbers from yesterday were an anomaly, I re-ran my ten job NBENCH stress test again last evening. Here are the new numbers:



17:02:26.113 00000D8C HHC01603I maxrates

17:02:26.113 00000D8C HHC02272I Highest observed MIPS and IO/s rates:

17:02:26.114 00000D8C HHC02272I From Mon Jan 30 17:02:26 2017 to Tue Jan 31 17:02:26 2017

17:02:26.114 00000D8C HHC02272I MIPS: 1026.198338

17:02:26.114 00000D8C HHC02272I IO/s: 1820

17:02:26.114 00000D8C HHC02272I From Tue Jan 31 17:02:26 2017 to Wed Feb 01 17:02:26 2017

17:02:26.114 00000D8C HHC02272I MIPS: 1061.265864

17:02:26.114 00000D8C HHC02272I IO/s: 1612

17:02:26.114 00000D8C HHC02272I Current interval is 1440 minutes



I think I’ll bump up DEVTMAX value within the .CNF file from 16 to 32 and see what that does. Who knows, maybe there’s more that can be squeezed from my machine. It’s all virtual anyway.



From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Tuesday, January 31, 2017 5:48 PM
To: hercules-***@yahoogroups.com
Subject: Re: [hercules-390] Stress Testing Hyperion under Windows Server, Updated



Maybe because it had to do actual I/O it waited until the I/O buffer filled up?



On Tue, Jan 31, 2017 at 5:44 PM, 'Dan Skomsky' ***@sbcglobal.net [hercules-390] <hercules-***@yahoogroups.com> wrote:



Well, I ran the ten NBENCH stress JOBS again today. This time I changed all the ‘SYSPRINT DD DUMMY’ statements to ‘SYSPRINT DD SYSOUT=*’. The result is that each job now generates 108,000+ lines of SYSPRINT, up from 72,000+ lines. The average JOB wall clock time now went to 6.73 minutes. That is down slightly from 6.93 minutes. Again while eight NBENCH jobs are running, all my CPU cores were pegged at 100%, but my memory usage number went down to 19%. Here are my new HERCGUI numbers which are impressive:

17:02:26.285 00000D8C HHC01603I maxrates

17:02:26.286 00000D8C HHC02272I Highest observed MIPS and IO/s rates:

17:02:26.286 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30 17:02:26 2017

17:02:26.286 00000D8C HHC02272I MIPS: 910.316468

17:02:26.287 00000D8C HHC02272I IO/s: 2900

17:02:26.287 00000D8C HHC02272I From Mon Jan 30 17:02:26 2017 to Tue Jan 31 17:02:26 2017

17:02:26.287 00000D8C HHC02272I MIPS: 1026.198338

17:02:26.287 00000D8C HHC02272I IO/s: 1820

17:02:26.288 00000D8C HHC02272I Current interval is 1440 minutes

Why the IO/s rate went down so drastically surprised me. I have no clue why that is. I guess the creation of SYSPRINT DUMMY files somehow result in more IOs than SYSOUT= files.

From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Tuesday, January 31, 2017 12:22 PM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server, Updated

The 350 mips I mentioned, that's from the maxrates command, not any benchmark. This seems very odd. You have a 4 core box running eight jobs, and I have a 12 core box running 8 jobs. So in theory, each of my emulated cpu engines should get it's own core, while yours have to double up. Yet even so, you still get more than twice the speed, even though the 1225 is only about 40% faster on a per core basis than the x5680.

Maybe I should have bought a Dell. I can't run my machine at more than around 60% for any length of time or else the fans switch to high and it's just like being on the raised floor, heh. On the rare occasions when I reboot, the bios powers them to full for five minutes, and even though the rack is in the far basement, someone invariably starts shouting down the stairs that they can hear it and it better stop soon, lol.

---In hercules-***@yahoogroups.com, <***@...> wrote :

I’m not looking at the performance numbers generated by NBENCH itself, just the numbers from HERCGUI as to work load. I wanted to drive my machine to the max and see what HERCGUI says. As with all my ten job stress tests, both the Windows Task Manager CPU monitor and HERCGUI CPU monitor show 100% on all four cores. But, Windows still reports that I never go over 25% usage of installed memory (16Gb). The performance increase using SSDs is simply amazing. The Dell box runs cool and quiet. The fans never turn on. Plus, you never defrag the partitions. How can you beat that?

I’m not trying to get actual benchmarks. Rather, I’m trying to stress everything to see what she’ll do flat out.

From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Tuesday, January 31, 2017 10:49 AM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server, Updated

That's an amazing mips rate. I looked up the E3-1225 v3, and Passmark rates it as 7080 vs the x5680 which is rated at 8767. Yet as an 8-way, the best I've seen is 350mips and 2400 io/s. I don't have SSDs, just iron drives but run as raid with a 512MB controller that delivers 280MB/sec avg at full blast.

I'm using 3.12, so if that's the sort of performance boost Hyperion gives, that's sure a good reason to upgrade.

---In hercules-***@yahoogroups.com, <***@...> wrote :

17:02:26.928 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30 17:02:26 2017

17:02:26.928 00000D8C HHC02272I MIPS: 910.316468

17:02:26.928 00000D8C HHC02272I IO/s: 2900

17:02:26.928 00000D8C HHC02272I Current interval is 1440 minutes

The MIPS number went up by 98.168611, or 12.09%. Again, that’s significant. But IOs per second decreased by 336 or 10.38%.

_____

Posted by: ***@sympatico.ca
--
Mike A Schwab, Springfield IL USA
Where do Forest Rangers go to get away from it all?



_____

Posted by: Mike Schwab <***@gmail.com>

.

<http://geo.yahoo.com/serv?s=97359714/grpId=342064/grpspId=1707281942/msgId=81039/stime=1485906504> <http://y.analytics.yahoo.com/fpc.pl?ywarid=515FB27823A7407E&a=10001310322279&js=no&resp=img&cf12=CP>
williaj@sympatico.ca [hercules-390]
2017-02-01 23:43:35 UTC
Permalink
I downloaded NBENCH from the files area to try it, since there must be something about the different programs that affect the different MIPS rates. I have 9 NBENCH jobs running and so far the maxrates command now says that MIPS = 959 and max io/s is 3650. Task manager says my 24 cores (HT is turned on) are at 34%. So that sounds better. Elapsed time comes in at 6.76 minutes for each job.

I did some testing and much to my surprise, Hercules performs better with Hyper-Threading than without. I had thought that having individual cores for each thread isolated from the others would yield better performance, since it would avoid the occasional stalls you get with the "other" HT core within the same core. That doesn't seem to be the case: with HT off, a single cpu gives 52 mips (with my homebrew test) and with HT on, it goes up to 60 mips. I guess Windows must do a decent job of dispatching to try and avoid using both sides of a single core. Interestingly, during IPL before the SIGP to start the other 7 cpu's, maxrates says the single engine has reached 207 mips. So from a theoretical 1642 to 959 mips, that's fairly hefty overhead for running as an 8-way.

---In hercules-***@yahoogroups.com, <***@...> wrote :

Well, I ran the ten NBENCH stress JOBS again today. This time I changed all the ‘SYSPRINT DD DUMMY’ statements to ‘SYSPRINT DD SYSOUT=*’. The result is that each job now generates 108,000+ lines of SYSPRINT, up from 72,000+ lines. The average JOB wall clock time now went to 6.73 minutes. That is down slightly from 6.93 minutes. Again while eight NBENCH jobs are running, all my CPU cores were pegged at 100%, but my memory usage number went down to 19%. Here are my new HERCGUI numbers which are impressive:

17:02:26.285 00000D8C HHC01603I maxrates
17:02:26.286 00000D8C HHC02272I Highest observed MIPS and IO/s rates:
17:02:26.286 00000D8C HHC02272I From Sun Jan 29 17:02:26 2017 to Mon Jan 30 17:02:26 2017
17:02:26.286 00000D8C HHC02272I MIPS: 910.316468
17:02:26.287 00000D8C HHC02272I IO/s: 2900
17:02:26.287 00000D8C HHC02272I From Mon Jan 30 17:02:26 2017 to Tue Jan 31 17:02:26 2017
17:02:26.287 00000D8C HHC02272I MIPS: 1026.198338
17:02:26.287 00000D8C HHC02272I IO/s: 1820
17:02:26.288 00000D8C HHC02272I Current interval is 1440 minutes

Why the IO/s rate went down so drastically surprised me. I have no clue why that is. I guess the creation of SYSPRINT DUMMY files somehow result in more IOs than SYSOUT= files.
wsebo@yahoo.com [hercules-390]
2017-02-03 11:44:24 UTC
Permalink
Hey,
nbench is just a JCL right? Is this MVS version running in z/OS https://groups.yahoo.com/neo/groups/hercules-390/files/MIPs%20Testing/ https://groups.yahoo.com/neo/groups/hercules-390/files/MIPs%20Testing/?
All I wanted to know initially was how to benchmark my system :)
'Dan Skomsky' poodles511@sbcglobal.net [hercules-390]
2017-02-03 12:35:25 UTC
Permalink
The NBENCH JOB I’ve been running contains all the Assembler source and JCL to Assemble and Link Edit to build NBENCH. It looks like I loaded it on 08/18/2016. It was loaded on my machine as nbench-b1.zip.



From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Friday, February 03, 2017 5:44 AM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server, Updated

Hey,
nbench is just a JCL right? Is this MVS version running in z/OS https://groups.yahoo.com/neo/groups/hercules-390/files/MIPs%20Testing/?
All I wanted to know initially was how to benchmark my system :)



_____

Posted by: ***@yahoo.com

_____
williaj@sympatico.ca [hercules-390]
2017-02-04 04:52:26 UTC
Permalink
It looks like DEVTMAX does matter for NBENCH. I ran with it set to 18 and the results were not as good as with only eight. I just tried it with DEVT set to 10 and have seen the best result so far, MIPS=1053
and IO/s = 3789. Still only 207 mips for a single cpu during ipl, so I guess that's as good as it gets for this box.

---In hercules-***@yahoogroups.com, <***@...> wrote :

The NBENCH JOB I’ve been running contains all the Assembler source and JCL to Assemble and Link Edit to build NBENCH. It looks like I loaded it on 08/18/2016. It was loaded on my machine as nbench-b1.zip.
'Dan Skomsky' poodles511@sbcglobal.net [hercules-390]
2017-02-04 11:33:41 UTC
Permalink
I bumped DEVTMAX to 32 and my MIPS number dropped to 879. I’m dropping DEVTMAX back down to 16 and leaving well enough alone.



From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Friday, February 03, 2017 10:52 PM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server, Updated

It looks like DEVTMAX does matter for NBENCH. I ran with it set to 18 and the results were not as good as with only eight. I just tried it with DEVT set to 10 and have seen the best result so far, MIPS=1053
and IO/s = 3789. Still only 207 mips for a single cpu during ipl, so I guess that's as good as it gets for this box.

---In hercules-***@yahoogroups.com, <***@...> wrote :

The NBENCH JOB I’ve been running contains all the Assembler source and JCL to Assemble and Link Edit to build NBENCH. It looks like I loaded it on 08/18/2016. It was loaded on my machine as nbench-b1.zip.

_____

Posted by: ***@sympatico.ca

_____

___
'Dan Skomsky' poodles511@sbcglobal.net [hercules-390]
2017-02-05 21:30:42 UTC
Permalink
With DEVTMAX set back to 16, the MIPS number jumped to 1028 . After more testing, I have found that my maximum MIPS number was accomplished by setting DEVTMAX to 14. At that setting my MIPS number reached 1179.



From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Saturday, February 04, 2017 5:34 AM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server, Updated





I bumped DEVTMAX to 32 and my MIPS number dropped to 879. I’m dropping DEVTMAX back down to 16 and leaving well enough alone.



From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Friday, February 03, 2017 10:52 PM
To: hercules-***@yahoogroups.com
Subject: RE: [hercules-390] Stress Testing Hyperion under Windows Server, Updated

It looks like DEVTMAX does matter for NBENCH. I ran with it set to 18 and the results were not as good as with only eight. I just tried it with DEVT set to 10 and have seen the best result so far, MIPS=1053
and IO/s = 3789. Still only 207 mips for a single cpu during ipl, so I guess that's as good as it gets for this box.

---In hercules-***@yahoogroups.com, <***@...> wrote :

The NBENCH JOB I’ve been running contains all the Assembler source and JCL to Assemble and Link Edit to build NBENCH. It looks like I loaded it on 08/18/2016. It was loaded on my machine as nbench-b1.zip.

_____

Posted by: ***@sympatico.ca

_____

___
williaj@sympatico.ca [hercules-390]
2017-03-17 15:44:47 UTC
Permalink
I just bought an i7-7700K which I think has just about the fastest single core speed there is. With DEVTMAX defaulting to 8, I get a maxrates of 1222MIPS, 6908 SIOS running with eight cpus. A single cpu yields a maxrates of around 242 MIPS, although my more generic benchmark says 119 MIPS.

---In hercules-***@yahoogroups.com, <***@...> wrote :

With DEVTMAX set back to 16, the MIPS number jumped to 1028 . After more testing, I have found that my maximum MIPS number was accomplished by setting DEVTMAX to 14. At that setting my MIPS number reached 1179.






___
williaj@sympatico.ca [hercules-390]
2018-03-17 16:47:26 UTC
Permalink
I just built a new system for Hercules: a p8700K with 64GB of 2666MHz ram and a 12TB Raid 5 array. Running eight copies of the NBENCH program with eight cpus defined to Herucles 3.13, I get a maxrates of:

MIPS: 1464
SIOS: 3908

which is about 183mips per engine Looks to be about the range of 2098-K series speed..

While a better raid card would give better i/o performance, the p8700K is the current x86 processor with the fastest single thread performance. So I guess it'll be a while before we see faster CPU rates.
poodles511@sbcglobal.net [hercules-390]
2018-03-17 17:01:37 UTC
Permalink
Impressive!
Giuseppe Vitillaro giuseppe@vitillaro.org [hercules-390]
2018-03-19 18:30:40 UTC
Permalink
Post by ***@sbcglobal.net [hercules-390]
Impressive!
Yep.

Wonder how would perform the IBM z/PDT on the same system.

Anyone may figure out a number?

Peppe.
williaj@sympatico.ca [hercules-390]
2018-03-21 01:18:31 UTC
Permalink
Can you even do that on z/PDT ? I thought the max you could have was three engines.

My own little benchmark which is an old instruction rate calculator, testing branches, rr instructions, etc to give a general benchrmark for one cpu says the 8700K gives 119mips, same as the 7700k. In comparison on a real 2098-U05 which is 1561mips total or 312 per engine, in an lpar with 3 engines assigned(SHR not DED) I got a result of 283mips.

It's interesting that during the ipl process while only one cpu is active, you can see much higher maxrates. On the 7700K I think I saw 207mips peak, and on the 8700K I've seen 242mips peak. Yet once all eight cpus have been started, even while the system is just idling, no single processor can be driven hard enough to reach that high rate. Obviously Hercule "knows" that only one cpu is active and can presumably bypass some locking that is required once the other cpus become active.
Post by ***@sbcglobal.net [hercules-390]
Impressive!
Yep.

Wonder how would perform the IBM z/PDT on the same system.

Anyone may figure out a number?

Peppe.
Giuseppe Vitillaro giuseppe@vitillaro.org [hercules-390]
2018-03-21 16:43:15 UTC
Permalink
Post by ***@sympatico.ca [hercules-390]
Can you even do that on z/PDT ? I thought the max you could have was three engines.
Apologies, of course I do not own a copy of z/PDT, so I'm
almost unaware of details about the IBM emulator. Guess
is not a technical constraint, rather a markerting "lock" ;-)?

But I'm still interested to understand the "ratio"
between the hercules and the z/PDT emulator, even
for a single-core/single-emulated-CPU, for "high
end" Intel processors.

For what I've been reading, the z/PDT outperform
the hercules emulator (how much it actually does,
if it even does?) and I'm asking to myself, and
to this group, if the performance gap may be due
to the fact z/PDT is a "Just in Time" emulator
and not an "interpreter" like hercules.

I hope it is a question which may be legally
asked on a public group and that is "on topic"
on the hercules emulator group.

At the end is more related to understand
the hercules emulator than the IBM z/PDT,
which, probably, I'll never use in my life.

Peppe.
'Dave Wade' dave.g4ugm@gmail.com [hercules-390]
2018-03-21 17:19:35 UTC
Permalink
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
-----Original Message-----
Sent: 21 March 2018 16:43
Subject: Re: [hercules-390] Re: Hercules performance
Post by ***@sympatico.ca [hercules-390]
Can you even do that on z/PDT ? I thought the max you could have was three engines.
Apologies, of course I do not own a copy of z/PDT, so I'm almost unaware
of
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
details about the IBM emulator. Guess is not a technical constraint,
rather a
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
markerting "lock" ;-)?
But I'm still interested to understand the "ratio"
between the hercules and the z/PDT emulator, even for a
single-core/single-
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
emulated-CPU, for "high end" Intel processors.
For what I've been reading, the z/PDT outperform the hercules emulator
(how much it actually does, if it even does?) and I'm asking to myself,
and to
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
this group, if the performance gap may be due to the fact z/PDT is a "Just
in
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
Time" emulator and not an "interpreter" like hercules.
This paper, which is rather old, appears to be dated 2011 despite the URL
below

https://www.itconline.com/wp-content/uploads/2017/07/What-is-zPDT.pdf

claims 30% faster. I suspect that's based on Hercules V3 but I think V4 is
measurably slower...
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
I hope it is a question which may be legally asked on a public group and
that is
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
"on topic"
on the hercules emulator group.
You can legally ask. Some folks may not be able to comment due to NDA. The
NDA may prohibit them even mentioning they know anything.
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
At the end is more related to understand the hercules emulator than the
IBM z/PDT, which, probably, I'll never use in my life.
Hercules was always designed to be as portable as possible. Lets face it, it
is even runs on a Raspberry Pi, albeit slowly..
This leads to non-optimal code in many places.
V3 does virtually no emulation of the channel or control units. It never
returns Channel or Controller busy.
V4 has I believe (I haven't looked at the code) a more complete emulation of
channels and controllers which may account for the slowdown.
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
Peppe.
Dave
Giuseppe Vitillaro giuseppe@vitillaro.org [hercules-390]
2018-03-23 09:59:46 UTC
Permalink
Post by Giuseppe Vitillaro ***@vitillaro.org [hercules-390]
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
-----Original Message-----
Sent: 21 March 2018 16:43
Subject: Re: [hercules-390] Re: Hercules performance
Post by ***@sympatico.ca [hercules-390]
Can you even do that on z/PDT ? I thought the max you could have was three engines.
Apologies, of course I do not own a copy of z/PDT, so I'm almost unaware
of
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
details about the IBM emulator. Guess is not a technical constraint,
rather a
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
markerting "lock" ;-)?
But I'm still interested to understand the "ratio"
between the hercules and the z/PDT emulator, even for a
single-core/single-
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
emulated-CPU, for "high end" Intel processors.
For what I've been reading, the z/PDT outperform the hercules emulator
(how much it actually does, if it even does?) and I'm asking to myself,
and to
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
this group, if the performance gap may be due to the fact z/PDT is a "Just
in
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
Time" emulator and not an "interpreter" like hercules.
This paper, which is rather old, appears to be dated 2011 despite the URL
below
https://www.itconline.com/wp-content/uploads/2017/07/What-is-zPDT.pdf
claims 30% faster. I suspect that's based on Hercules V3 but I think V4 is
measurably slower...
Thanks for the reference, Dave, appreciated.
Post by Giuseppe Vitillaro ***@vitillaro.org [hercules-390]
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
I hope it is a question which may be legally asked on a public group and
that is
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
"on topic"
on the hercules emulator group.
You can legally ask. Some folks may not be able to comment due to NDA. The
NDA may prohibit them even mentioning they know anything.
Of course, and for the number of answers I've received so
far, it looks NDA works quite well ;-)
Post by Giuseppe Vitillaro ***@vitillaro.org [hercules-390]
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
At the end is more related to understand the hercules emulator than the
IBM z/PDT, which, probably, I'll never use in my life.
Hercules was always designed to be as portable as possible. Lets face it, it
is even runs on a Raspberry Pi, albeit slowly..
This leads to non-optimal code in many places.
V3 does virtually no emulation of the channel or control units. It never
returns Channel or Controller busy.
V4 has I believe (I haven't looked at the code) a more complete emulation of
channels and controllers which may account for the slowdown.
Yep, I understand the rational behind hercules being a "pure interpreter":
written this way, the hercules emulator may be available everywhere there
is a reasonable C/POSIX environment.

But a JIT emulator for the Intel platform looks so appealing
to me ... ;-)

By the way. Is the qemu-s390x a JIT emulator on the Intel
platform?

Peppe.
'Dave Wade' dave.g4ugm@gmail.com [hercules-390]
2018-03-23 10:09:00 UTC
Permalink
But a JIT emulator for the Intel platform looks so appealing to me ... ;-)
By the way. Is the qemu-s390x a JIT emulator on the Intel platform?
Yes but only supported guest OS is Linux so I would guess no CKD DASD
devices, no 3270...
Peppe.
Dave
Giuseppe Vitillaro giuseppe@vitillaro.org [hercules-390]
2018-03-23 10:11:41 UTC
Permalink
Post by 'Dave Wade' ***@gmail.com [hercules-390]
But a JIT emulator for the Intel platform looks so appealing to me ... ;-)
By the way. Is the qemu-s390x a JIT emulator on the Intel platform?
Yes but only supported guest OS is Linux so I would guess no CKD DASD
devices, no 3270...
Don't guess, Dave. You may bet on this point.

Pure Linux distro environment.

Peppe.
Giuseppe Vitillaro giuseppe@vitillaro.org [hercules-390]
2018-03-23 10:24:24 UTC
Permalink
Post by Giuseppe Vitillaro ***@vitillaro.org [hercules-390]
Post by 'Dave Wade' ***@gmail.com [hercules-390]
But a JIT emulator for the Intel platform looks so appealing to me ... ;-)
By the way. Is the qemu-s390x a JIT emulator on the Intel platform?
Yes but only supported guest OS is Linux so I would guess no CKD DASD
devices, no 3270...
Don't guess, Dave. You may bet on this point.
Pure Linux distro environment.
Peppe.
By the way. If we "cross" recent posts on the hercules groups,
we may easily reach the conclusion a QEMU/JIT emulator,
capable to IPL an IBM proprietary OS, would easily perform
almost as a "real" mainframe.

Am I wrong, Dave?

Peppe.
'Dave Wade' dave.g4ugm@gmail.com [hercules-390]
2018-03-23 11:17:22 UTC
Permalink
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
-----Original Message-----
Sent: 23 March 2018 10:24
Subject: RE: [hercules-390] Re: Hercules performance
[hercules-390]
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
Post by Giuseppe Vitillaro ***@vitillaro.org [hercules-390]
Post by 'Dave Wade' ***@gmail.com [hercules-390]
But a JIT emulator for the Intel platform looks so appealing to me ... ;-)
By the way. Is the qemu-s390x a JIT emulator on the Intel platform?
Yes but only supported guest OS is Linux so I would guess no CKD DASD
devices, no 3270...
Don't guess, Dave. You may bet on this point.
Pure Linux distro environment.
Peppe.
By the way. If we "cross" recent posts on the hercules groups, we may
easily
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
reach the conclusion a QEMU/JIT emulator, capable to IPL an IBM
proprietary
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
OS, would easily perform almost as a "real" mainframe.
Am I wrong, Dave?
Now you are guessing. A "real" mainframe consists of many interconnected
subsystems. So we know we can achieve the CPU speeds, but that's always been
the case.
However, for zVM how do you implement SIE, how does that hold up under a
double JIT emulation. What about CKD emulation?
However for handling many former mainframe environments in SME businesses
even Hercules would provide totally adequate performance with suitable
pimping of the underlying hardware.
PCI Express is already inherently "multi channel". Fibre connected disks use
the same technology as the Storage Area Networks IBM uses for Z SANs.
https://www.lzlabs.com/ seem to feel that way as well.

Trouble is whilst I can't licence zVM or zOS (or even zDOS) it is pointless
discussing it.
Post by 'Dan Skomsky' ***@sbcglobal.net [hercules-390]
Peppe.
Dave
Ivan Warren ivan@vmfacility.fr [hercules-390]
2018-03-23 11:20:53 UTC
Permalink
Post by Giuseppe Vitillaro ***@vitillaro.org [hercules-390]
By the way. Is the qemu-s390x a JIT emulator on the Intel
platform?
the "tcg" accelerator (Tiny Code Generator) for qemu is a JIT compiler
that allows anything to run on anything (not just s390x on Intel)..

Basically, you have front ends (the emulated architecture) and back ends
(the architecture you run on)... about every combination works (there
are a few restrictions).

But basically qemu allows you to emulate say.. an ARM processor on a
z/Arch machine, a Power 9 processor on a MIPS system... or whatever you
chose !

If the host system supports kvm, then you can use it as well (I think
the only architectures supporting kvm are Intel, Power and s390x (but
the latter requires a SIE capable host - such as hercules - but not qemu
s390x tcg itself).

(ps : Yes I tried it - kvm on a linux guest runing under hercules works)

Concerning the qemu s390x tcg implementation, it's only a partial
implementation. There are some features of z/Architecture which are not
implemented at all.. It really only implements the feaures needed by linux.

However, it is certainly faster than hercules (about 6 times faster for
general purpose compute intensive code).

For example :
qemu-s390x :
$ openssl speed rsa1024
Doing 1024 bit private rsa's for 10s: 1420 1024 bit private RSA's in 9.81s
Doing 1024 bit public rsa's for 10s: 20085 1024 bit public RSA's in 9.80s
OpenSSL 1.1.0g  2 Nov 2017
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS
-DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM
-DAES_ASM -DAES_CTR_ASM -DAES_XTS_ASM -DGHASH_ASM -DPOLY1305_ASM
-DOPENSSLDIR="\"/usr/lib/ssl\""
-DENGINESDIR="\"/usr/lib/s390x-linux-gnu/engines-1.1\""
                  sign    verify    sign/s verify/s
rsa 1024 bits 0.006908s 0.000488s    144.8   2049.5

hercules:
~$ openssl speed rsa1024
Doing 1024 bit private rsa's for 10s: 199 1024 bit private RSA's in 9.78s
Doing 1024 bit public rsa's for 10s: 4230 1024 bit public RSA's in 9.74s
OpenSSL 1.1.0g  2 Nov 2017
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS
-DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM
-DAES_ASM -DAES_CTR_ASM -DAES_XTS_ASM -DGHASH_ASM -DPOLY1305_ASM
-DOPENSSLDIR="\"/usr/lib/ssl\""
-DENGINESDIR="\"/usr/lib/s390x-linux-gnu/engines-1.1\""
                  sign    verify    sign/s verify/s
rsa 1024 bits 0.049146s 0.002303s     20.3    434.3

On a side note... Doing the same test with SHA256 for example, hercules
is faster (because hercules implements the Message Security Assist
instructions while qemu-s390x do not implement all of them)

--Ivan



[Non-text portions of this message have been removed]
Giuseppe Vitillaro giuseppe@vitillaro.org [hercules-390]
2018-03-23 12:19:42 UTC
Permalink
Post by Ivan Warren ***@vmfacility.fr [hercules-390]
Post by Giuseppe Vitillaro ***@vitillaro.org [hercules-390]
By the way. Is the qemu-s390x a JIT emulator on the Intel
platform?
the "tcg" accelerator (Tiny Code Generator) for qemu is a JIT compiler
that allows anything to run on anything (not just s390x on Intel)..
Basically, you have front ends (the emulated architecture) and back ends
(the architecture you run on)... about every combination works (there
are a few restrictions).
But basically qemu allows you to emulate say.. an ARM processor on a
z/Arch machine, a Power 9 processor on a MIPS system... or whatever you
chose !
If the host system supports kvm, then you can use it as well (I think
the only architectures supporting kvm are Intel, Power and s390x (but
the latter requires a SIE capable host - such as hercules - but not qemu
s390x tcg itself).
(ps : Yes I tried it - kvm on a linux guest runing under hercules works)
Concerning the qemu s390x tcg implementation, it's only a partial
implementation. There are some features of z/Architecture which are not
implemented at all.. It really only implements the feaures needed by linux.
However, it is certainly faster than hercules (about 6 times faster for
general purpose compute intensive code).
$ openssl speed rsa1024
Doing 1024 bit private rsa's for 10s: 1420 1024 bit private RSA's in 9.81s
Doing 1024 bit public rsa's for 10s: 20085 1024 bit public RSA's in 9.80s
OpenSSL 1.1.0g  2 Nov 2017
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS
-DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM
-DAES_ASM -DAES_CTR_ASM -DAES_XTS_ASM -DGHASH_ASM -DPOLY1305_ASM
-DOPENSSLDIR="\"/usr/lib/ssl\""
-DENGINESDIR="\"/usr/lib/s390x-linux-gnu/engines-1.1\""
                  sign    verify    sign/s verify/s
rsa 1024 bits 0.006908s 0.000488s    144.8   2049.5
~$ openssl speed rsa1024
Doing 1024 bit private rsa's for 10s: 199 1024 bit private RSA's in 9.78s
Doing 1024 bit public rsa's for 10s: 4230 1024 bit public RSA's in 9.74s
OpenSSL 1.1.0g  2 Nov 2017
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,char) des(int) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS
-DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT
-DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DRC4_ASM
-DAES_ASM -DAES_CTR_ASM -DAES_XTS_ASM -DGHASH_ASM -DPOLY1305_ASM
-DOPENSSLDIR="\"/usr/lib/ssl\""
-DENGINESDIR="\"/usr/lib/s390x-linux-gnu/engines-1.1\""
                  sign    verify    sign/s verify/s
rsa 1024 bits 0.049146s 0.002303s     20.3    434.3
On a side note... Doing the same test with SHA256 for example, hercules
is faster (because hercules implements the Message Security Assist
instructions while qemu-s390x do not implement all of them)
--Ivan
Thanks for the insigths, Ivan, appreciated.

And, Dave, yep, I'm "wildly guessing" on top of what Ivan is writing,
6x hercules emulator, and on top of hercules CPU (CPU-ONLY,
I understand the point Dave) numbers recently published on
a previous post.

No aim to think an "emulated mainframe" as an "hardware mainframe",
beside being basically a UNIX people, I'm not so naive.

I've a picture of the differences, at the level of "physical systems",
collections of "particles", between an home server in the 1.000$ range and
a mainframe in 1.000.000$ range, Dave.

I'm speaking only from the point of view of an hobbyst, perhaps
of a people involved in education and research.

From this point of view an "emulated mainframe", running
in a 1.000$ PC, with the CPU-ONLY perfomance of a real
mainframe CPU ... is ... well ... appealing.

Appealing, being a Linux user, beside being a 370 hobbyst,
even if it doesn't IPL a proprietary IBM OS.

And infact I have a Gentoo qemu-s390x up and running on my Intel desktop
(thanks for your advices Ivan) as I already had under the hercules
emulator.

Peppe.


[Non-text portions of this message have been removed]
w.f.j.mueller@gsi.de [hercules-390]
2018-04-04 18:12:19 UTC
Permalink
Hi,

Hercules is in fact a simple interpreter, it reads the s/390 code instructions one by one, and executes for each s/390 instruction a native handling routine which implements the s/390 instruction. That's a simple linear process. z/PDT is a binary translator https://en.wikipedia.org/wiki/Binary_translation, the s/390 source code is first split into basic blocks, than on-the-fly (or JIT) translated into native instructions, which are than executed. The translation process can do optimizations, an obvious one is to eliminate the calculation of condition code updates when the condition code is not used before it's over-written in a basic block. The CPU time spend for translation is well invested when the basic block is executed often. This clearly works best for s/390 instructions which can be translated into very few native instructions. For complex instructions, like decimal arithmetic, also the translator can only emit calls to run-time library functions. Same is true for s/370 style hex floating point arithmetic which can't be translated to a few native instructions. In all those cases the difference between Hercules and z/PDT comes from the efficiency of the routines handling the complex instructions. So it's reasonable to expect that a plain integer workload will run a lot faster on z/PDT, while a FORTRAN code using hex floating point or a COBOL or PL/I code using decimal arithmetic heavily will gain substantially less.

With best regards, Walter
Giuseppe Vitillaro giuseppe@vitillaro.org [hercules-390]
2018-04-05 08:32:13 UTC
Permalink
Post by ***@gsi.de [hercules-390]
Hi,
Hercules is in fact a simple interpreter, it reads the s/390 code
instructions one by one, and executes for each s/390 instruction a
native handling routine which implements the s/390 instruction. That's a
simple linear process. z/PDT is a binary translator
https://en.wikipedia.org/wiki/Binary_translation, the s/390 source code
is first split into basic blocks, than on-the-fly (or JIT) translated
into native instructions, which are than executed. The translation
process can do optimizations, an obvious one is to eliminate the
calculation of condition code updates when the condition code is not
used before it's over-written in a basic block. The CPU time spend for
translation is well invested when the basic block is executed often.
This clearly works best for s/390 instructions which can be translated
into very few native instructions. For complex instructions, like
decimal arithmetic, also the translator can only emit calls to run-time
library functions. Same is true for s/370 style hex floating point
arithmetic which can't be translated to a few native instructions. In
all those cases the difference between Hercules and z/PDT comes from the
efficiency of the routines handling the complex instructions. So it's
reasonable to expect that a plain integer workload will run a lot faster
on z/PDT, while a FORTRAN code using hex floating point or a COBOL or
PL/I code using decimal arithmetic heavily will gain substantially less.
With best regards, Walter
Thanks for the "formal" info, Walter.

I had bet z/PDT had a JIT translator ;-)

As the QEMU s390x-tcg emulator, if I got the picture.

Peppe.

Ivan Warren ivan@vmfacility.fr [hercules-390]
2018-03-19 18:32:41 UTC
Permalink
Hey !

What is the guest system and where can NBENCH be located ?

Thanks,

--Ivan
Post by ***@sympatico.ca [hercules-390]
I just built a new system for Hercules: a p8700K with 64GB of 2666MHz
ram and a 12TB Raid 5 array. Running  eight copies of the NBENCH
MIPS: 1464
SIOS: 3908
which is about 183mips per engine Looks to be about the range of 2098-K series speed..
 While a better raid card would give better i/o performance, the
p8700K is the current x86 processor with the fastest single thread
performance. So I guess it'll be a while before we see faster CPU rates.
[Non-text portions of this message have been removed]
williaj@sympatico.ca [hercules-390]
2018-03-20 00:45:55 UTC
Permalink
NBENCH is the files area under the mips testing folder.


---In hercules-***@yahoogroups.com, <***@...> wrote :

Hey !

What is the guest system and where can NBENCH be located ?

Thanks,

--Ivan
Post by ***@sympatico.ca [hercules-390]
I just built a new system for Hercules: a p8700K with 64GB of 2666MHz
ram and a 12TB Raid 5 array. Running eight copies of the NBENCH
MIPS: 1464
SIOS: 3908
which is about 183mips per engine Looks to be about the range of 2098-K series speed..
While a better raid card would give better i/o performance, the
p8700K is the current x86 processor with the fastest single thread
performance. So I guess it'll be a while before we see faster CPU rates.
[Non-text portions of this message have been removed]
williaj@sympatico.ca [hercules-390]
2018-03-29 02:22:28 UTC
Permalink
Did some more testing today on the 8700K

Six cpu: 1402 mips/5786 sios, single cpu=233 mips
five cpu: 1205 mips, single =241
four cpu: 969 mips, single = 242
Eight cpu: 1413 mips, single=176

It seems like Hercules does best as a six-way processor. This is true even on my two dual Xeon boxes where there are way more cores than there are cpu threads. At six cpus or less you seem to get the same performance per engine as what you see during ipl, when only the one cpu is active. Go past six, and the individual performance suffers. For single engine performance, four or less seems to yield the best result.
'Sternbach, William' william.sternbach@baml.com [hercules-390]
2018-03-29 11:16:17 UTC
Permalink
Is this with hyper-threading On or Off?

From: hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com]
Sent: Wednesday, March 28, 2018 10:22 PM
To: hercules-***@yahoogroups.com
Subject: [hercules-390] Re: Hercules performance



Did some more testing today on the 8700K

Six cpu: 1402 mips/5786 sios, single cpu=233 mips
five cpu: 1205 mips, single =241
four cpu: 969 mips, single = 242
Eight cpu: 1413 mips, single=176

It seems like Hercules does best as a six-way processor. This is true even on my two dual Xeon boxes where there are way more cores than there are cpu threads. At six cpus or less you seem to get the same performance per engine as what you see during ipl, when only the one cpu is active. Go past six, and the individual performance suffers. For single engine performance, four or less seems to yield the best result.


----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.


[Non-text portions of this message have been removed]
'\'Fish\' (David B. Trout)' david.b.trout@gmail.com [hercules-390]
2018-03-29 18:51:05 UTC
Permalink
Post by 'Sternbach, William' ***@baml.com [hercules-390]
Post by ***@sympatico.ca [hercules-390]
Did some more testing today on the 8700K
[...]
Post by 'Sternbach, William' ***@baml.com [hercules-390]
Post by ***@sympatico.ca [hercules-390]
It seems like Hercules does best as a six-way processor.
This is true even on my two dual Xeon boxes where there
are way more cores than there are cpu threads.
[...]
Post by 'Sternbach, William' ***@baml.com [hercules-390]
Post by ***@sympatico.ca [hercules-390]
Go past six, and the individual performance suffers.
<snip>
Post by 'Sternbach, William' ***@baml.com [hercules-390]
Is this with hyper-threading On or Off?
Excellent question! I too am curious about that.

When I bought my first multi-core system years ago (this was probably 10 years ago), my own testing revealed that both Hercules as well as Microsoft's Visual C++ compiler performed better when hyper-threading was disabled. (And before you ask, the version of Windows I was using at the time WAS hyper-thread aware!)

williaj? (tsagoth??) What do the numbers look like when hyper-threading is disabled/enabled?

Thanks!
--
"Fish" (David B. Trout)
Software Development Laboratories
http://www.softdevlabs.com
mail: ***@softdevlabs.com
williaj@sympatico.ca [hercules-390]
2018-03-29 23:04:17 UTC
Permalink
Ok, with HT off I get:

mips/sios
8-way 1386/2809 = 173
6-WAY 1430/1093 = 238
4-WAY 983/na = 245

So a modest improvement. I've been puzzling over why the SIO rate gets lower with each run. Would it be the case that an i/o satisfied by a cache hit isn't counted ? I'm running flat CKD images, so there's no compression involved. W10 in the 8-way case reported that the cpu was running at 102% of capacity, heh.

The 8700K is a six core cpu, so not too surprising that 8-way is so bad, although I thought it would have been a bit better with HT on than it actually is.

The big drop in SIOs I attribute to the lack of processor capacity to run the device threads.
Post by 'Sternbach, William' ***@baml.com [hercules-390]
Post by ***@sympatico.ca [hercules-390]
Did some more testing today on the 8700K
[...]
Post by 'Sternbach, William' ***@baml.com [hercules-390]
Post by ***@sympatico.ca [hercules-390]
It seems like Hercules does best as a six-way processor.
This is true even on my two dual Xeon boxes where there
are way more cores than there are cpu threads.
[...]
Post by 'Sternbach, William' ***@baml.com [hercules-390]
Post by ***@sympatico.ca [hercules-390]
Go past six, and the individual performance suffers.
<snip>
Post by 'Sternbach, William' ***@baml.com [hercules-390]
Is this with hyper-threading On or Off?
Excellent question! I too am curious about that.

When I bought my first multi-core system years ago (this was probably 10 years ago), my own testing revealed that both Hercules as well as Microsoft's Visual C++ compiler performed better when hyper-threading was disabled. (And before you ask, the version of Windows I was using at the time WAS hyper-thread aware!)

williaj? (tsagoth??) What do the numbers look like when hyper-threading is disabled/enabled?

Thanks!

--
"Fish" (David B. Trout)
Software Development Laboratories
http://www.softdevlabs.com http://www.softdevlabs.com
mail: ***@... mailto:***@...
williaj@sympatico.ca [hercules-390]
2018-03-29 21:37:47 UTC
Permalink
This is with Hyper-threading on. I found in my past testing with the dual Xeon machines that performance got worse with HT off. I was quite surprised by that because I had thought having single core processors would be faster because you would avoid the cases where the second thread in the one core stalled because it needed access to an element held by the other thread. I don't have the numbers for nbench from then because I wasn't running nbench at the time, but on my single thread speed test the difference was 5-12 mips. Not a lot, but noticeable.

Fish, I'll turn off HT on the 8700K and run all the sequences again. FWIW I ran my little single thread on a real zBC12. Spec says one engine in the box is 371 mips. My little program says 475mips. So that shows the benefit of real iron with its cache and out-of-order execution, because my test is usually a little on the low side of its calculations.

Minor bug in 3.13. If you vary a CP offline, the text line says OFFLINE fine, but if you hit ESC and toggle to the other display, when you toggle back the offline CP is no longer displayed. If you vary it online,it doesn't reappear until you toggle the display again.

Also, what does it mean when the cpu % busy line for a cp is grey *s rather than white ones ?


---In hercules-***@yahoogroups.com, <***@...> wrote :

Is this with hyper-threading On or Off?

From: hercules-***@yahoogroups.com mailto:hercules-***@yahoogroups.com [mailto:hercules-***@yahoogroups.com mailto:hercules-***@yahoogroups.com]
Sent: Wednesday, March 28, 2018 10:22 PM
To: hercules-***@yahoogroups.com mailto:hercules-***@yahoogroups.com
Subject: [hercules-390] Re: Hercules performance



Did some more testing today on the 8700K

Six cpu: 1402 mips/5786 sios, single cpu=233 mips
five cpu: 1205 mips, single =241
four cpu: 969 mips, single = 242
Eight cpu: 1413 mips, single=176

It seems like Hercules does best as a six-way processor. This is true even on my two dual Xeon boxes where there are way more cores than there are cpu threads. At six cpus or less you seem to get the same performance per engine as what you see during ipl, when only the one cpu is active. Go past six, and the individual performance suffers. For single engine performance, four or less seems to yield the best result.


----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.


[Non-text portions of this message have been removed]
Loading...