Discussion:
IPL of CD .ins file yields immediate disabled wait state, SLES11
(too old to reply)
tuxjarboe
2009-04-10 14:04:24 UTC
Permalink
I'm having trouble IPLing the starter SLES11 system using Hercules 3.06. The IPL method should be the equivalent of an LPAR load from CD-ROM. The deployment.pdf included within SuSE's docu directory didn't reveal any obvious differences why what worked for me with SLES10 SP2 installation under Hercules doesn't work for me now with SLES11 media.

ipl /path/to/suse.ins now results in:

HHCCP007I CPU0000 architecture mode set to ESA/390
HHCCP011I CPU0000: Disabled wait state
PSW=000A0000 00000000
Command ==>
CPU0000 PSW=000A000000000000 24M.W..... instcount=2


The new SLES11 suse.ins is:
* SuSE Linux for zSeries Installation/Rescue System
vmrdr.ikr 0x00000000
initrd.off 0x0001040c
initrd.siz 0x00010414
initrd 0x00800000
parmfile 0x00010480

All files are present. For reference, the SLES10SP2 suse.ins was fairly similar and works without problems for SLES10:
* SuSE Linux for zSeries Installation/Rescue System
vmrdr.ikr 0x00000000
initrd.siz 0x00010414
initrd 0x00800000
parmfile 0x00010480

Has anyone already been through a SLES11 LPAR IPL from CD, or can get me started in the right direction on this? I'm happy to provide any additional information that may be helpful (hercules.cnf, log, etc); I didn't want to go overboard in case this was something obvious for the hercules or mainframe regulars.

Thanks,
~ Daniel
Harold Grovesteen
2009-04-11 13:00:53 UTC
Permalink
It looks like vmrdr.ikr is not the same between SLES10 (it worked) and
SLES11 (it doesn't work). It has a disabled wait PSW in the IPLPSW
assigned storage locations 0-7.

However, if you Google "vmrdr.ikr SLES11" you will not find any hits for
SLES11 but you will find a few for SLES10. The Novell link for SLES10
would suggest that it would not work for SLES10, either, not that I am
questioning what you said. The Novell link states that vmrdr.ikr is
intended for IPL from a VM reader device, a card reader device to
Hercules, and that once IPL'd the install would proceed from an FTP
server located elsewhere, which of course could be on the same system
running Hercules. So, I take your word without question when you say it
worked for SLES10, but how is beyond my knowledge. And, the SLES10 base
implementation might in fact differ from the SP2 implementation which
could account for the difference.

Because of the differences in how a CDROM IPL works and a device based
IPL works, it is easy to see why vmrdr.ikr would not function for CDROM
if it were intended for a card reader IPL.

This actually looks like an excellent question for the standard Linux
mainframe mailing list rather than this one. They discuss both z/VM and
LPAR installation issues on that list: LINUX-390-***@public.gmane.org

Harold Grovesteen
Post by tuxjarboe
I'm having trouble IPLing the starter SLES11 system using Hercules
3.06. The IPL method should be the equivalent of an LPAR load from
CD-ROM. The deployment.pdf included within SuSE's docu directory
didn't reveal any obvious differences why what worked for me with
SLES10 SP2 installation under Hercules doesn't work for me now with
SLES11 media.
HHCCP007I CPU0000 architecture mode set to ESA/390
HHCCP011I CPU0000: Disabled wait state
PSW=000A0000 00000000
Command ==>
CPU0000 PSW=000A000000000000 24M.W..... instcount=2
* SuSE Linux for zSeries Installation/Rescue System
vmrdr.ikr 0x00000000
initrd.off 0x0001040c
initrd.siz 0x00010414
initrd 0x00800000
parmfile 0x00010480
All files are present. For reference, the SLES10SP2 suse.ins was
* SuSE Linux for zSeries Installation/Rescue System
vmrdr.ikr 0x00000000
initrd.siz 0x00010414
initrd 0x00800000
parmfile 0x00010480
Has anyone already been through a SLES11 LPAR IPL from CD, or can get
me started in the right direction on this? I'm happy to provide any
additional information that may be helpful (hercules.cnf, log, etc); I
didn't want to go overboard in case this was something obvious for the
hercules or mainframe regulars.
Thanks,
~ Daniel
tuxjarboe
2009-04-13 11:43:41 UTC
Permalink
Post by Harold Grovesteen
This actually looks like an excellent question for the standard Linux
mainframe mailing list rather than this one. They discuss both z/VM and
Harold Grovesteen
Thanks for the suggestion Harold; I'll ask on the Marist list.

~ Daniel
Harold Grovesteen
2009-04-13 18:41:49 UTC
Permalink
I am going to make the following suggestion after seeing the responses
on the Marist list:

The assumption that has been made is that the files listed in your .ins
file were in fact loaded. That assumption needs to be validated.

Please display real storage for each of the starting addresses in your
.ins file. This should validate that what you think got loaded did get
loaded. The command format is:

r hexaddr.len

So:
r 00000000.1F will display the first 32 bytes of storage. If what is
displayed is other than 0x00 then something did in fact get loaded.
r 0001040C.4
etc.

If you do not see what you would expect to see then we have to figure
out why you are not.

Just a suggestion,
Harold Grovesteen
Post by tuxjarboe
I'm having trouble IPLing the starter SLES11 system using Hercules
3.06. The IPL method should be the equivalent of an LPAR load from
CD-ROM. The deployment.pdf included within SuSE's docu directory
didn't reveal any obvious differences why what worked for me with
SLES10 SP2 installation under Hercules doesn't work for me now with
SLES11 media.
HHCCP007I CPU0000 architecture mode set to ESA/390
HHCCP011I CPU0000: Disabled wait state
PSW=000A0000 00000000
Command ==>
CPU0000 PSW=000A000000000000 24M.W..... instcount=2
* SuSE Linux for zSeries Installation/Rescue System
vmrdr.ikr 0x00000000
initrd.off 0x0001040c
initrd.siz 0x00010414
initrd 0x00800000
parmfile 0x00010480
All files are present. For reference, the SLES10SP2 suse.ins was
* SuSE Linux for zSeries Installation/Rescue System
vmrdr.ikr 0x00000000
initrd.siz 0x00010414
initrd 0x00800000
parmfile 0x00010480
Has anyone already been through a SLES11 LPAR IPL from CD, or can get
me started in the right direction on this? I'm happy to provide any
additional information that may be helpful (hercules.cnf, log, etc); I
didn't want to go overboard in case this was something obvious for the
hercules or mainframe regulars.
Thanks,
~ Daniel
tuxjarboe
2009-04-14 13:29:53 UTC
Permalink
Post by Harold Grovesteen
The assumption that has been made is that the files listed in your .ins
file were in fact loaded. That assumption needs to be validated.
Please display real storage for each of the starting addresses in your
.ins file. This should validate that what you think got loaded did get
r hexaddr.len
r 00000000.1F will display the first 32 bytes of storage. If what is
displayed is other than 0x00 then something did in fact get loaded.
r 0001040C.4
etc.
Thanks for thinking of that suggestion. Based on the response on the Linux-390 Marist list, I pulled from SVN last night: 5323. Incidentally, the make blew up in po; I had to ./configure --disable-nls. Now, doing the displays like you mentioned in that build...

# Start up Hercules (storage is empty):

HHCAO001I Hercules Automatic Operator thread started;
tid=76FA7B90, pri=0, pid=11594
r 0.1f
R:0000000000000000:K:00=00000000 00000000 00000000 00000000 ................
R:0000000000000010:K:00=00000000 00000000 00000000 00000000 ................

# IPL the SLES11 .ins and get the disabled wait state; storage is no longer empty:

ipl ./rdr/sles11/suse.ins
HHCCP007I CPU0000 architecture mode set to ESA/390
HHCCP011I CPU0000: Disabled wait state
PSW=000A0000 00000000
r 0.1f
R:00000000:K:06=00080000 80000298 02000018 60000050 .......q....-..&
R:00000010:K:06=02000068 60000050 40404040 40404040 ....-..&
r 1040c.f
R:0001040C:K:06=00800000 00000000 00C4CC37 00000000 .........D......
r 10414.f
R:00010414:K:06=00C4CC37 00000000 00000000 00000000 .D..............
r 800000.1f
R:00800000:K:06=1F8B0808 5C47B449 0203696E 69747264 ....*......>....
R:00800010:K:06=00AC590B 7C935596 BF5F9A36 01298452 ***@l.o.^....d.
r 10480.1f
R:00010480:K:06=72616D64 69736B5F 73697A65 3D363535 ./_...,^..:.....
R:00010490:K:06=33362072 6F6F743D 2F646576 2F72616D ....??......../_

# The files specified in the .ins files do appear to be loaded. For comparisons sake I loaded the SLES10SP2 vmrdr.ikr to see if it looked the same...

loadcore ./rdr/sles10SP2/vmrdr.ikr
HHCPN112I Loading ./rdr/sles10SP2/vmrdr.ikr to location 0
HHCPN113I 7161160 bytes read from ./rdr/sles10SP2/vmrdr.ikr
r 0.1f
R:00000000:K:06=00080000 80000298 02000018 60000050 .......q....-..&
R:00000010:K:06=02000068 60000050 40404040 40404040 ....-..&

# Both start off the same. I checked outside of Hercules and the first 671 bytes of the files are identical. For grins restart with the SLES10SP2 vmrdr.ikr and SLES11 initrd/parmfile/etc.

restart
HHCPN038I Restart key depressed
CPU0000: SIGP Set architecture mode (12) CPU0000, PARM 00000001: CC 0
HHCCP007I CPU0000 architecture mode set to z/Arch
HHCCP014I CPU0000: Addressing exception CODE=0005 ILC=6
PSW=00001001 80000000 0000000000011176 INST=E50150000000 TPROT 0(5),0(0) test_protection
R:0000000040000000: Translation exception 0005
R:0000000000000000:K:06=00080000 80000298 000A0000 00000000 .......q........
R0=0000000000728F04 R1=0000000000020000 R2=0000000000000000 R3=00000000006DC610
R4=0000000000000000 R5=0000000040000000 R6=0000000000000000 R7=0000000000000000
R8=0000000000000000 R9=0000000040000000 RA=0000000000000010 RB=0000000000000000
RC=0000000000010400 RD=0000000000011002 RE=0000000000011058 RF=000000000069FF60
HHCCP014I CPU0000: Special-operation exception CODE=0013 ILC=6
PSW=00001001 80000000 0000000000011290 INST=C80000000000 MVCOS 0(0),0(0),0 move_with_optional_specifications
R:0000000000000000:K:06=00080000 80000298 000A0000 00000000 .......q........
R:0000000000000000:K:06=00080000 80000298 000A0000 00000000 .......q........
R0=0000000000000000 R1=0000000000011290 R2=0000000000000000 R3=00000000006DC628
R4=0000000040020000 R5=0000000040020000 R6=0000000000000000 R7=0000000000000000
R8=0000000000000001 R9=0000000040000000 RA=000000000000000F RB=0000000000000000
RC=00000000006DC5F8 RD=0000000000011002 RE=0000000000011058 RF=000000000069FF60
HHCCP014I CPU0000: Operation exception CODE=0001 ILC=4
PSW=00001001 80000000 00000000000112AE INST=B9AF0011 PFMF 1,1 perform_frame_management_function
R:00000000FFFFFFFF: Translation exception 0005
R:00000000FFFFFFFF: Translation exception 0005
R0=0000000000000000 R1=00000000FFFFFFFF R2=0000000000000000 R3=00000000006DC628
R4=0000000040020000 R5=0000000040020000 R6=0000000000000000 R7=0000000000000000
R8=0000000000000001 R9=0000000040000000 RA=000000000000000F RB=0000000000000000
RC=00000000006DC5F8 RD=0000000000011002 RE=0000000000011058 RF=000000000069FF60
HHCCP014I CPU0000: Operation exception CODE=0001 ILC=4
PSW=00000001 80000000 00000000000112CC INST=B9AB0001 ????? , ?
R:0000000000000000:K:06=00080000 80000298 000A0000 00000000 .......q........
R:0000000000000000:K:06=00080000 80000298 000A0000 00000000 .......q........
R0=0000000000000000 R1=0000000000000000 R2=0000000000000000 R3=00000000006DC628
R4=0000000040020000 R5=0000000040020000 R6=0000000000000000 R7=0000000000000000
R8=0000000000000001 R9=0000000040000000 RA=000000000000000F RB=0000000000000000
RC=00000000006DC5F8 RD=0000000000011002 RE=0000000000011058 RF=000000000069FF60
HHCCP014I CPU0000: Operation exception CODE=0001 ILC=4
PSW=04002001 80000000 00000000006AFA50 INST=B2162000 ????? , ?
V:00000000006DC4A0:K:06=08000000 00000000 00000000 00000000 ................
R0=0000000000000040 R1=00000000FFFFFFDA R2=00000000006DC4A0 R3=0000000000000001
R4=00000000006A07C0 R5=0000000000000100 R6=00000000006A0290 R7=0000000000000000
R8=00000000006D4548 R9=000000000000C000 RA=0000000002244200 RB=0000000000000040
RC=000000000069FF98 RD=0000000000492DE0 RE=000000000069FEF8 RF=000000000069FE58
C0=0000000014354202 C1=00000000006D7007 C2=0000000000011370 C3=0000000000000000
C4=0000000000000000 C5=0000000000011370 C6=0000000000000000 C7=00000000006D7007
C8=0000000000000000 C9=0000000000000000 CA=0000000000000000 CB=0000000000000000
CC=0000000000000000 CD=00000000006D7007 CE=00000000C0000000 CF=0000000000000000
HHCCP041I SYSCONS interface active
Linux version 2.6.16.60-0.21-default (***@buildhost) (gcc version 4.1.2 200701
15 (SUSE Linux)) #1 SMP Tue May 6 12:41:02 UTC 2008
We are running native (64 bit mode)
Detected 1 CPU's
Boot cpu address 0
Built 1 zonelists
Kernel command line: root=/dev/ram0 ro
PID hash table entries: 4096 (order: 12, 131072 bytes)
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Memory: 1010176k/1048576k available (4671k kernel code, 0k reserved, 2089k data,
212k init)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 256
checking if image is initramfs...

etc

No matter which SLES11 .ikr file is specified in the .ins (cd.ikr, tapeipl.ikr, vmrdr.ikr), I go into immediate disabled wait state. Yet specify ../sles10SP2/vmrdr.ikr in suse.ins and off it goes. I don't get it, but will post this to the LINUX-390 list too in case anyone has other ideas.

Thanks,
~ Daniel
tuxjarboe
2009-04-14 21:14:50 UTC
Permalink
Turning on instruction stepping increased the displayed instcount from 2 to 11, and revealed that the SLES11 installer was checking CPU Model and not liking 9672 (rightfully so). Changing the model number to something more current made the SLES11 installer happy.

Thanks,
~ Daniel
Ivan Warren
2009-04-14 21:39:17 UTC
Permalink
Post by tuxjarboe
Turning on instruction stepping increased the displayed instcount from 2 to 11, and revealed that the SLES11 installer was checking CPU Model and not liking 9672 (rightfully so). Changing the model number to something more current made the SLES11 installer happy.
Thanks,
~ Daniel
Arghh..

We already saw that one happening before !

Had completely forgot about it...

Actually..

You can specify ANY CPU model - *except* those that designate a g3, g4,
g5, g6 (for s390x) or z800, z900, z890 or z990 if z9 or z10 is specified
as a prerequisite for the kernel !

That means, as strange as it seems, that specifying a z800 CPU model on
a z9/10 only linux kernel will break, but specifying you are running on
a 4341 will make linux happy ! Go figure ...

(cf : arch/s390/kernel/head.S in the kernel tree)..

--Ivan


[Non-text portions of this message have been removed]
Harold Grovesteen
2009-04-15 10:15:02 UTC
Permalink
Of course, being open source one is free to patch this check and have it
ignored. So why they bothered is beyond me.

Daniel, glad you got it working.

Harold Grovesteen
Post by tuxjarboe
Post by tuxjarboe
Turning on instruction stepping increased the displayed instcount
from 2 to 11, and revealed that the SLES11 installer was checking CPU
Model and not liking 9672 (rightfully so). Changing the model number
to something more current made the SLES11 installer happy.
Post by tuxjarboe
Thanks,
~ Daniel
Arghh..
We already saw that one happening before !
Had completely forgot about it...
Actually..
You can specify ANY CPU model - *except* those that designate a g3, g4,
g5, g6 (for s390x) or z800, z900, z890 or z990 if z9 or z10 is specified
as a prerequisite for the kernel !
That means, as strange as it seems, that specifying a z800 CPU model on
a z9/10 only linux kernel will break, but specifying you are running on
a 4341 will make linux happy ! Go figure ...
(cf : arch/s390/kernel/head.S in the kernel tree)..
--Ivan
[Non-text portions of this message have been removed]
Ivan Warren
2009-04-15 16:34:55 UTC
Permalink
Post by Harold Grovesteen
Of course, being open source one is free to patch this check and have it
ignored. So why they bothered is beyond me.
Actually (cf the linux-390 list), they are taking particular interest in
solving the issue - or at least - mitigating the effects !

The IBM folks in Germany responsible for maintaining the s390(x) linux
port are - for all I know - a bunch of highly motivated individuals..

The last interaction I had with them was because of the 2.6.26
introduced issue with the PRSSD I/O length (which was working on their
environment - but wasn't working on herc *and* was contrary to
documented behavior - which was the decisive point) - and they were
quick to admit they did make a mistake[1], corrected it straight away
(as soon as it was reported) - leading to a patch being sent *and*
accepted in a week or so.

So, let me not share you pessimism ;)

--Ivan

[1] Incidentally, the problem had been there for more than a year.. but
they started using some pre-coded structures only recently (pav/hyperpav
support I think) - and because it worked on their own rig - they didn't
look any further. This also means the newer storage boxes do not work as
documented ! scary !


[Non-text portions of this message have been removed]
Harold Grovesteen
2009-04-16 01:07:58 UTC
Permalink
I was going to respond, but my message on this list was sent before I
received any of those others on the linux-390 list. I usually read my
home email during the week around 4:30-5:00 a.m. before I go to work.
It seems they are trying to intelligently catch before they occur
problems that would result from gcc generated code that uses
instructions not supported by the processor. Good intentions there, but
we all know where that road leads. At least it explains why they
bothered (which turned out to have a technical foundation rather than
some marketing objective). And, it looks like they are trying to
provide a more intelligible response and detection.

Real future benefit to this small change may result. So, I am once
again amazed at the power of open source methodologies.

Harold
Post by Harold Grovesteen
Post by Harold Grovesteen
Of course, being open source one is free to patch this check and
have it
Post by Harold Grovesteen
ignored. So why they bothered is beyond me.
Actually (cf the linux-390 list), they are taking particular interest in
solving the issue - or at least - mitigating the effects !
The IBM folks in Germany responsible for maintaining the s390(x) linux
port are - for all I know - a bunch of highly motivated individuals..
The last interaction I had with them was because of the 2.6.26
introduced issue with the PRSSD I/O length (which was working on their
environment - but wasn't working on herc *and* was contrary to
documented behavior - which was the decisive point) - and they were
quick to admit they did make a mistake[1], corrected it straight away
(as soon as it was reported) - leading to a patch being sent *and*
accepted in a week or so.
So, let me not share you pessimism ;)
--Ivan
[1] Incidentally, the problem had been there for more than a year.. but
they started using some pre-coded structures only recently (pav/hyperpav
support I think) - and because it worked on their own rig - they didn't
look any further. This also means the newer storage boxes do not work as
documented ! scary !
[Non-text portions of this message have been removed]
Roger Bowler
2009-04-17 12:27:25 UTC
Permalink
Post by Ivan Warren
[1] Incidentally, the problem had been there for more than a year.. but
they started using some pre-coded structures only recently (pav/hyperpav
support I think) - and because it worked on their own rig - they didn't
look any further. This also means the newer storage boxes do not work as
documented ! scary !
AFAICS the newer storage boxes are not documented at all. The last
DASD documentation was for the ESS (SC26-7298-01 System/390 Command
Reference 2105) published in June 2000. There is no corresponding CCW
documentation for the DS6800 and DS8000 (device types 1750 and 2107).
Now that *is* scary!

--
Regards,
Roger Bowler
http://perso.wanadoo.fr/rbowler
Hercules "I can't believe it's not a mainframe!"

P.S. I shall be delighted if someone can prove me wrong and point me
to the documentation.

Greg Smith
2009-04-15 21:45:48 UTC
Permalink
Post by tuxjarboe
Turning on instruction stepping increased the displayed instcount from 2 to 11, and revealed that the SLES11 installer was checking CPU Model and not liking 9672 (rightfully so). Changing the model number to something more current made the SLES11 installer happy.
One comment on the instruction count. The count is not an exact value
but an approximation. The loop in cpu.c looks like:

regs.instcount++;
EXECUTE_INSTRUCTION(ip, &regs);

do {
UNROLLED_EXECUTE(&regs);
UNROLLED_EXECUTE(&regs);
UNROLLED_EXECUTE(&regs);
UNROLLED_EXECUTE(&regs);
UNROLLED_EXECUTE(&regs);
UNROLLED_EXECUTE(&regs);

regs.instcount += 12;

UNROLLED_EXECUTE(&regs);
UNROLLED_EXECUTE(&regs);
UNROLLED_EXECUTE(&regs);
UNROLLED_EXECUTE(&regs);
UNROLLED_EXECUTE(&regs);
UNROLLED_EXECUTE(&regs);
} while (!INTERRUPT_PENDING(&regs));

That is, in the while loop, we only update the instruction counter every
12 instructions. Note we execute 6 instructions, update the counter,
execute 6 more instructions, test for interrupt. Rinse lather & repeat.

Each UNROLLED_EXECUTE can exit the loop for a number of reasons, but it
basically boils down to:

1) the next instruction may cross a page boundary
2) the executed instruction did a longjmp() because an interrupt
condition may or does exist.

Testing shows that the instruction counter, for a large number of
instructions, is reasonably accurate. If we displayed the MIPS value as
a fraction with 6 decimal places, the lower digits would probably not be
accurate.

Interestingly enough, it's kinda hard to define whether an instruction
executed or not. I believe Jan told me that in esame mode, a single
instruction can have a dozen or more interrupts before completing
successfully. So does that count as 12 instructions or 1? What about
CLCL/MVCL? The instruction counter is a reasonable value, but it's not
"dead on balls accurate".

In stepping/tracing mode none of the UNROLLED_EXECUTE statements execute
an instruction because hercules sets a value that makes UNROLLED_EXECUTE
think the next instruction is crossing a page boundary. Therefore, in
this mode, the interrupt pending is checked for each instruction, and an
interrupt is pending during stepping/tracing mode in order to do the
stepping/tracing.

We don't increment instcount for every UNROLLED_EXECUTE because that
causes a noticeable increase in overhead.

My apologies for waxing so long on such a simple subject.

Greg
Loading...