Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MODE SELECT ignores mode pages in develop branch, v23.11.01 works #1427

Closed
uweseimet opened this issue Jan 31, 2024 · 17 comments
Closed

MODE SELECT ignores mode pages in develop branch, v23.11.01 works #1427

uweseimet opened this issue Jan 31, 2024 · 17 comments
Labels
bug Something isn't working

Comments

@uweseimet
Copy link
Contributor

This ticket refers to commit a23d642 (current develop branch).

When you send a sequence of mode pages with MODE SELECT(6), some pages are ignored. If one of the pages is a format device page for a sector size change (configured in the page data, not in the descriptor), and when this page is located somewhere at the end of the list, the change is ignored. The log entry about the sector size change does not appear. When you move this page up in the page list the log entry appears. Whether a page is evaluated or not seems to depend on its position in the list of mode pages.

With release v23.11.01 this is correct, i.e. for the same command parameters the sector size change request is logged (for both MODE SELECT(6) and MODE SELECT(10)), regardless of the size of the page list.

@uweseimet uweseimet added the bug Something isn't working label Jan 31, 2024
@rdmark
Copy link
Member

rdmark commented Feb 17, 2024

Thanks for reporting, Uwe.

@kkaempf At a glance, this seems to be a specific scenario that we didn't consider when reviewing your ModeSelect PR series.

#1405
#1406

@rdmark
Copy link
Member

rdmark commented Mar 20, 2024

@uweseimet Does the unit test added here adequately represent the test case that you describe in this issue? #1443

@uweseimet
Copy link
Contributor Author

@rdmark No, it doesn't.. The test needs to send multiple pages (a list of pages) with a single MODE SELECT. Then, depending on where in the list the sector size is changed, the change does not have any effect. You can see this in the logfile: piscsi in the develop branch does not log that the sector size has been changed, v23.11.01 does. This shows that the code parsing the page list passed with MODE SELECT is broken in the develop branch.

Please note that I do not intend to spend more time on this and the other MODE SELECT bug. I thought I'd report what I found, so that it does not end up in a release, but I am not interested in defending why I think that a program that crashes has a bug.

@rdmark
Copy link
Member

rdmark commented Mar 21, 2024

@uweseimet Can you please share a MODE SELECT(6) payload that you used to demonstrate the bug? This would help avoid making incorrect assumptions or mistakes based of a misunderstanding of SCSI behavior.

@uweseimet
Copy link
Contributor Author

The payload (list of pages) was similar to the one below. It may even have been the same, but I don't know anymore.

02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
02:01:00
03:16:00:08:00:00:00:00:00:00:00:19:08:00:00:01:00:0b:00:14:00:00:00:00

@uweseimet
Copy link
Contributor Author

uweseimet commented Mar 24, 2024

@rdmark Out of curiosity I diffed scsi_command_util.cpp against the last release and found this:

-               length -= size;
+               length -= size + 1;
                offset += size;

This looks quite wrong to me. Why is the length now decremented faster than the offset is incremented? This means offset and length are out of sync, aren't they? IMHO this results in the loop being terminated too early, before all data have been parsed.

@rdmark
Copy link
Member

rdmark commented Mar 26, 2024

@uweseimet if you look at the git blame for that change you find this commit: ad5eae9

The commit message lays out the justification for the logic change in detail. This is the relevant passage. Do you have a different interpretation of this data?

* Fix page length computation in ModeSelect

tl;dr

The 'skip to next ModeSelect page' computation was off-by-one, either not
taking the page code itself into account or missing the fact that the
page length is given as `n - 1`.

Fix:

Add 1 to the computed length.

Details:

OpenVMS Alpha sends a ModeSelect6 as follows
~~~
command:

ModeSelect6, CDB $151000001900

payload:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
00 00 00 08 00 00 00 00 00 00 02 00 01 0a 24 00 00 00 00 00 00 00 00 00 00
~~~

This translates to (accoring to [1], Section 8.3.3)

~~~
Mode Data Length 0
Medium Type      0
Device-specific  0
Block desc len   8
~~~

with the following offset / length computation _before_ the `while` loop
~~~
offset = 12
length = 13
~~~

The first payload section is
~~~
 4  5  6  7  8  9 10 11
00 00 00 00 00 00 02 00
~~~

translating to

~~~
Density Code     0
Number of blks   0
Reserved         0
Block length   0x200 512
~~~

Then follows a pagecode 1 as
~~~
12 13 14 15 16 17 18 19 20 21 22 23 24
01 0a 24 00 00 00 00 00 00 00 00 00 00
~~~

translating to
~~~~
Page code        1
Page length -1  10
Mode parameters 24 00 00 00 00 00 00 00 00 00 00
~~~

computing (inside the `while` loop, as `// Advance to the next page`)

~~~
size =  10 + 2 = 12
~~~

followed by new `offset` and `length` values

~~~
offset = 25
length = 1
~~~

So it stays in the `while` loop (and has a larger-than-buffer `offset`
value)

@uweseimet
Copy link
Contributor Author

uweseimet commented Mar 27, 2024

@rdmark IMO the current code is wrong, independent of the data. If you iterate over an array and remember the array index and the number of remaining bytes, is it correct to add/subtract different values? Wouldn't the sum of the index and the remaining byte count always have to be the same? If the number of remaining bytes is decremented faster than the index is incremented you are out of sync and terminate the loop too early. This would perfectly explain the bug I have observed. Please correct me if I am missing something.

https://github.com/uweseimet/scsi2pi/blob/main/cpp/test/scsi_hd_test.cpp has some unit tests with mode page lists, e.g. ModeSelect6_Multiple. (The page list size is not sufficient to completely cover this ticket, though.) In https://github.com/uweseimet/scsi2pi/blob/main/cpp/devices/disk.cpp in the ModeSelect() method both offset and remaining byte count are adjusted by the same value. I recommend to add similar unit tests to PiSCSI. Payload examples for such tests I have already provided, and https://github.com/uweseimet/scsi2pi/blob/main/cpp/test/scsi_hd_test.cpp provides more examples.
Of course, if it turns out that something is wrong with my unit tests I would appreciate feedback.

Maybe I should give an abstract example on what I think is wrong. Assume a page list of 10 pages with a length of 2 bytes each. At the beginning offset is 0, length is 20. After parsing a page the offset is incremented by 2 (size), the length decremented by 3 (size + 1). You get this sequence:

offset = 0, length = 20
offset = 2, length = 17
offset = 4 length = 14
offset = 6, length = 11
offset = 8, length = 8
offset = 10, length = 5
offset = 12, length = 2
offset = 14, length = -1

Parsing stops here because length is < 0. The pages at offsets 16 and 18 have not yet been parsed, i.e. they are ignored.

@kkaempf
Copy link
Contributor

kkaempf commented Mar 29, 2024

The payload (list of pages) was similar to the one below. It may even have been the same, but I don't know anymore.

02:01:00
...
02:01:00
03:16:00:08:00:00:00:00:00:00:00:19:08:00:00:01:00:0b:00:14:00:00:00:00

Where is this data coming from ?

I'd assume this would be "mode pages". Looking at the definition (Table 94 in Small Computer System Interface - 2 rev 10L.pdf - page 148 in document, 184 in pdf) it has

  • page code (1 byte)
  • page length n - 1 (!) (1 byte)
  • page data (n bytes)

So the above should rather be

02:00:00
...
02:00:00
03:15:00:08:00:00:00:00:00:00:00:19:08:00:00:01:00:0b:00:14:00:00:00:00

Commit ad5eae9 was based on actual captured SCSI data with 2 pages.

I'm going to extend the testcase now to cover > 2 pages. It might well be that the offset / length computation still is wrong for > 2 pages.

@uweseimet
Copy link
Contributor Author

uweseimet commented Mar 29, 2024

The specification says: "The page length field specifies the length in bytes of the mode parameters that follow. If the initiator does not set this value to the value that is returned for the page by the MODE SENSE command, the target shall terminate the command with CHECK CONDITION status."

The length field for MODE SELECT is the same as the length field reported by MODE SENSE for the same page.

When the page data are 02:01:00 there is 1 byte following the page length field, i.e. the page length field must be 1. Please look at the mode page examples in the specification, e.g. table 96. In table 96 there are 6 bytes following the length field and the length field is 6. This is consistent with the mode page data I used in my tests. Correct me if I am wrong, but if you were right the page length in table 96 would have to be 5, which is not the case.

But this is not related to the potential offset/length issue I mentioned, because the affected length is not an individual page length but the total length of the payload data.

@kkaempf
Copy link
Contributor

kkaempf commented Mar 29, 2024

Please look at the mode page examples in the specification, e.g. table 96. In table 96 there are 6 bytes following the length field and the length field is 6. This is consistent with the mode page data I used in my tests. Correct me if I am wrong, but if you were right the page length in table 96 would have to be 5, which is not the case.

Looks like we have a conflict between an example in the spec (length byte == number of byte following) and the actual data captured from the DEC BIOS (length byte == number of bytes following minus 1) 😉

Do we need a DEC BIOS flag ? 🤔

Anyone with more captured data to look at ?

@uweseimet
Copy link
Contributor Author

@kkaempf Before you try to add any flag I recommend to verify that the DEC BIOS really violates the specification. I doubt it, because in this case it would not be compatible with any SCSI drive that correctly implements the specification. I bet the vast majority of drives do it right, because otherwise they would not work with most of the available drivers/tools. It is more likely that you have misinterpreted something.

Do you have any real SCSI drive that is compatible with your DEC BIOS? In this case I recommend that you verify the behavior of this drive regarding MODE SELECT. If you are right, in order to be compatible with the DEC BIOS the drive would have to violate the specification. You can use your PISCSI board and the s2pexec tool of https://www.scsi2pi.net/, for instance, to run any SCSI command against this drive and to evaluate the results.

If you are convinced that your BIOS is a special case I would be interested if https://www.scsi2pi.net/, which has a more elaborate MODE page handling than PiSCSI, works with your board or not. Testing this does not take much time, you can simply install an SCSI2Pi binary package.

In any case, what the specification says should be the default implementation.

@rdmark All in all, it's up to you, of course, but as you can see from my comments I doubt once again that any kind of "quirks" mode is needed. Real SCSI drives also do not have this kind of mode and work with all kinds of platforms/drivers. Usually in such a case there is just a misunderstanding of the specification or what the drivers do.

@uweseimet
Copy link
Contributor Author

uweseimet commented Mar 29, 2024

@kkaempf I thought I'd run a test with your payload and the SCSI2Pi in_process_test (see https://github.com/uweseimet/scsi2pi/wiki/Advanced-Testing) myself, as a proof of concept. This is the result:

>./bin/in_process_test -s "-i 0 zip.hds"
s2pexec>-i 0 -c 151000001900 -d 000000080000000000000200010a2400000000000000000000
[17:52:57.144] [warning] Device reported CHECK CONDITION (status code $02)
Error: ILLEGAL REQUEST (Sense Key $05), INVALID FIELD IN PARAMETER LIST (ASC $26), ASCQ $00
s2pexec>

The error is expected because your payload appears to contain a vendor-specific mode page 0, which is not supported by the default configuration. As soon as I add a user-defined mode page 0 (see https://www.scsi2pi.net/en/properties.html) your payload is accepted and the error is gone:

>./bin/in_process_test -s "-i 0 zip.hds"
s2pexec>-i 0 -c 151000001900 -d 000000080000000000000200010a2400000000000000000000
s2pexec>

This confirms that nothing is wrong with how PiSCSI v23.11.01 parses the mode pages (offset/length). But PiSCSI does not support mode page 0, and this appears to cause the problem with your payload.

Note that for a valid solution PiSCSI has to support mode page 0 for both MODE SENSE and MODE SELECT.
User-configurable (on device level) mode page data are IMO the only valid solution, because each device that supports mode page 0 can have different mode page data. If you just implement support for mode page 0 as it is needed by your setup, as soon as the there is another user with a setup that needs a different mode page 0, you have a conflict: Without a user-definable mode page, why should PiSCSI support mode page 0 just like you need it, and not just like the other user(s) need it?
There is also a conflict if two drivers for two different devices each expect vendor-specific data. If there is just a single (default) implementation for mode page 0, one of the drivers will fail. This can even happen with the other mode pages, because they use hard-coded data.

kkaempf added a commit to kkaempf/piscsi that referenced this issue Apr 1, 2024
Follow-up on PiSCSI#1402, PiSCSI#1405, PiSCSI#1427

This is the first (yet incomplete) test to cover this behavior.

Signed-off-by: Klaus Kämpf <[email protected]>
kkaempf added a commit to kkaempf/piscsi that referenced this issue Apr 1, 2024
OpenVMS Alpha sends a strange ModeSelect payload, apparently one byte
too large. This was 'fixed' by a (wrong) length calculation in PiSCSI#1405, breaking PiSCSI#1427.

This PR
- fixes the wrong length calculation
- improves the loop test in scsi_command_util::ModeSelect to prevent a
  buffer overflow. (Remaining length was checked for > 0, but buffer
  access is at offset and offset + 1, effectively requiring 2 bytes.)
- the loop test fix makes PiSCSI#1402 pass
- adds a testcase for PiSCSI#1402
- adds a testcase for PiSCSI#1427

Fixes issue PiSCSI#1427

Signed-off-by: Klaus Kämpf <[email protected]>
@kkaempf
Copy link
Contributor

kkaempf commented Apr 1, 2024

@uweseimet, I stand corrected on the offset/length calculation in scsi_command_util.cpp. See #1447 for a fix.

However, I don't get your "page 0" arguments about the Alpha VMS payload.
000000080000000000000200010a2400000000000000000000 splits up into
00000008 0000000000000200 010a2400000000000000000000 as explained in #1427 (comment)

Looking at the 010a2400... - it's page 0x01 request with 0x0a bytes following. However, 0x0b bytes actually follow.
The "trailing" byte 0 could be interpreted as a "page 0" but it's missing the length byte. 🤷🏻‍♂️

@uweseimet
Copy link
Contributor Author

@kkaempf Page 0 does not have/require a length byte. Please check the SCSI specification. This page does not have any well-defined format. This is why it must be the last page of a page list, because otherwise you would not be able to determine the start of the next page.

@rdmark
Copy link
Member

rdmark commented Apr 4, 2024

To recap, the reason we wanted to implement the DEC vendor specific ModeSelect page 0 was because the DEC machine cannot boot from a piscsi volume without it:

#1402

The question is if there’s another way to handle the vendor page, I e ignoring or retuning some kind of stub value that makes the DEC boot without introducing SCSI spec breaking behavior.

@rdmark
Copy link
Member

rdmark commented Apr 13, 2024

Whole changeset reverted with #1451

@rdmark rdmark closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2024
kkaempf added a commit to kkaempf/piscsi that referenced this issue Aug 9, 2024
* Allow 'empty' ModeSelect6

tl;dr

Treat a computed length of 0 as `has_valid_page_code`.

Details:

The SRM console (aka 'BIOS') of DEC Alpha sends an empty
ModeSelect6 with the following data:
~~~
ModeSelect6, CDB $151000000c00
~~~

That makes 12 byte(s) as follows
~~~
  0  1  2  3   4   5  6  7   8   9 10 11
 00 00 00 08  00  00 00 00  00  00 02 00
~~~

decoding it (accoring to [1], Section 8.3.3, Table 94) gives us

Mode Data Length 0
Medium Type      0
Device-specific  0
Block desc len   8

Density Code     0
Number of blks   0
Reserved         0
Block length     512

`scsi_command_util::ModeSelect` computes
~~~
offset = 4 + buf[3];
~~~

giving 12 and

~~~
length -= offset;
~~~

giving 0.

Thus it never enters the `while` loop and `has_valid_page_code` stays
`false`, raising an error.

[1] [Small Computer System Interface - 2 rev 10L.pdf](https://dn790004.ca.archive.org/0/items/SCSISpecificationDocumentsSCSIDocuments/Small%20Computer%20System%20Interface%20-%202%20rev%2010L.pdf)

Signed-off-by: Klaus Kämpf <[email protected]>

* Allow ModeSelect with page code 1

OpenVMS Alpha (the operating system, not the SRM BIOS) uses
ModeSelect6 with a page code of 1.

The semantics are unknown, just accepting it works for me.

Signed-off-by: Klaus Kämpf <[email protected]>

* Fix page length computation in ModeSelect

tl;dr

The 'skip to next ModeSelect page' computation was off-by-one, either not
taking the page code itself into account or missing the fact that the
page length is given as `n - 1`.

Fix:

Add 1 to the computed length.

Details:

OpenVMS Alpha sends a ModeSelect6 as follows
~~~
command:

ModeSelect6, CDB $151000001900

payload:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
00 00 00 08 00 00 00 00 00 00 02 00 01 0a 24 00 00 00 00 00 00 00 00 00 00
~~~

This translates to (accoring to [1], Section 8.3.3)

~~~
Mode Data Length 0
Medium Type      0
Device-specific  0
Block desc len   8
~~~

with the following offset / length computation _before_ the `while` loop
~~~
offset = 12
length = 13
~~~

The first payload section is
~~~
 4  5  6  7  8  9 10 11
00 00 00 00 00 00 02 00
~~~

translating to

~~~
Density Code     0
Number of blks   0
Reserved         0
Block length   0x200 512
~~~

Then follows a pagecode 1 as
~~~
12 13 14 15 16 17 18 19 20 21 22 23 24
01 0a 24 00 00 00 00 00 00 00 00 00 00
~~~

translating to
~~~~
Page code        1
Page length -1  10
Mode parameters 24 00 00 00 00 00 00 00 00 00 00
~~~

computing (inside the `while` loop, as `// Advance to the next page`)

~~~
size =  10 + 2 = 12
~~~

followed by new `offset` and `length` values

~~~
offset = 25
length = 1
~~~

So it stays in the `while` loop (and has a larger-than-buffer `offset`
value)

Signed-off-by: Klaus Kämpf <[email protected]>

Fix length calculation in scsi_command_util::ModeSelect

OpenVMS Alpha sends a strange ModeSelect payload, apparently one byte
too large. This was 'fixed' by a (wrong) length calculation in PiSCSI#1405, breaking PiSCSI#1427.

This PR
- fixes the wrong length calculation
- improves the loop test in scsi_command_util::ModeSelect to prevent a
  buffer overflow. (Remaining length was checked for > 0, but buffer
  access is at offset and offset + 1, effectively requiring 2 bytes.)
- the loop test fix makes PiSCSI#1402 pass
- adds a testcase for PiSCSI#1402
- adds a testcase for PiSCSI#1427

Fixes issue PiSCSI#1427

Signed-off-by: Klaus Kämpf <[email protected]>

Improve buffer overflow checking in scsi_command_util::ModeSelect

Signed-off-by: Klaus Kämpf <[email protected]>
kkaempf added a commit to kkaempf/piscsi that referenced this issue Aug 9, 2024
OpenVMS Alpha sends a strange ModeSelect payload, apparently one byte
too large. This was 'fixed' by a (wrong) length calculation in PiSCSI#1405, breaking PiSCSI#1427.

This PR
- fixes the wrong length calculation
- improves the loop test in scsi_command_util::ModeSelect to prevent a
  buffer overflow. (Remaining length was checked for > 0, but buffer
  access is at offset and offset + 1, effectively requiring 2 bytes.)
- the loop test fix makes PiSCSI#1402 pass
- adds a testcase for PiSCSI#1402
- adds a testcase for PiSCSI#1427

Fixes issue PiSCSI#1427

Signed-off-by: Klaus Kämpf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants