Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test writing data to a tape with a VAXstation 3100 running OpenVMS 7.2 #112

Open
uweseimet opened this issue Dec 7, 2024 · 47 comments
Open
Assignees

Comments

@uweseimet
Copy link
Owner

This is a follow-up ticket to #111, dealing with writing data.

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 7, 2024

@Pacjunk There is a new branch issue_112, which is based on the issue_111 branch. Any changes dealing with writing data (your copy script, for instance) will be committed to this new branch.
I think wrt logs and scripts currently we do not need the mount log, but only the copy log/script. I assume that the label and mount operations are always the same.

@Pacjunk
Copy link

Pacjunk commented Dec 7, 2024

I included the mount script for your information, as the mount is done in a different way to our previous tests.

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 7, 2024

Your copy script uses spacing and reverse-spacing, most likely in order to find the correct position for writing. I will review the respective code and will also try to run the script against my real tape drive. I will get back to you when I have committed something to the issue_112 branch.
For subsequent tests, please remove the "device.3:0.name" property from /etc/s2p conf.

OK; I will also have a look at the mount operation.

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 7, 2024

Based on your log and with the help of my streamer and the specification I fixed a bug in the object count the SPACE command returned. Please test again with the latest issue_112 branch.

I have a general question: Is the copy operation the counterpart of the list operation, ie. does list read the same data copy is writing? If this is the case as soon as copy is working it may provide insight into the issue we have with list.
What is the correct behavior of the list operation when you run it on a tape that has been labeled but does not contain any other data than those created by the label operation? This is essentially an empty tape. What does list display on the screen when your drive is connected and you have mounted such a tape? I haven't asked this before, have I?

@Pacjunk
Copy link

Pacjunk commented Dec 8, 2024

Build 2101769

I have tried to copy again. This time the OS says the file was copied, but could not be closed. Looking at the .tap file, the text in the file that I copied is present, but it appears to have corrupted the file.
copy.txt
copy script.txt

As to your question, no this copy is different to what we were doing before. When the tape is mounted "normally", you can copy files to/from the tape, do directory commands etc just like the tape was a file system (except no directories!).

Mounting the tape "foreign" is used by the backup utility which creates save-sets (tar in unix speak). To list the contents of this file, you do a backup/list command which is what is asking for the 90 byte read.

These tape operations should not do the 90 byte read as I am not using the backup command. Copying files from the emulated tape works fine. We are now trying to get it to write to the tape without corrupting it.

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 8, 2024

The corrupt data are the only thing that is wrong, correct? Your OS thinks everything is fine? Can you please try the same with -b 8192? I guess that there is a similar issue as there was with reading before.

@Pacjunk
Copy link

Pacjunk commented Dec 8, 2024

The structure of the file seems incorrect. If you re-mount it in the OS, it appears blank, but you get different errors when trying to copy to it. -b 8192 makes no difference. There doesn't appear to be any end of file markers.

@uweseimet
Copy link
Owner Author

Thank you. How big is the file? Maybe you can attach it, or can provide about 20 lines of s2psimh output for this file? Just the structure, no data, i.e. not with the -d option.

@Pacjunk
Copy link

Pacjunk commented Dec 8, 2024

The filename seems to be in the 4th block. Other files downloaded from bitsavers, the first filename is in the second block.

File attached:
1.tap.dmp

The EOF markers are missing. It seems it started writing the file to the tape (probably in the wrong spot), then did not complete.

@uweseimet
Copy link
Owner Author

Can you please also add the s2psimh output for this file?

On first sight the data as such seem to be fine. I agree that the write position appears to be wrong. Your OS is using SPACE (for positioning) a lot.

This is a file after labeling:

Offset 0 ($0): Class 0, good data record, record length 80 ($50)
Offset 88 ($58): Class 0, good data record, record length 80 ($50)
Offset 176 ($b0): Class 0, good data record, record length 80 ($50)
Offset 264 ($108): Class 0, tape mark
Offset 268 ($10c): Class 0, tape mark
Offset 272 ($110): Class 0, good data record, record length 80 ($50)
Offset 360 ($168): Class 0, good data record, record length 80 ($50)
Offset 448 ($1c0): Class 0, tape mark
Offset 452 ($1c4): Class 0, tape mark
Offset 456 ($1c8): Class 7, private marker (SCSI2Pi end-of-data object)

This is the result of your test:

Offset 0 ($0): Class 0, good data record, record length 80 ($50)
Offset 88 ($58): Class 0, good data record, record length 80 ($50)
Offset 176 ($b0): Class 0, good data record, record length 80 ($50)
Offset 264 ($108): Class 0, tape mark
Offset 268 ($10c): Class 0, good data record, record length 80 ($50)
Offset 356 ($164): Class 0, good data record, record length 80 ($50)
Offset 444 ($1bc): Class 0, good data record, record length 80 ($50)
Offset 532 ($214): Class 0, good data record, record length 80 ($50)
Offset 620 ($26c): Class 0, tape mark
Offset 624 ($270): Class 0, good data record, record length 2048 ($800)
Offset 2680 ($a78): Class 7, private marker (SCSI2Pi end-of-data object)

The end-of-data marker is present. Correct me if I am wrong: The bitsavers files save the filenames at offset 88? This is the second block.

@Pacjunk
Copy link

Pacjunk commented Dec 8, 2024

Offset 0 ($0): Class 0, good data record, record length 80 ($50)
Offset 88 ($58): Class 0, good data record, record length 80 ($50)
Offset 176 ($b0): Class 0, good data record, record length 80 ($50)
Offset 264 ($108): Class 0, tape mark
Offset 268 ($10c): Class 0, good data record, record length 80 ($50)
Offset 356 ($164): Class 0, good data record, record length 80 ($50)
Offset 444 ($1bc): Class 0, good data record, record length 80 ($50)
Offset 532 ($214): Class 0, good data record, record length 80 ($50)
Offset 620 ($26c): Class 0, tape mark
Offset 624 ($270): Class 0, good data record, record length 2048 ($800)
Offset 2680 ($a78): Class 7, private marker (SCSI2Pi end-of-data object)

The filename entry seems to have been written to the first block after the tape mark, not the second record. Do the first 3 records not have tape marks between them?

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 8, 2024

OK, so you mean the filename should be at offset 356? Let's use offsets as a notation in this case, not block or tape mark counts. This will avoid misunderstandings.

@Pacjunk
Copy link

Pacjunk commented Dec 8, 2024

No the filename should be at 88. The HDR1 block.

The filename has been written incorrectly to 268 - leaving the HDR1 block at 88 still blank which is why it appears there are no files on the tape.

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 8, 2024

OK, thank you. I will investigate this.

I think that without all these tools (s2pexec, s2ptool, s2psimh) it would be impossible to implement everything correctly. In theory you read the specification, implement based on it, and everything works. But in practice this is different. I will never understand why the PiSCSI project was not interested in such tools. s2pexec had already been ready for a PR, and the lack of interest was a reason why I started SCSI2Pi.

When you created the PiSCSI tape ticket I labeled it with "major effort". I have not promised too much at that time, have I? ;-)

@Pacjunk
Copy link

Pacjunk commented Dec 8, 2024

Often don't know until you start on something! I think a better scsimon would help too! I had a look and can't make much of the output. I think its more about checking signal timings than tracing commands.

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 8, 2024

Yes, scsimon is more about bus signals. But this is much too low level for what we are dealing with, because we are working on the SCSI command level. The logs for all signals would be really huge, and it would be impossible to correlate them to SCSI commands and their parameters.
And as already mentioned scsimon cannot capture all signals. In order to do that the Pi would have to be fast enough to not only capture all signals, but also to save this information to the filesystem. But the Pi is not that fast, making a tool like scsimon useless, because it records imcomplete data.
If you are interested in bus signals you need an oscilloscope.

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 8, 2024

Can it be that the last script/log also contained the mount operation? This is sometimes hard to tell, because the OS appears to use a set of building blocks for its command sequences. If the mount operation was included, please only add the copy operation next time. Re-starting s2p for that should be safe.

For reference, if needed later, this is what your OS does according to your log. I verified all steps with my streamer and with s2p.

  1. It reads blocks until it hits the first filemark at offset 264.
  2. It spaces forwards until it is behind the last filemark, at offset 456.
  3. At this point READ POSITION says that the current object location is 9. (Objects are either blocks or filemarks.)
  4. It spaces backwards, searching for filemarks, and hits the filemark at offset 452.
  5. It spaces backwards, searching for filemarks, and hits the filemark at offset 448, object location is 7.
  6. It spaces backwards, searching for filemarks, and hits the filemark at offset 268, object location is 4.
  7. It spaces backwards, searching for filemarks, and hits the filemark at offset 264, object location is 3.
  8. It spaces backwards, searching for filemarks, and hits beginning-of-medium, object location is 0.

Here the sense data differ between my drive and s2p. The streamer reports EOM (end of medium) and that -3 objects (the 3 data blocks) were skipped, s2p does not say -3 but 3.
Furthermore, when running READ POSITION (which your OS does not do) the streamer sets both the BOP (beginning of partition) and EOP (end of partition) bit, s2p only sets BOP. When reading the specification I think this is correct, because s2p does not support the early warning feature. Otherwise both bits would have to be set.

I have fixed the wrong object count (3 instead of -3), please re-test with issue_112. This is the script I used to reproduce the issue:

-i 3 -L trace
-c 01:00:00:00:00:00 -o 90
-c 08:00:00:00:50:00 -o 90
-c 08:00:00:00:50:00 -o 90
-c 08:00:00:00:50:00 -o 90
-n -c 08:00:00:00:50:00 -o 90
-c 03:00:00:00:ff:00
-c 11:00:00:7f:ff:00
-c 11:00:00:7f:ff:00
-c 11:00:00:7f:ff:00
-c 11:00:ff:ff:ff:00
-c 11:00:ff:80:00:00
-c 11:00:ff:80:00:00
-c 11:00:ff:80:00:00
-n -c 11:00:ff:80:00:00
-c 03:00:00:00:ff:00
-c 34:00:00:00:00:00:00:00:00:00

@Pacjunk
Copy link

Pacjunk commented Dec 9, 2024

Build b1150f1.

Well that didn't write anything to the tape! Before it wrote to the tape but in the wrong place. Now it doesn't write anything.

copy.txt
copy script.txt

@uweseimet
Copy link
Owner Author

It is possible that the updated positioning code has changed the command sequence sent by your OS quite a lot. I will see soon.

@uweseimet
Copy link
Owner Author

I compared the two copy scripts (from today and yesterday), and there is a difference that is not related to my changes.

-i 3:0 -c 00:00:00:00:00:00
-i 3:0 -c 12:00:00:00:24:00
-i 3:0 -c 1a:00:00:00:ff:00
-i 3:0 -c 1a:00:00:00:ff:00
-i 3:0 -c 15:00:00:00:0c:00 -d 00:00:10:08:00:00:00:00:00:00:00:00
-i 3:0 -c 1a:00:00:00:ff:00
-i 3:0 -c 34:00:00:00:00:00:00:00:00:00

In the script from yesterday your OS did not run READ POSITION (last line). It is running READ POSITION several times now, yesterday it did not do that. IMO this means that something has changed in the test scenario.
I mentioned that you could safely restart s2p before running the copy operation, because I assumed that yesterday you did not restart, but ran all operations in sequence. Can you confirm that it was different this time?

No need to re-run the test, at least not yet. First I need to know whether the scenario has changed.

@Pacjunk
Copy link

Pacjunk commented Dec 9, 2024

I can't remember, but I don't think so. The errors are different, so maybe it is doing different commands trying to find the place to write.

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 9, 2024

It can't be just like that, because it uses READ POSITION (34:00:...) right at the beginning before any command that can change the tape position. At that time nothing has really happened. This means that the reason why READ POSITION is used is not related to the commands in the copy script.

Can you please run a test that also includes mounting the tape, so that we have a log and a script that covers mounting and copying?

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 9, 2024

What your OS was doing today is not that much different from yesterday, except for the final outcome. It's just the READ POSITION commands I stumble upon, because they were not there yesterday.
What effectively happens is that, like yesterday, your OS eventually reads the existing blocks and repositions. Then there is a difference, and this difference is where I changed something. This means, all in all, except for the additional READ POSITION commands, there is nothing unexpected. It still does not work, but the behavior has changed just at the place where yesterday something was wrong. This time end-of-data is hit, but this is because there is still a bug.

@Pacjunk
Copy link

Pacjunk commented Dec 9, 2024

Mount & copy combined...

mount&copy.txt
mount&copy script.txt

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 9, 2024

Thank you. In the meantime I have found something wrong, and I am working on it. Updating the unit tests after a code change has become more time consuming than updating the actual implementation.

The mount&copy script also has READ POSITION in it. I have no idea why this was different yesterday. But in practice for the copy operation this is not relevant. It even helps, because the logs contain more information, e.g. at the very end of the copy log:

00000000  00:00:00:00:00:00:00:08:00:00:00:08:00:00:00:00  '................'
00000010  00:00:00:00                                      '....'

This means that the tape is at logical object position 8. A logical object is a block or a filemark. The first logical object has position 0, i.e. 8 is line 9 (last tape mark at offset 452) in the SIMH file structure.

The next steps should be:

  1. Re-test the copy script with the latest issue_112 branch. I have fixed issues with reporting the position, i.e. the responses to some of the READ POSITION commands are different now. I assume that this will change the sequence of commands your OS uses. If the OS just stops again, please run the script from this test against your drive, on trace level. I would also need to see what your drive reports.
  2. Please remember to run the test with the switched drive and the UNIT ATTENTION change in Test tape device with a VAXstation 3100 running OpenVMS 7.2 #111.

Note that I have changed the default error handling of s2pexec. Running REQUEST SENSE automatically when a device reports CHECK CONDITION is not the default anymore. This is convenient when you manually run one or two commands. But the OS also runs REQUEST SENSE regularly. This results in duplicate executions and affects the logfile contents (but not what the device does), potentially resulting in a wrong analysis. The "-n" option is obsolete and does not exist anymore. Instead "-R" now explicitly forces a REQUEST SENSE.

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

In case you find this interesting, I did the copy test under simh.

This is an initialised empty tape:
1.tap.dmp

This is the tape after copying one file to it: (Note it goes at the HDR1 block, and also has an EOF1 block)
2.tap.dmp

@uweseimet
Copy link
Owner Author

These files are useful for testing locally, without your OS. Thank you.

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

Bingo, It works! Copied a file to the tape and read it back successfully. Same label and filename/contents etc as the simh image I supplied earlier.

copy.txt
copy script.txt
1.tap.dmp

I tried copying another small file and this also worked (two files on the tape). I then tried copying a larger file and this failed. The .tap file was still 64KB, so I assume the automatic appending to the file is not implemented yet!

Good to have a win!

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 10, 2024

Great news! I am sure we will also solve the remaining issues. This is just a matter of time.

Appending has already been implemented, but it is not the default. See http://www.scsi2pi/en/properties.html for the respective "append" property.
The value for the property is the maximum file size. I have to fix the website, the text is wrong. (Just fixed this.)

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

I just copied about 1MB of files to a larger .tap file. Read back OK. I will look into the properties.

@uweseimet
Copy link
Owner Author

This would have to be something like "device.3.append=MAXIMUM_FILE_SIZE".

@uweseimet
Copy link
Owner Author

By the way, the READ POSITION commands are gone ...

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

Yes, append works. I copied larger files and it extended.

As a bigger test, I mounted one of the simh tapes (the one I use for the backup/list test which contains about 50MB of data) on ID4 and copied this to a blank tape on ID3. Compared the two and they are identical.

Looking good.

@uweseimet
Copy link
Owner Author

If you have other test scenarios (different than #111), please go ahead. If there is something else that does not work it may require a new ticket.
If there is nothing wrong with writing, we may close #112.

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

Both read and write drives were under s2p, if that was not clear.

Hard to tell at this stage, but writing looks good. I did get an error at one stage:

[error] (ID:LUN 3:0) - Trailing record length 8192 at position 528 does not match leading length 8192

but I had restarted s2p between copy commands, so it may have got confused. Didn't see that message again with later testing.

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 10, 2024

Please add the commit ID you are currently using to this ticket.

A restart should not matter. Can you verify the error message with s2psimh? If there is something wrong with a file, s2psimh should also report that. The error refers to something almost at the beginning of a file.

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

I have overwritten that file many times by now. It did not occur again. Build number is ab7b07b.

I'm trying to build a boot tape at the moment. I want to see if I can boot the machine off the tape. This will be a good test as it uses different drivers (ROM based instead of OS based).

@uweseimet
Copy link
Owner Author

Sounds good!

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

Built the tape fine, but crashes when I try to boot off it. I will get you a log...

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

Short and sweet...

boot.txt
boot script.txt

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 10, 2024

The story continues ... ;-)

Please look at the log, this is not a SCSI command problem, but there is a low-level signal issue, like wrong termination.

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

Most of those messages came up after the system crashed...

BTW it boots off a hard drive image fine.

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

Weird - got the same crash message with s2p stopped. I think it must have left some line activated when it stopped as things only started working again when I turned the pi off. I will build a physical boot tape for comparison.

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 10, 2024

In such a case, what can help is launching s2pexec and then running the "-r" command, which resets the bus.

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

It is so slow writing a physical tape!! s2p is much faster!

@uweseimet
Copy link
Owner Author

uweseimet commented Dec 10, 2024

You are using a Pi 3 now, don't you? I Pi 4 would probably even be faster. But a Pi 5 might actually be slower. The GPIO bus works differently with the Pi 5. As far as I can tell, with the Pi 5 you cannot transfer all 8 SCSI data bits in a single operation, because you need more than one GPIO register for that. If this is really the case, it would have a considerable negative impact.

@Pacjunk
Copy link

Pacjunk commented Dec 10, 2024

Physical drive doing it too - I will have to investigate the hardware...

Off for the night.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

2 participants