-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LX-1550 create migration image #154
Conversation
fd6ec6d
to
d76b5fd
Compare
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
Another note, have you talked with @jgallag88 on how this would interact with his work on platform-dependent images? Also, would we want to have a mechanism (i.e. some environment variable, or just checking if DLPX_KEY_URL is empty) to skip generating the migration images? |
live-build/misc/ansible-roles/appliance-build.minimal-common/tasks/main.yml
Outdated
Show resolved
Hide resolved
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
IMO, I think this is a good idea, but I'd rather us create a way to do this for all/any of the artifacts rather than add something specific for the migration image. We have a similar need for selectively skipping any/all of the artifacts generated, so I think it'd be better to solve that problem more generally in a later change, then have a one-off solution only for migration images. |
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bors delegate+
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
✌️ sdimitro can now approve this pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides the 2 changes to find.
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
live-build/misc/live-build-hooks/90-linux-migration-artifact.binary
Outdated
Show resolved
Hide resolved
This change creates a new hook for live-build that creates a tarball for illumos to Linux migrations. The tarball contains our normal Linux root filesystem, packaged in a way that it can be understood by our current upgrade logic (dx_unpack specifically). The signed hashes file is not generated by default. If you need that file please specify the following environment variables: - DLPX_SIGNING_LOGIN - DLPX_KEY_URL In addition, I've added the pigz package to the environment of the docker container to speed up the compression of the tarball. Using the single-threaded compression from built into tar it would take ~12m 40s to archive and compress 21GB of data. Pigz in the container with 2 CPUs brings that to ~9m 41s. Finally, I've created a migration-scripts directory similar to the upgrade one. It may seem useless in this PR as I only have dx_prepare which is basically a no-op, but I actually do have some WIP scripts there in my local repo for the next steps of the migration. After running appliance build with the two new environment variables set I scp'd the compressed migration tar artifact of the interna-qa variant to an illumos 5.3 VM and called dx_unpack: ``` $ sudo /opt/delphix/server/bin/upgrade/dx_unpack internal-qa.migration.tar.gz Progress increment: 18:55:58:872276013+0000, 10, Extracting upgrade image 18:55:58:875080352:+0000: Unpacking internal-qa.migration.tar.gz ... 19:05:24:436752844:+0000: done. Progress increment: 19:05:24:440796731+0000, 40, verifying format Progress increment: 19:05:24:507477818+0000, 50, verifying signature 19:05:24:510590314:+0000: Verify signature ... Verified OK Progress increment: 19:05:25:072324099+0000, 70, Verifying integrity of upgrade image 19:05:25:075460462:+0000: Verifying contents of dx_apply ... dx_apply: OK 19:05:26:518439350:+0000: Verifying contents of dx_execute ... dx_execute: OK 19:05:26:534407155:+0000: Verifying contents of dx_prepare ... dx_prepare: OK 19:05:26:547907130:+0000: Verifying contents of dx_verify ... dx_verify: OK 19:05:26:558378011:+0000: Verifying contents of os-root.cpio ... os-root.cpio: OK 19:07:03:668049935:+0000: Verifying contents of os-root.hashes ... os-root.hashes: OK 19:07:03:795593227:+0000: Verifying contents of version.info ... version.info: OK Progress increment: 19:07:03:809621540+0000, 80, Verifying integrity of upgrade image done Progress increment: 19:07:03:936732121+0000, 90, preparing upgrade image Progress increment: 19:07:03:944176874+0000, 100, unpacking successful ``` There reason why I tried the internal-qa variant and not the internal-dev was that the qa variant does not include the source code of the app-gate and masking and thus takes less space. (/var/dlpx-update which is the place where images are unpacked has a reservation of 12G so internal-dev doesn't fit). Unfortunately we can't really boot into Linux from 5.3 as the rpool has features like the Log Spacemap that are enabled and don't exist in ZoL. For that reason I ensured that I could boot into the Linux image by untaring the internal-qa variant in a 5.2 VM and doing the following things (some of them will end up in the migration version of dx_apply which is currently a WIP): ``` TMPDIR=$(mktemp -d -p "/tmp" -t delphix.XXXXXXX) FSNAME=$(basename "$TMPDIR") zfs create -o canmount=off -o mountpoint=none "$RPOOL/ROOT" zfs create -o canmount=noauto -o mountpoint=/ "$RPOOL/ROOT/$FSNAME" TMP_ROOT="$TMPDIR/root" zfs set mountpoint="$TMP_ROOT" "$RPOOL/ROOT/$FSNAME" zfs mount "$RPOOL/ROOT/$FSNAME" (cd $TMP_ROOT; cpio -idmu) < $ARCHIVE_DIR/os-root.cpio cp $(ls $TMP_ROOT/boot/initrd.img-* | sort -r | head -n 1) /boot cp $(ls $TMP_ROOT/boot/vmlinuz-* | sort -r | head -n 1) /boot zfs umount "$RPOOL/ROOT/$FSNAME" zfs set mountpoint=/ "$RPOOL/ROOT/$FSNAME" ``` Then rebooting the machine and getting in the bootloader's prompt: ``` > set console=ttya > load load /boot/vmlinuz-4.15.0-38-generic root=ZFS=rpool/ROOT/delphix.HIFKlSm console=tty0 console=ttyS0,115200n8 zfsforce=1 > load -t rootfs /boot/initrd.img-4.15.0-38-generic > boot ``` Then wait for the new system to boot up and login (includes interesting bits from console): ``` ... Begin: Sleeping for ... done. [ 6.247809] spl: loading out-of-tree module taints kernel. [ 6.250949] spl: module verification failed: signature and/or required key missing - tainting kernel [ 6.257025] znvpair: module license 'CDDL' taints kernel. [ 6.259017] Disabling lock debugging due to kernel taint [ 7.939793] ZFS: Loaded module v1.0.0-20181101T233420-aa2d882, ZFS pool version 5000, ZFS filesystem version 5 Begin: Sleeping for ... done. Begin: Importing ZFS root pool 'rpool' ... Begin: Importing pool 'rpool' using defaults ... [ 8.136087] print_req_error: I/O error, dev fd0, sector 0 [ 8.138347] floppy: error 10 while reading block 0 [ 8.249208] random: crng init done [ 8.250434] random: 7 urandom warning(s) missed due to ratelimiting done. done. Begin: Mounting 'rpool/ROOT/delphix.HIFKlSm' on '/root//' ... done. Begin: Running /scripts/local-bottom ... done. ... [ OK ] Mounted Kernel Debug File System. ... [ OK ] Started udev Wait for Complete Device Initialization. [ OK ] Reached target ZFS pool import target. Starting Mount ZFS filesystems... [ OK ] Started Mount ZFS filesystems. ... [ OK ] Started Initial cloud-init job (metadata service crawler). [ OK ] Reached target Cloud-config availability. ... [ OK ] Started ZFS Event Daemon (zed). [ OK ] Started D-Bus System Message Bus. [ OK ] Started Daily Cleanup of Temporary Directories. Starting ZFS file system shares... ... [ OK ] Started Delphix Perfomance Statistics. [ OK ] Started Delphix default PostgreSQL database server. [ OK ] Started Delphix Appliance Platform Service ... localhost login: delphix password: ``` And then within the VM: ``` delphix@localhost:~$ uname -a Linux localhost 4.15.0-38-generic delphix#42-Ubuntu SMP Tue Oct 23 15:48:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux ``` Other notes: Postgress is up but the app-stack fails with failures like the following in the error log: ``` Context initialization failed java.lang.NullPointerException ... at java.lang.NullPointerException.<init>(NullPointerException.java:60) at com.delphix.appliance.server.upg.impl.UpgradeManagerImpl.setVersionStatus(UpgradeManagerImpl.java:791) at com.delphix.appliance.server.upg.impl.UpgradeManagerImpl.versionFromDirectoryLinux(UpgradeManagerImpl.java:850) at com.delphix.appliance.server.upg.impl.UpgradeManagerImpl.addNewVersions(UpgradeManagerImpl.java:1022) at com.delphix.appliance.server.upg.impl.UpgradeManagerImpl.update(UpgradeManagerImpl.java:907) at com.delphix.appliance.server.upg.impl.UpgradeManagerImpl.start(UpgradeManagerImpl.java:463) ``` Will look into it but it can be part of a separate issue.
bors r+ |
154: LX-1550 create migration image r=sdimitro a=sdimitro This change creates a new hook for live-build that creates a tarball for illumos to Linux migrations. The tarball contains our normal Linux root filesystem, packaged in a way that it can be understood by our current upgrade logic (dx_unpack specifically). The signed hashes file is not generated by default. If you need that file please specify the following environment variables: - DLPX_SIGNING_LOGIN - DLPX_KEY_URL In addition, I've added the pigz package to the environment of the docker container to speed up the compression of the tarball. Using the single-threaded compression from built into tar it would take ~12m 40s to archive and compress 21GB of data. Pigz in the container with 2 CPUs brings that to ~9m 41s. Finally, I've created a migration-scripts directory similar to the upgrade one. It may seem useless in this PR as I only have dx_prepare which is basically a no-op, but I actually do have some WIP scripts there in my local repo for the next steps of the migration. After running appliance build with the two new environment variables set I scp'd the compressed migration tar artifact of the interna-qa variant to an illumos 5.3 VM and called dx_unpack: ``` $ sudo /opt/delphix/server/bin/upgrade/dx_unpack internal-qa.migration.tar.gz Progress increment: 18:55:58:872276013+0000, 10, Extracting upgrade image 18:55:58:875080352:+0000: Unpacking internal-qa.migration.tar.gz ... 19:05:24:436752844:+0000: done. Progress increment: 19:05:24:440796731+0000, 40, verifying format Progress increment: 19:05:24:507477818+0000, 50, verifying signature 19:05:24:510590314:+0000: Verify signature ... Verified OK Progress increment: 19:05:25:072324099+0000, 70, Verifying integrity of upgrade image 19:05:25:075460462:+0000: Verifying contents of dx_apply ... dx_apply: OK 19:05:26:518439350:+0000: Verifying contents of dx_execute ... dx_execute: OK 19:05:26:534407155:+0000: Verifying contents of dx_prepare ... dx_prepare: OK 19:05:26:547907130:+0000: Verifying contents of dx_verify ... dx_verify: OK 19:05:26:558378011:+0000: Verifying contents of os-root.cpio ... os-root.cpio: OK 19:07:03:668049935:+0000: Verifying contents of os-root.hashes ... os-root.hashes: OK 19:07:03:795593227:+0000: Verifying contents of version.info ... version.info: OK Progress increment: 19:07:03:809621540+0000, 80, Verifying integrity of upgrade image done Progress increment: 19:07:03:936732121+0000, 90, preparing upgrade image Progress increment: 19:07:03:944176874+0000, 100, unpacking successful ``` There reason why I tried the internal-qa variant and not the internal-dev was that the qa variant does not include the source code of the app-gate and masking and thus takes less space. (/var/dlpx-update which is the place where images are unpacked has a reservation of 12G so internal-dev doesn't fit). Unfortunately we can't really boot into Linux from 5.3 as the rpool has features like the Log Spacemap that are enabled and don't exist in ZoL. For that reason I ensured that I could boot into the Linux image by untaring the internal-qa variant in a 5.2 VM and doing the following things (some of them will end up in the migration version of dx_apply which is currently a WIP): ``` TMPDIR=$(mktemp -d -p "/tmp" -t delphix.XXXXXXX) FSNAME=$(basename "$TMPDIR") zfs create -o canmount=off -o mountpoint=none "$RPOOL/ROOT" zfs create -o canmount=noauto -o mountpoint=/ "$RPOOL/ROOT/$FSNAME" TMP_ROOT="$TMPDIR/root" zfs set mountpoint="$TMP_ROOT" "$RPOOL/ROOT/$FSNAME" zfs mount "$RPOOL/ROOT/$FSNAME" (cd $TMP_ROOT; cpio -idmu) < $ARCHIVE_DIR/os-root.cpio cp $(ls $TMP_ROOT/boot/initrd.img-* | sort -r | head -n 1) /boot cp $(ls $TMP_ROOT/boot/vmlinuz-* | sort -r | head -n 1) /boot zfs umount "$RPOOL/ROOT/$FSNAME" zfs set mountpoint=/ "$RPOOL/ROOT/$FSNAME" ``` Then rebooting the machine and getting in the bootloader's prompt: ``` > set console=ttya > load load /boot/vmlinuz-4.15.0-38-generic root=ZFS=rpool/ROOT/delphix.HIFKlSm console=tty0 console=ttyS0,115200n8 zfsforce=1 > load -t rootfs /boot/initrd.img-4.15.0-38-generic > boot ``` Then wait for the new system to boot up and login (includes interesting bits from console): ``` ... Begin: Sleeping for ... done. [ 6.247809] spl: loading out-of-tree module taints kernel. [ 6.250949] spl: module verification failed: signature and/or required key missing - tainting kernel [ 6.257025] znvpair: module license 'CDDL' taints kernel. [ 6.259017] Disabling lock debugging due to kernel taint [ 7.939793] ZFS: Loaded module v1.0.0-20181101T233420-aa2d882, ZFS pool version 5000, ZFS filesystem version 5 Begin: Sleeping for ... done. Begin: Importing ZFS root pool 'rpool' ... Begin: Importing pool 'rpool' using defaults ... [ 8.136087] print_req_error: I/O error, dev fd0, sector 0 [ 8.138347] floppy: error 10 while reading block 0 [ 8.249208] random: crng init done [ 8.250434] random: 7 urandom warning(s) missed due to ratelimiting done. done. Begin: Mounting 'rpool/ROOT/delphix.HIFKlSm' on '/root//' ... done. Begin: Running /scripts/local-bottom ... done. ... [ OK ] Mounted Kernel Debug File System. ... [ OK ] Started udev Wait for Complete Device Initialization. [ OK ] Reached target ZFS pool import target. Starting Mount ZFS filesystems... [ OK ] Started Mount ZFS filesystems. ... [ OK ] Started Initial cloud-init job (metadata service crawler). [ OK ] Reached target Cloud-config availability. ... [ OK ] Started ZFS Event Daemon (zed). [ OK ] Started D-Bus System Message Bus. [ OK ] Started Daily Cleanup of Temporary Directories. Starting ZFS file system shares... ... [ OK ] Started Delphix Perfomance Statistics. [ OK ] Started Delphix default PostgreSQL database server. [ OK ] Started Delphix Appliance Platform Service ... localhost login: delphix password: ``` And then within the VM: ``` delphix@localhost:~$ uname -a Linux localhost 4.15.0-38-generic #42-Ubuntu SMP Tue Oct 23 15:48:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux ``` Other notes: Postgress is up but the app-stack fails with failures like the following in the error log: ``` Context initialization failed java.lang.NullPointerException ... at java.lang.NullPointerException.<init>(NullPointerException.java:60) at com.delphix.appliance.server.upg.impl.UpgradeManagerImpl.setVersionStatus(UpgradeManagerImpl.java:791) at com.delphix.appliance.server.upg.impl.UpgradeManagerImpl.versionFromDirectoryLinux(UpgradeManagerImpl.java:850) at com.delphix.appliance.server.upg.impl.UpgradeManagerImpl.addNewVersions(UpgradeManagerImpl.java:1022) at com.delphix.appliance.server.upg.impl.UpgradeManagerImpl.update(UpgradeManagerImpl.java:907) at com.delphix.appliance.server.upg.impl.UpgradeManagerImpl.start(UpgradeManagerImpl.java:463) ``` Will look into it but it can be part of a separate issue. Co-authored-by: Serapheim Dimitropoulos <[email protected]>
Build succeeded
|
This change creates a new hook for live-build that creates a tarball
for illumos to Linux migrations. The tarball contains our normal
Linux root filesystem, packaged in a way that it can be understood
by our current upgrade logic (dx_unpack specifically).
The signed hashes file is not generated by default. If you need that
file please specify the following environment variables:
In addition, I've added the pigz package to the environment of the
docker container to speed up the compression of the tarball.
Using the single-threaded compression from built into tar it would
take ~12m 40s to archive and compress 21GB of data. Pigz in the
container with 2 CPUs brings that to ~9m 41s.
Finally, I've created a migration-scripts directory similar to the
upgrade one. It may seem useless in this PR as I only have dx_prepare
which is basically a no-op, but I actually do have some WIP scripts
there in my local repo for the next steps of the migration.
After running appliance build with the two new environment variables
set I scp'd the compressed migration tar artifact of the interna-qa
variant to an illumos 5.3 VM and called dx_unpack:
There reason why I tried the internal-qa variant and not the
internal-dev was that the qa variant does not include the source
code of the app-gate and masking and thus takes less space.
(/var/dlpx-update which is the place where images are unpacked has
a reservation of 12G so internal-dev doesn't fit).
Unfortunately we can't really boot into Linux from 5.3 as the rpool
has features like the Log Spacemap that are enabled and don't exist
in ZoL. For that reason I ensured that I could boot into the Linux
image by untaring the internal-qa variant in a 5.2 VM and doing the
following things (some of them will end up in the migration version
of dx_apply which is currently a WIP):
Then rebooting the machine and getting in the bootloader's prompt:
Then wait for the new system to boot up and login (includes interesting bits from console):
And then within the VM:
Other notes:
Postgress is up but the app-stack fails with failures like the following in the
error log:
Will look into it but it can be part of a separate issue.