Skip to content

USHIFT-1312 Test: Rebooting healthy system should result in backing up the data#1866

Merged
openshift-merge-robot merged 2 commits intoopenshift:mainfrom
pmtk:test-backup-on-boot
Jun 19, 2023
Merged

USHIFT-1312 Test: Rebooting healthy system should result in backing up the data#1866
openshift-merge-robot merged 2 commits intoopenshift:mainfrom
pmtk:test-backup-on-boot

Conversation

@pmtk
Copy link
Member

@pmtk pmtk commented Jun 1, 2023

Tests functionality of #1874

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 1, 2023
@openshift-ci openshift-ci bot requested review from jerpeter1 and pacevedom June 1, 2023 06:51
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2023
@pmtk
Copy link
Member Author

pmtk commented Jun 1, 2023

This test will probably require setting up current microshift-e2e to ignore ostree tags or provide mechanism to not run tests on non-ostree-based systems

@pmtk
Copy link
Member Author

pmtk commented Jun 1, 2023

/cc @dhellmann @jogeo

@openshift-ci openshift-ci bot requested review from dhellmann and jogeo June 1, 2023 06:53
@pmtk pmtk changed the title WIP USHIFT-1083 Test backup after reboot from healthy system USHIFT-1083 Test backup after reboot from healthy system Jun 2, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2023
@pmtk pmtk changed the title USHIFT-1083 Test backup after reboot from healthy system WIP USHIFT-1083 Test backup after reboot from healthy system Jun 2, 2023
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2023
@pmtk pmtk changed the title WIP USHIFT-1083 Test backup after reboot from healthy system USHIFT-1083 Test backup after reboot from healthy system Jun 2, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2023
@pmtk pmtk changed the title USHIFT-1083 Test backup after reboot from healthy system WIP USHIFT-1083 Test backup after reboot from healthy system Jun 2, 2023
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2023
@pmtk pmtk force-pushed the test-backup-on-boot branch from 9e7aaac to d91703e Compare June 5, 2023 11:59
@pmtk pmtk changed the title WIP USHIFT-1083 Test backup after reboot from healthy system USHIFT-1083 Test backup after reboot from healthy system Jun 5, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 5, 2023
@pmtk
Copy link
Member Author

pmtk commented Jun 5, 2023

/cc @dhellmann @pacevedom @jogeo @dhensel-rh

This PR needs some sort mechanism to not run tests intended for R4E/ostree-based systems on RPM.
Approaches that come to mind:

  • doing a runtime check and deciding if test should run or be skipped
  • using tags to filter out irrelevant test suites
    any other ideas?

@openshift-ci openshift-ci bot requested a review from dhensel-rh June 5, 2023 14:43
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK, but I see the e2e test is failing with an error in this new test suite

Keyword 'Greenboot Health Check Exited' failed after retrying for 5 minutes. The last error was: dead != exited

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more robust if we looked for a different boot id after the reboot, to avoid any race condition with issuing the reboot command and sshd stopping. It doesn't seem like the most likely race, so let's make that change in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know the exact path, right? Can we just remove it, instead of using find?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With current flow (make new backup, delete old backup) there's always a chance that more than one backup will be present, this just makes sure that we're starting clean.
And we don't necessarily know the path, with randomized order of tests, in future, we could have a backup from two boots away, or more.
Also, health.json might already change (post-healthchecks) and point to current boot instead of previous

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we probably want to do something to limit the depth of the find call, then. Is it enough to say rm -rf ${deployID}* instead of using find at all?

And we probably want to check for deployID being empty, to ensure we don't delete everything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find was only looking into /var/lib/microshift-backups/ but I changed it to rm -rf because it's easier to read

@pmtk pmtk changed the title USHIFT-1083 Test backup after reboot from healthy system USHIFT-1312 Test backup after reboot from healthy system Jun 6, 2023
@pmtk pmtk changed the title USHIFT-1312 Test backup after reboot from healthy system USHIFT-1312 Test: Rebooting healthy system should result in backing up the data Jun 6, 2023
@pmtk pmtk force-pushed the test-backup-on-boot branch from 2f11501 to b5622e1 Compare June 6, 2023 15:54
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is in separate dir so e2e CI job doesn't try to run it against non-ostree system.
When we have e2e for ostree systems we can think if we want to move it back to suites/ and use tags (or other method to filter tests)

@pmtk
Copy link
Member Author

pmtk commented Jun 6, 2023

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 6, 2023
@pmtk
Copy link
Member Author

pmtk commented Jun 7, 2023

/remove-label tide/merge-method-squash

Remembered that tide's squash results in ugly commit messages. I'll squash the commits myself

@openshift-ci openshift-ci bot removed the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 7, 2023
@pmtk pmtk force-pushed the test-backup-on-boot branch from b5622e1 to 89044be Compare June 7, 2023 06:58
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, but this generally looks good.

@pmtk pmtk force-pushed the test-backup-on-boot branch from 89044be to b82e880 Compare June 7, 2023 13:56
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'll leave it for @jogeo to approve since he wanted a chance to review, too.

Copy link
Contributor

@jogeo jogeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this test, it looks good over all. I've left a few questions and comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect there to always be only one deployment, so the index of 0 will always be the deployment we should get the "id" from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses kw Get Booted Deployment Status which runs rpm-ostree status --booted --json
Thanks to --booted it will only output one deployment, but it's still using the same data structure, i.e. deployments in an array. With --booted it will only have 1 element.

Without that switch, 0 could be either booted or staged deployment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'Evaluate json.loads' was used above in keyword "Get Booted Deployment Status". Should we be consistent in the pattern we use to ingest and parse json?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, 'jq' is used by scripts such as microshift_set_healthy.sh, so it's not a bad thing that we are also using it in our test scripts. My initial comment was just a reaction to seeing json parsed in the same PR using two different approaches.

Copy link
Member Author

@pmtk pmtk Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. jq here felt simpler, otherwise I'd have either cat and use json.loads or... I don't know what, I don't think SSHLibrary.Get File supports privilege escalation and health.json is only readable by root.
Also Get File scp's file, I'd rather use some kind of slurp which I haven't seen in that library.

Maybe we could create some helper keywords to slurping and modifying files as root and have flow consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of keywords for grabbing files only readable by root. Save Default MicroShift Config uses cat to grab the contents of the config file, and it could use the same reusable keyword for that step. Then something like Upload String To File could be modified to take arguments for the owner and mode of the output file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test path here with 'Should Not Be Empty'? I see that 'Get Booted Deployment Backup Prefix Path' has a check for empty values. However, if the code is changed to use a different key word at some point to get path we would not want to accidentally call 'rm -rf *'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For YAML, I found it handy to parse the text and then return a RF DottedDict type to make accessing the data from RF keywords easier. See the YAML library in the resources directory.

@pmtk pmtk force-pushed the test-backup-on-boot branch from b82e880 to 825995f Compare June 12, 2023 11:06
@pmtk pmtk force-pushed the test-backup-on-boot branch from 825995f to a7d4fdf Compare June 12, 2023 13:50
@ggiguash
Copy link
Contributor

Let's merge this in the current form. The tests are not going to run because they are in a separate folder.
We now have a metal environment to run this tests, so let's try those out.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 19, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 19, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggiguash, pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pmtk
Copy link
Member Author

pmtk commented Jun 19, 2023

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 19, 2023

@pmtk: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants