Skip to content

USHIFT-1083 microshift admin data backup command#1819

Merged
openshift-merge-robot merged 27 commits intoopenshift:mainfrom
pmtk:upgrade-ostree-backup-new
Jun 1, 2023
Merged

USHIFT-1083 microshift admin data backup command#1819
openshift-merge-robot merged 27 commits intoopenshift:mainfrom
pmtk:upgrade-ostree-backup-new

Conversation

@pmtk
Copy link
Member

@pmtk pmtk commented May 18, 2023

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2023
@openshift-ci openshift-ci bot requested review from dhellmann and pacevedom May 18, 2023 18:16
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 18, 2023
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start.

How can we make sure MicroShift is not running before taking the backup? Is that actually a requirement or does cp work ok in that case?

@pmtk
Copy link
Member Author

pmtk commented May 18, 2023

How can we make sure MicroShift is not running before taking the backup? Is that actually a requirement or does cp work ok in that case?

That would be the thing with ps aux I had in previous PR, thanks for the reminder

@pmtk pmtk changed the title WIP Command to back up the data USHIFT-1083 microshift admin backup command May 20, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 20, 2023
@pmtk
Copy link
Member Author

pmtk commented May 21, 2023

/retest

@pmtk
Copy link
Member Author

pmtk commented May 22, 2023

/assign @dhellmann @pacevedom

@pmtk
Copy link
Member Author

pmtk commented May 23, 2023

/retest

@pmtk pmtk force-pushed the upgrade-ostree-backup-new branch from 4af7f85 to 8548161 Compare May 25, 2023 06:29
@pmtk
Copy link
Member Author

pmtk commented May 25, 2023

/test e2e-openshift-conformance-reduced-arm

Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall logic looks good. I have a couple of specific points inline, and I think it would be worth reviewing the logging advice at https://docs.openstack.org/oslo.log/latest/user/guidelines.html#overall-logging-principles and thinking about how to apply that here, both for the use of different log levels and for the "unit of work" approach described for info log messages.

Restore(BackupConfig) error
}

func NewManager() *manager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't the config be passed in when creating the manager?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to use it like this (src):

a.dataManager.Backup(data.BackupConfig{
			Storage: config.BackupsDir,
			Name:    prevBootHealth.DeploymentID,
		})

When I'll need to restore backup for another deployment, I don't want to create another manager.
Should manager take backupsDir and Backup/Restore functions take name of the backup?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that the backup directory could be held in the manager, and then the functions could take arguments that are more dynamic, like the name to restore from. A method like List() would also use the backup directory, but not the name value in BackupConfig.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it so the manager has "storage" as field and Backup/Restore only take BackupName

Restore(BackupConfig) error
}

func NewManager() *manager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that the backup directory could be held in the manager, and then the functions could take arguments that are more dynamic, like the name to restore from. A method like List() would also use the backup directory, but not the name value in BackupConfig.

pkg/cmd/admin.go Outdated
},
}
v := version.Get()
data.PersistentFlags().String(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for making the storage location configurable from the command line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say nice to have for RPM systems where backups are manual. But i don't have concrete use case other than NFS (or different partition) mount for backups, but then CoW goes out of window

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense. In that case I think we're going to not want to overwrite an existing backup, so we should think about how to handle that. Maybe just a check in the CLI code, before we get into the low-level code, where we do want old backups overwritten.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added BackupExists() and --force

}

if err := renamePath(dest_tmp, dest); err != nil {
klog.Errorf("Renaming path failed - renaming %s back and deleting %s: %v", dest_old, dest_tmp, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dest_old will only exist if the original location existed when this function started, so if the rename on line 60 fails for some other reason then the logic in this section is going to produce more errors.

What are the situations that could result in the location already existing? We can definitely write code to produce a unique name, so I think we only have to worry about the collision if the name comes from the user via the command line? What if we just refuse to take the backup in that case? Or don't let the user specify the full backup location (give us a directory and we pick a name in that directory)?

Copy link
Member Author

@pmtk pmtk May 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same code will be used in automated ostree backing up so 3rd boot into the same healthy deployment will overwrite backup taken during 2nd boot (and since backup name is 'deploymentID' the dir will already exist)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to move some of that overwrite logic "up" the call stack to a place where the code knows whether it is being invoked by the user or by automation. That will make the edge cases easier to reason about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added function to interface BackupExists(BackupName) and it's called in the CLI. It will refuse to overwrite a backup unless --force is provided

pmtk added 2 commits May 29, 2023 12:38
klog includes filename in the logs, but if we add more manager.go
then it'll be harder to know which one logged the message
@pmtk pmtk force-pushed the upgrade-ostree-backup-new branch from 1d04379 to 4bb597f Compare May 29, 2023 10:40
@pmtk pmtk changed the title USHIFT-1083 microshift admin backup command USHIFT-1083 microshift admin data backup command May 29, 2023
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/hold for QE verification

}

if err := renamePath(tmp, dest); err != nil {
klog.Errorf("Renaming path failed - renaming %s back and deleting %s: %v", old, tmp, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be interesting to set up a test for this error case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is so unlikely, do you think that "recovery actions" (renamePath and removePath) are unnecessary? I would suspect that if one rename fails, then other might fail as well but since go might return error I thought this is the way to handle it (try to revert)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked on slack about a way to generate unique names to avoid having to rename anything. Let's land this version and make that change in another PR.

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels May 30, 2023
@pmtk
Copy link
Member Author

pmtk commented May 31, 2023

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label May 31, 2023
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label May 31, 2023
Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold

@jogeo, please remove the hold when you've had a chance to verify the change.

}

if err := renamePath(tmp, dest); err != nil {
klog.Errorf("Renaming path failed - renaming %s back and deleting %s: %v", old, tmp, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked on slack about a way to generate unique names to avoid having to rename anything. Let's land this version and make that change in another PR.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 31, 2023
}
klog.InfoS("Backup storage directory created", "path", dm.storage)
} else {
klog.InfoS("Backup storage directory already existed", "path", dm.storage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indicates an attempt to write over an existing backup, right? I would expect that to be an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, that's the "parent directory" where backups are stored. Maybe you have how can we name it better ("backup storage") or maybe just simply delete this log (and keep only one about creating /var/lib/microshift-backups/)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what's going on now.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 31, 2023

@pmtk: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openshift-conformance-reduced-arm a6e672c link false /test e2e-openshift-conformance-reduced-arm

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold

Holding for @jogeo to verify again

}
klog.InfoS("Backup storage directory created", "path", dm.storage)
} else {
klog.InfoS("Backup storage directory already existed", "path", dm.storage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what's going on now.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 31, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhellmann, pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jogeo
Copy link
Contributor

jogeo commented Jun 1, 2023

I've preformed some early testing on this PR and it behaves as expected.
/unhold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants