Skip to content

Conversation

@davidjumani
Copy link
Contributor

@davidjumani davidjumani commented Oct 5, 2020

Description

Fixes #4371

In the code, all servers, and their count belonging to the same zone are returned. Then they are filtered for suitability.
This is fine except for the point where VM volumes existing on managed storage pools or hosts on which those storage pools are not suitable. In this case it removes the hosts rather than marking them as not suitable. This messes up the count. Subtracting the non suitable hosts count does not fix this since we're paginating the response so unsuitable hosts and thier count from the current page will be removed but will not fix the total count since hosts on other pages can not be filtered.
Another place is that the source host is removed from the list but the count is not adjusted

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

How Has This Been Tested?

Before :

{
  "count": 4,
  "host": [
    {},
    {},
    {}
  ]
}

After :

{
  "count": 3,
  "host": [
    {},
    {},
    {}
  ]
}

@davidjumani davidjumani force-pushed the fix-findhostsformigration branch 3 times, most recently from a32685e to 182af0d Compare October 5, 2020 16:19
@davidjumani davidjumani added this to the 4.15.0.0 milestone Oct 5, 2020
@davidjumani
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@davidjumani a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2119

@davidjumani
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@davidjumani a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@davidjumani davidjumani force-pushed the fix-findhostsformigration branch 3 times, most recently from fd4cc97 to 9668193 Compare October 5, 2020 19:36
@blueorangutan
Copy link

Trillian test result (tid-2894)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 37351 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4375-t2894-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 83 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 301.45 test_vpc_redundant.py
test_hostha_kvm_host_fencing Error 168.51 test_hostha_kvm.py

@davidjumani
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@davidjumani a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2120

@davidjumani
Copy link
Contributor Author

@blueorangutan test

@davidjumani davidjumani marked this pull request as ready for review October 6, 2020 04:37
@sureshanaparti
Copy link
Contributor

@davidjumani can you please add the root cause for the count mismatch and the fix to the PR description (and also commit msg).

// If the source and target storage pool is the same and it is managed, then we still have to perform a storage migration
// because we need to create a new target volume and copy the contents of the source volume into it before deleting the
// source volume.
iterator.remove();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidjumani any reason, why unsuitable hosts are not removed from the hosts list ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that the total count and hosts returned remain the same. Rather than removing them, they're marked as not suitable
https://github.com/apache/cloudstack/blob/master/api/src/main/java/org/apache/cloudstack/api/command/admin/host/FindHostsForMigrationCmd.java#L94

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in actual production, there can be lot of unsuitable hosts. say 60 out of 100 are unsuitable, in this case, list all these and marking unsuitable doesn't sound good. thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidjumani better to update the count result from searchForServers(), when unsuitable hosts are removed, instead changing the existing code for suitable hosts, can cause regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but since we're paginating the result, we can not remove the hosts in page 2 when we've only fetched the ones in page 1. So marking it as not compatible so that the total count is not broken

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pagination is the UI thing, may be some other alternative that can be worked upon. If someone uses the API, doesn't look good to return all hosts, marking suitable & unsuitable.

Copy link
Contributor Author

@davidjumani davidjumani Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok @davidjumani if possible, display the suitable hosts first in the UI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2160

@DaanHoogland
Copy link
Contributor

@blueorangutan test centos7 kvm-centos7 keepEnv

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@DaanHoogland
Copy link
Contributor

@blueorangutan test centos7 kvm-centos7

@apache apache deleted a comment from blueorangutan Oct 15, 2020
@DaanHoogland
Copy link
Contributor

@blueorangutan test centos7 kvm-centos7 keepEnv

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-2964)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 37917 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4375-t2964-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 83 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 461.19 test_vpc_redundant.py
test_hostha_kvm_host_fencing Error 172.18 test_hostha_kvm.py

Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, not tested though

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2200

@apache apache deleted a comment from blueorangutan Oct 20, 2020
@apache apache deleted a comment from blueorangutan Oct 20, 2020
@apache apache deleted a comment from rohityadavcloud Oct 20, 2020
@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3025)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 36556 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4375-t3025-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermittent failure detected: /marvin/tests/smoke/test_hostha_kvm.py
Smoke tests completed. 83 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 310.41 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 295.68 test_vpc_redundant.py
test_hostha_kvm_host_fencing Error 169.68 test_hostha_kvm.py

@DaanHoogland
Copy link
Contributor

@davidjumani can you have a quick look at the errors above?

@davidjumani
Copy link
Contributor Author

@DaanHoogland The host fencing errors are due to com.google.gson.stream.MalformedJsonException: Unterminated object near assword":"p*****},"resourcestate":"Enabl'}
Think that's beeen fixed in #4387
Need to look into the vpc issues

@davidjumani
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@davidjumani a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2268

@davidjumani
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@davidjumani a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3062)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 41373 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4375-t3062-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 84 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 321.16 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 296.90 test_vpc_redundant.py

@DaanHoogland DaanHoogland merged commit 81c524e into apache:master Oct 24, 2020
@DaanHoogland DaanHoogland deleted the fix-findhostsformigration branch October 24, 2020 11:08
@davidjumani davidjumani mentioned this pull request Nov 24, 2020
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect count returned by findHostsForMigration

5 participants