Skip to content

Conversation

@harikrishna-patnala
Copy link
Contributor

@harikrishna-patnala harikrishna-patnala commented Sep 9, 2024

Description

CloudStack has a global setting "wait" which controls the wait time by the management server for all the commands which it sends. There are also some command specific timeout settings which controls specific commands like "ready.command.wait", "kvm.storage.online.migration.wait", etc.

This PR simplifies these configurations by introducing a new parent global setting which serves as a single point to configure any command timeout. The new global setting is "commands. timeout", this timeout overrides the wait global config. This holds a comma separated key value pairs containing timeout for specific commands. For example: DhcpEntryCommand=600, SavePasswordCommand=300, VmDataCommand=300.

There are also some hardcoded values defined in the code base for few commands. so the order of precedence of all these timeouts will be

"commands.timeout" > Existing global settings > Hardcoded value for the command > global "wait"

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale

  • Major
  • Minor

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link

codecov bot commented Sep 9, 2024

Codecov Report

Attention: Patch coverage is 46.39175% with 52 lines in your changes missing coverage. Please review.

Project coverage is 16.05%. Comparing base (a4224e5) to head (124da2b).
Report is 1 commits behind head on 4.20.

Files with missing lines Patch % Lines
...java/com/cloud/agent/manager/AgentManagerImpl.java 24.07% 40 Missing and 1 partial ⚠️
.../cloud/configuration/ConfigurationManagerImpl.java 73.80% 7 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.20    #9659   +/-   ##
=========================================
  Coverage     16.04%   16.05%           
- Complexity    12848    12858   +10     
=========================================
  Files          5639     5639           
  Lines        493945   494040   +95     
  Branches      59901    59917   +16     
=========================================
+ Hits          79272    79316   +44     
- Misses       405854   405899   +45     
- Partials       8819     8825    +6     
Flag Coverage Δ
uitests 4.02% <ø> (ø)
unittests 16.89% <46.39%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 11058

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11069

Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11439)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 52865 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9659-t11439-kvm-ol8.zip
Smoke tests completed. 139 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_UpdateConfigParamWithScope Error 0.08 test_global_settings.py
test_01_snapshot_usage Error 4.24 test_usage.py

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11092

@sureshanaparti
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@yadvr yadvr requested a review from DaanHoogland September 12, 2024 06:38
@blueorangutan
Copy link

[SF] Trillian test result (tid-11457)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 57345 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9659-t11457-kvm-ol8.zip
Smoke tests completed. 138 look OK, 3 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_UpdateConfigParamWithScope Error 0.08 test_global_settings.py
test_01_snapshot_usage Error 6.26 test_usage.py
test_02_unsecure_vm_migration Error 497.36 test_vm_life_cycle.py

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 11107

@GutoVeronezi
Copy link
Contributor

GutoVeronezi commented Dec 3, 2024

@DaanHoogland, before merging this one, please check #8605.

This PR solves the same thing that #8605, but apparently that solution was ignored.

@DaanHoogland
Copy link
Contributor

@DaanHoogland, before merging this one, please check #8605.

This PR solves the same thing that #8605, but apparently that solution was ignored.

ok @GutoVeronezi (and @harikrishna-patnala) , please discuss whether to merge or choose one of the two?

As far as I can see the main difference is whether the configuration is done in a Configitem or in a separate table, is that right?

@blueorangutan
Copy link

[SF] Trillian test result (tid-11833)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 52018 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9659-t11833-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_secure_vm_migration Error 134.12 test_vm_life_cycle.py
test_01_secure_vm_migration Error 134.12 test_vm_life_cycle.py

@harikrishna-patnala
Copy link
Contributor Author

@DaanHoogland, before merging this one, please check #8605.
This PR solves the same thing that #8605, but apparently that solution was ignored.

ok @GutoVeronezi (and @harikrishna-patnala) , please discuss whether to merge or choose one of the two?

As far as I can see the main difference is whether the configuration is done in a Configitem or in a separate table, is that right?

Thanks @GutoVeronezi for bringing this up. I wasn't aware of your earlier PR, so I independently worked on a solution to address this issue, which is now available here.
Yes @DaanHoogland both the solutions aim to resolve the same problem, though with slightly different approaches one with global setting and one with database table. I feel the current PR offers a more user-friendly experience and is already accessible through the UI. Additionally, it has gone through the development cycle, including reviews, testing, and validation, and is now ready to merge.
That said, @GutoVeronezi I want to ensure that any valuable aspects of your earlier work are not overlooked. If there are specific elements from your solution that you feel should be incorporated here, I'd be happy to discuss.

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12000

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@harikrishna-patnala harikrishna-patnala merged commit 9bc283e into apache:4.20 Jan 7, 2025
26 checks passed
@harikrishna-patnala harikrishna-patnala deleted the granularCommandTimeout branch January 7, 2025 11:36
DaanHoogland added a commit that referenced this pull request Jan 8, 2025
* 4.20:
  merge errors fixed
  Restrict the migration of volumes attached to VMs in Starting state (#9725)
  server, plugin: enhance storage stats for IOPS (#10034)
  Introducing granular command timeouts global setting (#9659)
  Improve logging to include more identifiable information (#9873)
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jan 10, 2025
* Introducing granular command timeouts global setting

* fix marvin tests

* Fixed log messages

* some more log message fix

* Fix empty value setting

* Converted the global setting to non-dynamic

* set wait on command only when granular wait is defined. This is to keep the backward compatibility

* Improve error logging
@Pearl1594 Pearl1594 moved this to Done in ACS 4.20.1 Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants