-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Fix: catch CloudRuntimeException in LibvirtGetVolumeStatsCommandWrapper.java #3949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@slavkap can you rebase your commit on latest head of the 4.13 branch, please? |
When you migrate volume between data stores CS keeps the original UUID and changes the path of the volume. When volume is not found by the given path the agent throws CloudRuntimeException but it's not catched in LibvirtGetVolumeStatsCommandWrapper.java
f72ac63 to
da33574
Compare
DaanHoogland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm
|
@blueorangutan package |
|
@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✖centos6 ✔centos7 ✔debian. JID-1035 |
|
@blueorangutan test |
|
@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-1223)
|
|
For testing, I changed the global setting volume.stats.interval to 10000 to make stats run every 10 seconds. I did not get the exception in the description after migrating storage. A different null pointer was thrown every time stats ran, but as soon as the migration was finished the null pointer was no longer thrown. To reproduce the runtime exception, I just removed the file from where it was in primary storage. This PR would hide the runtime exception which I feel is not the desired behavior here if you would agree @slavkap ? |
|
@Spaceman1984 thanks that you spend time to test it! My bad that I didn't pull the latest changes and the fix is from #3884 where mgmt sends the path rather than the UUID of the volume. Unfortunately I couldn't reproduce the NPE that you hit. |
|
This is the null pointer on the agent while migrating a volume, happens when stats are run. I have also configured nfs in /etc/exports to be 2 separate entries: I will test #3884 merged with this PR for the error on the management server, I haven't seen it yet. |
lgtm was purely code related, ignorant of a functional problem being there
|
I have tested #3884 with the code from this PR, I still get the null pointer while migrating a volume and stats run. I wasn't able to reproduce the error in the description after migrating storage. These are the steps I followed:
Results: The next step I followed was:
Result:
Seems like this PR improves logging of the error when a volume is missing from the storage pool, not when a volume was migrated. |
|
I will change the description, because when I have created this PR I wasn't updated my local repo with the latest changes with #3884. The CloudRuntimeException could be thrown in a lot of cases when volume stats command is executed, but I have tested it only if the volume is missing(migrated between datastores before the change from 3884). You can get it also, if for some reason can't get storage pool. About the NPE if I find time I could check it, but this change doesn't causes it. |
|
@slavkap, I agree the NPE is not caused by this change. |
|
LGTM |
|
@andrijapanicsb pure bug fix, do we include it given blockers are still out? |
Description
When volume or storage pool are not found when LibvirtGetVolumeStatsCommandWrapper.java, CloudRuntimeException is thrown. All methods invoked by this command are throwing in most of the cases CloudRuntimeException and it's not handled in LibvirtGetVolumeStatsCommandWrapper.
For example if volume is not found, CloudStack agent throws this exception and it fills up the agent.log
DB record:
'29', '2', '1', '1', '2', NULL, NULL, 'nfsDisk', 'd644f5c6-bf70-4826-8d2b-d4bc373a9198', '5368709120', '/export/primary', 'efba7e17-dddd-4bbb-8233-347c9a5faaff', '1', '1', NULL, NULL, 'DATADISK', NULL, '3', NULL, NULL, '0', '2020-01-28 08:56:00', NULL, '2020-02-19 12:21:53', NULL, 'Ready', NULL, '21', NULL, NULL, NULL, '1', 'QCOW2', NULL, NULL, NULL, 'thin'
2020-03-06 17:38:06,775 DEBUG [kvm.storage.LibvirtStoragePool] (agentRequest-Handler-1:null) (logid:4858b218) volume: d644f5c6-bf70-4826-8d2b-d4bc373a9198 not exist on storage pool
2020-03-06 17:38:06,776 WARN [cloud.agent.Agent] (agentRequest-Handler-1:null) (logid:4858b218) Caught:
com.cloud.utils.exception.CloudRuntimeException: Can't find volume:d644f5c6-bf70-4826-8d2b-d4bc373a9198
at com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.getPhysicalDisk(LibvirtStoragePool.java:149)
at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtGetVolumeStatsCommandWrapper.getVolumeStat(LibvirtGetVolumeStatsCommandWrapper.java:63)
at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtGetVolumeStatsCommandWrapper.execute(LibvirtGetVolumeStatsCommandWrapper.java:52)
at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtGetVolumeStatsCommandWrapper.execute(LibvirtGetVolumeStatsCommandWrapper.java:40)
at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78)
at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1483)
at com.cloud.agent.Agent.processRequest(Agent.java:640)
at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1053)
at com.cloud.utils.nio.Task.call(Task.java:83)
at com.cloud.utils.nio.Task.call(Task.java:29)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
If we handle the exception we will receive in management logs information like this for example:
DEBUG [c.c.a.m.AgentManagerImpl] (StatsCollector-2:ctx-bc39a30f) (logid:cd207a3e) Details from executing class com.cloud.agent.api.GetVolumeStatsCommand: Can't get vm disk stats: Unable to attach volume a43deb8f-b8b5-4e73-9043-76298cacd27b. Error: Error: volume 'a43deb8f-b8b5-4e73-9043-76298cacd27b' does not exist.
Types of changes
How Has This Been Tested?
tested with volumes on NFS and StorPool
on CloudStack version 4.11.2.0, 4.13.0.0 and master