Bug/disk order bug during ingest #4548

DK101010 · 2020-12-16T11:14:58Z

Description

This PR...

Types of changes

Breaking change (fix or feature that would cause existing functionality to change)
New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Enhancement (improves an existing feature and functionality)
Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Major
Minor

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

Adapt mockito test to check the new behavior

shwstppr · 2020-12-16T11:30:35Z

server/src/main/java/org/apache/cloudstack/vm/UnmanagedVMsManagerImpl.java

            throw new ServerApiException(ApiErrorCode.INTERNAL_ERROR, String.format("No attached disks found for the unmanaged VM: %s", instanceName));
        }
-        final UnmanagedInstanceTO.Disk rootDisk = unmanagedInstance.getDisks().get(0);
+        UnmanagedInstanceTO.Disk rootDisk = unmanagedInstance.getDisks().get(0);


@DK101010 Since disks list for UnmanagedInstance is created by CloudStack only, will it be better to always prepare that list in a way that first disk is always ROOT disk.
Relevant code can be found in VmwareResource class.

@shwstppr ahh ok, yes of course this is a better solution, i will check this.

GabrielBrascher

@DK101010 thanks for investing some time in contributing to the project. I see that it is your first Pull Request, so ... be welcome :-)

I see that the code still will be updated to address @shwstppr; therefore, my review here may not fit after these changes.

I would like to bring some points that can be improved, with these points I would also raise some attention to the lack of Pull Request description. A PR description is a key piece of each contribution, being used as future reference if/when necessary. Can you please enhance the PR description?

server/src/main/java/org/apache/cloudstack/vm/UnmanagedVMsManagerImpl.java

server/src/test/java/org/apache/cloudstack/vm/UnmanagedVMsManagerImplTest.java

DK101010 · 2020-12-16T15:16:10Z

I would like to bring some points that can be improved, with these points I would also raise some attention to the lack of Pull Request description. A PR description is a key piece of each contribution, being used as future reference if/when necessary. Can you please enhance the PR description?

@GabrielBrascher thanks for your welcome :).
Sure, my infos are small, I've added a corresponding issue #4547. Issue Information and PR information have intersections, so I thought it's enough to descripted the problem in the issue. When you say it's a key piece, is it necessary to create issues?

rohityadavcloud · 2020-12-18T05:28:22Z

@blueorangutan package

blueorangutan · 2020-12-18T05:29:00Z

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2020-12-18T06:42:49Z

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2498

rohityadavcloud · 2020-12-18T06:44:13Z

@blueorangutan test

blueorangutan · 2020-12-18T06:45:06Z

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

shwstppr · 2020-12-18T06:48:45Z

...ns/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java

-                    String num = disk.getLabel().replaceAll("\\D", "");
-                    // return 0 if no digits found
-                    return num.isEmpty() ? 0 : Integer.parseInt(num);
+                    return disk.getPosition() + (disk.getControllerUnit() * 100);


@DK101010 In the code above this section we set ControllerUnit only for VirtualIDEController and VirtualSCSIController device type otherwise it remains null. Should we add a null check here?

@shwstppr Good point, But how realistic is it to say that we have a volume with a position and without a controller id.? I talked with my administration and network colleagues and other controller types like NVME and SATA have also a controller id.
Therefore I think about it to add an exception in the else condition? What do you think?

@DK101010 yes, exception seems appropriate

@shwstppr I will add an Exception.

@shwstppr @DK101010 will throwing an exception cause VM to be not ingested?

@rhtyd damn, your are right, that was to quick from my side, the catch block catch all exceptions :(.

Hmm ... is it ok to remove the catch block ?

In my opinion the ingest process should be canceled if not all volumes ingestible. What do you think @rhtyd @shwstppr ?

Yes, as long as that's acceptable

I think IDE and generic SCSI (VirtualSCSIController) will cover all disk controllers. If we are getting any other type of VirtualController as a disk, an exception seems correct to me. However, the error message needs to be a bit more generic there as the same method is used for listing unmanaged isntances

@shwstppr Sure, we use also only IDE, SCSI controller but for me it sounds critical in case of other/new controller to ingest vm's without volumes . I think from user perspektive it is inconsistent to see that the vm is ingested but is unusable.

blueorangutan · 2020-12-18T16:08:10Z

Trillian test result (tid-3354)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 32060 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4548-t3354-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Smoke tests completed. 85 look OK, 1 have error(s)
Only failed tests results shown below:

Test	Result	Time (s)	Test File
test_03_deploy_and_upgrade_kubernetes_cluster	`Failure`	241.39	test_kubernetes_clusters.py

shwstppr

LGTM on changes. If we can add an exception as discussed here, https://github.com/apache/cloudstack/pull/4548/files#r545656572, that will be great

rohityadavcloud · 2020-12-21T11:13:11Z

@blueorangutan package

blueorangutan · 2020-12-21T11:14:20Z

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2020-12-21T11:53:55Z

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2504

rohityadavcloud · 2020-12-21T12:07:33Z

@blueorangutan test

blueorangutan · 2020-12-21T12:08:14Z

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

shwstppr

Looks good to me except the error message which can be more generic

...ns/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java

shwstppr · 2020-12-22T06:33:41Z

...ns/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java

-                    String num = disk.getLabel().replaceAll("\\D", "");
-                    // return 0 if no digits found
-                    return num.isEmpty() ? 0 : Integer.parseInt(num);
+                    return disk.getPosition() + (disk.getControllerUnit() * 100);


I think IDE and generic SCSI (VirtualSCSIController) will cover all disk controllers. If we are getting any other type of VirtualController as a disk, an exception seems correct to me. However, the error message needs to be a bit more generic there as the same method is used for listing unmanaged isntances

…vmware/resource/VmwareResource.java Co-authored-by: Abhishek Kumar <[email protected]>

rohityadavcloud · 2020-12-22T10:48:02Z

...ns/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java

-                                    instanceDisk.setController(DiskControllerType.none.toString());
+                                    String msg = String.format("Controller of disk %s is unsupported", instanceDisk.getLabel());
+                                    s_logger.error(msg);
+                                    throw new Exception(msg);


@DK101010 I think on failure you can decide to fail the entire VM ingestion by (a) returning an empty list and let the user of the getUnmanageInstanceDisks method handle the case (when disk list is empty throw an exception), or (b) in the generic exception handler, instead of Exception throw an UnsupportedOperationException add another handler for the specific exception. cc @shwstppr

@rhtyd @DK101010 same method is used when listUnmanagedINstances API is called so I'm not sure if we would want the complete API to raise an exception when disks for one particular VM is not found. In my opinion, we can log the error here and return the list of disks for the VM as empty. With an empty list of disks VM import with automatically fail with existing check
https://github.com/apache/cloudstack/blob/master/server/src/main/java/org/apache/cloudstack/vm/UnmanagedVMsManagerImpl.java#L942-L944

If we want to go with the above approach of returning empty disks list, we can clear the instanceDisks list variable in the catch block

@shwstppr @rhtyd ok, then I will modify the exception msg and clear the list in the catch block. Thanks you two for help.

@shwstppr ahh sorry, I had not checked that you have already changed the msg :D.

shwstppr · 2020-12-23T12:01:08Z

@blueorangutan package

blueorangutan · 2020-12-23T12:01:39Z

@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2020-12-23T13:14:17Z

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2512

GabrielBrascher

Code LGTM. I did not test it, though.

blueorangutan · 2021-01-14T07:50:38Z

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2541

shwstppr

Needs changes, testing. Order of ROOT disk is coming incorrect

shwstppr · 2021-01-14T08:44:08Z

...ns/hypervisors/vmware/src/main/java/com/cloud/hypervisor/vmware/resource/VmwareResource.java

-                    String num = disk.getLabel().replaceAll("\\D", "");
-                    // return 0 if no digits found
-                    return num.isEmpty() ? 0 : Integer.parseInt(num);
+                    return disk.getPosition() + (disk.getControllerUnit() * 100);


@DK101010 I'm not sure if this formula is giving correct position/order for the disk.
I tried listUnamanagedInstance API for a VM with 2 data disks and results were incorrect with this change

With changes - Root disk is showing up at second position in order

> list unmanagedinstances clusterid=2b35fefc-892a-4393-9ccf-30229c1a999b name=importtest { "count": 1, "unmanagedinstance": [ { "clusterid": "2b35fefc-892a-4393-9ccf-30229c1a999b", "cpucorepersocket": 1, "cpunumber": 1, "cpuspeed": 0, "disk": [ { "capacity": 21474836480, "controller": "lsilogic", "controllerunit": 0, "datastorehost": "10.10.0.16", "datastorename": "4ea48dd0701d3385a5cbb8921ee1d1d4", "datastorepath": "/acs/primary/pr4385-t3314-vmware-65u2/pr4385-t3314-vmware-65u2-esxi-pri2", "datastoretype": "NFS", "id": "35-2000", "imagepath": "[4ea48dd0701d3385a5cbb8921ee1d1d4] importtest/importtest_61.vmdk", "label": "Hard disk 2", "position": 0 }, { "capacity": 2147483648, "controller": "ide", "controllerunit": 0, "datastorehost": "10.10.0.16", "datastorename": "4ea48dd0701d3385a5cbb8921ee1d1d4", "datastorepath": "/acs/primary/pr4385-t3314-vmware-65u2/pr4385-t3314-vmware-65u2-esxi-pri2", "datastoretype": "NFS", "id": "35-3001", "imagepath": "[4ea48dd0701d3385a5cbb8921ee1d1d4] importtest/importtest_62.vmdk", "label": "Hard disk 1", "position": 1 }, { "capacity": 5368709120, "controller": "lsilogic", "controllerunit": 0, "datastorehost": "10.10.0.16", "datastorename": "4ea48dd0701d3385a5cbb8921ee1d1d4", "datastorepath": "/acs/primary/pr4385-t3314-vmware-65u2/pr4385-t3314-vmware-65u2-esxi-pri2", "datastoretype": "NFS", "id": "35-2001", "imagepath": "[4ea48dd0701d3385a5cbb8921ee1d1d4] importtest/importtest_63.vmdk", "label": "Hard disk 3", "position": 1 } ], "hostid": "fd9a2641-ce6e-456e-be78-4b1b63c2b2ac", "memory": 512, "name": "importtest", "nic": [ { "adaptertype": "E1000", "id": "Network adapter 1", "macaddress": "02:00:44:4f:00:12", "networkname": "cloud.guest.2183.200.1-vSwitch1", "vlanid": 2183 } ], "osdisplayname": "CentOS 4/5 or later (64-bit)", "osid": "centos64Guest", "powerstate": "PowerOn" } ] }

Without change - Root disk at first place

> list unmanagedinstances clusterid=2b35fefc-892a-4393-9ccf-30229c1a999b name=importtest { "count": 1, "unmanagedinstance": [ { "clusterid": "2b35fefc-892a-4393-9ccf-30229c1a999b", "cpucorepersocket": 1, "cpunumber": 1, "cpuspeed": 0, "disk": [ { "capacity": 2147483648, "controller": "ide", "controllerunit": 0, "datastorehost": "10.10.0.16", "datastorename": "4ea48dd0701d3385a5cbb8921ee1d1d4", "datastorepath": "/acs/primary/pr4385-t3314-vmware-65u2/pr4385-t3314-vmware-65u2-esxi-pri2", "datastoretype": "NFS", "id": "35-3001", "imagepath": "[4ea48dd0701d3385a5cbb8921ee1d1d4] importtest/importtest_62.vmdk", "label": "Hard disk 1", "position": 1 }, { "capacity": 21474836480, "controller": "lsilogic", "controllerunit": 0, "datastorehost": "10.10.0.16", "datastorename": "4ea48dd0701d3385a5cbb8921ee1d1d4", "datastorepath": "/acs/primary/pr4385-t3314-vmware-65u2/pr4385-t3314-vmware-65u2-esxi-pri2", "datastoretype": "NFS", "id": "35-2000", "imagepath": "[4ea48dd0701d3385a5cbb8921ee1d1d4] importtest/importtest_61.vmdk", "label": "Hard disk 2", "position": 0 }, { "capacity": 5368709120, "controller": "lsilogic", "controllerunit": 0, "datastorehost": "10.10.0.16", "datastorename": "4ea48dd0701d3385a5cbb8921ee1d1d4", "datastorepath": "/acs/primary/pr4385-t3314-vmware-65u2/pr4385-t3314-vmware-65u2-esxi-pri2", "datastoretype": "NFS", "id": "35-2001", "imagepath": "[4ea48dd0701d3385a5cbb8921ee1d1d4] importtest/importtest_63.vmdk", "label": "Hard disk 3", "position": 1 } ], "hostid": "fd9a2641-ce6e-456e-be78-4b1b63c2b2ac", "memory": 512, "name": "importtest", "nic": [ { "adaptertype": "E1000", "id": "Network adapter 1", "macaddress": "02:00:44:4f:00:12", "networkname": "cloud.guest.2183.200.1-vSwitch1", "vlanid": 2183 } ], "osdisplayname": "CentOS 4/5 or later (64-bit)", "osid": "centos64Guest", "powerstate": "PowerOn" } ] }

Hi @shwstppr, the list is correct sorted by controller id and position. Works like I want. Your root disk is on controller Id 0 and position 1 and you have two disk on the same position. I'm a little bit confused. In my mind vmware puts the root disk in every case on controller id 0 and position 0.

Hi @DK101010, I'm not sure why two disks have same position. To get this unmanaged VM. I just created a VM in cloudstack, added two data disks to it and then cloned the vm in vcenter.

@shwstppr ahh ok you have two controller(ide, lsilogic), But in my opinion it makes no sense that you have one disk on one controller on position one. Also should increment the controller id, when you use more the one controller. And for this reason works this feature not correct. I think you found a other bug.

Hi @shwstppr, I talked with my team about this behavior. First, your controller ids are correct. Increments are per controller type. For this reason is my implementation not better as the old one. Perhaps we could check in the sort algorithm if device type == scsi controller, but then it isn't no longer possible to boot from other controller types. It exist no safety way to find out which disk is bootable. The only way to get sure that CS use the right volume is that the user indicates during the ingest process via rest call which volume is the right. Therefore I think we should close this pull request. What do you think?

@DK101010 then is it better to add a new param in importUnamangedInstance API, something like rootdiskid? cc @rhtyd @PaulAngus

@shwstppr yes, for example, but in my mind it should be optional for the user and when not set then works like now via sorting of disk names.

shwstppr · 2021-01-25T09:45:11Z

@DK101010 closed in favour of #4591. Please re-open if needed

DK101010 mentioned this pull request Dec 16, 2020

wrong root disk during the ingest bug #4547

Closed

shwstppr reviewed Dec 16, 2020

View reviewed changes

GabrielBrascher reviewed Dec 16, 2020

View reviewed changes

server/src/main/java/org/apache/cloudstack/vm/UnmanagedVMsManagerImpl.java Outdated Show resolved Hide resolved

server/src/test/java/org/apache/cloudstack/vm/UnmanagedVMsManagerImplTest.java Outdated Show resolved Hide resolved

DK101010 mentioned this pull request Dec 16, 2020

Update Fork NDBSGMS/cloudstack#1

Merged

4 tasks

[apache#4547] adapt collection sort to get the right root disk

e3c8143

DK101010 force-pushed the bug/disk_order_bug_during_ingest branch from 1cf953f to e3c8143 Compare December 17, 2020 12:42

GabrielBrascher linked an issue Dec 17, 2020 that may be closed by this pull request

wrong root disk during the ingest bug #4547

Closed

GabrielBrascher assigned DK101010 Dec 17, 2020

GabrielBrascher added component:vmware type:bug labels Dec 17, 2020

shwstppr reviewed Dec 18, 2020

View reviewed changes

rohityadavcloud requested review from GabrielBrascher and shwstppr December 21, 2020 07:07

shwstppr approved these changes Dec 21, 2020

View reviewed changes

add exception for unknown controller type

44d1462

rohityadavcloud approved these changes Dec 21, 2020

View reviewed changes

shwstppr approved these changes Dec 22, 2020

View reviewed changes

Update plugins/hypervisors/vmware/src/main/java/com/cloud/hypervisor/…

894758f

…vmware/resource/VmwareResource.java Co-authored-by: Abhishek Kumar <[email protected]>

rohityadavcloud reviewed Dec 22, 2020

View reviewed changes

rohityadavcloud requested review from rohityadavcloud and shwstppr December 22, 2020 10:58

clear disk list in case of exception to prevent ingest

8e51f1e

GabrielBrascher approved these changes Dec 31, 2020

View reviewed changes

shwstppr requested changes Jan 14, 2021

View reviewed changes

shwstppr closed this Jan 25, 2021

Bug/disk order bug during ingest #4548

Bug/disk order bug during ingest #4548

Uh oh!

Conversation

DK101010 commented Dec 16, 2020

Description

Types of changes

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GabrielBrascher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DK101010 commented Dec 16, 2020

Uh oh!

rohityadavcloud commented Dec 18, 2020

Uh oh!

blueorangutan commented Dec 18, 2020

Uh oh!

blueorangutan commented Dec 18, 2020

Uh oh!

rohityadavcloud commented Dec 18, 2020

Uh oh!

blueorangutan commented Dec 18, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blueorangutan commented Dec 18, 2020

Uh oh!

shwstppr left a comment

Choose a reason for hiding this comment

Uh oh!

rohityadavcloud commented Dec 21, 2020

Uh oh!

blueorangutan commented Dec 21, 2020

Uh oh!

blueorangutan commented Dec 21, 2020

Uh oh!

rohityadavcloud commented Dec 21, 2020

Uh oh!

blueorangutan commented Dec 21, 2020

Uh oh!

shwstppr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shwstppr Dec 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

shwstppr Dec 22, 2020 •

edited

Loading