Skip to content

Conversation

@Pearl1594
Copy link
Contributor

Description

This PR fixes #5934

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

(1) create isolated network
(2) add new public IP range
(3) acquire public IP in new IP range
(4) create firewall rule. a new NIC is added to the network VR
(5) release the public IP. - if it is the only IP on the NIC, the nic will be removed

Test 1:
Initial output of ip a with default NIC:
image

Acquired an IP in another Public IP range and added a firewall rule: New nic was added
image

On deletion of the firewall rule on the NIC, the nic is removed:
image

Test2:

Add another Public IP range in the same VLAN as the one before
Acquire & add firewall rule for 2 IPs belonging to 2 different Public IP ranges but the same VLAN, such that there are 2 IPs on a NIC (check eth3)
image

Remove the rule from one of the IPs, and verify that the IP is disassociated from the VRs NIC, but the NIC isn't removed:
image

@Pearl1594 Pearl1594 added this to the 4.17.0.0 milestone Feb 11, 2022
@Pearl1594
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2578

@Pearl1594
Copy link
Contributor Author

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

// the check and will try to find it within cluster
if (vmMo == null) {
if (hyperHost instanceof HostMO) {
ClusterMO clusterMo = new ClusterMO(hyperHost.getContext(), ((HostMO) hyperHost).getParentMor());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about datacenter instead of cluster search?

throw new Exception(msg);
}

for (IpAddressTO ip : ips) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this for loop is not needed since we ensure the logic is executed only when it has one item as is not add. Could be refactored to pre-checks and logic to reduce indentation

@weizhouapache
Copy link
Member

weizhouapache commented Feb 11, 2022

code works well. @Pearl1594
could you please retarget to 4.16 ?

@blueorangutan
Copy link

Trillian test result (tid-3298)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 36156 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5985-t3298-vmware-67u3.zip
Smoke tests completed. 90 look OK, 2 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_offline_migrate_VM_and_root_volume Error 101.04 test_vm_life_cycle.py
test_03_live_migrate_VM_with_two_data_disks Error 61.35 test_vm_life_cycle.py
test_06_disk_offering_strictness_false Error 142.03 test_service_offerings.py

@Pearl1594
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@Pearl1594 Pearl1594 force-pushed the fix-vmware-cleanup-nic branch from e4b9313 to a03e88e Compare February 14, 2022 06:41
@Pearl1594 Pearl1594 changed the base branch from main to 4.16 February 14, 2022 06:41
@Pearl1594
Copy link
Contributor Author

@nvazquez I've addressed the comments. @weizhouapache I've rebased the branch off 4.16.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2597

@Pearl1594
Copy link
Contributor Author

@blueorangutan test centos7 vmware-67u3

@Pearl1594 Pearl1594 modified the milestones: 4.17.0.0, 4.16.1.0 Feb 14, 2022
@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@weizhouapache
Copy link
Member

@Pearl1594 @nvazquez @sureshanaparti
this PR needs to be tested/approved.
However 4.16.1.0 is close to be released, let's test/approve/merge this after 4.16.1.0 release, is that ok to you ?

@DaanHoogland
Copy link
Contributor

agree @weizhouapache
in addition I wonder why this apparently generic feature requires such a vmware specific implementation. Isn't a more hypevisor independent implementation required?
I understand that it already works for Xen and KVM, so is there a way to fix it so it works for all, instead of adding ahypervisor specific code?

@Pearl1594
Copy link
Contributor Author

Sure @weizhouapache I am fine with moving to 4.17, I rebased it to 4.16, per a request made earlier : )
@DaanHoogland I am not sure if there is a hypervisor agnostic way to solve this, as each hypervisor has it's own way of handling hot unplugging of NICs.

@Pearl1594 Pearl1594 force-pushed the fix-vmware-cleanup-nic branch from a03e88e to 5447b05 Compare February 14, 2022 10:01
@Pearl1594 Pearl1594 changed the base branch from 4.16 to main February 14, 2022 10:01
@Pearl1594 Pearl1594 force-pushed the fix-vmware-cleanup-nic branch from 5447b05 to 1f27c6a Compare February 14, 2022 10:03
@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2601

@blueorangutan
Copy link

Trillian test result (tid-3322)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 36003 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5985-t3322-vmware-67u3.zip
Smoke tests completed. 90 look OK, 2 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_offline_migrate_VM_and_root_volume Error 94.64 test_vm_life_cycle.py
test_03_live_migrate_VM_with_two_data_disks Error 58.53 test_vm_life_cycle.py
test_06_disk_offering_strictness_false Error 143.41 test_service_offerings.py

@Pearl1594 Pearl1594 marked this pull request as ready for review February 15, 2022 04:26
@weizhouapache
Copy link
Member

weizhouapache commented Feb 15, 2022

@Pearl1594
in my testing, if public ips are used for port forwarding or load balancer, everything seems ok.
If public ips are used for static nat, the interface will be removed even if there is still public ip in use (for example, pf or lb or nat). This issue happens with eth2 as well which is the interface for source nat.

steps to reproduce the issue
(1) acquire public IP in source NAT ip range or additional IP range
(2) enable static nat
(3) disbale static nat

expected result: interface is plugged when enable static nat if needed. interface is unplugged if IP is the last attached IP on the interface, when disable static nat.
actual result: interface is unplugged every time when disable static nat, no matter if it is the last IP.

I think we should use similar code like

            final String lastIp = cmd.getAccessDetail(NetworkElementCommand.NETWORK_PUB_LAST_IP);
            for (final IpAddressTO ip : ips) {
                if (!StringUtils.equalsIgnoreCase(lastIp, "true") || ip.isAdd()) {
                    continue;
                }

and

        if (publicNicInfo.first() == 2) {
            s_logger.debug("Do not remove eth2 in network VR because it is the public NIC of source NAT.");
            continue;
        }

@Pearl1594
Copy link
Contributor Author

Thanks for the review/testing @weizhouapache . I'll fix the issue.

@Pearl1594 Pearl1594 marked this pull request as draft February 15, 2022 09:10
@Pearl1594 Pearl1594 force-pushed the fix-vmware-cleanup-nic branch from b20a106 to 593f1b6 Compare February 16, 2022 06:42
@Pearl1594
Copy link
Contributor Author

@weizhouapache Could you please help verify if the fix addresses your comment. Thanks.

@weizhouapache
Copy link
Member

@Pearl1594
the following code works in my tesing

            final String lastIp = cmd.getAccessDetail(NetworkElementCommand.NETWORK_PUB_LAST_IP);
            if (ips.length == 1 && !ips[0].isAdd()) {
                Pair<VirtualDevice, Integer> nicInfo = getVirtualDevice(vmMo, ips[0]);

                if (nicInfo.first() != null && nicInfo.second() != 2 && lastIp.equalsIgnoreCase("true")) {
                    configureNicDevice(vmMo, nicInfo.first(), VirtualDeviceConfigSpecOperation.REMOVE, "unplugNicCommand");
                }
            }

@Pearl1594 Pearl1594 marked this pull request as ready for review February 17, 2022 11:24
Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did lots of testing, all seem fine.

@weizhouapache
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2642

@nvazquez
Copy link
Contributor

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@weizhouapache
Copy link
Member

@blueorangutan test centos7 vmware-70u3

@blueorangutan
Copy link

@weizhouapache a Trillian-Jenkins test job (centos7 mgmt + vmware-70u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3378)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 37145 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5985-t3378-vmware-67u3.zip
Smoke tests completed. 92 look OK, 0 have errors
Only failed tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link

Trillian test result (tid-3382)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 38052 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5985-t3382-vmware-70u3.zip
Smoke tests completed. 92 look OK, 0 have errors
Only failed tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nvazquez nvazquez merged commit 550c810 into apache:main Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vmware: interface for additional public IP in VR is not unplugged when public IP is released

5 participants