Skip to content

[VMware] Nics for routers or guest networks lose their IP on VM reboot #10640

@alexandru-bagu

Description

@alexandru-bagu

problem

I've noticed this issue a while ago and it has been present for at least 2 years since we have Cloudstack in production with VMware as hypervisor backend. Based on the logic I see here this issue should be present with other hypervisors however I did not see an issue with Xcp-ng hypervisor. I may have not tested Xcp well enough though.

Based on the current logic, for external dhcp, Cloudstack tryies to fill in the gaps based on information from the hypervisor vm agent to have as much information for the NICs it is managing however this does not work well in practice especially when there is no external dhcp but only manual IP assignments.

The issue has two sides to it:

  1. VMs that have a NIC created in a shared guest network will "randomly" lose their IPs after a bit of time after a reboot. This is because Cloudstack as I mentioned above, will fetch wrongly the IP from the vm agent which can have one of two side effects. As an example say we have a VM with one NIC in a CS managed VPC network with DHCP, which gives the VM the IP 172.16.0.1, and one NIC in a guest network defined in CS. As an admin, I can configure an IP in Cloudstack for the 2nd NIC however on start/reboot, the external dhcp IP fetcher will attempt to get the IP from the VM guest tools. For VMware the code is getting the default IP which most of the time will be 172.16.0.1 which means we will end up in Cloudstack with two NICs having the same IP. The 2nd NIC will most likely have either "NULL" or 172.16.0.1 with the wrong network subnet.
  2. At the same time, I noticed that VPC routers are affected the same way sometimes, where the NIC for the control plane that is managed by ControlNetworkGuru will sometimes lose their IP. I am assuming that the issue is the same one but I am not certain, as I don't know how to reproduce this bug.

I will follow up with a PR but I do believe some discussion is to be had around this whole external dhcp ip fetcher because it looks like it has some conflicts with manual assigning of ips.

versions

ACS: 4.19.1.0
VMware vSphere 7.0.3

The steps to reproduce the bug

For normal VM bug:

Create a VPC and a network with DHCP.
Create a shared guest network with no DHCP.

Create a VM and allocate one nic in VPC network and one nic in shared network.
Restart the VM and wait 5 minutes (or whatever time period is configured for external dhcp ip fetcher) and you will see that the nic for the shared guest network has the same ip as the dhcp nic.

Example of result:

Image

What to do about it?

No response

Metadata

Metadata

Assignees

Type

No type

Projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions