OCPBUGS-8277: Use a different internal IP for apiserver connectivity#1478
OCPBUGS-8277: Use a different internal IP for apiserver connectivity#1478openshift-merge-robot merged 2 commits intoopenshift:mainfrom
Conversation
|
@pacevedom: This pull request references Jira Issue OCPBUGS-8277, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/jira refresh |
1 similar comment
|
/jira refresh |
|
@pacevedom: An error was encountered updating to the POST state for bug OCPBUGS-8277 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details. Full error message.
Error marking step #27447694 finished: root cause: Tried to update an entity that does not exist.: request failed. Please analyze the request body for more details. Status code: 400:
Please contact an administrator to resolve this issue, then request a bug refresh with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@pacevedom: This pull request references Jira Issue OCPBUGS-8277, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
pkg/config/config.go
Outdated
There was a problem hiding this comment.
I'm afraid I don't understand what this section is doing now. Can you give an example, maybe using the default service network settings?
There was a problem hiding this comment.
What it does is to get the next immediate subnet from the service CIDR and use that IP.
Given that we need a non-service-CIDR IP to setup for the apiserver endpoint this is the most trivial approach I could think of.
This is also why both parameters (the actual address and the lo interface getting it) are now configurable, since this is an additional IP and there might be cases where we need a different one because of collisions.
There was a problem hiding this comment.
Could you expand the comment in the code to explain all of that? The text that is there now might have been clear to someone who understood how we were already choosing an IP, but was not enough for me to understand what was going on.
There was a problem hiding this comment.
Ideally, we don't want to assign k8s service IP address to any host interface, because that could cause unexpected problems such as the issue described here: #1478 (comment)
however, it seems that we have to in order to fix the certificate issue: https://issues.redhat.com/browse/OCPBUGS-7442
Given the above, the IP from next service cidr is used in this PR and assigned to lo device.
|
/cc @zshi-redhat |
pkg/config/config.go
Outdated
There was a problem hiding this comment.
Why do we want to make the user make this choice? Is this something we can figure out on our own?
There was a problem hiding this comment.
Not really, this is a bit complex. You have all these possibilities:
- Custom AdvertiseAddress + SkipInterfaceConfiguration=true: This means the api server is either in a different node or is using an already configured interface.
- Custom AdvertiseAddress + SkipInterfaceConfiguration=false: This is deliberately ignoring the default next_subnet_after_service_cidr ip, as there might be collisions with that range or simply put it is required to have a different subnet. It does configure the lo interface with the ip.
- No AdvertiseAddress + SkipInterfaceConfiguration=true: Same as having a custom advertise address, there might be an interface already configured with the first ip from next_subnet_after_service_cidr.
- No AdvertiseAddress + SkipInterfaceConfiguration=false: Default everything, it will configure the first valid ip from next_subnet_after_service_cidr in lo interface. This will be the common case.
There was a problem hiding this comment.
Should we make the SkipInterfaceConfiguration an implicit (devel-only) option and assume it is false for single node deployment? because it seems SkipInterfaceConfiguration=true is for multi-node consideration.
There was a problem hiding this comment.
Multi node does require this option, however you could also have an IP range you can use in the node and there is no need to configure it for lo interface. I wonder if this is something that could happen.
Also, it defaults to false: https://github.com/openshift/microshift/pull/1478/files#diff-a3d824da3c42420cd5cbb0a4a2c0e7b5bfddd819652788a0596d195dc6e31fa5R251
There was a problem hiding this comment.
I'm still unclear on why we need the boolean at all. If the user provides a custom address, we can say that they must pre-configure the interface to use with that address, even if they just use the loopback interface. If they do not provide an address, then we will configure the loopback interface with an address we choose.
There was a problem hiding this comment.
So are we ok to say that if not using default then the configuration side of the interface lies on the admin? Having in mind that custom here means "anything different than service-CIDR-next-subnet". If that is ok then I am totally fine to remove the exposed option!
There was a problem hiding this comment.
I removed the option as its simpler this way. Thanks!
There was a problem hiding this comment.
So are we ok to say that if not using default then the configuration side of the interface lies on the admin? Having in mind that custom here means "anything different than service-CIDR-next-subnet". If that is ok then I am totally fine to remove the exposed option!
Yes, I think that's consistent with how we separate responsibilities for the OS setup in other cases.
Due to the support for IPs in certificates introduced in openshift#1298, the apiserver IP is configured as a secondary address in the lo interface. A VIP is configured by ovnk redirecting 10.43.0.1:443 to 10.43.0.1:6443. 6443 is the port where apiserver listens in the host. 10.43.0.1:443 is used by all pods using client go, as it is computed from the env vars we can find in any pod. If a host network pod or any other tool in the host tries to reach the apiserver by using 10.43.0.1:443 the address is not translated to the endpoint, it tries to contact 10.43.0.1:443 which is not the apiserver but the router. This change computes a new IP endpoint in the next available /32 subnet from the service IP to ensure ovnk does not interfere.
|
/cc @pliurh PTAL. Why this matters to ovnk or k8s networking?
|
|
@pacevedom maybe we could take #1442 before this one? It should be easy to add a method to report when the user has provided a value for the IP and update the interface management logic based on that. |
|
@pacevedom: This pull request references Jira Issue OCPBUGS-8277, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
pkg/config/config.go
Outdated
There was a problem hiding this comment.
This implies that one extra IP address needs to be reserved for microshift. Shall we document it?
There was a problem hiding this comment.
Yes, that should be documented.
There was a problem hiding this comment.
Added docs commit.
|
/retest-required |
|
/lgtm I will rebase the config refactoring on top of this. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhellmann, pacevedom The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@pacevedom: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@pacevedom: Jira Issue OCPBUGS-8277: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-8277 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release-4.13 |
|
@pacevedom: new pull request created: #1501 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Which issue(s) this PR addresses:
Closes #1460