rfc 3.0 architecture documentation#543
rfc 3.0 architecture documentation#543mcastelino merged 1 commit intoclearcontainers:masterfrom egernst:3.0-architecture
Conversation
docs/architecture/architecture.md
Outdated
| time initializing devices of no use for containers. | ||
| - Skipping the guest BIOS/firmware and jumping straight to the Clear Containers kernel. | ||
|
|
||
| #### Agent |
|
Popular Images qa-passed 👍 |
docs/architecture/architecture.md
Outdated
| multiplexes and demultiplexes those commands and streams for all container virtual machines. | ||
| There is only one `cc-proxy` instance running per Clear Containers host. | ||
|
|
||
| On the host, each container process is reaped by a Docker specific (`containerd-shim`) monitoring |
There was a problem hiding this comment.
This is pretty docker specific. We may want to make this more generic so it can cover !docker as well.
There was a problem hiding this comment.
Yes, this needs to be made more generic. At the moment, we always have a reaper between us and the higher layers of the stack. It's either containerd-shim (Docker) or conmon (CRI-O). So we could mention that and just basically replace containerd-shim with container process reaper.
docs/architecture/architecture.md
Outdated
| wants to run within an already running container (`docker exec`). | ||
|
|
||
| The container workload, i.e. the actual OCI bundle rootfs, is exported from the host to | ||
| the virtual machine via a 9pfs virtio mount point. Hyperstart uses this mount point as the root |
docs/architecture/architecture.md
Outdated
| Although Clear Containers can run with any recent QEMU release, containers boot time and memory | ||
| footprint are significantly optimized by using a specific QEMU version called [`qemu-lite`](https://github.com/clearcontainers/qemu/tree/qemu-lite-v2.9.0). | ||
|
|
||
| `qemu-lite` improvements comes through a new `pc-lite` machine type, mostly by: |
There was a problem hiding this comment.
We are using PC machine type for 3.0, I think for release perhaps it makes sense to just remove all of the permutations and just describe what we actually have?
There was a problem hiding this comment.
Yes. Describe what we use by default, and what we support (q35, pc-lite). We should also explain why we go with pc by default.
There was a problem hiding this comment.
took a pass at this @ https://github.com/clearcontainers/runtime/wiki/Clear-Containers-Architecture#hypervisor - not sure how much detail we should go into features we aren't actually using though....
docs/architecture/architecture.md
Outdated
| `qemu-lite` improvements comes through a new `pc-lite` machine type, mostly by: | ||
| - Removing many of the legacy hardware devices support so that the guest kernel does not waste | ||
| time initializing devices of no use for containers. | ||
| - Skipping the guest BIOS/firmware and jumping straight to the Clear Containers kernel. |
There was a problem hiding this comment.
not really the case for 3.0 release, right?
There was a problem hiding this comment.
With 2.7, we're not skipping firmware with the pc machine type while we are with pc-lite. Right now our packages install 2.7 and since we switched to pc, we're indeed no longer skipping the firmware.
With 2.9 we could skip the firmware with the pc machine type, we're tracking why we haven't switched to 2.9 yet: clearcontainers/packaging#28
docs/architecture/architecture.md
Outdated
| virtio I/O serial one). | ||
| 3. Run all the [OCI hooks](https://github.com/opencontainers/runtime-spec/blob/master/config.md#hooks) in the container namespaces, | ||
| as described by the OCI container configuration file. | ||
| 4. **fixme** [Set up the container networking](https://github.com/clearcontainers/runtime/blob/master/documentation/architecture.md#networking). |
There was a problem hiding this comment.
fixme is just for the link location, right @grahamwhaley ?
There was a problem hiding this comment.
Looks like it. I believe we can change it to a 'local reference', so just #networking should resolve to the section in this document.
|
@sameo @jodh-intel @mcastelino - if you have a chance to start taking a look. @jcvenegas perhaps we can reference the proxy protocol in the proxy section? @iphutch -- early heads up and FYI that this is (in early stage of) the pipeline |
docs/architecture/architecture.md
Outdated
| This is an architectural overview of Clear Containers, based on the 3.0 release. | ||
|
|
||
| The [Clear Containers runtime (cc-runtime)](https://github.com/clearcontainers/runtime) | ||
| complies with the [OCI](https://github.com/opencontainers) specifications and thus |
There was a problem hiding this comment.
I'd say "compatible" rather than "complies with".
docs/architecture/architecture.md
Outdated
|
|
||
| The container process is then spawned by an [agent](https://github.com/clearcontainers/agent), | ||
| running as a daemon on the guest operating system. | ||
| Hyperstart opens 2 virtio serial interfaces (Control and I/O) on the guest, and QEMU exposes them |
There was a problem hiding this comment.
- This should say "The agent", not Hyperstart now.
- Small numbers should be spelt out in full, so can you change this to "two".
- I'd say "in the guest" rather than "on the guest" to reinforce that we are talking about what is happening inside the VM.
docs/architecture/architecture.md
Outdated
| `cc-runtime` creates a QEMU/KVM virtual machine for each container the Docker engine creates. | ||
|
|
||
| The container process is then spawned by an [agent](https://github.com/clearcontainers/agent), | ||
| running as a daemon on the guest operating system. |
There was a problem hiding this comment.
I'd add "inside the virtual machine" at the end of this sentence to be clearer here.
docs/architecture/architecture.md
Outdated
| `stderr`, `stdin`) between the guest and the Docker Engine. | ||
|
|
||
| For any given container, both the init process and all potentially executed commands within that | ||
| container, together with their related I/O streams, need to go through 2 virtio serial interfaces |
docs/architecture/architecture.md
Outdated
|
|
||
| #### Hypervisor | ||
|
|
||
| Clear Containers use [KVM](http://www.linux-kvm.org/page/Main_Page)/[QEMU](http://www.qemu-project.org/) to |
There was a problem hiding this comment.
About it was QEMU/KVM so I'd stick with that ordering for consistency.
docs/architecture/architecture.md
Outdated
| For more details about `cc-proxy`'s protocol, theory of operations or debugging tips, please read | ||
| [`cc-proxy` README](https://github.com/clearcontainers/proxy). | ||
|
|
||
| #### Shim |
There was a problem hiding this comment.
@amshinde - can you review this section please?
docs/architecture/architecture.md
Outdated
| Clear Containers utilises the Linux kernel DAX (Direct Access filesystem) | ||
| feature to efficiently map some host side files into the guest VM space. | ||
| In particular, Clear Containers uses the `QEMU` nvdimm feature to provide a | ||
| memory mapped virtual device that can be used to DAX map the mini-OS root |
There was a problem hiding this comment.
This seems to be the first mention of the mini-OS. I think it might be helpful to introduce it when we first start taking about the VM, pointing out that there is a mini-OS and the 9p-mapped docker image (ubuntu, busybox, etc).
docs/architecture/architecture.md
Outdated
| file and device mapping mechanisms: | ||
|
|
||
| - Mapping as a direct access devices allows the guest to directly access | ||
| the memory pages (such as via eXicute In Place (XIP)), bypassing the guest |
There was a problem hiding this comment.
s/the memory pages/the host memory pages/ ?
docs/architecture/architecture.md
Outdated
| host to be demand loaded using page faults, rather than having to make requests | ||
| via a virtualised device (causing expensive VM exits/hypercalls), thus providing | ||
| a speed optimisation. | ||
| - Utilising shmem MAP_SHARED on the host allows the host to efficiently |
There was a problem hiding this comment.
s/shmem MAP_SHARED/MAP_SHARED shared memory/ ?
docs/architecture/architecture.md
Outdated
| Information on the use of nvdimm via QEMU is available in the QEMU source code | ||
| [here](http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/nvdimm.txt;hb=HEAD) | ||
|
|
||
| ### Previous releases |
There was a problem hiding this comment.
Maybe this should be called "Architectural changes by release" or something? We can then document here what changed between 2.1 and 3.0.
There was a problem hiding this comment.
good idea @jodh-intel . I made this update on the wiki: https://github.com/clearcontainers/runtime/wiki/Clear-Containers-Architecture#architectural-changes-by-release
docs/architecture/architecture.md
Outdated
|
|
||
|  | ||
|
|
||
| ** fixme - discuss different QEMUs - lite, q35, pc etc. ** |
There was a problem hiding this comment.
I think for this release we can drop this 'fixme' and just mention qemu-lite.
There was a problem hiding this comment.
We can also mention that we support several QEMU machine types.
docs/architecture/architecture.md
Outdated
| wants to run within an already running container (`docker exec`). | ||
|
|
||
| The container workload, i.e. the actual OCI bundle rootfs, is exported from the host to | ||
| the virtual machine via a 9pfs virtio mount point. Hyperstart uses this mount point as the root |
There was a problem hiding this comment.
Here we should mention that we go for virtio-blk if we find a block based graph driver.
docs/architecture/architecture.md
Outdated
| multiplexes and demultiplexes those commands and streams for all container virtual machines. | ||
| There is only one `cc-proxy` instance running per Clear Containers host. | ||
|
|
||
| On the host, each container process is reaped by a Docker specific (`containerd-shim`) monitoring |
There was a problem hiding this comment.
Yes, this needs to be made more generic. At the moment, we always have a reaper between us and the higher layers of the stack. It's either containerd-shim (Docker) or conmon (CRI-O). So we could mention that and just basically replace containerd-shim with container process reaper.
docs/architecture/architecture.md
Outdated
|
|
||
|  | ||
|
|
||
| ** fixme - discuss different QEMUs - lite, q35, pc etc. ** |
There was a problem hiding this comment.
We can also mention that we support several QEMU machine types.
docs/architecture/architecture.md
Outdated
| Although Clear Containers can run with any recent QEMU release, containers boot time and memory | ||
| footprint are significantly optimized by using a specific QEMU version called [`qemu-lite`](https://github.com/clearcontainers/qemu/tree/qemu-lite-v2.9.0). | ||
|
|
||
| `qemu-lite` improvements comes through a new `pc-lite` machine type, mostly by: |
There was a problem hiding this comment.
Yes. Describe what we use by default, and what we support (q35, pc-lite). We should also explain why we go with pc by default.
docs/architecture/architecture.md
Outdated
| by a set of namespaces (UTS, PID, mount and IPC). Although a pod can hold several containers, | ||
| `cc-runtime` always runs a single container per pod. **fixme** incorrect<-- | ||
|
|
||
| **todo** add details on the agent protocol |
There was a problem hiding this comment.
Also, explain that the agent is based on libcontainers, the runc library, and reference it.
docs/architecture/architecture.md
Outdated
| 1. `cc-runtime` connects to `cc-proxy` and sends it the `attach` command to let it know which pod | ||
| we want to use to run the `exec` command. | ||
| 2. `cc-runtime` sends the allocateIO command to the proxy, for getting the `agent` I/O sequence | ||
| numbers for the `exec` command I/O streams. |
There was a problem hiding this comment.
This has also changed with 3.0. We connect to the proxy and get a token back. We then create a shim with the given token and the shim connects to the proxy.
docs/architecture/architecture.md
Outdated
| agent instance running in the appropriate guest. | ||
| 3. After deleting the last running pod, the `agent` will gracefully shut the virtual machine | ||
| down. | ||
| 4. `cc-runtime` sends the `BYE` command to `cc-proxy`, to let it know that a given virtual |
docs/architecture/architecture.md
Outdated
|
|
||
| - A UNIX, named socket for all `cc-runtime` instances on the host to send commands to `cc-proxy`. | ||
| - One socket pair per `cc-shim` instance, to send stdin and receive stdout and stderr I/O streams. See the | ||
| [cc-shim section](#shim) |
There was a problem hiding this comment.
That has changed as well and is now simplified. There is one single socket (UNIX or TCP) for all runtimes and all shims.
docs/architecture/architecture.md
Outdated
| the `AllocateIO` command to `cc-proxy` to have it request the `agent` to allocate those sequence numbers. | ||
| They will be passed as command line arguments to `cc-shim`, who will then use them to e.g. prepend its stdin | ||
| stream packets with the right sequence number. | ||
| - `Hyper`: This command is used by both `cc-runtime` and `cc-shim` to forward `agent` specific |
There was a problem hiding this comment.
This is well out of date as well for 3.0.
docs/architecture/architecture.md
Outdated
| - `Hyper`: This command is used by both `cc-runtime` and `cc-shim` to forward `agent` specific | ||
| commands. | ||
|
|
||
| For more details about `cc-proxy`'s protocol, theory of operations or debugging tips, please read |
docs/architecture/architecture.md
Outdated
| The [Clear Containers runtime (cc-runtime)](https://github.com/clearcontainers/runtime) | ||
| complies with the [OCI](https://github.com/opencontainers) specifications and thus | ||
| works seamlessly with the [Docker Engine](https://www.docker.com/products/docker-engine) | ||
| pluggable runtime architecture. In other words, one can transparently replace the |
There was a problem hiding this comment.
I would not say replace. We can add another runtime. We cannot satisfy all possible types of containers like priv ones.
docs/architecture/architecture.md
Outdated
| running as a daemon on the guest operating system. | ||
| Hyperstart opens 2 virtio serial interfaces (Control and I/O) on the guest, and QEMU exposes them | ||
| as serial devices on the host. `cc-runtime` uses the control device for sending container | ||
| management commands to the agent while the I/O serial device is used to pass I/O streams (`stdout`, |
There was a problem hiding this comment.
Would be good to point to a section that contain the container management command or maybe the source file where the command protocol is defined.
There was a problem hiding this comment.
Added url to to godoc proxy api protocol, and added link to UML sequence diagram.
docs/architecture/architecture.md
Outdated
| management commands to the agent while the I/O serial device is used to pass I/O streams (`stdout`, | ||
| `stderr`, `stdin`) between the guest and the Docker Engine. | ||
|
|
||
| For any given container, both the init process and all potentially executed commands within that |
There was a problem hiding this comment.
Would be good to document how exec is handled too.
docs/architecture/architecture.md
Outdated
|
|
||
| The `agent` execution unit is the pod. An `agent` pod is a container sandbox defined | ||
| by a set of namespaces (UTS, PID, mount and IPC). Although a pod can hold several containers, | ||
| `cc-runtime` always runs a single container per pod. **fixme** incorrect<-- |
There was a problem hiding this comment.
This is not accurate
cc-runtimealways runs a single container per pod.
docs/architecture/architecture.md
Outdated
|
|
||
| Here we will describe how `cc-runtime` handles the most important OCI commands. | ||
|
|
||
| ##### `create` |
There was a problem hiding this comment.
Would be nice to link to the source code file that implements create
|
Moved this document to https://github.com/clearcontainers/runtime/wiki/Clear-Containers-Architecture to facilitate better collaboration at this point in the review process. |
|
@iphutch - Can you start taking a look at this PR? |
iphutch
left a comment
There was a problem hiding this comment.
Line length and indentation are the biggies here. It's a good looking doc. See suggestions for other changes too.
docs/architecture/architecture.md
Outdated
| is compatible with the [OCI](https://github.com/opencontainers) [runtime specification](https://github.com/opencontainers/runtime-spec) | ||
| and thus works seamlessly with the | ||
| [Docker Engine](https://www.docker.com/products/docker-engine) pluggable runtime | ||
| architecture. It also supports the [Kubernetes Container Runtime Interface (CRI)](https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/apis/cri/v1alpha1/runtime) through the [CRI-O](https://github.com/kubernetes-incubator/cri-o) implementation. In other words, one can transparently select between the |
There was a problem hiding this comment.
- Line length
- Docker, Kubernetes, and CRI-O need to be followed by an * in their first instance in the doc.
- Let's avoid "one". Saying "you" is acceptable when referring to the reader/user:
In other words, you can transparently...
docs/architecture/architecture.md
Outdated
|  | ||
|  | ||
|
|
||
| `cc-runtime` creates a QEMU/KVM virtual machine for each container the Docker |
docs/architecture/architecture.md
Outdated
|
|
||
| The [Clear Containers runtime (cc-runtime)](https://github.com/clearcontainers/runtime) | ||
| is compatible with the [OCI](https://github.com/opencontainers) [runtime specification](https://github.com/opencontainers/runtime-spec) | ||
| and thus works seamlessly with the |
docs/architecture/architecture.md
Outdated
| to the container process on the guest and pass the container `stdout` and `stderr` | ||
| streams back up the stack to CRI-O or Docker via the container process reaper. | ||
| `cc-runtime` creates a `cc-shim` daemon for each container and for each OCI command | ||
| received to run within an already running container (i.e. `docker exec`). |
docs/architecture/architecture.md
Outdated
| `cc-runtime` creates a `cc-shim` daemon for each container and for each OCI command | ||
| received to run within an already running container (i.e. `docker exec`). | ||
|
|
||
| The container workload, i.e. the actual OCI bundle rootfs, is exported from the |
There was a problem hiding this comment.
Avoid latin abbreviations:
The container workload, that is, the actual OCI bundle rootfs, is...
docs/architecture/architecture.md
Outdated
| 2. Get CNI plugin information | ||
|
|
||
| 3. Start the plugin (providing previously created netns) to add a network | ||
| described into /etc/cni/net.d/ directory. At that time, the CNI plugin will |
There was a problem hiding this comment.
use literal text for the directory name
docs/architecture/architecture.md
Outdated
| 5. Start VM inside the netns and start the container | ||
|
|
||
| ## Storage | ||
| Container workloads are shared with the virtualized environment through 9pfs. |
docs/architecture/architecture.md
Outdated
|
|
||
| ## DAX | ||
|
|
||
| Clear Containers utilises the Linux kernel DAX (Direct Access filesystem) |
There was a problem hiding this comment.
Ditto, can we link to DAX info here?:
Direct Access filesystem
docs/architecture/architecture.md
Outdated
| share pages. | ||
|
|
||
| Clear Containers uses the following steps to set up the DAX mappings: | ||
| - QEMU is configured with an nvdimm memory device, with a memory file |
There was a problem hiding this comment.
Use numbers instead of - here. Also, indentation
docs/architecture/architecture.md
Outdated
| More information about DAX can be found in the Linux Kernel | ||
| [documentation](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt) | ||
|
|
||
| Information on the use of nvdimm via QEMU is available in the QEMU source code |
There was a problem hiding this comment.
make "QEMU source code" the link and remove "here"
|
@iphutch -- Thanks! Will start to address these now... |
docs/architecture/architecture.md
Outdated
| - Connects to `cc-proxy` using a token obtained by calling the `cc-proxy` `ConnectShim` command. The token is passed from `cc-runtime` to `cc-shim` when the former spawns the latter and is used to identify the true container process that the shim process will be shadowing or representing. | ||
| - Fragments and encapsulates the standard input stream from the container process reaper into `cc-proxy` stream frames: | ||
| ``` | ||
| 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 |
There was a problem hiding this comment.
@amshinde -- what's the intention for the above line?
There was a problem hiding this comment.
good question , I had asked the same to damien with no answer on that .
This is a direct copy of the frame format from his documented proxy protocol :)
|
|
||
| - Moved from hyperstart to `cc-agent` as an agent inside the VM. | ||
| - Moved from `qemu-lite` to `pc` QEMU machine type. | ||
| - Rewrite of runtime in go, leveraging virtcontainers. |
There was a problem hiding this comment.
I'd add a few items here:
- virtio-blk for block based graph drivers
- New simplified protocol between the shim, proxy and runtime
- KSM throttling
There was a problem hiding this comment.
Agreed, though if we bring up KSM throttlng, we should probably describe its setup in the proxy section, right?
iphutch
left a comment
There was a problem hiding this comment.
You're a champ for getting the bulk of these indentation issues, a few more to go!
docs/architecture/architecture.md
Outdated
| 3. Call the prestart hook (from inside the netns) | ||
|
|
||
| 4. Scan network interfaces inside netns and get the name of the interface | ||
| created by prestart hook |
There was a problem hiding this comment.
Indentation :( From here down. line 525, 538, 543, 592, 595,599, 651, and finally (fittingly) 666.
iphutch
left a comment
There was a problem hiding this comment.
Three things to fix and I'm happy to approve :)
docs/architecture/architecture.md
Outdated
| [default Docker and CRI-O runtime (runc)](https://github.com/opencontainers/runc) | ||
| and `cc-runtime`. | ||
|
|
||
|  |
There was a problem hiding this comment.
This .png isn't included in the PR or has a different name. It will bring a 404 error.
There was a problem hiding this comment.
y'ouch - git blame says that wa sfrom me, but I don't recall adding this, nor can I find this image anywehere... removing now...
docs/architecture/architecture.md
Outdated
| the [Clear Containers QEMU repo](https://github.com/clearcontainers/qemu/tree/qemu-lite-v2.9.0). | ||
| This transition has been delayed until after the release of Clear Containers 3.0 | ||
| due to regressions, as described in [runtime issue 407] | ||
| (https://github.com/clearcontainers/runtime/issues/407). Once support for |
There was a problem hiding this comment.
Breaking up this runtime issue 407 breaks the link's markup. Don't worry about going over 78chars when it's code or markup forcing you over:
due to regressions, as described in runtime issue 407.
Once support for
There was a problem hiding this comment.
I meant to fix all those -- thanks! Done
docs/architecture/architecture.md
Outdated
| ## DAX | ||
|
|
||
| Clear Containers utilises the Linux kernel DAX [(Direct Access filesystem)] | ||
| (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt) |
There was a problem hiding this comment.
Build error. Ignore line length rules for markup OR bring the whole link to 575.
iphutch
left a comment
There was a problem hiding this comment.
Thanks for all the changes! Thumbs up from me.
docs/architecture/architecture.md
Outdated
| ## Agent | ||
|
|
||
| [`cc-agent`](https://github.com/clearcontainers/agent) is a daemon running in the | ||
| guest as a supervisor for managing containers and processes potentially running |
There was a problem hiding this comment.
The word "potentially" is redundant so I'd drop it.
There was a problem hiding this comment.
agreed - was just about to say the same thing.
|
You might like to add in some |
grahamwhaley
left a comment
There was a problem hiding this comment.
Nothing serious, mostly stylistic etc.
docs/architecture/architecture.md
Outdated
|
|
||
|  | ||
|
|
||
| `cc-runtime` creates a QEMU\*/KVM virtual machine for each container the Docker |
There was a problem hiding this comment.
Would it be more accurate to say 'for each container or pod', rather than just 'container' - as for k8s I believe we have one-VM per pod?
docs/architecture/architecture.md
Outdated
| ## Agent | ||
|
|
||
| [`cc-agent`](https://github.com/clearcontainers/agent) is a daemon running in the | ||
| guest as a supervisor for managing containers and processes potentially running |
There was a problem hiding this comment.
agreed - was just about to say the same thing.
docs/architecture/architecture.md
Outdated
| - A control serial channel over which the `cc-agent` sends and receives specific | ||
| commands for controlling and managing pods and containers. Detailed information | ||
| about the commands can be found at [`cc-agent` API](https://github.com/clearcontainers/agent/tree/master/api). | ||
| - An I/O serial channel for passing the container processes output streams (stdout, |
There was a problem hiding this comment.
earlier in the doc we backtick std[in|out|err] to stdin - we should consider doing that consistently through the document.
docs/architecture/architecture.md
Outdated
|
|
||
| 1. Create the networking container namespace on the host, according to the container | ||
| OCI configuration file. We only support networking namespaces for now, but | ||
| will support more of them later. |
There was a problem hiding this comment.
s/of them/namespaces/
There was a problem hiding this comment.
or even s/more of them/other namespaces/
docs/architecture/architecture.md
Outdated
| 2. Run all the [prestart OCI hooks](https://github.com/opencontainers/runtime-spec/blob/master/config.md#hooks) | ||
| in the host namespaces created in step 1, as described by the OCI container | ||
| configuration file. | ||
| 3. [Set up the container networking namespace up](#networking). This is when |
There was a problem hiding this comment.
'Set up the container networking namespace up': Is that 'up' at the end a typo/redundant?
docs/architecture/architecture.md
Outdated
| `cc-proxy` and let it know that they stop monitoring their container process. | ||
|
|
||
| For more details about `cc-proxy`'s protocol, theory of operations or debugging | ||
| tips, please read |
docs/architecture/architecture.md
Outdated
|
|
||
| ## Shim | ||
|
|
||
| A container process reaper, such as Docker's `containerd-shim` or crio's `conmon`, |
|
|
||
| `cc-shim` has an implicit knowledge about which VM agent will handle those streams | ||
| and signals and thus acts as an encapsulation layer between the container process | ||
| reaper and the `cc-agent`. `cc-shim`: |
There was a problem hiding this comment.
Maybe put cc-shim: on a new paragraph, and expand a little to something like 'cc-shim performs the following steps:' or similar.
| - Fragments and encapsulates the standard input stream from the container process | ||
| reaper into `cc-proxy` stream frames: | ||
| ``` | ||
| 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 |
There was a problem hiding this comment.
I should note - if anything then I think it would have been denoting byte lanes - but it is failing at that on many fronts right now. It might be nice to show it as byte lanes though (1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3
There was a problem hiding this comment.
Now that it is adjusted, it is showing bit number, the 1's aren't redundant, they are the decimal MSB for each bit. Turn your head clockwise when you look at it :)
| Users can check to see if the container uses devicemapper block device as its | ||
| rootfs by calling `mount(8)` within the counter. If devicemapper block device | ||
| is used, '/' will be mounted on `/dev/vda`. | ||
|
|
There was a problem hiding this comment.
We could note that devicemapper block device mode can be disabled in the runtime config file if necessary.
|
This is looking really good - a visible improvement on the already rather good CC2.x document - so, kudos to everybody involved in the update. |
| This is an architectural overview of Clear Containers, based on the 3.0 release. | ||
|
|
||
| The [Clear Containers runtime (cc-runtime)](https://github.com/clearcontainers/runtime) | ||
| is compatible with the [OCI](https://github.com/opencontainers) [runtime specification](https://github.com/opencontainers/runtime-spec) |
There was a problem hiding this comment.
Hi @iphutch - does OCI need an asterisk here?
There was a problem hiding this comment.
Yes, we will need an asterisk here. Good catch.
docs/architecture/architecture.md
Outdated
|
|
||
| In the future, Clear Containers plan to move to a 2.9 based version of QEMU, | ||
| available at | ||
| the [Clear Containers QEMU repo](https://github.com/clearcontainers/qemu/tree/qemu-lite-v2.9.0). |
| within those containers. | ||
|
|
||
| The `cc-agent` execution unit is the pod. A `cc-agent` pod is a container sandbox | ||
| defined by a set of namespaces (NS, UTS, IPC and PID). `cc-runtime` can run several |
There was a problem hiding this comment.
s/NS/mount, net, cgroup/ I think @sameo?
docs/architecture/architecture.md
Outdated
| the [Clear Containers QEMU repo](https://github.com/clearcontainers/qemu/tree/qemu-lite-v2.9.0). | ||
| This transition has been delayed until after the release of Clear Containers 3.0 | ||
| due to regressions, as described in [runtime issue 407](https://github.com/clearcontainers/runtime/issues/407). | ||
| Once support for features like hotplug are available in `Q35`, the project will |
There was a problem hiding this comment.
s/Q35/q35/ for consistency with all other mentions of machine types.
docs/architecture/architecture.md
Outdated
| Most users will not need to modify the configuration file. | ||
|
|
||
| The file is well commented and provides a few "knobs" that can modify the | ||
| behaviour of the runtime. |
There was a problem hiding this comment.
I wrote this but on re-reading, I think it would be better to say, "that can be used to modify the behaviour of the runtime."
| 5. Start VM inside the netns and start the container | ||
|
|
||
| ## Storage | ||
| Container workloads are shared with the virtualized environment through [9pfs](https://www.kernel.org/doc/Documentation/filesystems/9p.txt). |
There was a problem hiding this comment.
s/through 9pfs/using the 9pfs filesystem/
docs/architecture/architecture.md
Outdated
| Clear Containers utilises the Linux kernel DAX [(Direct Access filesystem)](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt) | ||
| feature to efficiently map some host side files into the guest VM space. | ||
| In particular, Clear Containers uses the `QEMU` nvdimm feature to provide a | ||
| memory mapped virtual device that can be used to DAX map the virtual machine's |
There was a problem hiding this comment.
I generally try to avoid hyphenation but I'm on board with memory-mapped and host-side.
docs/architecture/architecture.md
Outdated
| feature to efficiently map some host side files into the guest VM space. | ||
| In particular, Clear Containers uses the `QEMU` nvdimm feature to provide a | ||
| memory mapped virtual device that can be used to DAX map the virtual machine's | ||
| root filesystem into the guest space. |
There was a problem hiding this comment.
s/guest space/guest memory address space/
docs/architecture/architecture.md
Outdated
| host to be demand loaded using page faults, rather than having to make requests | ||
| via a virtualised device (causing expensive VM exits/hypercalls), thus providing | ||
| a speed optimisation. | ||
| - Utilising MAP_SHARED shared memory on the host allows the host to efficiently |
docs/architecture/architecture.md
Outdated
|
|
||
| Clear Containers utilises the Linux kernel DAX [(Direct Access filesystem)](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt) | ||
| feature to efficiently map some host side files into the guest VM space. | ||
| In particular, Clear Containers uses the `QEMU` nvdimm feature to provide a |
|
I think we ( @iphutch ) may have mentioned the high-level-overview.png is not referenced anywhere - to note, it was referenced from the proxy README.md in the cc2.x repo, and thus I believe has been un-necessarily carried over (and there is a copy over in the clearcontainers/proxy/docs dir I think). Yes, it can be dropped from here. |
docs/architecture/architecture.md
Outdated
|
|
||
| The `delete` code path differs significantly between having to delete one container | ||
| inside a pod (as is typical in Docker) and having to delete an entire pod (which | ||
| is unique to Kubernetes). In the former case, `cc-runtime` will only send a |
There was a problem hiding this comment.
rather than 'is unique to' I might say 'such as from'.
docs/architecture/architecture.md
Outdated
| The `delete` code path differs significantly between having to delete one container | ||
| inside a pod (as is typical in Docker) and having to delete an entire pod (which | ||
| is unique to Kubernetes). In the former case, `cc-runtime` will only send a | ||
| `SIGKILL` signal to the container process. In the latter case, the whole thing |
There was a problem hiding this comment.
s/whole thing/whole pod/ or something similar - 'all components' or something
further work to do: -upload text used to create UMLs -add UML for other key OCI commands -add crio/conmon/k8s diagram for parity w/ docker -Describe KSM throttling feature in the proxy or its own section once the feature lands. Contributions-by: Graham Whaley <graham.whaley@intel.com> Contributions-by: James O. D. Hunt <james.o.hunt@intel.com> Contributions-by: Samuel Ortiz <sameo@linux.intel.com> Contributions-by: Archana Shinde <archana.m.shinde@intel.com> Contributions-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com> Signed-off-by: Eric Ernst <eric.ernst@intel.com>
|
I recommend adding a Kubernetes overall architecture guide including the shim/proxy/agent and how it works with crio. |
To use the filepath.Join() instead of the simple string append method to form the file path, otherwise it will lose the "/" between the two parts. Fixes clearcontainers#543. Signed-off-by: Fupan Li <lifupan@gmail.com>
This is very much work in progress, and shouldn't be merged in this state. I'm pushing this as a PR so folks can easily get their eyes on the doc/pictures and start to make suggestions edits. This is based on Graham's initial branch with some quick cleanup on my end. Note, the PNGs are exports from our (sorry, internal at the moment) google doc. Once we have settled on images we can export the odp and put on to the repo as well...
Of note, I'd like to see someone write the section on the agent so it is of equivalent or more detail compared to what we had for hyperstart in 2.1. Not sure, but perhaps @amshinde would be well suited for this? Thoughts?
There are a few fixmes as well, which I figure will be easier to discuss during this RFC PR's review process.
Once we have input from folks, I'd recommend doing a giant squash and adding signed offs from the contributors.
Fixes #392