Table of Contents | ||
---|---|---|
|
ICN BP family intends to address deployment of workloads in a large number of edges and also in public clouds using K8S as resource orchestrator in each site and ONAP-K8S as service level orchestrator (across sites). ICN also intends to integrate infrastructure orchestration which is needed to bring up a site using bare-metal servers. Infrastructure orchestration, which is the focus of this page, needs to ensure that the infrastructure software required on edge servers is installed on per-site basis, but controlled from a central dashboard. Infrastructure orchestration is expected to do the following:
...
This document break downs the hardware requirements, software ingredient, Testing and benchmarking for the R2 and R3 release for and provides overall picture toward blue print effect in Edge use cases.
Goals
- Generic: Infrastructure Orchestration shall be as generic. Even though this work is being done on behalf of one BP (MICN), infrastructure orchestration shall be common across all BPs in the ICN family. Also, it shall be possible to use this component in other BPs outside of ICN family.
- Leverage open source projects:
- Leverage cluster-API for infra-global-controller. Identify gaps and provide fixed and also provide UI/CLI for good user experience.
- Leverage Ironic and metal3 for infra-local-controller to do bare-metal provisioning. Identify any gaps to make it work with Cluster-API.
- Leverage KuD in infra-local-controller to do Kubernetes installation. Identify any gaps and fix them.
- Figure out ways to use the bootstrap machine also as workload machine (Not in scope for Akraino-R2)
- Flexible and Extensible :
- Adding any new package in future shall be a simple addition.
- Interaction with workload orchestrator shall not be limited to K8S. Shall be able to talk to any workload orchestrator.
- Data Model driven:
- Follow Custom Resource Definition(CRD) models as much as possible.
- Security:
- Infra-global and infra-local controller may have privileged access to secrets, keys etc.. Shall ensure to protect them by putting them in HW RoT or at least ensure that they are not visible in clear in HDD/SSDs.
- Redundancy: Infra-global controller shall be redundant, especially, if it used to manage multiple sites.
- Performance:
- Shall be able to complete the first time installation or patching across multiple servers in a site in < 10 minutes for 10 server site. (May need to ensure that jobs are done in parallel - Multi-threading of infra-local-controller).
- Shall be able to complete the patching across sites shall be done in <10 minutes for 100 sites.
Architecture:
Blocks and Modules
All the green items are existing open source projects. If they require any enhancements, it is best done in the upstream community.
...
- infra-global-controller-K8S : This is the K8S cluster where infra-global-controller related containers are run.
- infra-local-controller-K8S: This is the K8S cluster where the infra-local-controller related containers are run.
- application-K8S : These are K8S clusters where application workloads are run.
Infra-local-controller:
"infra-local-controller" is expected to run in bootstrap machine of each location. Bootstrap is the one which installs the required software in compute nodes used for future workloads. Just an example, say a location has 10 servers. 1 server can be used as bootstrap machine and all other 9 servers can be used compute nodes for running workloads. Bootstrap machine is not only installs all required software in the compute nodes, but also is expected to patch and update compute nodes with newer patched versions of the software.
...
- Select a machine in the location for bootstrapping.
- Install Linux OS
- Install Kubernetes on this machine using Kubeadm or any of your favorite tool
- Upload all binary packages, Linux OSes to be installed in compute nodes using for applications.
- Upload site specific information - Compute nodes, their roles etc...
- Once Linux get installed, Via Kuberctl to BPA (via CR), make BPA install the binary packages (such as Kubelet, docker, kubectl, kubenetes API server for application-K8S)
- Via Kuberctl to BPA, get hold of kubeconfig of application-K8S
- Using this kubeconfig, via kubectl to application-K8S, install the packages that can be done via kubectl (such as Multus, OVN Controllers, Virtlet etc...)
- Make a USB bootable disk for administrators to use in real deployments.
- Make a VM image for administrators to use in real deployments.
Binary Provisioning Agent (BPA)
BPA job is to install all packages that can't be installed using kubectl to application-K8S. Hence, BPA is normally used right after compute nodes get installed with Linux operating system, before installing kubernetes based packages. BPA is also an implementation of CRD controller of infra-local-controller-k8s. We expect to have following CRs:
...
- KuD that installs basic packages via Kubespray and packages that are not containerized. BPA can inherit this code.
- KuD that acts as private docker hub repository. BPA can inherit this code.
- KuD that builds the packages from the source code - this needs to be done outside of BPA and binary packages and container packages that result from these are expected to be part of USB bootable disk.
- KuD that brings containerized packages : This needs to be taken care as a script on top of infra-local-controller.
- CSM (Certificate and Secret management) can be used ASIS. Integration with CSM can be for Akraino-R2 and not for interim release
Infra-global-controller:
There could be multiple edges that need to be brought up. Administrator going to each location, using infra-local-controller to bring up application-K8S cluster in compute nodes of location is not scalable. "infra-global-controller" is expected to provide centralized software provisioning and configuration system. It provides one single-pane-of-glass for administrating the edge locations with respect to infrastructure. Administration involves
...
- ISTIO and Envoy (for internal communication as well as for external communication)
- Store Citadel private keys using CSM.
- Store secrets using SMS of CSM.
Admin user experience:
Assuming that infra-global-controller is brought up with all its micro-services, following steps are expected to be taken up to provision sites/edges.
...
Following sections describe the components of infra-global-controller.
Provisioning Controller:
It has following functions
...
- Site registration code can be borrowed from the ONAP K8S plugin service.
- New CRD controller is expected to be created with following CRs
- Site registration related CRs.
- Compute inventory related CRs.
- Site install trigger related CRs.
- Expected to provide APIs
- For uploading binary packages
- For uploading containerized packages
- For uploading OS images
- Each package, OS image or containerized package is supposed to have right meta data information for identification at later time.
Binary Provisioning Manager (BPM)
It has following functions
...
Collection of KubeConfig of application-K8S : This functionality gets the KubeConfig of application-K8S from BPA. This gets stored in the database table that is specific to site.
K8S Provisioning Manager (KPM)
KPM is used to install containerized packages on application-K8S. KPM looks at all the relevant helm charts and instantiates them by talking to application-K8S.
...
Code can be borrowed from the ONAP Multi-Cloud K8S plugin service which does similar functionality.
Design Details(WIP)
Note : ZTP (Zero Touch Provisioning) term is used in the BP presentation. This represents both infra-local-controller and infra-global-controller.
infra-local-controller
As shown in the above figure, the infra local controller is itself a Bootstrap K8s cluster, that brings up the compute k8s cluster in the edge location. Infra-local controller has BPA, Metal3, Baremetal operator(Ironic). This section explains the details of it.
Metal3 & Ironic:
This subsection is referred from https://github.com/metal3-io/metal3-docs/blob/master/design/nodes-machines-and-hosts.md
Baremetal operator provides hardware provisioning of compute nodes by using the kubernetes API. The Baremetal operator defines a CRD BaremetalHost Object represents a physical server, it represents several hardware inventories. Ironic is responsible for provisioning the physical servers, and the Baremetal Operator is for responsible for wrapping the Ironic and represents them as CRD object.
BPA (Define CRD, example CRs, RESTful API)
KuD Changes (Describe how KuD works today and what specific changes would be required)
Metal3 & Ironic
Sequence Diagrams involving all of above + CSM + Logging + Monitoring stuff
Infra-global-controller
PC (Define CRD, Restful API and the example CRs and example API requests)
BPM
KPM
Cluster-API
Global ZTP:
Global ZTP system is used for Infrastructure provisioning and configuration in ICN family. It is subdivided into 3 deployments Cluster-API, KuD and ONAP on K8s.
Cluster-API & Baremetal Operator
One of the major challenges to cloud admin managing multiple clusters in different edge location is coordinate control plane of each cluster configuration remotely, managing patches and updates/upgrades across multiple machines. Cluster-API provides declarative APIs to represent clusters and machines inside a cluster. Cluster-API provides the abstraction for various common logic that can be seen in various cluster provider such as GKE, AWS, Vsphere. Cluster-API consolidated all those logic provide abstractions for all those logic functions such as grouping machines for the upgrade, autoscaling mechanism.
...
Cluster-API provider with Baremetal operator is used to provision physical server, and initiate the kubernetes cluster with user configuration
KuD
Kubernetes deployer(KUD) in ONAP can be reused to deploy the K8s App components(as shown in fig. II), NFV Specific components and NFVi SDN controller in the edge cluster. In R2 release KuD will be used to deploy the K8s addon such as Prometheus, Rook, Virlet, OVN, NFD, and Intel device plugins in the edge location(as shown in figure I). In R3 release, KuD will be evolved as "ICN Operator" to install all K8s addons.
ONAP on K8s
One of the Kubernetes clusters with high availability, which is provisioned and configured by Cluster-API will be used to deploy ONAP on K8s. ICN family uses ONAP Operations Manager(OOM) to deploy ONAP installation. OOM provides a set of helm chart to be used to install ONAP on a K8s cluster. ICN family will create OOM installation and automate the ONAP installation once a kubernetes cluster is configured by cluster-API
ONAP Block and Modules:
ONAP will be the Service Orchestration Engine in ICN family and is responsible for the VNF life cycle management, tenant management and Tenant resource quota allocation and managing Resource Orchestration engine(ROE) to schedule VNF workloads with Multi-site scheduler awareness and Hardware Platform abstraction(HPA). Required an Akraino dashboard that sits on the top of ONAP to deploy the VNFs
Kubernetes Block and Modules:
Kubernetes will be the Resource Orchestration Engine in ICN family to manage Network, Storage and Compute resource for the VNF application. ICN family will be using multiple container runtimes as Virtlet, Kata container, Kubevirt and gVisor. Each release supports different container runtimes that are focused on use cases.
...
SDN Controller components: This block is responsible for managing SDN controller and to provide additional features such as Service Function chaining(SFC) and Network Route manager.
Apps/ Use cases:
- SDWAN usecase
- Distributed Analytics as a Service
- EdgeXFoundry use case
- VR 360 streaming
ICN Infrastructure layout
Flows & Sequence Diagrams
- Use Clusterctl command to create the cluster for the cluster-api-provider-baremetal provider. For this step, we required KuD to provide a cluster and run the machine controller and cluster controller
- Users Machine CRD and Cluster CRD in configured to instated 4 clusters as #0, #1, #2, #3
- Automation script for OOM deployment is trigged to deploy ONAP on cluster #0
- KuD addons script in trigger in all edge location to deploy K8s App components, NFV Specific and NFVi SDN controller
- Subscriber or Operator requires to deploy the VNF workload such as SDWAN in Service Orchestration
- ONAP should place the workload in the edge location based on Multi-site scheduling and K8s HPA
Installation demonstration
View file name ICN EMCO vFW Demo.webm height 250
Software components
fComponents | Link | Akraino Release target |
Cluster-API | R2 | |
Cluster-API-Provider-bare metal | R2 | |
Provision stack - Metal3 | R2 | |
Host Operating system | Ubuntu 18.04 | R2 |
Quick Access Technology(QAT) drivers | Intel® C627 Chipset - https://ark.intel.com/content/www/us/en/ark/products/97343/intel-c627-chipset.html | R2 |
NIC drivers | R2 | |
ONAP | Latest release 3.0.1-ONAP - https://github.com/onap/integration/ | R2 |
Workloads |
| R3 |
KUD | R2 | |
Kubespray | R2 | |
K8s | R2 | |
Docker | https://github.com/docker - 18.09 | R2 |
Virtlet | R2 | |
SDN - OVN | R2 | |
OpenvSwitch | https://github.com/openvswitch/ovs - 2.10.1 | R2 |
Ansible | https://github.com/ansible/ansible - 2.7.10 | R2 |
Helm | https://github.com/helm/helm - 2.9.1 | R2 |
Istio | https://github.com/istio/istio - 1.0.3 | R2 |
Kata container | R3 | |
Kubevirt | https://github.com/kubevirt/kubevirt/ - v0.18.0 | R3 |
Collectd | R2 | |
Rook/Ceph | R2 | |
MetalLB | R3 | |
Kube - Prometheus | R2 | |
OpenNESS | Will be updated soon | R3 |
Multi-tenancy | R2 | |
Knative | R3 | |
Device Plugins | https://github.com/intel/intel-device-plugins-for-kubernetes - QAT, SRIOV | R2 |
https://github.com/intel/intel-device-plugins-for-kubernetes - FPGA, GPU | R3 | |
Node Feature Discovery | R2 | |
CNI | https://github.com/coreos/flannel/ - release tag v0.11.0 https://github.com/containernetworking/cni - release tag v0.7.0 https://github.com/containernetworking/plugins - release tag v0.8.1 https://github.com/containernetworking/cni#3rd-party-plugins - Multus v3.3tp, SRIOV CNI v2.0( withSRIOV Network Device plugin) | R2 |
Conformance Test for K8s | R2 |
Gaps(WIP)
Release | Block | Components | Identified Gaps | Initial thought |
---|---|---|---|---|
R2 | ZTP | Cluster-API | The cluster upgrade yet to be support | The definition of "cluster upgrade" and expected behaviour should be documented here. For example cluster upgrade could be kubelet version upgrade. |
No node repair mechanism | Node logs such kubelet logs should be enable in the automation script | |||
No Multi-Master support | Required to confirm from engineers | |||
KuD | Virtlet , Multus, NFD & Istio | Installation script are in ansible and static. Required to be in daemonset | ||
Virtlet & Intel Device plugin | Have to check with Virtlet support with device plugin framework | |||
ONAP | OOM automation | Portal chart is deployed with loadbalancer with floating IP address | ||
Dashboard | Monitoring tool to check the deployment across the multi site and show the metrics/statistics details to the operator | |||
R3 | APP use cases | SDWAN | OpenWRT is potential candidate to configured SDWAN use case. Required more information on it |
Roadmap
August Intermediate release
Timeline | Release | required state of implementation | Expected Result |
---|---|---|---|
Aug 2nd | ICN-v0.1.0 |
|
|
Aug 9th | ICN-v0.1.1 |
|
|
Aug 16th | ICN-v0.2.0 |
|
|
...
Components | required state of implementation | Expected Result |
---|---|---|
ZTP |
| All-in-one ZTP script with cluster-API and Baremetal operator |
ONAP |
| Should be integrated with the above script |
KuD addons |
| Daemonset yaml should be integrated with the above script |
Tenant Manager |
| should be deployed as part of KuD addons |
Dashboard |
| Dashboard run as deployment in ONAP cluster |
App |
| Instantiate 3 workloads from ONAP to show the SFC functionality in Dashboard |
CI |
| End-to-End testing script |
Akraino R3 release
Components | required state of implementation | Expected Result |
---|---|---|
ZTP |
| All-in-one ZTP script with cluster-API and Baremetal operator |
ONAP |
| Should be integrated with the above script |
KuD addons |
| Daemonset yaml should be integrated with the above script |
Dashboard |
| Dashboard run as deployment in ONAP cluster |
App |
| Instantiate 3 workloads from ONAP to show the SFC functionality in Dashboard |
CI |
| End-to-End testing script |
Future releases
Yet to discuss