Goal

Sdewan config agent is the controller of Sdewan CRDs. With the config agent, we are able to deploy CNFs. In this page, we have the following terms, let's define them here.

...

To deploy a CNF, user needs to create one CNF deployment and some Sdewan rule CRs. In a Kubernetes namespace, there could be more than one CNF deployment and many Sdewan rule CRs. We use label to correlate one CNF with some Sdewan rule CRs. The Sdewan controller watches Sdewan rule CRs and applies them onto the correlated CNF by calling CNF REST api.

Sdwan Design Principle

There could be multiple tenants/namespaces in a Kubernetes cluster. User may deploy multiple CNFs in any one or more tenants.
The replica of CNF deployment could be more than one for active/backup purpose. We should apply rules for all the pods under CNF deployment. (This release doesn't implement VRRP between pods)
CNF deployment and Sdewan rule CRs can be created/updated/deleted in any order
The Sdewan controller and CNF process could be crash/restart at anytime for some reasons. We need to handle these scenarios
Each Sdewan rule CR has labels to identify the type it belongs to. 3 types are available at this time: basic, app-intent and k8s-service. We extend k8s user role permission so that we can set user permission at type level of Sdewan rule CR
Sdewan rule CR dependencies are checked on creating/updating/deleting. For example, if we create a mwan3_rule CR which uses policy policy-x, but no mwan3_policy CR named policy-x exists. Then we block the request

CNF Deployment

In this section we describe what the CNF deployment should be like, as well as the pod under the deployment.

...

Code Block

language	yml
title	CNF pod

apiVersion: extensions/v1beta1
kind: Deployment
metadata: 
  name: cnf-1
  namespace: default
  labels:
    sdewanPurpose: cnf-1
spec:
  replicas: 1
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        k8s.plugin.opnfv.org/nfn-network: |-
          { "type": "ovn4nfv", "interface": [
            {
              "defaultGateway": "false",
              "interface": "net0",
              "name": "ovn-priv-net"
            },
            {
              "defaultGateway": "false",
              "interface": "net1",
              "name": "ovn-provider-net1"
            },
            {
              "defaultGateway": "false",
              "interface": "net2",
              "name": "ovn-provider-net2"
            }
          ]}
        k8s.v1.cni.cncf.io/networks: '[{ "name": "ovn-networkobj"}]'
    spec:
      containers:
      - command:
        - /bin/sh
        - /tmp/sdewan/entrypoint.sh
        image: integratedcloudnative/openwrt:dev
        name: sdewan
        readinessProbe:
          failureThreshold: 5
          httpGet:
            path: /
            port: 80
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        securityContext:
          privileged: true
          procMount: Default
        volumeMounts:
        - mountPath: /tmp/sdewan
          name: example-sdewan
          readOnly: true
      nodeSelector:
        kubernetes.io/hostname: ubuntu18

Sdewan rule CRs

CRD defines all properties of a resource, but it's not human friendly. So we paste Sdewan rule CR samples instead of CRDs.

...

CR samples of IPSec type(ruoyu):

Sdewan rule CRD Reconcile Logic

As we have many kinds of CRDs, they have almost the same reconcile logic. So we only describe the Mwan3Rule logic.

...

Code Block

language	py

def Mwan3RuleReconciler.Reconcile(req ctrl.Request):
  rule_cr = k8sClient.get(req.NamespacedName)
  cnf_deployment = k8sClient.get_deployment_with_label(rule_cr.labels.sdewanPurpose)
  if rule_cr DeletionTimestamp exists:
    # The CR is being deleted. finalizer on the CR
    if cnf_deployment exists:
      if cnf_deployment is ready:
        for cnf_pod in cnf_deployment:
          err = openwrt_client.delete_rule(cnf_pod_ip, rule_cr)
          if err:
            return "re-queue req"
        rule_cr.finalizer = nil
        return "ok"
      else:
        return "re-queue req"
    else:
      # Just remove finalizer, because no CNF pod exists
      rule_cr.finalizer = nil
      return "ok"
  else:
    # The CR is not being deleted
    if cnf_deployment not exist:
      return "ok"
    else:
      if cnf_deployment not ready:
        # set appliedVersion = nil if cnf_deployment get into not_ready status
        rule_cr.status.appliedVersion = nil
        return "re-queue req"
      else:
        for cnf_pod in cnf_deployment:
          runtime_cr = openwrt_client.get_rule(cnf_pod_ip)
          if runtime_cr != rule_cr:
            err = openwrt_client.add_or_update_rule(cnf_pod_ip, rule_cr)
            if err:
              # err could be caused by dependencies not-applied or other reason
              return "re-queue req"
        # set appliedVerson only when it's applied for all the cnf pods
        rule_cr.finalizer = cnf_finalizer
        rule_cr.status.appliedVersion = rule_cr.resourceVersion
        rule_cr.status.inSync = True
        return "ok"

Unsual Cases

In the following cases, when we say "call CNF api to create/update/delte rule", it means the logic below:

Code Block

language	py

def create_or_update_rule(rule):
  runtime_rule = openwrt_client.get_rule(rule.name)
  if runtime_rule exist:
    if runtime_rule equal rule:
      return
    else:
      openwrt_client.update_rule(rule)
  else:
    openwrt_client.add_rule(rule)

def delete_rule(rule)

...

:
  runtime_rule = openwrt_client.get_rule(rule.name)
  if runtime_rule exist:
    openwrt_client.del_rule(rule)

Case 1:

A deployment(CNF) for a given purpose has two pod replicas (CNF-pod-1 and CNF-pod-2)
Controller is also brought yup.
CNF-pod-1 and CNF-pod-2 are both running with no/default configuration.
MWAN3 policy 1 is added
MWAN3 rule 1 and Rule 2 are added to use MWAN3 Policy1.
Since all controller, CNF-pod-1 and CNF-pod-2 are running, CNF-pod-1 and CNF-pod-2 has configuration MWAN3 Policy1, rule1 and rule2.

Now CNF-pod-1 is stopped.

Info

icon	false

Mwan3Policy controller and Mwan3Rule controller receives a CNF event. Mwan3Policy addes all the related mwan3Policy CRs to reconcile queue. Mwan3Rule addes all the related mwan3Rule CRs to reconcile queue. In the reconicle, it finds that the CNF is not ready, so CR status.appliedVersion is set nil. The CRs are re-queued with time delay.

MWAN3 rule 1 is deleted.

...

Info

icon	false

As every CR has finalizer, rule 1 CR is not deleted from etcd directly. Instead, deleteTimestap field is added to the rule 1 CR. The mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.

MWAN3 rule 3 added
Info
icon false
Mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.
MWAN3 rule 2 is updated.
Info
Mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.

CNF-pod-1 is brought back up after 10 minutes (more than 5 minutes)

Info

icon	false

As pod restart, CNF-pod-1 is running with no/default configuration. In Mwan3Rule reconcile queue, there are 3 CRs: rule1, rule2, rule3. The controller reconcile them, and do the right things. For rule1, controller calls cnf api to delete rule1 from both CNF-pod-1 and CNF-pod-2. Then controller removes finalizer from the rule1 CR, then rule1 CR is deleted from etcd by k8s. For rule2, controller calls cnf api to update rul2 for both CNF-pod-1 and CNF-pod-2. Then set rule2 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true. For rule3, controller calls cnf api to add rul3 for both CNF-pod-1 and CNF-pod-2. Then set rule3 finalizer. Also set rule3 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true.

Ensure that both CNF-pod-1 and CNF-pod-2 have latest configuration.
Info
Once the reconcile finish, both CNF-pod-1 and CNF-pod-2 have latest configuration.

Case 2:

A deployment(CNF) for a given purpose has two pod replicas (CNF-pod-1 and CNF-pod-2)
Controller is also brought yup.
CNF-pod-1 and CNF-pod-2 are both running with no/default configuration.
MWAN3 policy 1 is added
MWAN3 rule 1 and Rule 2 are added to use MWAN3 Policy1.
Since all controller, CNF-pod-1 and CNF-pod-2 are running, CNF-pod-1 and CNF-pod-2 has configuration MWAN3 Policy1, rule1 and rule2.

Now CNF-pod-1 is disconnected, but still running.

Info

We have the API rediness check for CNF pod, when it is disconnected. The CNF-pod-1 becomes not-ready. Mwan3Policy controller and Mwan3Rule controller receives a CNF event. Mwan3Policy addes all the related mwan3Policy CRs to reconcile queue. Mwan3Rule addes all the related mwan3Rule CRs to reconcile queue. In the reconicle, it finds that the CNF is not ready, so CR status.appliedVersion is set nil. The CRs are re-queued with time delay.

MWAN3 rule 1 is deleted.

Info
As every CR has finalizer, rule 1 CR is not deleted from etcd directly. Instead, deleteTimestap field is added to the rule 1 CR. The mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.

MWAN3 rule 3 added
Info
Mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.
MWAN3 rule 2 is updated.
Info
Mwan3Rule controller receives an event. In the reconcile, controller detects the CNF is not ready, so it re-queues the CR with delay.

CNF-pod-1 is brought back up after 10 minutes (more than 5 minutes)

Info

As pod restart, CNF-pod-1 is running with no/default configuration. In Mwan3Rule reconcile queue, there are 3 CRs: rule1, rule2, rule3. The controller reconcile them, and do the right things. For rule1, controller calls cnf api to delete rule1 from both CNF-pod-1 and CNF-pod-2. Then controller removes finalizer from the rule1 CR, then rule1 CR is deleted from etcd by k8s. For rule2, controller calls cnf api to update rul2 for both CNF-pod-1 and CNF-pod-2. Then set rule2 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true. For rule3, controller calls cnf api to add rul3 for both CNF-pod-1 and CNF-pod-2. Then set rule3 finalizer. Also set rule3 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true.

Ensure that both CNF-pod-1 and CNF-pod-2 have latest configuration.
Info
Once the reconcile finish, both CNF-pod-1 and CNF-pod-2 have latest configuration.

Case 3:

A deployment(CNF) for a given purpose has two pod replicas (CNF-pod-1 and CNF-pod-2)
Controller is also brought yup.
CNF-pod-1 and CNF-pod-2 are both running with no/default configuration.
MWAN3 policy 1 is added
MWAN3 rule 1 and Rule 2 are added to use MWAN3 Policy1.
Since all controller, CNF-pod-1 and CNF-pod-2 are running, CNF-pod-1 and CNF-pod-2 has configuration MWAN3 Policy1, rule1 and rule2.
Controller is down for 10 minutes.
MWAN3 rule 1 is deleted.
Info
As controller is down, so no event, no reconcile. rule1 CR is not deleted from etcd because of finalizer. Instead, DeleteTimestamp is added to rule1 CR by k8s
MWAN3 rule 3 added
Info
As controller is down, no event no reconcile. rule3 CR is added to etcd, but not applied onto CNF. rule3 status.appliedVersion and status.appliedTime and status.inSync are nil/default value.

MWAN3 rule 2 is updated.

Info
As controller is down, no event no reconcile. rule2 CR is updated to etcd, but not applied onto CNF. rule3 status.appliedVersion and status.appliedTime and status.inSync are the value before controller goes down.

Controller is up.

Info

Controller reconciles for all CRs. For rule1 CR, controller calls cnf api to delete rule1 from both CNF-pod-1 and CNF-pod-2. Then controller removes finalizer from the rule1 CR, then rule1 CR is deleted from etcd by k8s. For rule2, controller calls cnf api to update rul2 for both CNF-pod-1 and CNF-pod-2. Then set rule2 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true. For rule3, controller calls cnf api to add rul3 for both CNF-pod-1 and CNF-pod-2. Then set rule3 finalizer. Also set rule3 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true.

Ensure that CNF-pod-1 and CNF-pod-2 have latest configuration and there is no duplicate information.
Info
Once the reconcile finish, both CNF-pod-1 and CNF-pod-2 have latest configuration.

Case 4:

A deployment(CNF) for a given purpose has two pod replicas (CNF-pod-1 and CNF-pod-2)
Controller is also brought yup.
CNF-pod-1 and CNF-pod-2 are both running with no/default configuration.
MWAN3 policy 1 is added
MWAN3 rule 1 and Rule 2 are added to use MWAN3 Policy1.
Since all controller, CNF-pod-1 and CNF-pod-2 are running, CNF-pod-1 and CNF-pod-2 has configuration MWAN3 Policy1, rule1 and rule2.
Controller is down for 10 minutes.
After controller goes down, CNF-pod-1 is down
Info
As controller is down, so no event, no reconcile.
MWAN3 rule 1 is deleted.
Info
As controller is down, so no event, no reconcile. rule1 CR is not deleted from etcd because of finalizer. Instead, DeleteTimestamp is added to rule1 CR by k8s
MWAN3 rule 3 added
Info
As controller is down, no event no reconcile. rule3 CR is added to etcd, but not applied onto CNF. rule3 status.appliedVersion and status.appliedTime and status.inSync are nil/default value.

...

For MWAN3 rule 2, we don't make any change
CNF-pod-1 is up
Info
As controller is down, so no event, no reconcile. As pod restart, CNF-pod-1 is running with no/default configuration.

Controller is up.

Info

Controller reconciles for all CRs. For rule1 CR, controller calls cnf api to delete rule1 from both CNF-pod-1 and CNF-pod-2. Then controller removes finalizer from the rule1 CR, then rule1 CR is deleted from etcd by k8s. For rule2, controller calls cnf api to update rul2 for both CNF-pod-1 and CNF-pod-2. Then set rule2 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true. For rule3, controller calls cnf api to add rul3 for both CNF-pod-1 and CNF-pod-2. Then set rule3 finalizer. Also set rule3 status.appliedVersion=<current-version> and status.appliedTime=<now-time> and status.inSync=true.

Ensure that CNF-pod-1 and CNF-pod-2 have latest configuration and there is no duplicate information.
Info
Once the reconcile finish, both CNF-pod-1 and CNF-pod-2 have latest configuration.

Admission Webhook Usage

We use admission webhook to implemention several features.

Prevent creating more than one CNF of the same lable and the same namespace
Validate CR dependencies. For example, mwan3 rule depends on mwan3 policy
Extend user permission to control the operations on rule CRs. For example, we can control that ONAP can't update/delete rule CRs created by platform.

Sdewan rule CR type level Permission Implementation

8s support permission control on namespace level. For example, user1 may be able to create/update/delete one kind of resource(e.g. pod) in namespace ns1, but not namespace ns2. For Sdewan, this can't fit our requirement. We want label level control of Sdewan rule CRs. For example, user_onap can create/update/delete Mwan3Rule CR of label sdewan-bucket-type=app-intent, but not label sdewan-bucket-type=basic.

...

Code Block

language	py

def mwan3rule_webhook_handle_permission(req admission.Request):
  userinfo = req["userInfo]
  mwan3rule_cr = decode(req)
  roles = k8s_client.get_role_from_user(userinfo)
  for role in roles:
    if mwan3rule_cr.labels.sdewan-bucket-type in role.annotation.sdewan-bucket-type-permission.mwan3rules:
      return {"allowd": True}
  return {"allowd": False}

ServiceRule controller (For next release)

We create a controller watches the services created in the cluster. For each service, it creates a FirewallDNAT CR. On controller startup, it makes a syncup to remove unused CRs.

References

...

Versions Compared

Old Version 32

New Version 33

Key

Goal

Sdwan Design Principle

CNF Deployment

Sdewan rule CRs

Sdewan rule CRD Reconcile Logic

Unsual Cases

Admission Webhook Usage

Sdewan rule CR type level Permission Implementation

ServiceRule controller (For next release)

References

Page Comparison

Versions Compared

Old Version 32

New Version 33

Key

Sdwan Design Principle

CNF Deployment

Sdewan rule CRs

Sdewan rule CRD Reconcile Logic

Unsual Cases

Admission Webhook Usage

Sdewan rule CR type level Permission Implementation

ServiceRule controller (For next release)

References