...
Just create an instance of InferenceModel with "DeployToLayer == edge"
Joint Inference by edge and cloud
Create three resources:
- An instance of InferenceModel to the cloud
- An instance of InferenceModel to the edge
- A pod running on the edge for serving customer traffic. It contains the logic for deciding whether or not to call cloud model serving API.
Joint inference by device, edge and cloud
We can have three models, with different size and accuracies, running on device, edge, and cloud respectively.