This example shows how to deploy a classifier trained on the famous iris data set using scikit-learn.
Create a Python file trainer.py.
Use scikit-learn's LogisticRegression to train your model.
Add code to pickle your model (you can use other serialization libraries such as joblib).
Upload it to S3 (boto3 will need access to valid AWS credentials).
import boto3import picklefrom sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LogisticRegression​# Train the model​iris = load_iris()data, labels = iris.data, iris.targettraining_data, test_data, training_labels, test_labels = train_test_split(data, labels)​model = LogisticRegression(solver="lbfgs", multi_class="multinomial")model.fit(training_data, training_labels)accuracy = model.score(test_data, test_labels)print("accuracy: {:.2f}".format(accuracy))​# Upload the model​pickle.dump(model, open("model.pkl", "wb"))s3 = boto3.client("s3")s3.upload_file("model.pkl", "my-bucket", "sklearn/iris-classifier/model.pkl")
Run the script locally:
# Install scikit-learn and boto3$ pip3 install sklearn boto3​# Run the script$ python3 trainer.py
Create another Python file predictor.py.
Add code to load and initialize your pickled model.
Add a prediction function that will accept a sample and return a prediction from your model.
# predictor.py​import pickleimport numpy as np​​model = Nonelabels = ["setosa", "versicolor", "virginica"]​​def init(model_path, metadata):global modelmodel = pickle.load(open(model_path, "rb"))​​def predict(sample, metadata):measurements = [sample["sepal_length"],sample["sepal_width"],sample["petal_length"],sample["petal_width"],]​label_id = model.predict(np.array([measurements]))[0]return labels[label_id]
Create a requirements.txt file to specify the dependencies needed by predictor.py. Cortex will automatically install them into your runtime once you deploy:
# requirements.txt​numpy
You can skip dependencies that are pre-installed to speed up the deployment process. Note that pickle is part of the Python standard library so it doesn't need to be included.
Create a cortex.yaml file and add the configuration below. A deployment specifies a set of resources that are deployed together. An api provides a runtime for inference and makes our predictor.py implementation available as a web service that can serve real-time predictions:
# cortex.yaml​- kind: deploymentname: iris​- kind: apiname: classifierpredictor:path: predictor.pymodel: s3://cortex-examples/sklearn/iris-classifier/model.pkl
cortex deploy takes the declarative configuration from cortex.yaml and creates it on your Cortex cluster:
$ cortex deploy​creating classifier api
Track the status of your deployment using cortex get:
$ cortex get classifier --watch​status up-to-date available requested last update avg latencylive 1 1 1 8s -​endpoint: http://***.amazonaws.com/iris/classifier
The output above indicates that one replica of the API was requested and is available to serve predictions. Cortex will automatically launch more replicas if the load increases and spin down replicas if there is unused capacity.
We can use curl to test our prediction service:
$ curl http://***.amazonaws.com/iris/classifier \-X POST -H "Content-Type: application/json" \-d '{"sepal_length": 5.2, "sepal_width": 3.6, "petal_length": 1.4, "petal_width": 0.3}'​"setosa"
Add a tracker to your cortex.yaml and specify that this is a classification model:
# cortex.yaml​- kind: deploymentname: iris​- kind: apiname: classifierpredictor:path: predictor.pytracker:model_type: classification
Run cortex deploy again to perform a rolling update to your API with the new configuration:
$ cortex deploy​updating classifier api
After making more predictions, your cortex get command will show information about your API's past predictions:
$ cortex get classifier --watch​status up-to-date available requested last update avg latencylive 1 1 1 16s 28ms​class countsetosa 8versicolor 2virginica 4
This model is fairly small but larger models may require more compute resources. You can configure this in your cortex.yaml:
- kind: deploymentname: iris​- kind: apiname: classifierpredictor:path: predictor.pytracker:model_type: classificationcompute:cpu: 0.5mem: 1G
You could also configure GPU compute here if your cluster supports it. Adding compute resources may help reduce your inference latency. Run cortex deploy again to update your API with this configuration:
$ cortex deploy​updating classifier api
Run cortex get again:
$ cortex get classifier --watch​status up-to-date available requested last update avg latencylive 1 1 1 16s 24 ms​class countsetosa 8versicolor 2virginica 4
If you trained another model and want to A/B test it with your previous model, simply add another api to your configuration and specify the new model:
- kind: deploymentname: iris​- kind: apiname: classifierpredictor:path: predictor.pymodel: s3://cortex-examples/sklearn/iris-classifier/model.pkltracker:model_type: classificationcompute:cpu: 0.5mem: 1G​- kind: apiname: another-classifierpredictor:path: predictor.pymodel: s3://cortex-examples/sklearn/iris-classifier/another-model.pkltracker:model_type: classificationcompute:cpu: 0.5mem: 1G
Run cortex deploy to create the new API:
$ cortex deploy​creating another-classifier api
cortex deploy is declarative so the classifier API is unchanged while another-classifier is created:
$ cortex get --watch​api status up-to-date available requested last updateclassifier live 1 1 1 5manother-classifier live 1 1 1 8s
First, implement batch-predictor.py with a predict function that can process an array of samples:
# batch-predictor.py​import pickleimport numpy as np​​model = Nonelabels = ["setosa", "versicolor", "virginica"]​​def init(model_path, metadata):global modelmodel = pickle.load(open(model_path, "rb"))​​def predict(sample, metadata):measurements = [[s["sepal_length"], s["sepal_width"], s["petal_length"], s["petal_width"]] for s in sample]​label_ids = model.predict(np.array(measurements))return [labels[label_id] for label_id in label_ids]
Next, add the api to cortex.yaml:
- kind: deploymentname: iris​- kind: apiname: classifierpredictor:path: predictor.pymodel: s3://cortex-examples/sklearn/iris-classifier/model.pkltracker:model_type: classificationcompute:cpu: 0.5mem: 1G​- kind: apiname: another-classifierpredictor:path: predictor.pymodel: s3://cortex-examples/sklearn/iris-classifier/another-model.pkltracker:model_type: classificationcompute:cpu: 0.5mem: 1G​​- kind: apiname: batch-classifierpredictor:path: batch-predictor.pymodel: s3://cortex-examples/sklearn/iris-classifier/model.pklcompute:cpu: 0.5mem: 1G
Run cortex deploy to create the batch API:
$ cortex deploy​creating batch-classifier api
cortex get should show all three APIs now:
$ cortex get --watch​api status up-to-date available requested last updateclassifier live 1 1 1 10manother-classifier live 1 1 1 5mbatch-classifier live 1 1 1 8s
$ curl http://***.amazonaws.com/iris/classifier \-X POST -H "Content-Type: application/json" \-d '[{"sepal_length": 5.2,"sepal_width": 3.6,"petal_length": 1.5,"petal_width": 0.3},{"sepal_length": 7.1,"sepal_width": 3.3,"petal_length": 4.8,"petal_width": 1.5},{"sepal_length": 6.4,"sepal_width": 3.4,"petal_length": 6.1,"petal_width": 2.6}]'​["setosa","versicolor","virginica"]
Run cortex delete to spin down your API:
$ cortex delete​deleting classifier apideleting another-classifier apideleting batch-classifier api
Running cortex delete will free up cluster resources and allow Cortex to scale down to the minimum number of instances you specified during cluster installation. It will not spin down your cluster.
Any questions? chat with us.