Paperspace Deployments are containers-as-a-service that allow you to run container images and serve machine learning models using a high-performance, low-latency service with a RESTful API.
A deployment configuration or spec represents the desired state of your deployment. The deployment spec can be viewed/updated through the web console or Gradient CLI.
apiVersion: v1 # required, defaults to v1
name: my cool deployment # the name of your deployment. This must be unique within your project.
image: paperspace/app-fixture:0.1.3
enabled: true # Toggle to enable or disable the app
containerRegistry: my-registry # (optional) name of the container registry to use for the app.
command: # command to run on startup
- /bin/sh
- '-c'
- |
while true
do
sleep .01
done
resources:
ports:
- 8000
replicas: 2 # amount of static replicas for your app. We recommend 2 to ensure high availability.
machineType: A100-80G
autoscaling:
enabled: true # toggle for enabling/disabling autoscaling
maxReplicas: 5 # max replicas for autoscaling
metrics:
- metric: cpu
summary: average
value: 50 # 50% cpu utilization across all replicas
- metric: memory
summary: average
value: 22 # 22% memory utilization across all replicas
- metric: requestDuration
summary: average
value: 2 # 2 second request duration for the endpoint
integrations: # List of integrations. Max is 5.
- type: git-lfs # git-lfs integration type
name: falcon # unique name of the integration
path: /models/ # the a unique path on the filesystem to mount the integration. In this spec, model files will be located at /models/falcon
url: https://huggingface.co/tiiuae/falcon-7b # hugging face model url for cloning
- type: s3 # s3 integration type
name: my-s3-integration
path: /some/s3/mount/path
url: s3://my-integration-bucket/
region: us-east-1
accessKeyId: AKIAVWO7J5OJSCWRJ3HJ
secretAccessKey: secret:secretAccessKey # stored as a project or team secret. Naming is arbitrary.
healthChecks: # health checks allow you to define a set of probes to check the health of your app
readiness:
path: /
port: 8000 # healthcheck port.
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
failureThreshold: 5
headers: # (optional) list of headers to pass to the readiness probe
- name: Authorization
value: some-token
liveness:
...
startup:
...
basicAuthKey: secret:my_paperspace_secret # setting up a protected endpoint to restrict access for unauthorized users
env: # container environment variables
- name: some-env
value: some-value
- name: secret-env
value: secret:mySecretEnv # stored as a project or team secret. Naming is arbitrary.
These are actions you can take by updating the deployment spec either through the console or the CLI/SDK.
enabled
to true
(on) or false
(off) and resubmit the specreplicas
to the desired number of replicas and resubmit the specmachineType
and/or image
respectively and resubmit the spec