Limits
During the open beta, the following limits are in place:
Inference requests per minute (per model)
- @cf/meta/llama-2-7b-chat-int8 - 50 reqs/min
- @cf/openai/whisper - 4000 reqs/min
- @cf/meta/m2m100-1.2b - 4000 reqs/min
- @cf/huggingface/distilbert-sst-2-int8 - 6000 reqs/min
- @cf/microsoft/resnet-50 - 6000 reqs/min
- @cf/baai/bge-base-en-v1.5 - 6000 reqs/min
Note that these limits are estimates, subject to change, and will vary by location while in Open Beta.
Model inferences in local mode using Wrangler will also count towards these limits.