Business model: Pay per input/output tokens
- Dedicated API: Yes (OpenAI-compatible)
- Custom model uploads: Yes (limited to their model base)
- Rate limit: Tiered, starts at 60 RPM on the free plan
- Best for: Plug-and-play access to optimized open models for MVPs and quick experiments
Business model: Pay per input/output tokens
- Dedicated API: Yes, OpenAI-compatible
- Custom model uploads: Yes, supports up to 8 GPUs on demand
- Rate limit: 600 RPM standard; higher limits for Business plans
- Best for: High-performance inference, low-latency production workloads
Replicate
Business model: Pay per model execution time (per second)
- Dedicated API: Yes, supports async predictions
- Custom model uploads: Yes, via Cog
- Rate limit: 600 RPM for predictions; 3,000 RPM for other endpoints
- Best for: Prototyping public-facing models and exploring community-hosted models
Business model: Currently free (beta), with OpenRouter access