AI Hosting Comparison | Notion

Together.ai

Business model: Pay per input/output tokens

Dedicated API: Yes (OpenAI-compatible)
Custom model uploads: Yes (limited to their model base)
Rate limit: Tiered, starts at 60 RPM on the free plan
Best for: Plug-and-play access to optimized open models for MVPs and quick experiments

Fireworks.ai

Business model: Pay per input/output tokens

Dedicated API: Yes, OpenAI-compatible
Custom model uploads: Yes, supports up to 8 GPUs on demand
Rate limit: 600 RPM standard; higher limits for Business plans
Best for: High-performance inference, low-latency production workloads

Replicate

Business model: Pay per model execution time (per second)

Dedicated API: Yes, supports async predictions
Custom model uploads: Yes, via Cog
Rate limit: 600 RPM for predictions; 3,000 RPM for other endpoints
Best for: Prototyping public-facing models and exploring community-hosted models

Chutes.ai

Business model: Currently free (beta), with OpenRouter access