Hosted open-weight models via REST and official Python / TypeScript SDKs

Together AI operates a developer platform for running prominent open-source and vendor-weight models from Together-hosted GPUs. Documentation centers on issuing API keys, installing the Together Python (`together`) or npm (`together-ai`) SDKs, or calling HTTPS endpoints such as `https://api.together.ai/v1/chat/completions` with Bearer authentication. Guides cover streaming chat completions, function calling, structured outputs, model catalog browsing, GPU reservations for steady traffic, and fine-tuning or dedicated cluster offerings published in the broader docs hierarchy.

Category Developer Tools

Pricing Usage-based inference + optional dedicated endpoints / fine-tuning (see Together pricing docs)

Platforms Web / API / Python / Node.js

inferenceapiopen-source-models

Use cases

Shipping chat or agent backends on hosted open-weight checkpoints without owning GPU fleets
Prototyping tool-calling pipelines using Together-supported models listed in docs
Moving from curl-only proofs to typed SDK integrations for retries and telemetry
Exploring Together fine-tuning or dedicated endpoint SKUs once baseline usage patterns are validated
Teaching teams an OpenAI-shaped HTTP interface while swapping model IDs to Together catalog entries

Key features

Official SDK quickstart flows for Python and TypeScript with environment-based API keys (`TOGETHER_API_KEY`)
REST chat-completions endpoints compatible with common OpenAI-style JSON payloads illustrated in Together quickstart docs
Streaming completions demonstrated with `stream=True` in Python and Async iterators in TypeScript samples
Product surface area spanning model catalogs, GPU clusters, LoRA/full fine-tuning, and reservations per Together documentation index
Documented linkage between keys, billing projects (`api.together.ai` console), and per-model selection

Who Is It For?

Backend engineers prototyping LLM-heavy services
ML engineers evaluating hosted inference for open-weight checkpoints
Developer advocates standardizing onboarding material with official SDK snippets

Frequently Asked Questions

Do I need the SDK or can I use curl?: Together documents both: quickstart installs `together` or `together-ai`, but curl examples POST directly to `/v1/chat/completions` with your API key header.
Where are API keys created?: Docs direct users to the Together console API keys workflow under their active project (`api.together.ai/settings/projects/~current/api-keys` per quickstart).
Which model does the introductory sample call?: The quickstart streams `openai/gpt-oss-20b` as of the Together quickstart reproduction in this corpus.

3 Indexed items

Replicate

Developer ToolsPay-per-prediction billing + prepaid credits (see Replicate billing docs)

Replicate is a hosted platform for executing third-party and custom machine-learning models over HTTP without provisioning GPUs yourself. Official documentation explains how to authenticate with API tokens, create asynchronous predictions, stream outputs, retrieve model metadata, wire webhooks for completion events, and optionally deploy or fine-tune checkpoints (for example FLUX image workflows) published to the Replicate catalog.

Groq Cloud API

Developer ToolsFree tier + Pay-as-you-go (published USD rates)

GroqCloud exposes hosted language, speech, and compound workloads through Groq’s HTTP APIs. Documentation highlights compatibility with OpenAI client libraries when you point `base_url` at Groq’s OpenAI-compatible endpoint and supply a Groq API key, alongside first-party Groq SDKs for Python and JavaScript. Pricing pages publish per-model token rates (USD) for on-demand inference.

Hugging Face Hub

Developer ToolsFree community tier + hosted inference billed per vendor/model (see Inference Providers pricing docs)

Hugging Face operates the Hugging Face Hub—a central place to browse and host machine-learning artifacts—alongside Spaces for demo apps and documentation for calling models through HTTP APIs using Hugging Face access tokens. Official docs outline creating accounts and tokens (`Settings → Access Tokens`), downloading files with Git LFS-compatible clients, versioning repositories, and invoking models through Inference Providers / serverless patterns published in huggingface.co documentation rather than stitching together bespoke hosting.

Together AI