Fast inference API with OpenAI-compatible clients for Groq-hosted models
GroqCloud exposes hosted language, speech, and compound workloads through Groq’s HTTP APIs. Documentation highlights compatibility with OpenAI client libraries when you point `base_url` at Groq’s OpenAI-compatible endpoint and supply a Groq API key, alongside first-party Groq SDKs for Python and JavaScript. Pricing pages publish per-model token rates (USD) for on-demand inference.
Use cases
- Migrating existing OpenAI-shaped clients to Groq-hosted models with minimal code changes
- Prototype latency-sensitive assistants where docs emphasize fast inference as a design goal
- Running batch workloads via documented batch APIs where discounted throughput is advertised
- Pairing Groq-hosted speech models with LLM backends for voice-to-text pipelines
- Teaching teams OpenAI-compatible integration patterns without committing to a single model vendor
Key features
- OpenAI-compatible `base_url` at https://api.groq.com/openai/v1 for chat-style calls
- First-party Groq Python and JavaScript libraries documented alongside OpenAI SDK migration paths
- Model catalog spanning multiple open-weight and vendor-hosted options with Playground access
- Published per-million-token pricing tables for large language models on groq.com/pricing
- Responses API documented as an alternative interface with multimodal inputs where supported
Who Is It For?
- Backend engineers integrating hosted inference behind products
- ML platform teams evaluating alternate inference endpoints
- Developers prototyping agents that rely on OpenAI-compatible SDK ergonomics
Frequently Asked Questions
- Can I reuse OpenAI Python or JS clients against Groq?
- Groq documents configuring OpenAI-compatible libraries with its API key and base URL https://api.groq.com/openai/v1; check OpenAI Compatibility notes for unsupported fields.
- Where do I create API keys?
- Groq’s quickstart directs developers to console.groq.com/keys.
- Are all OpenAI parameters supported?
- Groq publishes a list of unsupported OpenAI fields (for example certain completion options); unsupported parameters may return HTTP 400 responses.
Related
Related
3 Indexed items
Together AI
Together AI operates a developer platform for running prominent open-source and vendor-weight models from Together-hosted GPUs. Documentation centers on issuing API keys, installing the Together Python (`together`) or npm (`together-ai`) SDKs, or calling HTTPS endpoints such as `https://api.together.ai/v1/chat/completions` with Bearer authentication. Guides cover streaming chat completions, function calling, structured outputs, model catalog browsing, GPU reservations for steady traffic, and fine-tuning or dedicated cluster offerings published in the broader docs hierarchy.
Replicate
Replicate is a hosted platform for executing third-party and custom machine-learning models over HTTP without provisioning GPUs yourself. Official documentation explains how to authenticate with API tokens, create asynchronous predictions, stream outputs, retrieve model metadata, wire webhooks for completion events, and optionally deploy or fine-tune checkpoints (for example FLUX image workflows) published to the Replicate catalog.
OpenRouter
OpenRouter is a model gateway that exposes many third-party AI models through one OpenAI-compatible API. Teams can compare providers, set routing preferences, and switch models without rewriting core client logic for each vendor SDK. The service publishes per-model pricing and supports pay-as-you-go usage.