Cohere provider summary
Bifrost routes Cohere models for chat, embeddings, and listing through OpenAI-compatible schemas. Parameter conversion handles structural differences between OpenAI and Cohere formats.
Common Cohere model IDs used in Bifrost routes:
command-r-plus-04-2024(Latest)command-r-2024-10-29(Current)embed-english-v3.0(Embeddings)
| Property | Details |
|---|---|
| Description | Cohere Command models for chat, embeddings, and model listing via Bifrost. |
| Provider route on Bifrost | cohere/<model> |
| Provider doc | Cohere |
| API endpoint for provider | https://api.cohere.ai |
| Supported endpoints | /v1/chat/completions, /v1/responses, /v1/embeddings, /v1/models |
Supported operations
Cohere provides chat, responses, embeddings, and model listing. Text Completions, Image Generation, Speech, Transcriptions, Files, and Batch are not supported by the upstream Cohere API and return UnsupportedOperationError.
| Operation | Non-streaming | Streaming | Upstream endpoint |
|---|---|---|---|
| Chat Completions | Yes | Yes | /v1/chat/completions |
| Responses API | Yes | Yes | /v1/responses |
| Embeddings | Yes | No | /v2/embed |
| List Models | Yes | No | /v1/models |
| Text Completions | No | No | - |
| Image Generation | No | No | - |
| Speech (TTS) | No | No | - |
| Transcriptions (STT) | No | No | - |
| Files | No | No | - |
| Batch | No | No | - |
Supported OpenAI parameters
Quick reference of OpenAI parameters accepted when routing through Cohere via Bifrost.
[ "stream", "temperature", "top_p", "max_tokens", "max_completion_tokens", "stop", "response_format", "tools", "tool_choice", "user", "reasoning" ]
Supported Cohere models
Use the provider prefix cohere/ in Bifrost model routes for deterministic provider targeting.
| Family | Model ID | Bifrost route | Typical usage |
|---|---|---|---|
| Command R+ | command-r-plus-04-2024 | cohere/command-r-plus-04-2024 | Latest flagship model |
| Command R | command-r-2024-10-29 | cohere/command-r-2024-10-29 | Balanced performance |
| Embed English v3 | embed-english-v3.0 | cohere/embed-english-v3.0 | English embeddings |
| Embed English Light v3 | embed-english-light-v3.0 | cohere/embed-english-light-v3.0 | Lightweight embeddings |
Core request mapping
Bifrost normalizes OpenAI-format input to Cohere-format fields.
| OpenAI-style param | Cohere conversion | Notes |
|---|---|---|
| max_completion_tokens | max_tokens | Token limit mapping |
| top_p | p | Nucleus sampling |
| stop | stop_sequences | Stop sequence array |
| temperature | Direct pass-through | |
| reasoning.effort | thinking.type | Thinking budget conversion |
| tools | Restructured format | Function definitions remapped |
| tool_choice | Type mapped | "auto"/"none" preserved, specific tool simplified to "any" |
| response_format | Special conversion | Structured output via tool injection |
API reference by operation
Gateway paths and upstream Cohere endpoints.
1) Chat Completions
Primary request path. Maps to upstream Cohere Chat API. Use extra_params for Cohere-specific fields.
curl -X POST http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "cohere/command-r-plus",
"messages": [{"role": "user", "content": "Hello"}],
"top_k": 40,
"safety_mode": "STRICT",
"log_probs": true,
"strict_tool_choice": false
}'2) Responses API
The Responses API uses the same upstream Cohere /v2/chat endpoint with conversion between OpenAI Responses format and Cohere's format. Gateway: POST /v1/responses. See Responses API in Bifrost docs.
| Parameter | Transformation | Notes |
|---|---|---|
| max_output_tokens | Renamed to max_tokens | |
| temperature | Direct pass-through | |
| top_p | Renamed to p | Cohere nucleus sampling field |
| instructions | Becomes system message | Prepended to messages |
| text.format | Converted to response_format | |
| tools / tool_choice | Same as Chat Completions | Function tools supported |
| reasoning | Mapped to thinking | effort → type; max_tokens → token_budget |
| stop | Via extra_params | Renamed to stop_sequences |
| top_k | Via extra_params | Cohere-specific |
| frequency_penalty, presence_penalty | Via extra_params |
Input & instructions
- String
inputconverts to a user message; arrays convert to messages instructionsbecomes a system message (prepended to the message list)
Tool support
Supported tool type: function. Tool definitions and tool choice use the same conversions as Chat Completions.
Response conversion
text→message|tool_use→function_callinput_tokens/output_tokenspreserved; cached tokens surfaced in token details when present
Streaming
- Event sequence:
message-start→content-start→content-delta→content-end→message-end - Tool call arguments accumulated across chunks; synthetic
output_item.addedevents for text/reasoning - Stable item IDs:
msg_{messageID}_item_{outputIndex}
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/command-r-plus",
"input": "Hello, how are you?",
"top_k": 40,
"stop": [".", "!"]
}'3) Embeddings
Embeddings via Bifrost at POST /v1/embeddings, routed to upstream Cohere /v2/embed. Non-streaming. See Embeddings in Bifrost docs.
| Parameter | Transformation | Notes |
|---|---|---|
| input (text or array) | Converted to texts array | Single string or list of strings |
| dimensions | Renamed to output_dimension | Output vector size |
| input_type | Via extra_params | Required for v3+; defaults to search_document |
| embedding_types | Via extra_params | e.g. float, int8 |
| truncate | Via extra_params | How to handle long inputs (e.g. START) |
| max_tokens | Via extra_params | Max tokens to embed per input |
Critical notes
input_typeis required for Cohere v3+ embedding models (Bifrost defaults tosearch_documentwhen omitted)- Use
embedding_typesto choose return formats (e.g.float,int8) - Cohere-specific options (
truncate,max_tokens) pass via the request body orextra_paramsin the Go SDK
Response conversion
embeddings.float→data[].embeddingmeta.tokens→ usage information- Multiple embedding types are supported when requested
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/embed-english-v3.0",
"input": ["text to embed"],
"dimensions": 1024,
"input_type": "search_query",
"embedding_types": ["float"],
"truncate": "START"
}'4) List Models
Lists available Cohere models through the Bifrost gateway. Upstream: GET /v1/models?page_size={defaultPageSize}. No request body required. See List Models in Bifrost docs.
- Model data is converted to the standard OpenAI-style models list format
- Pagination is cursor-based via
next_page_token - Optional filters
endpointanddefault_onlyare available viaextra_params(Go SDK) or the request body (gateway)
curl http://localhost:8080/v1/models
Dropped parameters
The following OpenAI-style parameters are silently ignored when targeting Cohere through Bifrost:
[ "logit_bias", "logprobs", "top_logprobs", "seed", "parallel_tool_calls", "service_tier" ]
Implementation caveats
| Caveat | Impact | Severity |
|---|---|---|
| Tool choice limitations | Cannot force specific tool; simplified to "any" mode | Medium |
| Strict tool mode dropped | strict: true in tool definitions silently dropped | Low |
| Input type requirement | input_type required for embeddings v3+ models | Medium |
| Cache control stripped | Cache control directives removed during conversion | Low |
Authoritative references
- Bifrost Cohere provider reference: docs.getbifrost.ai/providers/supported-providers/cohere
- Cohere API documentation: docs.cohere.com
- Bifrost provider support overview: docs.getbifrost.ai/providers/supported-providers/overview