Cohere on Bifrost: Models, Endpoints, Setup, and Mappings

Cohere provider summary

Bifrost routes Cohere models for chat, embeddings, and listing through OpenAI-compatible schemas. Parameter conversion handles structural differences between OpenAI and Cohere formats.

Common Cohere model IDs used in Bifrost routes:

command-r-plus-04-2024 (Latest)
command-r-2024-10-29 (Current)
embed-english-v3.0 (Embeddings)

Property	Details
Description	Cohere Command models for chat, embeddings, and model listing via Bifrost.
Provider route on Bifrost	cohere/<model>
Provider doc	Cohere
API endpoint for provider	https://api.cohere.ai
Supported endpoints	/v1/chat/completions, /v1/responses, /v1/embeddings, /v1/models

Supported operations

Cohere provides chat, responses, embeddings, and model listing. Text Completions, Image Generation, Speech, Transcriptions, Files, and Batch are not supported by the upstream Cohere API and return UnsupportedOperationError.

Operation	Non-streaming	Streaming	Upstream endpoint
Chat Completions	Yes	Yes	/v1/chat/completions
Responses API	Yes	Yes	/v1/responses
Embeddings	Yes	No	/v2/embed
List Models	Yes	No	/v1/models
Text Completions	No	No	-
Image Generation	No	No	-
Speech (TTS)	No	No	-
Transcriptions (STT)	No	No	-
Files	No	No	-
Batch	No	No	-

Supported OpenAI parameters

Quick reference of OpenAI parameters accepted when routing through Cohere via Bifrost.

[
  "stream",
  "temperature",
  "top_p",
  "max_tokens",
  "max_completion_tokens",
  "stop",
  "response_format",
  "tools",
  "tool_choice",
  "user",
  "reasoning"
]

Supported Cohere models

Use the provider prefix cohere/ in Bifrost model routes for deterministic provider targeting.

Family	Model ID	Bifrost route	Typical usage
Command R+	command-r-plus-04-2024	cohere/command-r-plus-04-2024	Latest flagship model
Command R	command-r-2024-10-29	cohere/command-r-2024-10-29	Balanced performance
Embed English v3	embed-english-v3.0	cohere/embed-english-v3.0	English embeddings
Embed English Light v3	embed-english-light-v3.0	cohere/embed-english-light-v3.0	Lightweight embeddings

Core request mapping

Bifrost normalizes OpenAI-format input to Cohere-format fields.

OpenAI-style param	Cohere conversion	Notes
max_completion_tokens	max_tokens	Token limit mapping
top_p	p	Nucleus sampling
stop	stop_sequences	Stop sequence array
temperature	Direct pass-through
reasoning.effort	thinking.type	Thinking budget conversion
tools	Restructured format	Function definitions remapped
tool_choice	Type mapped	"auto"/"none" preserved, specific tool simplified to "any"
response_format	Special conversion	Structured output via tool injection

API reference by operation

Gateway paths and upstream Cohere endpoints.

1) Chat Completions

Primary request path. Maps to upstream Cohere Chat API. Use extra_params for Cohere-specific fields.

curl -X POST http://localhost:8080/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "cohere/command-r-plus",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40,
    "safety_mode": "STRICT",
    "log_probs": true,
    "strict_tool_choice": false
  }'

2) Responses API

The Responses API uses the same upstream Cohere /v2/chat endpoint with conversion between OpenAI Responses format and Cohere's format. Gateway: POST /v1/responses. See Responses API in Bifrost docs.

Parameter	Transformation	Notes
max_output_tokens	Renamed to max_tokens
temperature	Direct pass-through
top_p	Renamed to p	Cohere nucleus sampling field
instructions	Becomes system message	Prepended to messages
text.format	Converted to response_format
tools / tool_choice	Same as Chat Completions	Function tools supported
reasoning	Mapped to thinking	effort → type; max_tokens → token_budget
stop	Via extra_params	Renamed to stop_sequences
top_k	Via extra_params	Cohere-specific
frequency_penalty, presence_penalty	Via extra_params

Input & instructions

String input converts to a user message; arrays convert to messages
instructions becomes a system message (prepended to the message list)

Tool support

Supported tool type: function. Tool definitions and tool choice use the same conversions as Chat Completions.

Response conversion

text → message | tool_use → function_call
input_tokens / output_tokens preserved; cached tokens surfaced in token details when present

Streaming

Event sequence: message-start → content-start → content-delta → content-end → message-end
Tool call arguments accumulated across chunks; synthetic output_item.added events for text/reasoning
Stable item IDs: msg_{messageID}_item_{outputIndex}

curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/command-r-plus",
    "input": "Hello, how are you?",
    "top_k": 40,
    "stop": [".", "!"]
  }'

3) Embeddings

Embeddings via Bifrost at POST /v1/embeddings, routed to upstream Cohere /v2/embed. Non-streaming. See Embeddings in Bifrost docs.

Parameter	Transformation	Notes
input (text or array)	Converted to texts array	Single string or list of strings
dimensions	Renamed to output_dimension	Output vector size
input_type	Via extra_params	Required for v3+; defaults to search_document
embedding_types	Via extra_params	e.g. float, int8
truncate	Via extra_params	How to handle long inputs (e.g. START)
max_tokens	Via extra_params	Max tokens to embed per input

Critical notes

input_type is required for Cohere v3+ embedding models (Bifrost defaults to search_document when omitted)
Use embedding_types to choose return formats (e.g. float, int8)
Cohere-specific options (truncate, max_tokens) pass via the request body or extra_params in the Go SDK

Response conversion

embeddings.float → data[].embedding
meta.tokens → usage information
Multiple embedding types are supported when requested

curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/embed-english-v3.0",
    "input": ["text to embed"],
    "dimensions": 1024,
    "input_type": "search_query",
    "embedding_types": ["float"],
    "truncate": "START"
  }'

4) List Models

Lists available Cohere models through the Bifrost gateway. Upstream: GET /v1/models?page_size={defaultPageSize}. No request body required. See List Models in Bifrost docs.

Model data is converted to the standard OpenAI-style models list format
Pagination is cursor-based via next_page_token
Optional filters endpoint and default_only are available via extra_params (Go SDK) or the request body (gateway)

curl http://localhost:8080/v1/models

Dropped parameters

The following OpenAI-style parameters are silently ignored when targeting Cohere through Bifrost:

[
  "logit_bias",
  "logprobs",
  "top_logprobs",
  "seed",
  "parallel_tool_calls",
  "service_tier"
]

Implementation caveats

Caveat	Impact	Severity
Tool choice limitations	Cannot force specific tool; simplified to "any" mode	Medium
Strict tool mode dropped	strict: true in tool definitions silently dropped	Low
Input type requirement	input_type required for embeddings v3+ models	Medium
Cache control stripped	Cache control directives removed during conversion	Low