Try Bifrost Enterprise free for 14 days.
Request access

[ Provider Guide ]

Cohere Provider on Bifrost

Bifrost supports Cohere's Command chat models, embeddings, and model listing. The integration handles conversion between OpenAI and Cohere formats, enabling seamless routing through a unified gateway.

Cohere provider summary

Bifrost routes Cohere models for chat, embeddings, and listing through OpenAI-compatible schemas. Parameter conversion handles structural differences between OpenAI and Cohere formats.

Common Cohere model IDs used in Bifrost routes:

  • command-r-plus-04-2024 (Latest)
  • command-r-2024-10-29 (Current)
  • embed-english-v3.0 (Embeddings)
PropertyDetails
DescriptionCohere Command models for chat, embeddings, and model listing via Bifrost.
Provider route on Bifrostcohere/<model>
Provider docCohere
API endpoint for providerhttps://api.cohere.ai
Supported endpoints/v1/chat/completions, /v1/responses, /v1/embeddings, /v1/models

Supported operations

Cohere provides chat, responses, embeddings, and model listing. Text Completions, Image Generation, Speech, Transcriptions, Files, and Batch are not supported by the upstream Cohere API and return UnsupportedOperationError.

OperationNon-streamingStreamingUpstream endpoint
Chat CompletionsYesYes/v1/chat/completions
Responses APIYesYes/v1/responses
EmbeddingsYesNo/v2/embed
List ModelsYesNo/v1/models
Text CompletionsNoNo-
Image GenerationNoNo-
Speech (TTS)NoNo-
Transcriptions (STT)NoNo-
FilesNoNo-
BatchNoNo-

Supported OpenAI parameters

Quick reference of OpenAI parameters accepted when routing through Cohere via Bifrost.

[
  "stream",
  "temperature",
  "top_p",
  "max_tokens",
  "max_completion_tokens",
  "stop",
  "response_format",
  "tools",
  "tool_choice",
  "user",
  "reasoning"
]

Supported Cohere models

Use the provider prefix cohere/ in Bifrost model routes for deterministic provider targeting.

FamilyModel IDBifrost routeTypical usage
Command R+command-r-plus-04-2024cohere/command-r-plus-04-2024Latest flagship model
Command Rcommand-r-2024-10-29cohere/command-r-2024-10-29Balanced performance
Embed English v3embed-english-v3.0cohere/embed-english-v3.0English embeddings
Embed English Light v3embed-english-light-v3.0cohere/embed-english-light-v3.0Lightweight embeddings

Core request mapping

Bifrost normalizes OpenAI-format input to Cohere-format fields.

OpenAI-style paramCohere conversionNotes
max_completion_tokensmax_tokensToken limit mapping
top_ppNucleus sampling
stopstop_sequencesStop sequence array
temperatureDirect pass-through
reasoning.effortthinking.typeThinking budget conversion
toolsRestructured formatFunction definitions remapped
tool_choiceType mapped"auto"/"none" preserved, specific tool simplified to "any"
response_formatSpecial conversionStructured output via tool injection

API reference by operation

Gateway paths and upstream Cohere endpoints.

1) Chat Completions

Primary request path. Maps to upstream Cohere Chat API. Use extra_params for Cohere-specific fields.

curl -X POST http://localhost:8080/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "cohere/command-r-plus",
    "messages": [{"role": "user", "content": "Hello"}],
    "top_k": 40,
    "safety_mode": "STRICT",
    "log_probs": true,
    "strict_tool_choice": false
  }'

2) Responses API

The Responses API uses the same upstream Cohere /v2/chat endpoint with conversion between OpenAI Responses format and Cohere's format. Gateway: POST /v1/responses. See Responses API in Bifrost docs.

ParameterTransformationNotes
max_output_tokensRenamed to max_tokens
temperatureDirect pass-through
top_pRenamed to pCohere nucleus sampling field
instructionsBecomes system messagePrepended to messages
text.formatConverted to response_format
tools / tool_choiceSame as Chat CompletionsFunction tools supported
reasoningMapped to thinkingeffort → type; max_tokens → token_budget
stopVia extra_paramsRenamed to stop_sequences
top_kVia extra_paramsCohere-specific
frequency_penalty, presence_penaltyVia extra_params

Input & instructions

  • String input converts to a user message; arrays convert to messages
  • instructions becomes a system message (prepended to the message list)

Tool support

Supported tool type: function. Tool definitions and tool choice use the same conversions as Chat Completions.

Response conversion

  • textmessage | tool_usefunction_call
  • input_tokens / output_tokens preserved; cached tokens surfaced in token details when present

Streaming

  • Event sequence: message-startcontent-startcontent-deltacontent-endmessage-end
  • Tool call arguments accumulated across chunks; synthetic output_item.added events for text/reasoning
  • Stable item IDs: msg_{messageID}_item_{outputIndex}
curl -X POST http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/command-r-plus",
    "input": "Hello, how are you?",
    "top_k": 40,
    "stop": [".", "!"]
  }'

3) Embeddings

Embeddings via Bifrost at POST /v1/embeddings, routed to upstream Cohere /v2/embed. Non-streaming. See Embeddings in Bifrost docs.

ParameterTransformationNotes
input (text or array)Converted to texts arraySingle string or list of strings
dimensionsRenamed to output_dimensionOutput vector size
input_typeVia extra_paramsRequired for v3+; defaults to search_document
embedding_typesVia extra_paramse.g. float, int8
truncateVia extra_paramsHow to handle long inputs (e.g. START)
max_tokensVia extra_paramsMax tokens to embed per input

Critical notes

  • input_type is required for Cohere v3+ embedding models (Bifrost defaults to search_document when omitted)
  • Use embedding_types to choose return formats (e.g. float, int8)
  • Cohere-specific options (truncate, max_tokens) pass via the request body or extra_params in the Go SDK

Response conversion

  • embeddings.floatdata[].embedding
  • meta.tokens → usage information
  • Multiple embedding types are supported when requested
curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cohere/embed-english-v3.0",
    "input": ["text to embed"],
    "dimensions": 1024,
    "input_type": "search_query",
    "embedding_types": ["float"],
    "truncate": "START"
  }'

4) List Models

Lists available Cohere models through the Bifrost gateway. Upstream: GET /v1/models?page_size={defaultPageSize}. No request body required. See List Models in Bifrost docs.

  • Model data is converted to the standard OpenAI-style models list format
  • Pagination is cursor-based via next_page_token
  • Optional filters endpoint and default_only are available via extra_params (Go SDK) or the request body (gateway)
curl http://localhost:8080/v1/models

Dropped parameters

The following OpenAI-style parameters are silently ignored when targeting Cohere through Bifrost:

[
  "logit_bias",
  "logprobs",
  "top_logprobs",
  "seed",
  "parallel_tool_calls",
  "service_tier"
]

Implementation caveats

CaveatImpactSeverity
Tool choice limitationsCannot force specific tool; simplified to "any" modeMedium
Strict tool mode droppedstrict: true in tool definitions silently droppedLow
Input type requirementinput_type required for embeddings v3+ modelsMedium
Cache control strippedCache control directives removed during conversionLow

Authoritative references

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os
2from anthropic import Anthropic
3
4anthropic = Anthropic(
5 api_key=os.environ.get("ANTHROPIC_API_KEY"),
6 base_url="https://<bifrost_url>/anthropic",
7)
8
9message = anthropic.messages.create(
10 model="claude-3-5-sonnet-20241022",
11 max_tokens=1024,
12 messages=[
13 {"role": "user", "content": "Hello, Claude"}
14 ]
15)
Drop in once, run everywhere.