SGLang provider summary
SGLang serves models with an OpenAI-compatible API. Bifrost routes requests through the OpenAI provider layer with streaming (SSE), tool calling, embeddings, and filtered parameters for SGL compatibility.
Key features:
- OpenAI API compatibility — identical request/response format
- Full streaming support with usage tracking
- Tool calling — function definitions and execution
- Text embeddings for vector generation
- Parameter filtering — unsupported OpenAI fields removed automatically
| Property | Details |
|---|---|
| Description | OpenAI-compatible local/remote inference engine. |
| Provider route on Bifrost | sgl/<model> |
| Typical endpoint | http://localhost:8000 |
| Supported endpoints | /v1/chat/completions, /v1/responses, /v1/completions, /v1/embeddings, /v1/models |
Supported operations
Bifrost delegates SGLang to the OpenAI provider implementation. Chat, Responses API, and Text Completions support streaming; Embeddings and List Models do not. Speech, Transcriptions, Files, and Batch return UnsupportedOperationError. SGL is typically self-hosted — configure BaseURL to your instance (e.g. http://localhost:8000). See Supported operations in Bifrost docs.
| Operation | Non-streaming | Streaming | Upstream endpoint |
|---|---|---|---|
| Chat Completions | Yes | Yes | /v1/chat/completions |
| Responses API | Yes | Yes | /v1/chat/completions |
| Text Completions | Yes | Yes | /v1/completions |
| Embeddings | Yes | — | /v1/embeddings |
| List Models | Yes | — | /v1/models |
| Image Generation | No | No | - |
| Speech (TTS) | No | No | - |
| Transcriptions (STT) | No | No | - |
| Files | No | No | - |
| Batch | No | No | - |
BaseURL configuration
SGL requires BaseURL pointing at your SGLang server. Requests fail without it (validated in NewSGLProvider). Use http://localhost:8000 for local deployments or https://sgl.example.com for remote instances.
# Example: route chat through Bifrost to a local SGL server
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "sgl/meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello"}]
}'API reference
OpenAI-compatible endpoints routed to your SGL instance via Bifrost.
1) Chat Completions
Primary request path. Maps to upstream /v1/chat/completions. SGL supports all standard OpenAI chat completion parameters. For full parameter reference, see OpenAI Chat Completions and SGL Chat Completions in Bifrost docs.
Filtered parameters
Removed for SGL compatibility:
| Parameter | Reason | Notes |
|---|---|---|
| prompt_cache_key | Not supported | Removed for SGL compatibility |
| verbosity | Anthropic-specific | Removed for SGL compatibility |
| store | Not supported | Removed for SGL compatibility |
| service_tier | OpenAI-specific | Removed for SGL compatibility |
SGL supports standard OpenAI message types, tools, responses, and streaming formats. Cache control directives are stripped from messages during JSON marshaling.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "sgl/meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello"}],
"stream": true
}'2) Responses API
Fallback to Chat Completions with format conversion. Upstream routes to /v1/chat/completions on your SGL instance. Same parameter support as Chat Completions. See Responses API in Bifrost docs.
ResponsesRequest → ChatRequest → Response conversion
curl -X POST http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "sgl/meta-llama/Llama-3.1-8B-Instruct",
"input": "Hello",
"max_output_tokens": 1024
}'3) Text Completions
Legacy text completion format at /v1/completions. Supports streaming. See Text Completions in Bifrost docs.
| Parameter | Mapping | Notes |
|---|---|---|
| prompt | Direct pass-through | |
| max_tokens | max_tokens | |
| temperature | Direct pass-through | |
| top_p | Direct pass-through | |
| frequency_penalty | Supported | |
| presence_penalty | Supported |
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "sgl/meta-llama/Llama-3.1-8B-Instruct",
"prompt": "Hello, my name is",
"max_tokens": 50
}'4) Embeddings
Text embeddings at /v1/embeddings — no streaming. Response returns embedding vectors with usage information. See Embeddings in Bifrost docs.
| Parameter | Notes |
|---|---|
| input | Text or array of texts |
| model | Embedding model name |
| encoding_format | "float" or "base64" |
| dimensions | Model-specific dimension count |
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "sgl/your-embedding-model",
"input": "Hello world"
}'5) List Models
GET /v1/models — lists available models from your SGL server with capabilities. No request parameters required. See List Models in Bifrost docs.
curl http://localhost:8080/v1/models
Unsupported features
These operations are not offered by the upstream SGL API. Bifrost returns UnsupportedOperationError. See Unsupported features in Bifrost docs.
| Feature | Reason |
|---|---|
| Speech/TTS | Not offered by SGL API |
| Transcription/STT | Not offered by SGL API |
| Batch operations | Not offered by SGL API |
| File management | Not offered by SGL API |
| Image generation | Not offered by SGL API |
Implementation caveats
| Caveat | Impact | Severity |
|---|---|---|
| BaseURL configuration required | Requests fail without explicit BaseURL (validated in NewSGLProvider) | High |
| Cache control stripped | Cache control directives removed from messages; prompt caching does not work | Medium |
| Parameter filtering | prompt_cache_key, verbosity, store, service_tier removed via filterOpenAISpecificParameters | Low |
| User field size limit | User identifiers longer than 64 characters are silently dropped (SanitizeUserField) | Low |
Authoritative references
- Bifrost SGLang provider reference: docs.getbifrost.ai/providers/supported-providers/sgl
- SGLang project: github.com/sgl-project/sglang
- Bifrost provider support overview: docs.getbifrost.ai/providers/supported-providers/overview