Text Generation
General Chat API
POST
Documentation Index
Fetch the complete documentation index at: https://docs.qingbo.dev/llms.txt
Use this file to discover all available pages before exploring further.
- Unified chat API supporting all major text generation models
- Fully compatible with the OpenAI Chat Completions API format
- Switch between AI models seamlessly via the
modelparameter - Supports streaming, function calling, and other advanced features
Authorizations
All endpoints require Bearer Token authentication.Get your API Key:Visit the API Key management page to obtain your API Key.Add it to the request header:
Body
Model name.Supported models include:
- OpenAI:
gpt-5,gpt-5-mini,gpt-5-nano,gpt-5.1,gpt-5.2,gpt-5.4,gpt-4.1,gpt-4.1-mini,gpt-4.1-nano,gpt-4o,gpt-4o-mini,o1,o3-mini,o4-mini - Anthropic:
claude-opus-4.6,claude-sonnet-4.6,claude-opus-4.5,claude-sonnet-4.5,claude-haiku-4.5 - Google:
gemini-2.5-pro,gemini-2.5-flash,gemini-2.5-flash-lite,gemini-3-flash-preview,gemini-3.1-pro-preview - DeepSeek:
deepseek-r1-0528,deepseek-v3.2,deepseek-v3.2-exp - MiniMax:
minimax-m2.1,minimax-m2.5 - MoonshotAI:
kimi-k2.5,kimi-k2-thinking - Z.ai:
glm-4.6,glm-4.7,glm-5,glm-5.1 - More models added regularly…
List of conversation messages.
Controls output randomness, range 0–2.
- Lower values (e.g., 0.2) make output more deterministic.
- Higher values (e.g., 1.8) make output more random.
Maximum number of tokens to generate.The maximum allowed value varies by model — refer to the specific model documentation.
Whether to use streaming output.
true: Stream the response as Server-Sent Events (SSE).false: Return the full response in one go.
Nucleus sampling parameter, range 0–1.Controls diversity. We recommend using either
temperature or top_p, not both.Default: 1.0Frequency penalty, range -2.0 to 2.0.Positive values reduce the likelihood of repeating the same words.Default: 0
Presence penalty, range -2.0 to 2.0.Positive values increase the likelihood of introducing new topics.Default: 0
Stop sequences.Up to 4 sequences. Generation stops when any of them is encountered.
Number of completions to generate.Default: 1
Response
Unique identifier of the response.
Object type, always
chat.completion.Creation timestamp.
Name of the model that actually served the request.
List of generated completions.
Token usage statistics.
System fingerprint (used to track backend configuration).
Supported Models
OpenAI series
gpt-5— GPT-5 base modelgpt-5-mini— GPT-5 lightweightgpt-5-nano— GPT-5 ultra-lightweightgpt-5.1— GPT-5.1gpt-5.2— GPT-5.2gpt-5.4— GPT-5.4gpt-4.1— GPT-4.1gpt-4.1-mini— GPT-4.1 lightweightgpt-4.1-nano— GPT-4.1 ultra-lightweightgpt-4o— GPT-4o multimodal modelgpt-4o-mini— GPT-4o lightweighto1— OpenAI o1 reasoning modelo3-mini— OpenAI o3 Minio4-mini— OpenAI o4 Mini
Anthropic series
claude-opus-4.6— Claude 4.6 Opus, latest flagshipclaude-sonnet-4.6— Claude 4.6 Sonnet, latest versionclaude-opus-4.5— Claude 4.5 Opus flagshipclaude-sonnet-4.5— Claude 4.5 Sonnet, balancedclaude-haiku-4.5— Claude 4.5 Haiku, fast response
Google series
gemini-3.1-pro-preview— Gemini 3.1 Pro previewgemini-3-flash-preview— Gemini 3 Flash previewgemini-2.5-pro— Gemini 2.5 Progemini-2.5-flash— Gemini 2.5 Flashgemini-2.5-flash-lite— Gemini 2.5 Flash Lite
DeepSeek series
deepseek-r1-0528— DeepSeek R1 reasoning modeldeepseek-v3.2— DeepSeek V3.2deepseek-v3.2-exp— DeepSeek V3.2 experimental
MiniMax series
minimax-m2.1— MiniMax M2.1minimax-m2.5— MiniMax M2.5
MoonshotAI series
kimi-k2.5— Kimi K2.5kimi-k2-thinking— Kimi K2 Thinking
Z.ai series
glm-4.6— GLM 4.6glm-4.7— GLM 4.7glm-5— GLM 5glm-5.1— GLM 5.1
Examples
Basic chat
System prompt
Multi-turn conversation
Streaming output
Advanced features
This endpoint covers basic text chat. For advanced features, see:
- Image / video analysis — Multimodal Responses API
- Function calling — Multimodal Responses API
- Web search, file search — Multimodal Responses API
- Streaming output — Streaming guide