DeepSeek OpenAI Chat Completions 调用

本文说明腾讯 TokenHub DeepSeek 模型在 AI Gateway 中的 OpenAI Chat Completions 调用方式。当前腾讯云 TokenHub 侧仅支持 DeepSeek 模型，因此本文示例均使用

deepseek-v4-flash

deepseek-v4-flash

和

deepseek-v4-pro

deepseek-v4-pro

。

OpenAI Chat Completions 是 DeepSeek 推荐默认接入方式，适合普通对话、复杂推理、流式输出、JSON 输出和工具调用。

一、请求地址

POST https://cn-shanghai-alicloud-aimesh.api.clickzetta.com/gateway/v1/chat/completions

请求头：

Authorization: Bearer <API_KEY> Content-Type: application/json

二、基础对话

curl -X POST "https://cn-shanghai-alicloud-aimesh.api.clickzetta.com/gateway/v1/chat/completions" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-v4-flash", "messages": [ { "role": "user", "content": "请介绍一下大语言模型。" } ], "max_tokens": 1024, "thinking": { "type": "disabled" } }'

三、System Prompt

通过

system

system

角色设置模型行为、回答边界和输出格式。

curl -X POST "$AI_GATEWAY_BASE_URL/chat/completions" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-v4-pro", "messages": [ { "role": "system", "content": "你是一个严谨的技术分析助手。回答时先给结论，再给依据。" }, { "role": "user", "content": "请分析 AI Gateway 路由策略对稳定性的影响。" } ], "temperature": 0.3, "max_tokens": 2048, "thinking": { "type": "enabled" } }'

四、多轮对话

多轮对话需要把历史消息一并传入

messages

messages

。

{ "model": "deepseek-v4-flash", "messages": [ { "role": "system", "content": "你是一个友好的 AI 助手。" }, { "role": "user", "content": "我叫小明，我喜欢打篮球。" }, { "role": "assistant", "content": "你好，小明！打篮球是一项很棒的运动。" }, { "role": "user", "content": "你还记得我的名字和爱好吗？" } ], "max_tokens": 1024, "thinking": { "type": "disabled" } }

普通多轮对话中，回写上一轮

assistant

assistant

消息时通常只需要回写

content

content

，不需要回写

reasoning_content

reasoning_content

。

五、请求字段

字段	类型	必填	说明
`model` model	string	是	模型名称，例如 `deepseek-v4-flash` deepseek-v4-flash 、 `deepseek-v4-pro` deepseek-v4-pro 。
`messages` messages	array	是	对话消息数组，按时间顺序排列。
`stream` stream	boolean	否	是否开启 SSE 流式输出，默认 `false` false 。
`stream_options` stream_options	object	否	流式输出选项。常用 `{"include_usage": true}` {"include_usage": true} ，仅在 `stream=true` stream=true 时有效。
`temperature` temperature	number	否	采样温度。TokenHub 常见范围为 `0.0` 0.0 到 `2.0` 2.0 ，值越高输出越随机。
`top_p` top_p	number	否	核采样参数。TokenHub 常见范围为 `0.0` 0.0 到 `1.0` 1.0 。建议与 `temperature` temperature 二选一调整。
`max_tokens` max_tokens	integer	否	最大输出 Token 数。思考类模型的推理 Token 与回答 Token 共享该额度，开启思考时建议适当调大。
`n` n	integer	否	候选回复数量。生产环境通常使用默认值 `1` 1 ； `n > 1` n > 1 会按总 Token 量计费。
`stop` stop	string / array	否	停止生成序列。
`presence_penalty` presence_penalty	number	否	DeepSeek 调用中该参数已废弃，传入通常无效果。
`frequency_penalty` frequency_penalty	number	否	DeepSeek 调用中该参数已废弃，传入通常无效果。
`response_format` response_format	object	否	结构化输出配置，例如 `{"type": "json_object"}` {"type": "json_object"} 。
`tools` tools	array	否	工具定义列表。
`tool_choice` tool_choice	string / object	否	工具选择策略，例如 `auto` auto 、 `none` none 、 `required` required 或指定某个工具。
`parallel_tool_calls` parallel_tool_calls	boolean	否	是否允许并行工具调用，默认通常为 `true` true 。
`thinking` thinking	object	否	DeepSeek 思考模式控制，常见为 `{"type": "enabled"}` {"type": "enabled"} 或 `{"type": "disabled"}` {"type": "disabled"} 。
`thinking.reasoning_effort` thinking.reasoning_effort	string	否	推理深度配置。复杂任务可设置为 `high` high ，是否支持更多取值以模型详情页为准。

DeepSeek 参数限制：

不建议同时开启
```
thinking.type=enabled
```
thinking.type=enabled
和
```
response_format.type=json_object
```
response_format.type=json_object
。
开启思考模式后，响应时间可能更长，建议配合
```
stream=true
```
stream=true
使用。
```
max_tokens
```
max_tokens
同时约束思考过程和最终回答，开启思考模式时建议适当调大。
```
frequency_penalty
```
frequency_penalty
和
```
presence_penalty
```
presence_penalty
对 DeepSeek 已废弃，传入通常无效果。

六、流式输出

推理模型输出较长时，建议启用流式输出。

curl -N -X POST "$AI_GATEWAY_BASE_URL/chat/completions" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-v4-pro", "messages": [ { "role": "user", "content": "请分步骤分析一个 RAG 系统的召回链路。" } ], "max_tokens": 2048, "stream": true, "stream_options": { "include_usage": true }, "thinking": { "type": "enabled", "reasoning_effort": "high" } }'

注意：

流式响应使用 SSE 形式返回，客户端需要逐段读取。
开启思考模式时，流式增量中可能同时包含
```
reasoning_content
```
reasoning_content
和
```
content
```
content
。
如果业务不展示思考过程，应只渲染最终回答内容。

七、Python SDK

from openai import OpenAI client = OpenAI( api_key="<your-api-key>", base_url="https://cn-shanghai-alicloud-aimesh.api.clickzetta.com/gateway/v1", ) response = client.chat.completions.create( model="deepseek-v4-flash", messages=[ {"role": "user", "content": "请解释网关重试和业务重试的区别。"} ], max_tokens=1024, temperature=0.3, extra_body={"thinking": {"type": "disabled"}}, ) print(response.choices[0].message.content)

流式示例：

from openai import OpenAI client = OpenAI( api_key="<your-api-key>", base_url="https://cn-shanghai-alicloud-aimesh.api.clickzetta.com/gateway/v1", ) stream = client.chat.completions.create( model="deepseek-v4-pro", messages=[{"role": "user", "content": "请分析 AI Gateway 的价值。"}], max_tokens=2048, stream=True, extra_body={"thinking": {"type": "enabled"}}, ) for chunk in stream: if not chunk.choices: continue delta = chunk.choices[0].delta content = getattr(delta, "content", None) if content: print(content, end="", flush=True)

八、Node.js SDK

import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.API_KEY, baseURL: "https://cn-shanghai-alicloud-aimesh.api.clickzetta.com/gateway/v1", }); const completion = await client.chat.completions.create({ model: "deepseek-v4-flash", messages: [ { role: "user", content: "请用三句话解释 API Gateway 和 AI Gateway 的区别。", }, ], max_tokens: 1024, temperature: 0.3, thinking: { type: "disabled" }, }); console.log(completion.choices[0].message.content);

九、响应字段

典型非流式响应：

{ "id": "chatcmpl-xxx", "object": "chat.completion", "created": 1710000000, "model": "deepseek-v4-flash", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "AI Gateway 可以统一管理模型调用、鉴权、路由和用量统计。" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 32, "completion_tokens": 24, "total_tokens": 56 } }

字段	说明
`choices[0].message.content` choices[0].message.content	最终回答内容。
`choices[0].message.reasoning_content` choices[0].message.reasoning_content	开启思考模式时可能返回的思考过程。
`finish_reason` finish_reason	结束原因，例如 `stop` stop 、 `length` length 、 `tool_calls` tool_calls 。
`usage.prompt_tokens` usage.prompt_tokens	输入 Token 数。
`usage.completion_tokens` usage.completion_tokens	输出 Token 数，可能包含思考 Token。
`usage.total_tokens` usage.total_tokens	总 Token 数。

联系我们