DeepSeek

deepseek/deepseek-v4-flash

- context · $0.1120 / M input tokens · $0.2240 / M output tokens

DeepSeek V4 Flash is a DeepSeek model route on OurToken for developers who need a cost-efficient option for chat, coding, summarization, long-context prompts, and high-volume assistant workloads.

Pricing

Pay-per-use

No upfront costs, pay only for what you use

80% of official price
Input$0.14 / M$0.1120 / M Tokens
Output$0.28 / M$0.2240 / M Tokens

API Usage

API Access Guide

Base URLhttps://api.ourtoken.ai/v1
API Endpointchat/completions
Full URLhttps://api.ourtoken.ai/v1/chat/completions
Model IDdeepseek-v4-flash
Get API Key

Code examples

Use the OurToken API endpoint for this model. The examples below use direct HTTP requests and the recommended endpoint for the model family.

curl https://api.ourtoken.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "max_tokens": 256
  }'

Chat Completions API Reference

Create a chat response with the OpenAI Chat Completions-compatible endpoint. Use https://api.ourtoken.ai/v1 as the SDK Base URL and POST /chat/completions as the endpoint.

Authorization

Content-Typeapplication/json
AuthorizationBearer YOUR_API_KEY

Request Body

FieldTypeRequiredDescription
modelstringRequiredModel ID to call.
messagesarray<object>RequiredConversation messages sent to the model.
max_tokensintegerOptionalMaximum number of output tokens.
temperaturenumberOptionalSampling temperature.
top_pnumberOptionalNucleus sampling parameter.
streambooleanOptionalWhether to return a streaming response.
stream_optionsobjectOptionalAdditional options for streaming responses.
toolsarray<object>OptionalTools available to the model.
tool_choicestring | objectOptionalControls how the model selects tools.
response_formatobjectOptionalControls structured output, such as JSON object responses.

Response Body

FieldTypeRequiredDescription
idstringRequiredUnique chat completion identifier.
object"chat.completion"RequiredObject type returned by the Chat Completions API.
createdintegerRequiredUnix timestamp when the response was created.
modelstringRequiredModel that produced the response.
choicesarray<object>RequiredCandidate responses returned by the model.
choices[].message.rolestringRequiredRole of the returned chat message.
choices[].message.contentstringOptionalText content in the returned chat message.
choices[].finish_reasonstringOptionalReason generation stopped.
usageobjectOptionalToken usage information for the chat completion.
usage.prompt_tokensintegerOptionalInput token count.
usage.completion_tokensintegerOptionalOutput token count.
usage.total_tokensintegerOptionalTotal token count.
usage.prompt_tokens_detailsobjectOptionalBreakdown of input token usage.
usage.prompt_tokens_details.cached_tokensintegerOptionalTokens served from cache.

Model Introduction

DeepSeek deepseek-v4-flash

DeepSeek V4 Flash is a DeepSeek model route on OurToken for developers who need a cost-efficient option for chat, coding, summarization, long-context prompts, and high-volume assistant workloads.

DeepSeek V4 Flash gives teams a lower-cost DeepSeek V4 route for application work where responsiveness, predictable pricing, and simple API integration matter. Use DeepSeek V4 Flash API when you want to test DeepSeek workflows through the OurToken unified API while keeping model IDs, usage logs, cache costs, and price review in one dashboard.

Why It Looks Great

  • 80% of the official DeepSeek V4 Flash reference price for input and output tokens.
  • OpenAI-compatible API setup through the same OurToken endpoint used by other supported models.
  • Clear cache read and cache write pricing for repeated-context prompts and long conversation workloads.
  • Useful for evaluating cost-sensitive chat, coding, summarization, and assistant workflows without separate provider-specific integration.
  • Dashboard logs and usage visibility help teams review request cost after launch.

Key Features

  • Model ID: deepseek-v4-flash
  • Input price: $0.1120 per 1M tokens on OurToken
  • Output price: $0.2240 per 1M tokens on OurToken
  • Cache read price: $0.0020 per 1M tokens on OurToken
  • Cache write price: $0 per 1M tokens on OurToken
  • Provider: DeepSeek

Specifications

ProviderDeepSeek
Model TypeLarge Language Model (LLM)
Model IDdeepseek-v4-flash
Context Length1M tokens
Max Output384K tokens
OurToken Input Price$0.1120 / 1M tokens
OurToken Output Price$0.2240 / 1M tokens
OurToken Cache Read Price$0.0020 / 1M tokens
OurToken Cache Write Price$0 / 1M tokens
Official Input Reference$0.14 / 1M tokens
Official Output Reference$0.28 / 1M tokens
Official Cache Read Reference$0.0028 / 1M tokens

DeepSeek V4 Flash API Features

Use DeepSeek V4 Flash API for unified DeepSeek V4 API access, transparent DeepSeek V4 Flash API pricing, cache visibility, and production evaluation.

Unified Access

Call DeepSeek V4 Flash API through OurToken's unified endpoint while keeping model access, API key management, and usage history in one place. Use deepseek-v4-flash as the model ID and reuse OpenAI-compatible request patterns for chat, coding, and agent workflows.

Pricing Clarity

Review DeepSeek V4 Flash pricing before rollout. OurToken lists $0.1120 input and $0.2240 output per 1M tokens, so teams can estimate the DeepSeek V4 Flash price for chat, coding, and high-volume assistant traffic before scaling production usage.

Cache Costs

Separate cache behavior from normal prompt spend with explicit cache pricing. DeepSeek V4 Flash API cache read is listed at $0.0020 per 1M tokens on OurToken, while cache write is $0 for repeated-context workloads and long prompt reuse.

Flash Workloads

Use the Flash route when responsiveness and cost control matter for production chat, summarization, coding notes, and lightweight agent tasks. Competitor material positions the model for fast inference and high-throughput workloads, which teams should validate with their own prompts.

Long Context

Evaluate DeepSeek V4 API workloads that need long context, such as document review, repository notes, support logs, and multi-turn conversations. Test latency, output quality, and cache behavior before making Flash the default route for large prompts.

Benchmark Review

Use DeepSeek V4 Flash benchmark claims as a starting point, not a production guarantee. Compare coding, reasoning, latency, tool use, and token consumption against your own acceptance criteria before scaling traffic to customer-facing workflows.

How to Use DeepSeek V4 Flash API on OurToken

Create an API key, copy deepseek-v4-flash, compare DeepSeek V4 pricing, call the unified endpoint, and monitor real usage.

Create API Key

Create an OurToken API key from the dashboard and store it in a secure server-side environment variable. This gives your backend access to DeepSeek V4 Flash API while keeping credentials out of client code and public repositories.

01

Copy Model ID

Use deepseek-v4-flash as the model value in your request body. Keeping the exact model ID in configuration helps developers avoid naming mistakes when comparing DeepSeek V4 API routes across local tests, staging traffic, and production deployments.

02

Call Endpoint

Send requests to the OurToken unified API endpoint with your API key, model ID, and prompt payload. Existing OpenAI-compatible chat request patterns can usually be reused after changing the base URL, credential, and model value.

03

Compare Pricing

Compare DeepSeek V4 pricing before rollout: OurToken lists $0.1120 input, $0.2240 output, and $0.0020 cache read per 1M tokens. Use those values to estimate DeepSeek V4 Flash price for expected prompt, output, and cache volumes.

04

Test Benchmarks

Treat every DeepSeek V4 Flash benchmark claim as a prompt for your own evaluation. Run representative coding, reasoning, summarization, and agent tasks, then compare response quality, latency, tool behavior, token usage, and error handling.

05

Monitor Cost

After launch, review history logs for request count, input tokens, output tokens, cache read tokens, and spend. Real usage data helps teams compare DeepSeek V4 Flash pricing against actual traffic instead of relying only on provider listing assumptions.

06

DeepSeek V4 Flash API FAQ

Answers about DeepSeek V4 Flash API pricing, DeepSeek V4 API access, cache costs, model ID setup, benchmarks, and Flash versus Pro evaluation.

01

What is DeepSeek V4 Flash API?

DeepSeek V4 Flash API is the Flash DeepSeek V4 model route available through OurToken for teams that want a lower-cost option for chat, coding notes, summarization, and assistant workflows. Developers can use the deepseek-v4-flash model ID, create an OurToken API key, and call it through the same unified API flow used by other supported models.
02

What is DeepSeek V4 Flash API pricing on OurToken?

DeepSeek V4 Flash API pricing on OurToken is $0.1120 per 1M input tokens and $0.2240 per 1M output tokens. The official references provided for DeepSeek V4 Flash are $0.14 input and $0.28 output per 1M tokens, so input and output pricing are 80% of official price.
03

What is the DeepSeek V4 Flash price for cache read and cache write?

The DeepSeek V4 Flash price for cache read is $0.0020 per 1M cache read tokens on OurToken, compared with the official $0.0028 reference. Cache write is listed as $0 per 1M tokens. Because cache read has its own ratio, do not assume every token category uses the same discount as input and output.
04

How does DeepSeek V4 pricing compare between Flash and Pro?

DeepSeek V4 pricing is lower on the Flash route in the current OurToken catalog: Flash lists $0.1120 input and $0.2240 output per 1M tokens, while Pro lists $0.3480 input and $0.6960 output. Choose Flash for cost-sensitive or high-volume workloads, then test Pro when quality requirements justify a stronger route.
05

Which model ID should I use for DeepSeek V4 API access?

Use deepseek-v4-flash as the model ID for this DeepSeek V4 API route on OurToken. The API Keys page and model gallery should show the callable model value, so developers can copy the exact ID and avoid mistakes caused by display names, provider prefixes, or casing differences.
06

How should I evaluate DeepSeek V4 Flash benchmark and capability claims?

Treat every DeepSeek V4 Flash benchmark claim as a starting point for testing rather than a production guarantee. Competitor material mentions JSON output, tool calls, coding, reasoning, and long-context tasks, but teams should verify response quality, latency, cache behavior, and total token cost against their own requirements.