deepseek/deepseek-v4-flash

$0.1120 / M input tokens · $0.2240 / M output tokens

DeepSeek V4 Flash is a DeepSeek model route on OurToken for developers who need a cost-efficient option for chat, coding, summarization, long-context prompts, and high-volume assistant workloads.

Get API Key

24H Status Monitor

100% uptime

8 hours agonow

Available

2026-07-23 15:28:32 UTC

Pricing

Pay-per-use

No upfront costs, pay only for what you use

80% of official price

Input$0.14 / M$0.1120 / M Tokens

Output$0.28 / M$0.2240 / M Tokens

Cached input$0.0028 / M$0.0020 / M Tokens

Cache writes$0 / M$0 / M Tokens

API Usage

API Access Guide

Base URLhttps://api.ourtoken.ai/v1

API Endpointchat/completions

Full URLhttps://api.ourtoken.ai/v1/chat/completions

Model IDdeepseek-v4-flash

Get API Key

Code examples

Use the OurToken API endpoint for this model. The examples below use direct HTTP requests and the recommended endpoint for the model family.

curl https://api.ourtoken.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "max_tokens": 256
  }'

Chat Completions API Reference

Create a chat response with the OpenAI Chat Completions-compatible endpoint. Use https://api.ourtoken.ai/v1 as the SDK Base URL and POST /chat/completions as the endpoint.

Authorization

Content-Type	application/json
Authorization	Bearer YOUR_API_KEY

Request Body

Field	Type	Required	Description
model	string	Required	Model ID to call.
messages	array<object>	Required	Conversation messages sent to the model.
max_tokens	integer	Optional	Maximum number of output tokens.
temperature	number	Optional	Sampling temperature.
top_p	number	Optional	Nucleus sampling parameter.
stream	boolean	Optional	Whether to return a streaming response.
stream_options	object	Optional	Additional options for streaming responses.
tools	array<object>	Optional	Tools available to the model.
tool_choice	string \| object	Optional	Controls how the model selects tools.
response_format	object	Optional	Controls structured output, such as JSON object responses.

Response Body

Field	Type	Required	Description
id	string	Required	Unique chat completion identifier.
object	"chat.completion"	Required	Object type returned by the Chat Completions API.
created	integer	Required	Unix timestamp when the response was created.
model	string	Required	Model that produced the response.
choices	array<object>	Required	Candidate responses returned by the model.
choices[].message.role	string	Required	Role of the returned chat message.
choices[].message.content	string	Optional	Text content in the returned chat message.
choices[].finish_reason	string	Optional	Reason generation stopped.
usage	object	Optional	Token usage information for the chat completion.
usage.prompt_tokens	integer	Optional	Input token count.
usage.completion_tokens	integer	Optional	Output token count.
usage.total_tokens	integer	Optional	Total token count.
usage.prompt_tokens_details	object	Optional	Breakdown of input token usage.
usage.prompt_tokens_details.cached_tokens	integer	Optional	Tokens served from cache.

Model Introduction

DeepSeek deepseek-v4-flash

DeepSeek V4 Flash is a DeepSeek model route on OurToken for developers who need a cost-efficient option for chat, coding, summarization, long-context prompts, and high-volume assistant workloads.

DeepSeek V4 Flash gives teams a lower-cost DeepSeek V4 route for application work where responsiveness, predictable pricing, and simple API integration matter. Use DeepSeek V4 Flash API when you want to test DeepSeek workflows through the OurToken unified API while keeping model IDs, usage logs, cache costs, and price review in one dashboard.

Why It Looks Great

80% of the official DeepSeek V4 Flash reference price for input and output tokens.
OpenAI-compatible API setup through the same OurToken endpoint used by other supported models.
Clear cache read and cache write pricing for repeated-context prompts and long conversation workloads.
Useful for evaluating cost-sensitive chat, coding, summarization, and assistant workflows without separate provider-specific integration.
Dashboard logs and usage visibility help teams review request cost after launch.

Key Features

Model ID: deepseek-v4-flash
Input price: $0.1120 per 1M tokens on OurToken
Output price: $0.2240 per 1M tokens on OurToken
Cache read price: $0.0020 per 1M tokens on OurToken
Cache write price: $0 per 1M tokens on OurToken
Provider: DeepSeek

Specifications

ProviderDeepSeek

Model TypeLarge Language Model (LLM)

Model IDdeepseek-v4-flash

Context Length1M tokens

Max Output384K tokens

OurToken Input Price$0.1120 / 1M tokens

OurToken Output Price$0.2240 / 1M tokens

OurToken Cache Read Price$0.0020 / 1M tokens

OurToken Cache Write Price$0 / 1M tokens

Official Input Reference$0.14 / 1M tokens

Official Output Reference$0.28 / 1M tokens

Official Cache Read Reference$0.0028 / 1M tokens

DeepSeek V4 Flash API Features

Use DeepSeek V4 Flash API for unified DeepSeek V4 API access, transparent DeepSeek V4 Flash API pricing, cache visibility, and production evaluation.

Unified Access

Call DeepSeek V4 Flash API through OurToken's unified endpoint while keeping model access, API key management, and usage history in one place. Use deepseek-v4-flash as the model ID and reuse OpenAI-compatible request patterns for chat, coding, and agent workflows.

Pricing Clarity

Review DeepSeek V4 Flash pricing before rollout. OurToken lists $0.1120 input and $0.2240 output per 1M tokens, so teams can estimate the DeepSeek V4 Flash price for chat, coding, and high-volume assistant traffic before scaling production usage.

Cache Costs

Separate cache behavior from normal prompt spend with explicit cache pricing. DeepSeek V4 Flash API cache read is listed at $0.0020 per 1M tokens on OurToken, while cache write is $0 for repeated-context workloads and long prompt reuse.

Flash Workloads

Use the Flash route when responsiveness and cost control matter for production chat, summarization, coding notes, and lightweight agent tasks. Competitor material positions the model for fast inference and high-throughput workloads, which teams should validate with their own prompts.

Long Context

Evaluate DeepSeek V4 API workloads that need long context, such as document review, repository notes, support logs, and multi-turn conversations. Test latency, output quality, and cache behavior before making Flash the default route for large prompts.

Benchmark Review

Use DeepSeek V4 Flash benchmark claims as a starting point, not a production guarantee. Compare coding, reasoning, latency, tool use, and token consumption against your own acceptance criteria before scaling traffic to customer-facing workflows.

How to Use DeepSeek V4 Flash API on OurToken

Create an API key, copy deepseek-v4-flash, compare DeepSeek V4 pricing, call the unified endpoint, and monitor real usage.

Create API Key

Create an OurToken API key from the dashboard and store it in a secure server-side environment variable. This gives your backend access to DeepSeek V4 Flash API while keeping credentials out of client code and public repositories.

Copy Model ID

Use deepseek-v4-flash as the model value in your request body. Keeping the exact model ID in configuration helps developers avoid naming mistakes when comparing DeepSeek V4 API routes across local tests, staging traffic, and production deployments.

Call Endpoint

Send requests to the OurToken unified API endpoint with your API key, model ID, and prompt payload. Existing OpenAI-compatible chat request patterns can usually be reused after changing the base URL, credential, and model value.

Compare Pricing

Compare DeepSeek V4 pricing before rollout: OurToken lists $0.1120 input, $0.2240 output, and $0.0020 cache read per 1M tokens. Use those values to estimate DeepSeek V4 Flash price for expected prompt, output, and cache volumes.

Test Benchmarks

Treat every DeepSeek V4 Flash benchmark claim as a prompt for your own evaluation. Run representative coding, reasoning, summarization, and agent tasks, then compare response quality, latency, tool behavior, token usage, and error handling.

Monitor Cost

After launch, review history logs for request count, input tokens, output tokens, cache read tokens, and spend. Real usage data helps teams compare DeepSeek V4 Flash pricing against actual traffic instead of relying only on provider listing assumptions.

DeepSeek V4 Flash API FAQ

Answers about DeepSeek V4 Flash API pricing, DeepSeek V4 API access, cache costs, model ID setup, benchmarks, and Flash versus Pro evaluation.

What is DeepSeek V4 Flash API?

DeepSeek V4 Flash API is the Flash DeepSeek V4 model route available through OurToken for teams that want a lower-cost option for chat, coding notes, summarization, and assistant workflows. Developers can use the deepseek-v4-flash model ID, create an OurToken API key, and call it through the same unified API flow used by other supported models.

What is DeepSeek V4 Flash API pricing on OurToken?

DeepSeek V4 Flash API pricing on OurToken is $0.1120 per 1M input tokens and $0.2240 per 1M output tokens. The official references provided for DeepSeek V4 Flash are $0.14 input and $0.28 output per 1M tokens, so input and output pricing are 80% of official price.

What is the DeepSeek V4 Flash price for cache read and cache write?

The DeepSeek V4 Flash price for cache read is $0.0020 per 1M cache read tokens on OurToken, compared with the official $0.0028 reference. Cache write is listed as $0 per 1M tokens. Because cache read has its own ratio, do not assume every token category uses the same discount as input and output.

How does DeepSeek V4 pricing compare between Flash and Pro?

DeepSeek V4 pricing is lower on the Flash route in the current OurToken catalog: Flash lists $0.1120 input and $0.2240 output per 1M tokens, while Pro lists $0.3480 input and $0.6960 output. Choose Flash for cost-sensitive or high-volume workloads, then test Pro when quality requirements justify a stronger route.

Which model ID should I use for DeepSeek V4 API access?

Use deepseek-v4-flash as the model ID for this DeepSeek V4 API route on OurToken. The API Keys page and model gallery should show the callable model value, so developers can copy the exact ID and avoid mistakes caused by display names, provider prefixes, or casing differences.

How should I evaluate DeepSeek V4 Flash benchmark and capability claims?

Treat every DeepSeek V4 Flash benchmark claim as a starting point for testing rather than a production guarantee. Competitor material mentions JSON output, tool calls, coding, reasoning, and long-context tasks, but teams should verify response quality, latency, cache behavior, and total token cost against their own requirements.