DeepSeek launches V4 models with 9.5 times lower memory requirements and compatibility with Huawei Ascend


DeepSeek has introduced two new open weight models in preview: Deep Search V4an expert mix model with 284 billion parameters and 13 billion active parameters and DeepSeek V4-Pro, 1.6 billion parameter model with 49 billion active parameters. Both are available for download on Hugging Face, as well as through the DeepSeek API and web service.

V4-Pro was trained on 33 billion tokens. The company claims it outperforms all open-weight large language models and rivals leading Western proprietary models in its benchmark set. However, since these statements are self-reported, they should be considered with caution and evaluated against independent evidence.

Architectural Changes Behind DeepSeek V4 Efficiency Gains

The most notable technical update in V4 is a hybrid attention mechanism that combines compressed sparse attention and intense compressed attention. This combination reduces the computation required during inference and compresses the key-value caches used to track model state. DeepSeek reports that this results in a context window of one million tokens, with memory requirements decreasing by a factor of 9.5 to 13.7 compared to DeepSeek V3.2.

Both V4 models use a combination of FP8 and FP4 precision, with quantization-aware training applied to the expert combination weights. Using FP4 approximately halves the memory required to store model weights compared to FP8. Additionally, DeepSeek V4 introduces a new optimizer called Muon, which aims to accelerate convergence and improve training stability.

DeepSeek V4 Hardware Support and API Pricing vs. GPT 5.5

DeepSeek V4 has been confirmed to work on both Nvidia GPUs and Huawei Ascend NPU platforms. The article mentions the validation of the expert parallel scheme of the model on these types of hardware. It is unclear whether Huawei’s accelerators were used during training or solely for inference purposes.

DeepSeek V4 Costs $0.14 per million entry tokens and $0.28 per million output tokens for uncached requests. The V4-Pro version is priced at $1.74 per million entry tokens and $3.48 per million output tokens. In comparison, OpenAI’s GPT-5.5 is priced at $5 per million entry tokens and $30 per million output tokens. Both models, including the basic and instruction-tuned versions, are now available in preview through the DeepSeek API and Hugging Face.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *