AI Trivia

Daily Jin AI Trivia: What you need to know about DeepSeek V3.1

Daily Jin AI Trivia: What you need to know about DeepSeek V3.1

Two days ago, DeepSeek quietly dropped their new model weights on Hugging Face — just a small note on their WeChat channel, no big announcement.

So what’s new?

  • One Model to Rule Them All: DeepSeek V3 and R1 are now merged into a single big model — DeepSeek V3.1 — featuring a hybrid thinking mode switch.

  • Faster & Stronger Agents: Runs faster, with upgraded tool-calling skills.

  • Model Specs: 671B parameters, 37B activated, 128K token window (same as before). And Only 1 pricing now.

  • Format Upgrade: V3.1 is designed with UE8M0 FP8 (China’s NPU format) instead of MXFP8 (Nvidia’s E4M3/E5M2). In short: prioritizing width over precision — it can fit BF16 values but sacrifices some accuracy.

Jin’s Verdict:

  • Much faster, uses fewer tokens per task.
  • Smarter in coding, tool use, and math.
  • Slightly dumber in general knowledge.

A solid upgrade worthy of the “3.1” — Unlike GPT-5 ( >_>)

Feels like a foundation model optimized for China’s own AI chips rather than Nvidia GPUs.

Quick Not Professional Review:

Outstanding in coding — currently #1 on the Aider Polyglot benchmark for open-source models. Honestly, I’d take it over Claude Sonnet 4.

DeepSeek’s API now supports both OpenAI and Anthropic formats, meaning you can plug it directly into Claude code https://api-docs.deepseek.com/guides/anthropic_api

The “midnight eco” cheap rate is gone, but since V3.1 outputs fewer tokens, you should still see lower costs compared to R1.

That’s it — hope you learned something new today! See ya !!

Trivia Image Trivia Image Trivia Image Trivia Image