DeepSeek’s Dual-Model Storm (R1 and Janus-Pro)

A Twin Revolution of Code and Capital

Jan 28, 2025

Wei Wang is a Taiwanese computer scientist who previously spent five years studying and working in Beijing. You might have read his essay on Black Myth: Wukong, which ChinaTalk published last September. Today, he’s here to analyze the twin DeepSeek shockwaves from a technical perspective, as well as provide an inside look at the energy now rippling through China’s AI ecosystem. We hope you enjoy it.

At the stroke of midnight Beijing time on January 28, 2025, Chinese AI company DeepSeek followed up its R1 launch with a surprise release of the multimodal framework Janus-Pro-7B, which can analyze images and generate images from text. This technological blitzkrieg triggered seismic shocks in global capital markets: after the R1 release, Nvidia stock price plummeted 17% in a single day, the Nasdaq index fell 3%, and the stock price of Chinese AI chipmaker Cambricon dropped nearly 10%. Meanwhile, DeepSeek's app topped Apple's U.S. free download chart, as its open-source strategy began rewriting industry ecosystem rules. Janus-Pro's midnight release (during the Lunar New Year holiday no less) coincided with U.S. midday trading. Considering that DeepSeek's parent company is a quantitative hedge fund, some had speculated it may have profited through short-selling strategies.

Astonishingly, Janus-Pro — reportedly trained for under 100,000 yuan (~US$14,000) — outperformed OpenAI's DALL-E 3 and Stable Diffusion in DPG-Bench image generation tests.

Breakthrough Architecture Analysis

The DeepSeek R1 series achieves structured reasoning breakthroughs through pure reinforcement learning (GRPO), abandoning traditional supervised fine-tuning (SFT). Its core innovation forces models to output complete reasoning chains within <think> tags, with final answers enclosed in <answer> tags. This design integrates precise reward mechanisms: mathematical answers must be formatted in SymPy within special boxes, code scenarios directly invoke compilers to verify test cases, and “language consistency rewards” resolve Chinese-English hybrid outputs. Technical reports show that R1 achieved 42% higher accuracy in LeetCode tests than its predecessor while consuming only 1/5 of the energy. DeepSeek’s developers also observed cognitive emergence phenomena during training: when initial problem-solving attempts failed, the model autonomously generated multiple alternative solutions for reevaluation.

The R1 model also features mandatory visualization of reasoning chains instead of black-box outputs produced by traditional models — all while consuming 76% less energy for training than GPT-4.

DeepSeek unusually disclosed technical trial details, admitting failed attempts to use Process Reward Models (PRM) and Monte Carlo Tree Search (MCTS) due to reward hacking and token space complexity, ultimately opting for simplified technical pathways (see: R1 technical report, pp. 15-16).

The lightweight Janus-Pro-7B framework can be deployed on a single GPU through decoupled visual encoding, outperforming larger competitors in GenEval tests.

Chain Reactions in Open-Source Ecosystems

DeepSeek's open-source strategy has electrified the tech community, both in China and overseas. The R1 series fully discloses its entire development path from pure reinforcement learning (R1-zero) to multi-stage hybrid training (R1). This transparency sparked rapid open-source collaboration, with more than three thousand derivative projects visible on GitHub as of the time of writing. This includes the Qwen-14B model distilled from R1 that surpassed the team’s larger original model, Qwen-32B.

Janus-Pro's open-source impact runs even deeper. The fact that this 7B-parameter model outperforms 175B-scale competitors may permanently alter the “bigger-is-better” paradigm. Its innovative rectified flow architecture fused with autoregressive language models shattered previous small-model image quality limits. Janus-Pro’s GitHub page now hosts active discussions about a variety of details about the model. These controversies are fueling new technical experiments that will propel innovation further forward.

The Unfinished Technological Odyssey

DeepSeek's technical report candidly acknowledges current limitations. For example, nobody has figured out how to build reward functions for physics or chemistry, which prevents the type of breakthroughs we’ve seen in coding and math. Additionally, it remains questionable whether forced natural language reasoning is the optimal path forward, and open-source communities have already begun to explore possible alternatives to CoT. Kimi's research team is experimenting with compressing long reasoning chains into shorter chains of thought for practical reasons, while Meta’s researchers are exploring latent-space implicit reasoning (i.e., the use of internal thinking processes that aren’t visible in the model’s output).

On the market front, Nvidia attempted to soothe investor concerns about declining compute demand by pointing out that, “Inference requires significant numbers of Nvidia GPUs and high-performance networking.” Recent ChinaTalk guest Jimmy Goodrich added that Nvidia’s export control-compliant H20 processor is “probably the best chip in the world for inference.”

Yet industry consensus holds that Janus-Pro-7B's architectural innovations have shaken the traditional view of hardware dependence.

Peering Inside the Industry

The following insights come from machine learning engineers employed by one of China’s AI tigers:

The open-source models have been validated on Huawei GPUs with minimal performance loss, indicating low replication barriers.
Chinese AI engineers are already retraining models using DeepSeek's open-source frameworks — this Lunar New Year, only DeepSeek engineers will get holidays.
Industry rumors suggest DeepSeek R1 and Janus-Pro are merely appetizers, with more explosive releases forthcoming.

To close, I’ll leave you with my personal perspective: High-Flyer probably profited from shorting Nvidia this time. DeepSeek is threatening to burst the AI hardware bubble, while simultaneously outperforming subscription-based AI firms like OpenAI. Future investors would be wise to guard against such risks.

Given the minimal resource requirements at this stage, leading AI companies will likely invest in further research related to DeepSeek's training framework. The low resource threshold will also enable numerous small and medium-sized enterprises to enter the field, expanding LLM technology into more domains and catalyzing broader development across the entire LLM industry. Readers should understand one thing: this is just the beginning. AI innovation is still far from peaking.

Void Politics Taiwan

Discussion about this post