The
Hangzhou-based
company said in a WeChat post on Thursday that its namesake LLM,
DeepSeek V3, comes with 671 billion parameters and trained in around two
months at a cost of US$5.58 million, using significantly fewer
computing resources than models developed by bigger tech firms.
LLM refers to the technology underpinning
generative AI
services such as ChatGPT. In AI, a high number of parameters is pivotal
in enabling an LLM to adapt to more complex data patterns and make
precise predictions.
Reacting
to the Chinese start-up’s technical report on its new AI model,
computer scientist Andrej Karpathy – a founding team member at OpenAI –
said in a post on
social-media platform
X: “DeepSeek making it look easy … with an open weights release of a frontier-grade LLM trained on a joke of a budget.”
Open
weights refers to releasing only the pretrained parameters, or weights,
of an AI model, which allows a third party to use the model for
inference and fine-tuning only. The model’s training code, original data
set, architecture details and training methodology are not provided.
The chatbot icons of DeepSeek and OpenAI’s ChatGPT are displayed on a smartphone screen. Photo: Shutterstock
DeepSeek’s
development of a powerful LLM – at a fraction of the capital outlay
that bigger companies like Meta and OpenAI typically invest – shows how
far Chinese AI firms have progressed, despite
US sanctions that have blocked their access to advanced
semiconductors used for training models.
Leveraging
new architecture designed to achieve cost-effective training, DeepSeek
required just 2.78 million GPU hours – the total amount of time that a
graphics processing unit is used to train an LLM – for its V3 model. The
start-up’s training process used
Nvidia’s China-tailored
H800 GPUs.
That process was substantially less than the 30.8 million GPU hours that
Facebook
parent Meta needed to train its Llama 3.1 model on Nvidia’s more
advanced H100 chips, which are not allowed to be exported to China.
“DeepSeek V3 looks to be a stronger model at only 2.8 million GPU hours,” Karpathy wrote in his X post.
The technical report on V3 posted by DeepSeek showed that its LLM outperformed Meta’s Llama 3.1 and
Alibaba Group Holding’s
Qwen 2.5 in a series of benchmark tests evaluating an AI system’s
capabilities from text understanding and generation, domain expert
knowledge, coding and maths problem solving. Alibaba owns the South
China Morning Post.
The same benchmark tests showed that V3’s results matched up with those of OpenAI’s
GPT-4o and Claude 3.5 Sonnet from
Amazon.com-backed
Anthropic.
DeepSeek was spun off in July last year by
High-Flyer Quant, which uses AI to operate one of the largest quantitative
hedge funds in mainland China.
High-Flyer
spent 200 million yuan (US$27.4 million) to develop AI cluster Fire
Flyer I between 2019 and 2020, and then spent 1 billion yuan more to
build Fire-Flyer II, according to the Hangzhou-based company’s website.
In
an announcement last April, High-Flyer said DeepSeek’s development goal
is to create “AI that will benefit all of humanity”. DeepSeek had
earlier launched a series of AI models, which are used by developers to
build third-party applications, as well as its own chatbot.