[Salon] Chinese start-up DeepSeek launches AI model that outperforms Meta, OpenAI products



Chinese start-up DeepSeek launches AI model that outperforms Meta, OpenAI products

DeepSeek’s V3 model was trained for two months at a cost of US$5.58 million, using significantly fewer computing resources than its rivals

DeepSeek has developed an artificial intelligence model at a fraction of the capital outlay that bigger companies like Meta Platforms and OpenAI typically invest. Photo: Shutterstock
Ben Jiangin Beijing
27 Dec 2024   The South China Morning Post
Chinese start-up DeepSeek’s release of a new large language model (LLM) has made waves in the global artificial intelligence (AI) industry, as benchmark tests showed that it outperformed rival models from the likes of Meta Platforms and ChatGPT creator OpenAI.
The Hangzhou-based company said in a WeChat post on Thursday that its namesake LLM, DeepSeek V3, comes with 671 billion parameters and trained in around two months at a cost of US$5.58 million, using significantly fewer computing resources than models developed by bigger tech firms.
LLM refers to the technology underpinning generative AI services such as ChatGPT. In AI, a high number of parameters is pivotal in enabling an LLM to adapt to more complex data patterns and make precise predictions.
Reacting to the Chinese start-up’s technical report on its new AI model, computer scientist Andrej Karpathy – a founding team member at OpenAI – said in a post on social-media platform X: “DeepSeek making it look easy … with an open weights release of a frontier-grade LLM trained on a joke of a budget.”

Open weights refers to releasing only the pretrained parameters, or weights, of an AI model, which allows a third party to use the model for inference and fine-tuning only. The model’s training code, original data set, architecture details and training methodology are not provided.

The chatbot icons of DeepSeek and OpenAI’s ChatGPT are displayed on a smartphone screen. Photo: Shutterstock
The chatbot icons of DeepSeek and OpenAI’s ChatGPT are displayed on a smartphone screen. Photo: Shutterstock
DeepSeek’s development of a powerful LLM – at a fraction of the capital outlay that bigger companies like Meta and OpenAI typically invest – shows how far Chinese AI firms have progressed, despite US sanctions that have blocked their access to advanced semiconductors used for training models.
Leveraging new architecture designed to achieve cost-effective training, DeepSeek required just 2.78 million GPU hours – the total amount of time that a graphics processing unit is used to train an LLM – for its V3 model. The start-up’s training process used Nvidia’s China-tailored H800 GPUs.
That process was substantially less than the 30.8 million GPU hours that Facebook parent Meta needed to train its Llama 3.1 model on Nvidia’s more advanced H100 chips, which are not allowed to be exported to China.

“DeepSeek V3 looks to be a stronger model at only 2.8 million GPU hours,” Karpathy wrote in his X post.

The technical report on V3 posted by DeepSeek showed that its LLM outperformed Meta’s Llama 3.1 and Alibaba Group Holding’s Qwen 2.5 in a series of benchmark tests evaluating an AI system’s capabilities from text understanding and generation, domain expert knowledge, coding and maths problem solving. Alibaba owns the South China Morning Post.
The same benchmark tests showed that V3’s results matched up with those of OpenAI’s GPT-4o and Claude 3.5 Sonnet from Amazon.com-backed Anthropic.
DeepSeek was spun off in July last year by High-Flyer Quant, which uses AI to operate one of the largest quantitative hedge funds in mainland China.

High-Flyer spent 200 million yuan (US$27.4 million) to develop AI cluster Fire Flyer I between 2019 and 2020, and then spent 1 billion yuan more to build Fire-Flyer II, according to the Hangzhou-based company’s website.

In an announcement last April, High-Flyer said DeepSeek’s development goal is to create “AI that will benefit all of humanity”. DeepSeek had earlier launched a series of AI models, which are used by developers to build third-party applications, as well as its own chatbot.

Ben is a Beijing-based technology reporter for the Post focusing on emerging start-ups. He has previously covered Chinese tech for publications including KrAsia and TechNode.


This archive was generated by a fusion of Pipermail (Mailman edition) and MHonArc.