The release of a new AI chatbot by a small Chinese company has been accompanied by fluctuations in stock market values and bold claims, prompting curiosity about what sets it apart.
The cause of this excitement lies in the "large language model" (LLM) fueling the chatbot, which reportedly matches the reasoning abilities of prominent US models like OpenAI's o1 but at a significantly lower cost for training and maintenance.
Whether DeepSeek's models truly offer energy savings in practical applications remains uncertain. It is also unclear whether the affordability and efficiency of the AI could lead to broader adoption and potentially increased energy consumption.
DeepSeek's cost-saving methods are not entirely novel; some are shared with other LLMs. For example, in 2023, Mistral AI introduced the Mixtral 8x7B model, on par with cutting-edge models at the time, also utilizing a "mixture of experts" approach, where the model consists of several smaller models, each specializing in distinct domains.
Furthermore, DeepSeek has disclosed its unsuccessful efforts to enhance LLM reasoning through alternative technical strategies like Monte Carlo Tree Search. These insights will likely fuel research to further advance the already impressive problem-solving capabilities of the model, potentially shaping the next wave of AI models.
In essence, DeepSeek is showcasing that sophisticated AI models can be developed without substantial resources. The trend may lean towards highly capable AI models being created with fewer resources as companies innovate to streamline model training and deployment.
Small companies like DeepSeek are poised to play an increasingly significant role in developing AI solutions that simplify daily life. Underestimating their potential impact would be a oversight.