AI-powered chatbot ChatGPT has upped its game in the months since it was launched. As the runaway success develops, three recent key announcements indicate that rapid commercialization of the technology is likely to commence. On Mar.14, OpenAI launched a GPT-4 model which supports multi-modal output and surpasses the GPT-3.5 model ChatGPT in complex reasoning and performance. Upon its release, GPT-4 attracted widespread attention and dissemination. Then, on Mar.16, Baidu released its ERNIE Bot, a chatbot rival to ChatGPT. Prior to this, on Mar.1, OpenAI announced the opening of ChatGPT’s API (Application Programming Interface) and reduced usage costs by 90%.

As AI technology develops, large-scale AI models such as GPT are seeing falling costs. So why are AI models becoming more affordable?

John Zhang, founder of StarBitech, discussed this issue with TechNode in a Q&A format. StarBitech is a digital content asset technology company founded in 2015, jointly invested in by the Shanghai Tree-Graph Blockchain Research Institute and digital display company Fengyuzhu. The company recently received support from Microsoft and OpenAI and will leverage its strengths in Chinese natural language processing and local compliance to develop AIGC (AI-generated content) services in visual content creation and marketing content creation. These services will be supported by GPT, DALL-E, and reinforcement learning, providing AI capabilities geared towards marketing, gaming, animation, culture and tourism, and government.

Why are large AI models like GPT becoming increasingly affordable, and will other mainstream models follow the trend?

The decreasing cost of large AI models is mainly due to the continuous advancement of technology and intensification of competition. According to OpenAI, the cost of using the GPT-3.5-turbo model, which is used by ChatGPT, is only $0.002 for 1000 tokens (approximately 750 words), reducing the cost of using GPT-3.5 by 90%. The “turbo” in the GPT model refers to an optimized version of GPT-3.5 that has faster response times.

The significant reduction in OpenAI’s costs may have come from various optimizations, including adjustments to the model architecture, algorithm efficiency and GPU, at business-level, model-level, quantization, kernel-level, and compiler-level.

Adjustments to the model architecture mainly refer to techniques such as pruning, quantization, and fine-tuning to reduce the size of the model. Those measures help to improve its performance and accuracy while reducing computational and parameter costs, and lowering inference time and cost.

Using efficient algorithms and GPU parallel computing, companies can speed up calculations and improve computing efficiency, gaining algorithm efficiency and GPU optimization in the process. Business-level optimization refers to optimizing the performance and efficiency of the entire system, by using caching and prediction techniques to reduce latency and repeated calls. Model-level optimization can be achieved by streamlining the network structure. Quantization optimization can be achieved by reducing computational and parameter costs by using low-precision calculations. Compiler-level optimization uses efficient compilers to optimize code execution and computing efficiency.

In addition, as more and more companies and research institutions enter the field of large AI models, such as Google’s LaMDA (137B) and PaLM (540B), DeepMind’s Gopher (280B), BigScience’s BLOOM (175B), Meta’s OPT (175B), NVIDIA’s TNLG v2 (530B), and Tsinghua University’s GLM-130B (130B), market competition has become intense, and price competition has also begun. This factor has led to a continuous decrease in the prices of AI models. (The numbers in parentheses represent the parameters of these AI models.)

Whether other mainstream models will follow this trend of decreasing prices or not depends on their scale and performance, as well as their level of demand. If these models are comparable in scale and performance to the GPT-3 model and there is strong market demand, they may also see price reductions. However, if these models are smaller in scale, lower in performance, or demand weakens, prices may not drop significantly. 

In the long run, with the continuous development of technology and the progress of software and hardware technology, the cost of processing large amounts of data and training models will gradually decrease, and the prices of large language models will follow. In addition, as more and more companies and organizations turn to large language models, market competition will push prices down. Of course, the specific extent and timing of such price reductions cannot be predetermined because they depend on the supply relationship and quality of models on the market. Of course, for some high-end models, the price may remain buoyant as high-quality, high-performance, high-value-added models may require more computing resources and professional knowledge.

Did these large AI models become more powerful and intelligent while they become more affordable? Do you agree with OpenAI CEO Sam Altman’s statement about the new AI Moore’s Law, which states that the total amount of AI intelligence doubles every 18 months?

I agree with the new AI Moore’s Law — the decrease in costs and increase in applications will also increase the amount of language data and corpus that can be learned by AI, thereby enhancing its capabilities. Starting in 2022, the global internet environment has entered a new era of large-scale AI intelligence, where there is constant “Turing testing”. Unlike the image-based AI of recent years, language-based AI is more like the human brain, with a broader and deeper range of influences. However, the current level of AI’s capabilities still largely depends on hardware, especially the GPU’s high-performance capabilities, and supply. Therefore, AI’s development is strongly positively correlated with Moore’s law of chips.

What are some key factors driving cost reductions in large AI models?

1. Algorithmic improvements: New technologies are constantly being iterated and developed. These are more efficient at using computational resources and data, which reduces the costs of training and inference.

2. Hardware improvements: With advancements in hardware technology, such as the emergence of specialized chips like GPUs and TPUs, more efficient computing power is available to accelerate training and inference processes, thus lowering costs.

3. Dataset size: This is critical to AI training. Larger and higher quality datasets provide more information, leading to improved accuracy and generalization of models. Additionally, more efficient data processing and storage techniques can help reduce data costs.

4. Reusable pre-trained models: Pre-trained models have become an important way to train large models. Models such as BERT and GPT have already demonstrated their capabilities. These models can serve as base models to train other models, reducing training time and costs.

5. Distributed computing: Breaking down the training process into multiple tasks and running them on multiple computers can greatly shorten training time and costs.

Founder and CEO of StarBitech, member of the Information Technology Innovation Committee of the China Communications Industry Association, and expert consultant for the Shanghai Technology Exchange. He...