John Zhang, Author at TechNode

Why are AI models getting cheaper as they improve?

John Zhang — Mon, 27 Mar 2023 08:00:00 +0000

AI-powered chatbot ChatGPT has upped its game in the months since it was launched. As the runaway success develops, three recent key announcements indicate that rapid commercialization of the technology is likely to commence. On Mar.14, OpenAI launched a GPT-4 model which supports multi-modal output and surpasses the GPT-3.5 model ChatGPT in complex reasoning and performance. Upon its release, GPT-4 attracted widespread attention and dissemination. Then, on Mar.16, Baidu released its ERNIE Bot, a chatbot rival to ChatGPT. Prior to this, on Mar.1, OpenAI announced the opening of ChatGPT’s API (Application Programming Interface) and reduced usage costs by 90%.

As AI technology develops, large-scale AI models such as GPT are seeing falling costs. So why are AI models becoming more affordable?

John Zhang, founder of StarBitech, discussed this issue with TechNode in a Q&A format. StarBitech is a digital content asset technology company founded in 2015, jointly invested in by the Shanghai Tree-Graph Blockchain Research Institute and digital display company Fengyuzhu. The company recently received support from Microsoft and OpenAI and will leverage its strengths in Chinese natural language processing and local compliance to develop AIGC (AI-generated content) services in visual content creation and marketing content creation. These services will be supported by GPT, DALL-E, and reinforcement learning, providing AI capabilities geared towards marketing, gaming, animation, culture and tourism, and government.

Why are large AI models like GPT becoming increasingly affordable, and will other mainstream models follow the trend?

The decreasing cost of large AI models is mainly due to the continuous advancement of technology and intensification of competition. According to OpenAI, the cost of using the GPT-3.5-turbo model, which is used by ChatGPT, is only $0.002 for 1000 tokens (approximately 750 words), reducing the cost of using GPT-3.5 by 90%. The “turbo” in the GPT model refers to an optimized version of GPT-3.5 that has faster response times.

The significant reduction in OpenAI’s costs may have come from various optimizations, including adjustments to the model architecture, algorithm efficiency and GPU, at business-level, model-level, quantization, kernel-level, and compiler-level.

Adjustments to the model architecture mainly refer to techniques such as pruning, quantization, and fine-tuning to reduce the size of the model. Those measures help to improve its performance and accuracy while reducing computational and parameter costs, and lowering inference time and cost.

Using efficient algorithms and GPU parallel computing, companies can speed up calculations and improve computing efficiency, gaining algorithm efficiency and GPU optimization in the process. Business-level optimization refers to optimizing the performance and efficiency of the entire system, by using caching and prediction techniques to reduce latency and repeated calls. Model-level optimization can be achieved by streamlining the network structure. Quantization optimization can be achieved by reducing computational and parameter costs by using low-precision calculations. Compiler-level optimization uses efficient compilers to optimize code execution and computing efficiency.

In addition, as more and more companies and research institutions enter the field of large AI models, such as Google’s LaMDA (137B) and PaLM (540B), DeepMind’s Gopher (280B), BigScience’s BLOOM (175B), Meta’s OPT (175B), NVIDIA’s TNLG v2 (530B), and Tsinghua University’s GLM-130B (130B), market competition has become intense, and price competition has also begun. This factor has led to a continuous decrease in the prices of AI models. (The numbers in parentheses represent the parameters of these AI models.)

Whether other mainstream models will follow this trend of decreasing prices or not depends on their scale and performance, as well as their level of demand. If these models are comparable in scale and performance to the GPT-3 model and there is strong market demand, they may also see price reductions. However, if these models are smaller in scale, lower in performance, or demand weakens, prices may not drop significantly.

In the long run, with the continuous development of technology and the progress of software and hardware technology, the cost of processing large amounts of data and training models will gradually decrease, and the prices of large language models will follow. In addition, as more and more companies and organizations turn to large language models, market competition will push prices down. Of course, the specific extent and timing of such price reductions cannot be predetermined because they depend on the supply relationship and quality of models on the market. Of course, for some high-end models, the price may remain buoyant as high-quality, high-performance, high-value-added models may require more computing resources and professional knowledge.

Did these large AI models become more powerful and intelligent while they become more affordable? Do you agree with OpenAI CEO Sam Altman’s statement about the new AI Moore’s Law, which states that the total amount of AI intelligence doubles every 18 months?

I agree with the new AI Moore’s Law — the decrease in costs and increase in applications will also increase the amount of language data and corpus that can be learned by AI, thereby enhancing its capabilities. Starting in 2022, the global internet environment has entered a new era of large-scale AI intelligence, where there is constant “Turing testing”. Unlike the image-based AI of recent years, language-based AI is more like the human brain, with a broader and deeper range of influences. However, the current level of AI’s capabilities still largely depends on hardware, especially the GPU’s high-performance capabilities, and supply. Therefore, AI’s development is strongly positively correlated with Moore’s law of chips.

What are some key factors driving cost reductions in large AI models?

1. Algorithmic improvements: New technologies are constantly being iterated and developed. These are more efficient at using computational resources and data, which reduces the costs of training and inference.

2. Hardware improvements: With advancements in hardware technology, such as the emergence of specialized chips like GPUs and TPUs, more efficient computing power is available to accelerate training and inference processes, thus lowering costs.

3. Dataset size: This is critical to AI training. Larger and higher quality datasets provide more information, leading to improved accuracy and generalization of models. Additionally, more efficient data processing and storage techniques can help reduce data costs.

4. Reusable pre-trained models: Pre-trained models have become an important way to train large models. Models such as BERT and GPT have already demonstrated their capabilities. These models can serve as base models to train other models, reducing training time and costs.

5. Distributed computing: Breaking down the training process into multiple tasks and running them on multiple computers can greatly shorten training time and costs.

Why does China need its own version of ChatGPT?

John Zhang — Wed, 15 Feb 2023 07:38:30 +0000

ChatGPT has become the talk in China’s tech and business communities these days, with major Chinese tech companies racing to prove they have a similar capability or are developing similar services. TechNode talked to John Zhang, CEO of StarBitech, a digital asset startup based in Shanghai and supported by Microsoft for Startups, on why Chinese tech majors are rushing to push out their own versions of ChatGPT. Below is an edited version of the conversation.

1. Why are Chinese tech companies developing their own AI chatbots like ChatGPT? For example, Baidu announced last week that its look-a-like product, ERNIE Bot, or Wenxin Yiyan in Chinese, will be launched in March.

There are three reasons for this. First, from a market perspective, ChatGPT is currently not available to Chinese users. They can’t use it as easily as overseas users. So it’s inevitable that there will be a local ChatGPT-like service to satisfy demand.

Second, from a technological perspective, most large language models (LLMs) currently available on the market, like ChatGPT, are trained on English as the primary language. Their natural language processing (NLP) performance in Chinese is still inferior to that of English. So a model trained with Chinese as the primary language will further improve user effectiveness.

The third reason is data security. AI generates content after going through a large amount of data training. And OpenAI seems to gradually shift from being a non-profit project to a market-oriented one, so there could be uncertainty in the future. Additionally, mainland China requires all data to be locally stored, but OpenAI does not have a team in the country, making it difficult to meet regulatory requirements for local data storage and maintenance.

2. Can China’s AI chatbot compete with ChatGPT and its peers?

In the short term, it’s still difficult for Chinese AI chatbots to compete. OpenAI entered the stage of large-scale GPU cluster training after getting investment from Microsoft. It’s said that OpenAI owns thousands of Nvidia A100 chips, and Microsoft’s billion-dollar investment was mostly in Microsoft’s Azure cloud resources. Microsoft and OpenAI have just begun the next round of financing and collaboration, which means that in three years, they have burned billions of dollars in cloud resources on training. Such a large-scale investment is very rare in China’s internet circle, especially in underlying infrastructure technology. Most of the big investments in China are more focused on the application side.

But in the long run, China’s AI chatbot will become more powerful in the future. The country has superior algorithm engineers, a unified large market, abundant application scenarios, and data sources, and cost advantages over Microsoft Azure compared to Alibaba Cloud and Tencent Cloud.

3. Do you think China is ready in terms of big data and language models?

In terms of big data, China is ahead of the game. It’s highly digitized, so has access to abundant data and a complete industrial chain. However, when it comes to language models, there’s still room for improvement. Currently, models like GPT-3.5 used in chatGPT are large models that require significant investment and are slower in seeing returns, which isn’t an attractive option for many Chinese investors. As a result, only a few major internet companies have participated, with limited investment, slowing China’s progress in language models. But the popularity of ChatGPT offers a good warning for both Chinese investors and internet companies. I expect to see larger investments in the future.

4. How would Chinese AI chatbots differ from others, regarding application and regulations?

Currently, in China, large-scale chatbots are applied in NLP tasks such as machine translation, intelligent customer service, and Q&A platforms. As the development of LLM progresses, China will also popularize AI chatbots based on LLM.

AI chatbots developed in China should be: first, eloquent in Chinese expressions. That is, they need to be able to understand Chinese commands. In addition, for a better communication experience, the chatbot must have knowledge of Chinese culture and history, and communicate in a way that fits the Chinese language style and expression. For example, the same word may have different meanings and emotions in different contexts. Furthermore, the chatbot will provide more personalized services based on Chinese users’ habits and needs, such as different payment methods or ethnic customs unique to China.

Chinese-developed chatbots also need to comply with Chinese laws and regulations, including its Data Security Law, Cybersecurity Law, Personal Information Protection Law, and Administrative Measures for Internet Information Services. These laws aim to protect personal information (prevent its illegal acquisition, use, and dissemination), prevent information leaks and misuse, safeguard network security, prevent network attacks and fraudulent activities, and regulate internet information services. With the increasing popularity of chatbots and the continuous improvement of Chinese laws and policies, it is expected that more comprehensive and targeted regulations will be developed in the future to regulate chatbots.

5. Has your team used GPT (Generative Pretrained Transformer, OpenAI’s language model upon which ChatGPT is developed)? What challenges and limitations do you see with this tool?

Biases. The model is trained on a large amount of text data. If trained data contains biases, the model will also exhibit them. For example, if there is a lack of Chinese language data, particularly in Chinese history, culture, and society, the model may output biased information.
The model lacks a broad, bird-eye view perspective. Although GPT can maintain a sense of coherence in context, it lacks the ability to think more broadly.
Lack of language diversity. GPT is trained mostly based on English material, limiting its compatibility and understanding of other languages.
High computation cost. GPT is a very large neural network model, with parameter counts ranging from millions to tens of millions. The model size ranges from tens of megabytes to several gigabytes, going up to hundreds of gigabytes. Training such a model costs a significant amount of computing resources and time.

6. Has your team used any China-developed AI language models? How do they compare to GPT?

Currently, with self-developed Chinese AI language models:

Some can support different voice responses, which are not currently supported by GPT.
Regarding language support, there is a greater focus on Chinese-language communication, while GPT has a deeper understanding of English.
In the application field, Chinese models are more narrowly focused on dialogue generation. To compare, GPT is a language generation model that can be used in text generation, code writing, and more.
In terms of communication, Chinese models tend to deliver short-sentence communication, while GPT has a strong understanding of long sentences.

7. What are some features or functions that your team would like to achieve using AI language models, but have yet to do?

Current AI-powered chatbots may have achieved impressive results, but there is still room for improvement. One area is the understanding of context and emotions. Chatbots have a limited understanding of things such as one word having different meanings based on the context.

Another issue is that chatbots can lack coherence in continuous communication on the same topic. Moreover, they lack creativity, as they primarily integrate and sort existing knowledge. This means they do not meet the requirement for independent thinking and creating new ideas.

8. Could you give us an introduction to your company?

StarBitech is a digital content asset technology company founded in 2015. It is jointly invested in by the Shanghai Tree-Graph Blockchain Research Institute and Fengyuzhu and is located at the Microsoft Accelerator in the Caohejing Development Zone in Shanghai. The company focuses on providing individuals and businesses with algorithm-driven digital asset creation and publishing services. StarBitech has worked with companies such as China Merchants Bank, Huawei, LVMH, Shanghai Public Security Jing’an Branch, and the Shanghai Technology Exchange.

The company has recently received support from Microsoft and OpenAI and will leverage its strengths in Chinese natural language processing and local compliance to develop AIGC (AI-generated content) services in fields such as chatbots, visual content creation, and marketing content creation. These services will be supported by GPT, DALL-E, and reinforcement learning, providing AI capabilities for industries such as marketing, gaming, animation, culture and tourism, and government.