Large AI models keep improving, with applications based on them growing like wildfires (and I use that simile deliberately). Here's a bunch of questions I was asked recently by journalists in this area, with my replies. An NBD article that quotes some of this was published today.
1. What’s your take on the growing popularity of ChatGPT? Why ChatGPT? Why in 2022?
ChatGPT is one of the use cases of OpenAI’s GPT 3.5 language models that came out in early 2022. ChatGPT was released at the end of 2022 and immediately caused a stir because it is very good at chatting compared to most other NLP systems to date.
2. What are the upsides of ChatGPT compared to other content creation tools?
It can automatically create content from a simple request. For example it can explain something complex in simple terms, often writing quite well. This is automatic content creation, compared to current tools that might help at most with spelling or grammar.
3. A report claims that the unicorn in Texas Jasper was winning the AI race, but ChatGPT blew up the whole game. Could you please share with us the impact of ChatGPT on products of the same kind? Will it shrink the development space or lead to M&A or closure of some startups in the AIGC industry?
Jasper is another application that uses the same underlying technology. Jasper is focussed specifically on content creation with lots of templates, and can generate nice text for websites, blogs and many other kinds of media. ChatGPT and Jasper are two of many such applications - there are more and more companies focussing on the use of these pre-trained large language models, so there will be increasing number of applications in the coming months.
4. Meta's chief AI scientist says ChatGPT is “not particularly innovative”, and “nothing revolutionary”, but it seems everyone believes it is the next big thing of the AI industry. In your opinion, will it shake the existing landscape of the AI industry and are Google and Meta lagging far behind in this field?
It is the latest large pretrained model to be released and so it causes a stir because it’s abilities are impressive. Like most of these large AI models, millions of dollars was spent developing and training them, so when something so expensive and impressive gets freely released for use, people get excited. Google and Meta have big enough budgets to catch up, and they are working on their own much bigger models, so it will not be long before they have even more impressive results to show.
5. Since works created by ChatGPT are based on a large amount of data, are there any legal risks, for example, copyright infringement or legal disputes due to errors made, harmful instructions or biased contents. And are the works themselves protected by copyright laws? Could your elaborate on that?
Yes, there are many problems around training of the models. Often huge amounts of data scraped from the Internet are used as training data. It means that when ChatGPT suggests new computer code for you, it may be duplicating the code that someone else wrote - and that code may be protected by copyright. The same could apply to any content produced by these systems. At present there are few countries that permit computer-generated content to be copyrighted itself.
6. What are the downsides or limitations of ChatGPT? The chatbox is said to have limited knowledge of world and events after 2021, will this hinder its further development or commercial use?
ChatGPT is based on a model completed in early 2022, so it cannot be aware of anything after that. New versions are being produced now, which will know more. However, every time one of these massive models is trained it requires huge computational resources and huge costs - this is not really very sustainable in terms of cost or the environment, so it would be better in the future to have more effective, smaller models, each perhaps focussed on a specific kind of data, that can be trained faster. At present the different companies are in an arms race towards bigger and bigger models so this is not moving in a sensible direction.
7. In your opinion, in which area can ChatGPT be maximized and fully leveraged since it can do a lot of things, such as writing papers and creating music?
It’s hard to predict which areas will benefit most as this technology is still very new. However there are some concerns that these models can “hallucinate” results - they provide content that looks correct but is entirely fake. This is extremely dangerous in science and education where we need to ensure the accuracy of everything, so considerable work is still needed to prevent this effect for such applications. Having inaccurate results is also not useful for search engines or journalism, too! Creating strange or unusual results is not a problem in the arts, so for fiction, poetry, music or art there are many immediate applications. Sadly, the easiest use-case is for online content such as advertising or so-called “fake news” or “clickbait” where accuracy is irrelevant. If the technology is misused in this way, then we have the potential to fill the Internet with computer-generated junk - something that Chat GPT could do at unbelievable speeds (and it’s already very bad with human-generated junk).
8. If we look at a bigger picture, what are the trends of the AIGC industry and which niche areas will be highly favored by venture capitals?
The venture capitalists will favour those applications that look like they will make the most money fastest. These applications may not always be the ones best for our societies. My personal preference would be applications that carefully shape the AIGC industry. We do not need more computer-generated online content - we have enough content just generated by people. In my view, applications that focus on this area are short-sighted and harmful. We do need better ways to educate people and to let people discover useful new findings hidden within large amounts of data. Most importantly, today we need better methods to verify the accuracy of content. We need to be able to trace exactly where every claim, image, or line of code originated, so that we can verify if it is true and real. If we cannot do this, the Internet may become nothing more than an ocean of random fictions. That’s a big problem if future AIs are trained using it!
9. Does Google's specific business overlap with ChatGPT?What are the advantages of both of them?
ChatGPT has the potential to change how we search. Instead of us trying many different search terms in the search engine and looking through the list of results (and adverts), a large language model can do things differently. We can ask the model a question in normal sentences and it can summarise the topic in a nicely written piece of text, providing links to the source webpages that it used. The advantage of this approach is that we do not have to read through long lists of results any more. The disadvantage is that ChatGPT and systems like it may not always return correct results. Sometimes they even “hallucinate” results - they make up entirely fake statements that sound correct. So a good search engine using ChatGPT technology would need to be very carefully created so that it only produces results that are entirely correct and accurately summarise real content on websites.
10. Do you think ChatGPT will pose a substantial threat to Google as a whole? Why?
Because search might change in this way, there would no longer be the same opportunity for the search engine to return adverts. The entire business model of Google is based on adverts - if you lose this opportunity, then your business is in trouble. I suspect Google will integrate these technologies into search quite quickly. Somehow they will have to find a way to keep providing adverts at the same time, while also making it clear to the user which results are linked to normal websites and which are from sponsored content. I’m sure they will design a method to do this quickly!