AI "hallucination"
The official term in the field of AI is "hallucination." This refers to the fact that it sometimes "makes stuff up." This is because these systems are probabilistic, not deterministic.
Which models are less prone to this?
GPT-4 (the more capable model behind ChatGPT Plus and Bing Chat) has improved and is less prone to hallucination. According to OpenAI, it's "40% more likely to produce factual responses than GPT-3.5 on our internal evaluations." Verification of the output is still needed.
ChatGPT makes up fictional sources
One area where ChatGPT usually gives fictional answers is when asked to create a list of sources. See the Twitter thread, "Why does chatGPT make up fake academic papers?" for a useful explanation of why this happens.
Scholarly sources as grounding
There are also systems that combine language models with scholarly sources. For example:
- Elicit
A research assistant using language models like GPT-3 to automate parts of researchers’ workflows. Currently, the main workflow in Elicit is Literature Review. If you ask a question, Elicit will show relevant papers and summaries of key information about those papers in an easy-to-use table. Learn more in Elicit's FAQ.
- Consensus
A search engine that uses AI to search for and surface claims made in peer-reviewed research papers. Ask a plain English research question, and get word-for-word quotes from research papers related to your question. The source material used in Consensus comes from the Semantic Scholar database, which includes over 200M papers across all domains of science.