Researchers in Berlin are studying how reliably ChatGPT provides scientifically based information on climate change. They discover that the AI usually provides correct answers, but that it should never be trusted blindly. Checking your sources is more important than ever, but far from easy.
ChatGPT and other large language models based on machine learning and large datasets are penetrating almost every sector of society. Companies or researchers who do not enlist their help are increasingly seen as anachronistic. But is the information coming from artificial intelligence reliable enough? Scientists at the Technical University of Berlin tested this using climate change. To do this, they asked ChatGPT questions about the topic and examined the answers for accuracy, relevance, and possible errors and contradictions.
Its impressive capabilities have made ChatGPT a potential source on many different topics, the Berlin team writes in the paper published in “Ecological Economics”. However, not even the developers themselves could explain how a certain response is achieved. This could still be fine for creative tasks like writing a poem. However, this presents a problem when it comes to topics such as the consequences of climate change, where it is important to have accurate, fact-based information.
According to the researchers, it is therefore important to examine the quality of the answers that ChatGPT provides in these thematic areas. Among other things, it is important to separate misinformation in public debate and in the media from scientifically based results.
Hallucinations and useless hypotheses
It’s not easy. To make matters worse, AI can “hallucinate”. That is, ChatGPT makes factual claims that cannot be substantiated by any source. Furthermore, according to the TU team, the language model tends to “make meaningless assumptions instead of rejecting unanswered questions.”
The big danger is that ChatGPT users take incorrect or incorrect answers at face value because they are worded in a plausible and semantically correct way. Previous research has shown that people gave more weight to the AI’s recommendations if they were unfamiliar with the topic under discussion, had used ChatGPT before, and had received accurate recommendations from the model, the researchers write.
The Berlin team is particularly interested in the topic because, with the research project Green Consumption Assistant, they are developing an AI-supported assistant that helps consumers make more sustainable purchasing decisions on the Internet. Previous research has only shed light on ChatGPT’s capabilities, but does not reflect its ability to answer questions about climate change, the researchers write.
To clarify this, they asked ChatGPT a total of 95 questions. They evaluated the answers for accuracy, relevance and consistency. The team checked the quality of the responses using public and reliable sources of information on climate change, such as the current Intergovernmental Panel on Climate Change (IPCC) report.
Mostly high quality answers
The researchers took into account the fact that the language model is constantly developing. Among other things, they tested whether an input (prompt) provided different results at different times. The first round was conducted last February using ChatGPT-3.5, while the second round of questions was conducted in mid-May this year using the later version of the model. Its knowledge base was recently updated and now extends up to April 2023. Previously, the model only contained information up to September 2021.
So the results may be different today. For follow-up studies, researchers suggest multiple rounds of questions at shorter intervals. The researchers see further limitations to their work in the perhaps too small number of experts to evaluate the responses. Furthermore, the questions and their wording were not based on current user data. People today could ask ChatGPT different questions, worded in different ways, which would produce different results.
Research work now published has shown that the quality of model responses is generally high. On average it was rated 8.25 out of 10 points. “We have observed that ChatGPT provides balanced and nuanced arguments and concludes many answers with a comment that encourages critical review to avoid biased answers,” says Maike Gossen of TU Berlin. For example, ChatGPT answered the question “How is marine life affected by climate change and how can negative impacts be reduced?” not only the mentioned reduction in greenhouse gas emissions – but also?
Reduce non-climate impacts of human activities such as overfishing and pollution.
Relevant error rate
The accuracy of more than half of the answers was even rated out of 10. But you shouldn’t rely on the results always being this high. In 6.25% of the answers the precision did not reach more than 3 points and in 10% the relevance did not reach a value higher than 3.
Among the questions that were answered incorrectly, the most common error was caused by hallucinations of facts. For example, ChatGPT’s answer to the question “What percentage of recyclable waste is actually recycled by Germany?” Correct in broad terms, but not in detail. According to the Federal Environment Agency, in 2020 it was 67.4%, while ChatGPT reported 63%.
ChatGPT is creative, but seems believable
In some cases, ChatGPT has generated false or false information, such as fabricated references or fake links, including articles and purported contributions in scientific publications. Additional errors occurred in cases where ChatGPT cited specific and correct scientific or literary sources but drew incorrect conclusions.
The researchers were also able to observe that ChatGPT’s inaccurate responses were so plausibly worded that they were falsely perceived as correct. “Because text generators like ChatGPT are trained to give answers that sound right to people, the confident answer style can trick people into believing the answer is correct,” says Maike Gossen.
The team also found misinformation in social discourse or bias. For example, some of ChatGPT’s incorrect responses reflected misunderstandings about effective action on climate change. This includes overestimating individual behavioral changes, but also individual measures with little impact that slow down structural and collective changes with greater impact. At times, the responses also seemed overly optimistic about technological solutions as a key tool for mitigating climate change.
Valuable but fallible source
Large language models like ChatGPT could be a valuable source of information on climate change, scientists conclude. However, there is a risk that they will spread and promote false information about climate change because it already reflects outdated facts and misunderstandings.
Their short study shows that controlling sources of environmental and climate information is more important than ever. However, in-depth specialist knowledge in the relevant field is often required to recognize incorrect answers, especially because they seem plausible at first glance.