Jump to content

Wikipedia:Large language models

Page semi-protected
From Wikipedia, the free encyclopedia

The use of large language models (LLMs; they are the "engine" behind AI chatbots, such as ChatGPT) on Wikipedia presents systemic risks to maintaning the content standards required by the core content policies, specifically through the introduction of "hallucinated" statements, unsourced or unverifiable content, and algorithmic bias. Asking an LLM to "write a Wikipedia article" can lead to output that is an outright fabrication, complete with fictitious references. It might lack neutrality and libel living people. In addition, such content can be inconsistent with Wikipedia's copyright policy.

For this reason, using LLMs to generate or rewrite article content is prohibited, save for basic copyediting one's own work and translation.

Editors who are not fully aware of these risks and not able to overcome the limitations of these tools should not edit with their assistance, even outside of the scope of the ban. LLMs should not be used for tasks with which the editor does not have substantial familiarity. Their outputs should be rigorously scrutinized for compliance with all applicable policies. As with all edits, an editor is fully responsible for their LLM-assisted edits.

Wikipedia is not a testing ground. Using LLMs to write one's talk page comments or edit summaries in a non-transparent way is strongly discouraged, and obviously-generated comments may be hidden. LLMs used to generate or modify text should be mentioned in the edit summary, even if their terms of service do not require it.

Risks and relevant policies

Original research and "hallucinations"

Wikipedia articles must not contain original research – i.e. facts, allegations, and ideas for which no reliable, published sources exist. This includes any analysis or synthesis of published material that serves to reach or imply a conclusion not stated by the sources. To demonstrate that you are not adding original research, you must be able to cite reliable, published sources. They should be directly related to the topic of the article and directly support the material being presented.

LLMs are pattern completion programs: They generate text by outputting the words most likely to come after the previous ones. They learn these patterns from their training data, which includes a wide variety of content from the Internet and elsewhere, including works of fiction, low-effort forum posts, unstructured and low-quality content for search engine optimization (SEO), and so on. Because of this, LLMs will sometimes "draw conclusions" which, even if they seem superficially familiar, are not present in any single reliable source. They can also comply with prompts with absurd premises, like "The following is an article about the benefits of eating crushed glass". Finally, LLMs can make things up, which is a statistically inevitable byproduct of their design, called "hallucination". All of this is, in practical terms, equivalent to original research, or worse, outright fabrication.

As LLMs often output accurate statements, and since their outputs are typically plausible-sounding and given with an air of confidence, any time that they deliver a useful-seeming result, people may have difficulty detecting the above problems. An average user who believes that they are in possession of a useful tool, who maybe did a spot check for accuracy and "didn't see any problems", is biased to accept the output as provided; but it is highly likely that there are problems. Even if 90% of the content is okay and 10% is false, that is a huge problem in an encyclopedia. LLMs' outputs become worse when they are asked questions that are complicated, about obscure subjects, or told to do tasks to which they are not suited (e.g. tasks which require extensive knowledge or analysis). An example of this tendency is shown below:

Prompt:

Greater-toothed pademelons, Thylogale gigadonti, can be seen in wildlife preserves in Queensland

Note that "greater-toothed pademelon" is a fictitious species.

Response:

Yes, greater-toothed pademelons are found in wildlife preserves in Queensland. They are shy and nocturnal, so it may be difficult to see them during the day. However, you may be able to see their tracks or hear them rustling in the bushes at night

This complete statement is false and is based on the LLM's assumption that this was a real species. In order to provide the most plausible answer, it extracted general information about a different kind of pademelon (shown in underline) from an Australian Museum article. This serves to demonstrate that LLM's can offer statements with a confident tone even when that information is factually incorrect or unverifiable.

(LLM used: Gemini)

Unsourced or unverifiable content

Readers must be able to check that any of the information within Wikipedia articles is not just made up. This means all material must be attributable to reliable, published sources. Additionally, quotations and any material challenged or likely to be challenged must be supported by inline citations.

LLMs do not follow Wikipedia's policies on verifiability and reliable sourcing. LLMs sometimes exclude citations altogether or cite sources that don't meet Wikipedia's reliability standards (including citing Wikipedia as a source). In some cases, they hallucinate citations of non-existent references by making up titles, authors, and URLs. LLM output can be also influenced by bogus or specifically made-up content by third parties.[3]

LLM-hallucinated content, in addition to being original research as explained above, also breaks the verifiability policy, as it can't be verified because it is made up: there are no references to find.

Algorithmic bias and non-neutral point of view

Articles must not take sides, but should explain the sides, fairly and without editorial bias. This applies to both what you say and how you say it.

LLMs can produce content that is neutral-seeming in tone, but not necessarily in substance. This concern is especially salient for biographies of living persons.

If you want to import text that you have found elsewhere or that you have co-authored with others (including LLMs), you can only do so if it is available under terms that are compatible with the CC BY-SA license.
Examples of copyright violations by LLMs at 2:00
Slides for examples of copyright violations by LLMs

LLMs can generate material that violates copyright.[a] Generated text may include verbatim snippets from non-free content or be a derivative work. In addition, using LLMs to summarize copyrighted content (like news articles) may produce excessively close paraphrases.

The copyright status of LLMs trained on copyrighted material is not yet fully understood. Their output may not be compatible with the CC BY-SA license and the GNU license used for text published on Wikipedia.

Usage

Wikipedia relies on volunteer efforts to review new content for compliance with our core content policies. This is often time consuming. The informal social contract on Wikipedia is that editors will put significant effort into their contributions, so that other editors do not need to "clean up after them". Editors should ensure that their LLM-assisted edits are a net positive to the encyclopedia, and do not increase the maintenance burden on other volunteers.

Specific competence is required

LLMs are assistive tools, and cannot replace human judgment. Careful judgment is needed to determine whether such tools fit a given purpose. Editors using LLMs are expected to familiarize themselves with a given LLM's inherent limitations and then must overcome these limitations, to ensure that their edits comply with relevant guidelines and policies. To this end, prior to using an LLM, editors should have gained substantial experience doing the same or a more advanced task without LLM assistance.

There is a community-wide consensus that LLMs are generally unfit for generating or rewriting article content—it is believed that, in this area, their limitations cannot be adequately overcome. Even so, there are many other types of edits one can make, even within articles, such as engaging in anti-vandalism. Several editors have grouped up at Wikipedia:WikiProject AI Tools to explore implementations of constructive uses of LLMs and other AI technologies as editing tools. They aim to integrate AI into Wikipedia's workflow through scripts and other tools, with the end goal of having AI better support human editors in repetitive jobs, where current tools (some of which had already relied on certain forms of AI; not LLMs however) might not be up to the task.

Some editors are competent at making unassisted edits but repeatedly make inappropriate LLM-assisted edits despite a sincere effort to contribute. Such editors are assumed to lack competence in this specific sense. They may be unaware of the risks and inherent limitations or be aware but not be able to overcome them to ensure policy-compliance. In such a case, an editor may be banned from aiding themselves with such tools (i.e., restricted to only making unassisted edits). This is a specific type of limited ban. Alternatively, or in addition, they may be partially blocked from a certain namespace or namespaces.

Writing articles

Pasting raw large language models' outputs directly into the editing window to create a new article or add substantial new prose to existing articles generally leads to poor results. Consequently, the guideline on writing articles using LLMs establishes a near-blanket ban, stating that the use of LLMs to generate or rewrite article content is prohibited. While the guideline permits responsible LLM use for basic copyediting and translation (within the constraints of the separate guideline on LLM-assisted translation), these applications hardly count as exceptions to the ban.

Specifically, basic copyediting—as described in the corresponding how-to guide—should be seen as inherently distinct from "rewriting"; it is restricted to correcting typography, spelling, punctuation, capitalization, contractions, alongside straightforward formatting fixes. It also encompasses the most minimal and uncontentious stylistic corrections, such as removing redundant language or splitting overly long sentences, on a localized, incidental, basis. Any intervention requiring fundamental rephrasing exceeds the scope of "basic" copyediting and could mean "intermediate" or "advanced" copyediting, which would then fall under "rewriting", and that is where LLM use remains entirely prohibited. Even basic copyedits need human review. There might be a very good reason why a sentence is longer than usual, why capitalization is the way it is, or why a particular way to name something (that the LLM doesn't like) is used in the article, and the LLM might not know those reasons. Irresponsible LLM use can still do a lot of damage, even if the user believes they are doing "basic copyediting". Every change to an article must comply with all applicable policies and guidelines.

The ban does not cover indirect use of LLMs, which means that they can be used adjacent to editing. For example, they can help editors spot structural problems in long articles (such as: the same information being repeated in different sections, the infobox not agreeing with the prose, etc.), and to generate ideas for new or existing articles. If using an LLM as a writing advisor, i.e. asking for an outline, how to improve a paragraph, a critique of existing content, etc., editors should remain aware that the information it gives is unreliable. Due diligence and common sense are required when choosing whether to incorporate any suggestions. The editor should become familiar with the sourcing landscape for the topic in question and then carefully evaluate the suggestions for their neutrality in and verifiability.

LLM outputs should not be added directly into drafts either. Drafts are works in progress and their initial versions often fall short of the standard required for articles, but enabling editors to develop article content by starting from an unaltered LLM-outputted initial version is not one of the purposes of draft space or user space.

Communicating

Editors should not use LLMs to write comments generatively. Communication is at the root of Wikipedia's decision-making process and it is presumed that editors contributing to the English-language Wikipedia possess the ability to come up with their own ideas. Comments that do not represent an actual person's thoughts are not useful in discussions, and comments that are obviously generated by an LLM or similar AI technology may be struck or collapsed. Repeating such misuse forms a pattern of disruptive editing, and may lead to a block or ban.

This does not apply to using LLMs to refine the expression of one's authentic ideas: for instance, a non-native English speaker might permissibly use an LLM to check their grammar or to translate words they are unfamiliar with, but even in this case, be aware that LLMs may make mistakes or change the intended meaning of the comment. For proofreading, it is recommended to use a word processor (see comparison) or dedicated grammar checker (see category) instead of an AI chatbot. Editors with limited English proficiency are advised to use a machine translation tool (see comparison), instead of an AI chatbot, when needed to translate their comments to English. They should be aware, however, that machine translation tools like DeepL, Google Translate, etc. are also liable to make errors, sometimes serious ones, especially in low-resource languages.[4]

Other policy considerations

LLMs should not be used for unapproved bot-like editing or anything approaching bot-like editing. Using LLMs to assist high-speed editing in article space has a high chance of failing the standards of responsible use due to the difficulty in rigorously scrutinizing content for compliance with all applicable policies.

Wikipedia is not a testing ground for LLM development, for example, by running experiments or trials on Wikipedia for this sole purpose. Edits to Wikipedia are made to advance the encyclopedia, not a technology. This is not meant to prohibit editors from responsibly experimenting with LLMs in their userspace for the purposes of improving Wikipedia.

Sources with LLM-generated text

LLM-created works are not reliable sources. Unless their outputs were published by reliable outlets with rigorous oversight, and unless it can be verified that the content were evaluated for accuracy by the publisher, they should not be cited. After the AI boom, misuse of LLMs began to negatively affect journalism, causing a worsening of quality of some media sources.

See also

Demonstrations

Policy discussions

Notes

  1. ^ This also applies to cases in which the AI model is in a jurisdiction where works generated solely by AI are not copyrightable, although with very low probability.

References

  1. ^ Smith, Adam (25 January 2023). "What Is ChatGPT? And Will It Steal Our Jobs?". Context. Thomson Reuters Foundation. Retrieved 27 January 2023.
  2. ^ "When AI Gets It Wrong: Addressing AI Hallucinations and Bias". MIT Sloan Teaching & Learning Technologies. Retrieved 2025-05-25.
  3. ^ Duris, Daniel "Year 2026: The Year of LLM Bombing", Basta digital blog, Retrieved on 18 January 2026.
  4. ^ Naveen, Palanichamy; Trojovský, Pavel (2024). "Overview and challenges of machine translation for contextually appropriate translations". iScience. Retrieved 2025-12-11.