Jump to content

Wikipedia:Request a query

From Wikipedia, the free encyclopedia

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of pages that meet certain criteria, consider using PetScan (user manual) or the default search. PetScan can generate lists of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

The database replicas do not have access to page content, so requests which require checking wikitext cannot be answered with database queries. In particular, there is no way to query for template parameters or anything related to references. However, someone may be able to assist by querying in another way (for example, checking for external links rather than references) or suggest an alternative tool.

Articles created by new users

[edit]

Hi, would it be possible to obtain over a sample period of 1 Jan 2026 to 28 Feb 2026:

  • The number of articles created in mainspace (excluding redirects)
  • The  % of articles created in mainspace (excluding redirects) by users who had less than 100 edits
  • The % of articles created in mainspace (excluding redirects) by users who had less than 100 edits
  • The % of articles created in mainspace (excluding redirects) during the sample period t of article creations by editors with less than 100 edits that are tagged as a mobile edit on their first revision

Thanks. Kudpung กุดผึ้ง (talk) 14:54, 9 March 2026 (UTC)[reply]

Finding out how many edits a user had at some time in the past is hard. In bulk, it's effectively impossible. I can look for new articles created by users who now have under 100 edits, but that's a very different set even over a two month period. And I suspect this is going to be dwarfed by the number of new pages created in draft anyway. —Cryptic 00:17, 11 March 2026 (UTC)[reply]
Would it help to reduce the sample period to 30 days, from the last 30 days for example? For this exercise I'm not interested in drafts, only new pages that are created directly in mainspace or moved there from a user draft or sandbox. Kudpung กุดผึ้ง (talk) 11:12, 11 March 2026 (UTC)[reply]
Still isn't going to be what you asked for, but will be a bit closer, at least.
You're asking for pages currently in mainspace, created in the last 30 days, now? That's different, and somewhat harder, then created directly in mainspace (and maybe deleted or somewhere else now). Original request - or at least how I read it - was basically searching Special:Log/create; this is the equivalent of looking at each recently-edited mainspace page's history, so wouldn't be able to find anything that wouldn't. —Cryptic 19:54, 11 March 2026 (UTC)[reply]
I didn't think it would make it harder, quite to the contrary in fact, but I do not understand the technology involved in such datamining. To reinforce an upcoming presentation I still need some basic stats. How about simply: The % of articles created in mainspace (excluding redirects) in the last 30 days by editors who still have less than 100 edits Kudpung กุดผึ้ง (talk) 00:42, 12 March 2026 (UTC)[reply]
116381 creations; of those, 40379 nonredirects; of those, 3066 (7.59%) nonredirects by users with fewer than 100 edits and 505 (1.25%) nonredirects by mobile edits by users with fewer than 100 edits. Some other combinations of those criteria in the query. —Cryptic 03:59, 12 March 2026 (UTC)[reply]

Requesting a list of all in articlespace that are the same except for Capitalization (with no redirects)

[edit]

This query would return something like

@Naraht Do you happen to know of an example of such a page?Polygnotus (talk) 21:35, 22 March 2026 (UTC)[reply]
There's some examples at WP:DIFFCAPS. —Cryptic 23:12, 22 March 2026 (UTC)[reply]
@Sohom Datta Do you happen to know if this is even possible? Its utf8mb3_general_ci (ci meaning case insensitive), right? Polygnotus (talk) 21:40, 22 March 2026 (UTC)[reply]
PolygnotusNo, I don't I thought about it in terms of whether a redirect by Alphabetization would *always* be allowed. I'm pretty sure it is possible for user names, since that is something that isn't listed as a listing of usernames.
Sohom Datta As far as I've seen, the only limitation is the first letter needs to be capitalized. Certainly allowed in subpages. I can create User:Naraht/AAA and User:Naraht/AaA separately.Naraht (talk) 21:57, 22 March 2026 (UTC)[reply]
No, page titles are stored as uncollated binary data. (And TIL that "utf8" on MariaDB means utf8mb3 by default, not -mb4, which breaks for titles like 𝔹.) —Cryptic 07:57, 23 March 2026 (UTC)[reply]
This is wholly impractical by query. It's almost doable by downloading the list of mainspace titles from the monthly database dump and doing a case-insensitive sort on it, but excluding redirects from that is harder. I'll see if I can put something together. It won't be soon. —Cryptic 22:43, 22 March 2026 (UTC)[reply]
quarry:query/103518. Data as of about 13:57 5 March 2026. There were 6421 pairs, 43 sets of three, and just one with four (Kink/KinK/KiNK/KINK). Sure hope you had a better reason to ask than just curiosity. —Cryptic 07:40, 23 March 2026 (UTC)[reply]

Prevalence of infoboxes in articles

[edit]

Goal: I want to add a factual statement to MOS:INFOBOXUSE that says something like "As of 2026, n% of non-redirect, non-disambiguation, non-list articles contain an infobox."

This requires knowing:

  • the number of pages in the mainspace that aren't redirects (easy), dab pages (easy enough), or lists (difficult?)
  • the number of articles that have an infobox.

Module:Infobox is transcluded into nearly 4.4 million mainspace pages. Module:Autotaxobox is transcluded into 0.6 million. I don't know how many other independent/base templates exist.

Anything within about 1% or so is good enough. How much of this do you think you could answer? WhatamIdoing (talk) 17:31, 24 March 2026 (UTC)[reply]

The hard (and non-technical) part is finding all/most of the other independent/base templates. Given a list, I can certainly give you a total excluding redirects, disambigs, and "List[s] of..."-titled pages, and not double-counting pages using more than one infobox type. —Cryptic 17:45, 24 March 2026 (UTC)[reply]
I've asked at Wikipedia:Village pump (technical)#Finding all the infoboxes. Hopefully, someone will have tried to find them all before, and can easily share the information. Alternatively, they may be able to reassure me that these are the only two that are significant. WhatamIdoing (talk) 17:49, 24 March 2026 (UTC)[reply]
Is MediaWiki talk:Common.css/to do#Infobox what you need, or do you think it would be better to work from these search results? WhatamIdoing (talk) 22:54, 24 March 2026 (UTC)[reply]
Neither. The search has both false positives and negatives and isn't easily transcribable; Izno's notes to himself aren't legible (or, since it includes search results, transcribable; and maybe not comprehensive either); and my own scheme of walking the Category:Infobox templates tree won't work because, ha ha ha, people categorizing things have no discipline at all. As usual. It's overwhelmed with things like Infobox templates > Infobox templates by country > Place infobox templates by country > South Korea place infobox templates > South Korea subdivision templates > South Korea city templates > Seoul templates > Category:Seoul Metropolitan Subway templates > Template:Incheon Subway and Infobox templates > Science and nature infobox templates > Chemistry infobox templates > Chembox templates > Chembox documentation > Wikipedia chemical data validation > Chembox and Drugbox articles with a broken CheMoBot template > Template:Stdinchicite. Not to mention some 53000 articles. Sorting any of these methods out would take many hours of work that I just don't have. —Cryptic 13:32, 1 April 2026 (UTC)[reply]

RFC dataset

[edit]

The bot that manages RFC templates now has a public database, s51043__rfcbot_p. This has enabled queries such as quarry:query/100675. (People interpreting the results should remember that not all entries represent a separate RFC.) WhatamIdoing (talk) 00:31, 30 March 2026 (UTC)[reply]