Jump to content

Wikisource:Scan Lab

From Wikisource
Scan Lab

Shortcut:
WS:LAB

A central resource for assistance with creation, downloading, uploading, processing and other operations on scans of texts.

Times have changed, but it still can be hard to put 600 pages in the right order!
Instructions

If you need help with a scan, add your request in the relevant section below as a new sub-section. If you can, include all the details someone will need to work on the request without further questioning. You can use {{ping project|Scan Lab}} to send an immediate notification to all subscribed Scan Lab members. Once you have been answered, ping only that user when you reply with {{re|Their username}} (do not ping the whole project on every comment).

If your request has been completed, you should acknowledge that your issue is resolved and close the section with {{section resolved|1=~~~~}}.

Participants

[edit]

Add your name to Module:Mass notification/groups/Scan Lab to be notified via {{ping project|Scan Lab}}. Also add your name below with details of any particular tasks you can help with.

Scan Lab volunteers
Participant Can help with Instructions
Xover
  • General scan tasks: scraping/download, scan repair, manipulating DjVu files (but not PDF)
Alien333
  • General scan tasks: scraping/download (including from Hathi), scan repair, manipulating DJVU & PDF
Koavf
  • General scan tasks: scraping/download (including from Hathi), scan repair, manipulating PDF, cropping images, converting images to transparent backgrounds, uploads of large files to c:

Requests for downloading scans

[edit]
Instructions

If you would like scans that already exist online to be transferred to Wikisource, leave a message here. This includes batch transfers from the Internet or Hathi Trust for multi-volume works. Please include necessary bibliographic information so that scans can be uploaded to Commons with proper information and license templates. Author, country, and date of first publication. A suggested file name on Commons can also be helpful.

Jane Austen Juvenilia Volume 2 and 3

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) The scans of the manuscripts of Austen's Juvenelia are available on here and here. They're both in the PD, but I have absolutely no clue as how to download them. The images are higher resolution than the ones on the BL website, but they're in the zoomify flash format. Languageseeker (talk) 02:58, 2 February 2022 (UTC)[reply]

Penny Cyclopedia volumes 1 to 27

[edit]

The IA scans currently linked on the page are unusable (blank pages where there should be content), so I checked HathiTrust ([here's the search I used]). There are four complete sets of scans attached to [this record] (ignoring the supplements for now), but I'm not sure at the moment which ones would be the best to import. Arcorann (talk) 02:14, 24 December 2023 (UTC)[reply]

I've found pretty good scans of volumes 4 and 24 which are already on Commons, and I've added the links to the Penny Cyclopedia page. I don't have a Hathi Trust account, so I can't help you there. Ciridae (talk) 05:21, 27 December 2023 (UTC)[reply]

Journal of the Optical Society of America

[edit]

Volumes 1-40 of this fairly esteemed journal are out of copyright. Vol. 30, issue 12 and Vol. 33, issue 7 are here already, but there are *a lot* that are not here: https://archive.org/details/pub_optical-society-of-america-journal If you upload them, I can tidy the pile up at commons and get them ready to go here. For copyright concerns: https://onlinebooks.library.upenn.edu/webbin/cinfo/jopticalsocamerica --RaboKarbakian (talk) 20:43, 8 February 2024 (UTC)[reply]

No longer available for post 1929 issues. ShakespeareFan00 (talk) 06:50, 28 May 2025 (UTC)[reply]
It's very frustrating as Hathi is aware of the post 1929 volumes but in line with it's policy hasn't apparently done a review to confirme their non-renewal, Google books claimed it couldn't open access to the post 1929 volumes. These were previously available at IA, but are currently restricted suggesting faulty metadata, and the response I got back when I queried why the non-renewed volumes were restricted was not promising. Someone will have to find the non-renewed issues and volumes directly, and make direct uploads to Wikimedia Commons or English Wikisource directly.. It's dissapointing that non of the major scan archive sites seem to have the time to persue actual curation practices (:rant:) ShakespeareFan00 (talk) 07:17, 2 June 2025 (UTC)[reply]

The non-renewed issues are available again- And I have a request for a specfic issue - https://archive.org/details/sim_optical-society-of-america-journal_1944-07_34_7 - Which covers the Ostwald Color Systems complete with XYZ data. :). ShakespeareFan00 (talk) 21:54, 7 October 2025 (UTC)[reply]

File:Journal of the Optical Society of America, Volume 34, Issue 7, 1944-07.djvuCalendulaAsteraceae (talkcontribs) 22:11, 7 October 2025 (UTC)[reply]

Varney the Vampire

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) The (copyrighted) 1970 edition of this influential work has been scanned here; could someone please download the main part of the book, without the new introduction &c.? TE(æ)A,ea. (talk) 04:41, 24 March 2025 (UTC)[reply]

To make sure I got it: it would be page 1, and pages 54-336. Is that right? — Alien  3
3 3
07:33, 24 March 2025 (UTC)[reply]
  • Alien: Well, 14 rather than 1, but yes. I would appreciate if you clip the reprint publisher’s label from /14. In addition, the full work (one volume originally) was published as three volumes in this edition, so a full volume would need to take /15–/314 from the second volume and /15–/303 of the third volume. TE(æ)A,ea. (talk) 15:15, 24 March 2025 (UTC)[reply]
    Got it (didn't check that the page it loaded on was the first one). Regarding the publisher's label, will do. — Alien  3
    3 3
    15:48, 24 March 2025 (UTC)[reply]
    Done at c:File:Varney the Vampire.pdf. Again, haven't done djvu conversion or OCR, feel free to ask.
    Aaaand it of course fell prey to the PDF-specific 0x0 bug. If you prefer to stick to a PDF, it'll be a few days' wait.
    Off-topic comment: Extracting the pages from a 1970 reprint sounds like a complicated way of getting a 1847 text. Makes one wonder how it happens that publishers can get their hands on a scan or an original, but there isn't one on the whole internet.Alien  3
    3 3
    16:53, 24 March 2025 (UTC)[reply]
    • Alien: So long as the scan works (eventually) and the OCR looks fine, I don’t particularly care. The publishers either bought a copy at auction (for a one-off) or got in contact with a library (like The Orphan of the Rhine below, which is part of a ~50-reel series of microfilm). Unfortunately, schmuck-who-wants-to-make-a-digital-copy isn’t a good enough reason to get most of these sent out of Special Collections, although you might be able to get something if go to one of these libraries in person. It would be nice to work on digitizing reels of microfilm; I’ve seen a number of valuable series of microfilm reels, which are all public-domain contents but not digitized. TE(æ)A,ea. (talk) 17:18, 24 March 2025 (UTC)[reply]

Student Life, vol. 17

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) I am requesting to dowload the volume 17 of the periodical Student Life from https://collections.carli.illinois.edu/digital/collection/ben_listy/id/5974/rec/10 . The problem is that the individual pages are available only in the .jpg format, so it is quite challenging to make a single .pdf file from them. OCR layer is not needed.

It would be great to have also the following volumes (renamed for Czecho-Slovak Student Life), i. e. vol. 19 (https://collections.carli.illinois.edu/digital/collection/ben_listy/id/3191/rec/3 ), vol. 20 (https://collections.carli.illinois.edu/digital/collection/ben_listy/id/751/rec/4 ) and vol. 21 (https://collections.carli.illinois.edu/digital/collection/ben_listy/id/5188/rec/7 ), but these can wait. Vol. 18 has already been uploaded by Alien333.

Thanks! --Jan Kameníček (talk) 14:57, 17 October 2025 (UTC)[reply]

Downloading and preparing this now (as a djvu). MarkLSteadman (talk) 22:12, 17 October 2025 (UTC)[reply]
@Jan.Kamenicek here you go, I will have the others ready in due time. MarkLSteadman (talk) 23:14, 17 October 2025 (UTC)[reply]
Great, thanks! --Jan Kameníček (talk) 10:04, 18 October 2025 (UTC)[reply]
@Jan.Kamenicek and here is vol. 19 , for vols. 20 and 21 since they are after 1930, what copyright license should I use. MarkLSteadman (talk) 14:53, 19 October 2025 (UTC)[reply]
Thanks very much! The other volumes should be available under PD-US-no notice (apologies I did not realize to mention it before). The fact they are in PD in the US is confirmed also on the Carli Digital Collections pages, see e.g. here, section Object Description, line Rights. --Jan Kameníček (talk) 15:01, 19 October 2025 (UTC)[reply]
FYI I see copyright notices: e.g. Janosik, a drama in five acts by Sovoboda-Goldman, translated by George Gallik, Copyright 1930 by Student Life, Vol. 20, May 1930. pg.29. MarkLSteadman (talk) 21:53, 19 October 2025 (UTC)[reply]
Good point, thanks for the heads up. So now I have gone through both volumes page by page looking for such notices and found them only with the individual parts of the above mentioned serialized drama Janosik in the volume 20; I did not find any in the volume 21. Thus I think that volume 21 can be uploaded with the PD-US-no notice tag. At the same time the copyright of the drama does not seem to have been renewed, so the volume 20 seems possible to be uploaded with the PD-US-no renewal tag (or alternatively, we can wait until January, when it would expire anyway).
However, I have noticed one more thing: the front covers of all the issues of volume 20 are present at the very end of the published volume for some reason; would it be possible to reorder them? For example this front cover of the November issue and probably also this advertisement page (from the backside of the front cover?) should come before the title page of the November issue, etc. --Jan Kameníček (talk) 19:47, 20 October 2025 (UTC)[reply]

The desolate star, and other poems (1929)

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) This work was published in NZ in 1929 and is in both the US and NZ public domains (Author:Robin Hyde died in 1939, so her works became public domain in Aotearoa in 1990). Would someone be able to extract the page scans from the NZ National Library (https://paperspast.natlib.govt.nz/books/ALMA1929-9917503813502836-The-desolate-star--and-other-poe) and put them in a PDF or DJVU on commons for processing here?--IdiotSavant (talk) 08:51, 23 October 2025 (UTC)[reply]

The NZ National Library provides a PDF (https://paperspast.natlib.govt.nz/imageserver/books/P29pZD1BTE1BMTkyOS05OTE3NTAzODEzNTAyODM2JmdldHBkZj10cnVl), accessed by the "Save A Copy Link". Is there an issue with downloading and uploading that? If so, I might look at this after manually fixing and addressing the student life above to manually download the images and convert. MarkLSteadman (talk) 22:49, 11 November 2025 (UTC)[reply]
That pdf is very pixelated. I've been trying to fetch the raw HQ images, but they've got some sort of blocking on that prevents scraping :(. — Alien  3
3 3
08:55, 13 November 2025 (UTC)[reply]

Stories by English Authors: Germany, Etc.

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) Stories by English Authors: Germany, Etc. was published by Charles Scribner's Sons (New York) in 1896; page images are available at Internet Archive identifier: storiesbyenglish00newyrich. Authors: Beatrice Harraden, Henrietta Eliza Vaughan Stannard, Marie Louise de la Ramée, Robert Louis Stevenson, William Black (1841-1898). This is clearly OK for Commons since the longest-living known author died in 1936. Much of this content is entirely new on Wikisource; w:A Dog of Flanders by "Ouida"/de la Ramée is especially notable as a popular children's novelette/short story and the source for several films and Japanese anime. --~2025-27371-40 (talk) 13:40, 15 November 2025 (UTC)[reply]

There you go: Index:Stories by English Authors, Germany, Etc.djvuAlien  3
3 3
18:37, 15 November 2025 (UTC)[reply]
Thanks, but this file seems to have no OCR layer at all. Can this be addressed easily enough or is it an issue with the original IA text? --~2025-27371-40 (talk) 20:42, 15 November 2025 (UTC)[reply]
Gah, picked the wrong file when reconverting for higher quality. Be right back. — Alien  3
3 3
21:55, 15 November 2025 (UTC)[reply]
@~2025-27371-40: Text layer added. (Do pings to temporary accounts work?) — Alien  3
3 3
22:20, 15 November 2025 (UTC)[reply]
Thanks for that work! Now this is a bit of a long shot, but I've found out that a different scan of this work, while quite substandard wrt. proofreading (at least in my opinion) has preserved IA link the original photogravure of Beatrice Harraden that's missing from this instance of the work. Can this be spliced into the existing djvu file (with white balance adjusted appropriately) to replace current blank page 2 (6th page in the file)? (Thus keeping raw page numbers for the file entirely unchanged.) Or should we just upload the image to Commons and perhaps deal with the divergence wrt. the existing scan in a more ad-hoc way? (Also, for some reason, the caption for the photogravure is on a separate leaf that appears almost translucent on the scan. That should probably also be fixed to just have the photogravure and caption together on the page.) The scan is also up on Commons as commons:File:Stories by English authors (IA storiesbyenglish04newyiala).pdf but I'm not sure how the quality of that compares with what's up on IA) --~2025-27371-40 (talk) 23:06, 15 November 2025 (UTC)[reply]

William Cowper Brann, Complete Works in 12 volumes (1919)

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) On IA (should be in order by volume number): (external scans (multiple parts): 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) --~2025-27371-40 (talk) 21:11, 1 January 2026 (UTC)[reply]

Done/doing now (transcription volumes: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)Justin (koavf)TCM 05:59, 25 March 2026 (UTC)[reply]

Arthur Brisbane, Editorials from the Hearst newspapers

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) On IA: (external scan) (1906) --~2025-27371-40 (talk) 21:11, 1 January 2026 (UTC)[reply]

Done This was already uploaded years ago: File:Editorials from the Hearst newspapers (IA editorialshearst00brisrich).pdf. —Justin (koavf)TCM 06:16, 25 March 2026 (UTC)[reply]

Harold J. Laski (d. 1950), The Socialist Tradition in the French Revolution

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) On HathiTrust (external scan) (1930). Requested for Monthly Challenge. MarkLSteadman (talk) 21:10, 4 January 2026 (UTC)[reply]

Done File:Harold J. Laski - The Socialist Tradition in the French Revolution.pdf. —Justin (koavf)TCM 04:49, 25 March 2026 (UTC)[reply]

Finding scans

[edit]
Instructions

Requests for locating scans for existing works at Wikisource, or works you wish to add yourself but cannot find scans for. For general text requests, see Wikisource:Requested texts.

The Criterion Volume 2 and 3

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) Would it be possible to locate Volumes 2 and 3 of The Criterion? I'm especially trying to complete The Woman Who Rode Away that began in Volume 3. Languageseeker (talk) 18:36, 23 December 2022 (UTC)[reply]

Scan repair

[edit]
Instructions

Request repair work on existing scans here.

When requesting page insertion, rearrangement or deletion, always include the page numbers (as marked on the pages) as well as the position of the page within the scan file. This makes it much easier for the repairing user to locate the defect in the file and fix it, as well as allowing a double-check against mistakes.

Please do not use this page to request repairs on works that you don’t really care about: the backlog at Category:Index - File to fix is a known backlog. If you want to help with those, you can add {{missing pages}} to those indexes if they do not already have it, along with details of the missing pages.

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) Pages 482 and 483 of this volume were missing in the original scan; pageholders have been introduced, so all that is necessary is the replacement. That replacement can come from Index:Alumnioxonienses02univ.pdf, which exists solely for the purpose of supplying that gap. So, the missing pages from the PDF should be added in over the pageholders from the DJVU; the transclusion fixed; and the PDF deleted. TE(æ)A,ea. (talk) 23:46, 3 December 2023 (UTC)[reply]

Not sure I follow, pages 482 and 483 (djvu/99 and djvu/100) seem to be legit images and the 2 missing pages should be inserted between djvu/100 and djvu/101. Or ...? Mpaa (talk) 18:09, 4 December 2023 (UTC)[reply]

This file claims to be Volume 135 and is residing in the list of volumes as Volume 135 but it is actually Volume 136, probably (but not verified) a duplicate of Index:The Atlantic Monthly Volume 135.djvu. Can the file be replaced with https://babel.hathitrust.org/cgi/pt?id=uc1.32106019602660 ?--RaboKarbakian (talk) 15:47, 29 March 2024 (UTC)[reply]

Also, while you are at it:
  1. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030146099 Vol. 139
  2. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030146081 Vol. 140
  3. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030145968 Vol. 141
  4. https://babel.hathitrust.org/cgi/pt?id=mdp.39015030145745 Vol. 142
--RaboKarbakian (talk)


File was renamed at Commons, and needs re-aligning.

https://en.wikisource.org/w/index.php?search=intitle%3A%2FA+dictionary+of+the+language+of+Mota.djvu%2F&title=Special:Search&profile=advanced&fulltext=1&ns0=1&ns100=1&ns102=1&ns104=1&ns106=1&ns114=1 ShakespeareFan00 (talk) 17:40, 1 May 2024 (UTC)[reply]


Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) The current scans for The Story of My Experiments with Truth/Volume 1 is missing pages (without placeholders) and duplicates others. I've uploaded a corrected file here: File:Gandhi, 1927, The Story of My Experiments With Truth, Vol 1.pdf. I need assistance in moving the current project over to the new scans while keeping the already proofread pages. Thanks! — Qx3Jw (talk) 14:40, 30 October 2024 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) This book is missing the following pages: Part I: 1, 2, 37, 38, 43, 44, 105, 106, 191, 192, 258, 259, 343, 344. Part II: 67, 68, 74, 75, 77, 78, 105, 106, 115, 116, 133, 134, 135, 136, 193, 194, 195, 196, 213, 214, 237, 238, 257, 258, 265, 266. As well as 4 unnumbered pages immediately preceding page 1 of Part I. A different scan of this edition seems complete so you can get replacements here: https://archive.org/details/bim_eighteenth-century_the-compleat-geographer_1723. Just adding placeholders would also be helpful so proofreading can be started. Treebitt (talk) 08:49, 28 February 2025 (UTC)[reply]

The Orphan of the Rhine

[edit]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) I have just recently obtained the second volume, and had already obtained the other three volumes, of this Gothic novel (the last of the “horrids”). However, three of the volumes are scanned two original pages to the PDF page; would anyone be interested in dealing with it? I can upload the volumes if that is the case. TE(æ)A,ea. (talk) 20:20, 23 March 2025 (UTC)[reply]

I could, assuming that the widths of the halves are fixed (meaning that the first X columns of pixels are always the first page, and the Y others are always the second page). — Alien  3
3 3
21:03, 23 March 2025 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) the above PDF is missing page 50 (in between 62 and 63 on the scan). Is it possible to insert a placeholder page for it? I haven't been able to find a correct scan. Thanks, Cremastra (talk) 16:37, 24 August 2025 (UTC)[reply]

Done. Chrisguise (talk) 11:40, 26 August 2025 (UTC)[reply]
Thank you! Cremastra (talk) 14:46, 26 August 2025 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) 98-99 nominal are missing, and instead replaced with nominal 100-101 in the source file (flip back and forth and you'll see). There's a duplicate from Google Books which has the relevant pages. Can they be swapped in?--IdiotSavant (talk) 23:21, 26 October 2025 (UTC)[reply]

Also 116-117 nominal are missing, and have been replaced with duplicates of 118 and 119. Again its an error int he original scan, but the google books version has them.--IdiotSavant (talk) 00:56, 27 October 2025 (UTC)[reply]

Page iv (/11) is blank in the scan, but should have text; page 222 (/267) should have a table. Both pages need to be obtained from another scan. Arcorann (talk) 06:10, 5 November 2025 (UTC)[reply]

@Arcorann: — Fixed. — Hrishikes (talk) 04:21, 31 January 2026 (UTC)[reply]

1769 King James Version Bible

[edit]

The 1769 KJV Bible is split into two volumes, Volume I (Index) and Volume II (Index). Per this discussion post, Volume II uses a 1772 edition of the book for scan-backing, instead of 1769. Volume I uses a 1769 scan, but has some issues with missing text, and doesn't seem to include the 15 books listed after Jeremiah, which are Lamentations, Ezekiel, Daniel, Hosea, Joel, Amos, Obadiah, Jonah, Micah, Nahum, Habakkuk, Zephaniah, Haggai, Zechariah, and Malachi.

A scan of the full 1769 edition can be found on Internet Archive, which does include the missing books. If possible, I'd like this scan to replace both scans currently used for the work. I'll fix any discrepancies between the text and the scan in my next proofreading pass.

SpikeShroom (talkcontribs) 00:01, 6 November 2025 (UTC)[reply]

I've been asked to clarify a bit more on this request: I'd like the scan of the full 1769 edition of the KJV (linked above) to be made into its own Index, then the entire transcription of the Volume I Index (about 725 pages) to match the first part of the new Index. I can handle changing the transclusion links and other minor discrepancies once this is done.
SpikeShroom (talkcontribs) 21:02, 7 December 2025 (UTC)[reply]

Index created at Index:KJV 1769 Oxford Edition.djvu, but it's printed smaller than the djvus we had, so I think that probably the text matching will have to be done by hand. — Alien  3
3 3
09:41, 20 December 2025 (UTC)[reply]

The Color Charts can be reconstructed from [[1]] and we already have the 1943 Munsell renotation data to convert/interpolate to XYZ for use in CSS. :).. (Assuming someone wants to do the Lua coding to handle the conversions.) ShakespeareFan00 (talk) 00:20, 3 January 2026 (UTC)[reply]

This scan from IA is missing pages 25-28 (and has only two slots currently for those four pages. So two pages will need to be inserted and the text layer shifted from that point forward.

The scan may be repaired using pages from this copy at Hathi which is also available at Google books. The plan is to include the book as part of the February MC. --EncycloPetey (talk) 02:00, 17 January 2026 (UTC)[reply]

Pardon me for being stupid, but I think the solution here is to just overwrite the existing upload with the scan from Hathi Trust, correct? That fixes everything without the more complex kludge of inserting pages into an existing PDF. —Justin (koavf)TCM 17:18, 17 January 2026 (UTC)[reply]
No, because (1) the scan quality is lower, and (2) overwriting one Commons upload with an upload from a different source would be inappropriate. In a pinch we could upload the Hathi file to a different name, and use it, but the Hathi file is actually a low-quality Google books scan. The IA scan is of significantly higher quality. The underlying problem is that the book scanned for IA was missing these pages through damage to the physical book. --EncycloPetey (talk) 17:43, 17 January 2026 (UTC)[reply]
Sources at Commons are overwritten all the time. What policy is there at Commons about not overwriting a file from another source?
So you're proposing that the solution is to insert pages from the lower-quality scan into the higher-quality one? Also, to my eye, the Hathi Trust scan is substantially higher-quality: look at the photo of Truth in the front matter. A higher file size is not identical to higher quality. —Justin (koavf)TCM 21:37, 17 January 2026 (UTC)[reply]
The relevant Commons policy is Commons:Overwriting existing files.
I said nothing about file size, and have not looked at file size. I'm not sure why you bring that up. File size does not always correlate with readability; compression artefacts and scanning process are far more likely to affect legibility of the scan. Have you looked at all pages on the Google scan? The Google scan was made in such a way that the text on the reverse of each page bleeds through interfering with text legibility. It is also one of the Michigan University scans where the scanner's thumb was in view on every page, and so the thumb was digitally removed after the fact, and this digital doctoring leaves legibility problems as well. --EncycloPetey (talk) 02:59, 18 January 2026 (UTC)[reply]
That page doesn't say that a file can't be overwritten from a different source.
I did not check all the pages no.
"So you're proposing that the solution is to insert pages from the lower-quality scan into the higher-quality one?" —Justin (koavf)TCM 09:45, 18 January 2026 (UTC)[reply]
It does say that, yes. If you're unsure why a file of a scan of a book from the Boston Public library stored at IA with item number "narrativeofsojou1850gilb" should not be overwritten by a scan of a book in the University of Michigan library at Google books with a different item number, instead of uploading that scan to a suitable filename for the different scan, I'm sure someone at Commons can explain it to you better than I can do.
Because those pages are missing from the high quality scan, yes. We repair scans all the time here that way.
It sounds as though you have questions, and do not intend to provide any help repairing the scan. So I suggest if you still have questions the about ethics of overwriting Commons files, then that conversation should continue in a suitable forum at Commons rather than in the Wikisource Scan Lab. --EncycloPetey (talk) 11:42, 18 January 2026 (UTC)[reply]
It seems like you are referencing c:Commons:Overwriting_existing_files#Exceptions which says to not do what you are asking me to do. It is the only time that page mentions the word "source". It seems like it is better to upload the corrected version as a new file, which I can do and you can use that if you'd like. —Justin (koavf)TCM 11:57, 18 January 2026 (UTC)[reply]
Scans are not "historical documents"; repairing scans in this way happens all the time. You can see several examples listed above on this page. We make the repairs and then document the repair that was made. I do not understand why you would be willing to overwrite a Commons file with a completely different scan of a different object, yet unwilling to make the sort of repairs frequently requested and made here.
But you are not required to act on any request here. Your participation is voluntary, and if you are unwilling to make the repairs, you scan certainly allow someone else to do so. If you believe the other scan lab members have acted inappropriately by making such scan repairs, then you should raise that issue with them. --EncycloPetey (talk) 12:41, 18 January 2026 (UTC)[reply]
I don't recall refusing anything nor do I recall writing that anyone had done something improper. You have a nice day now. —Justin (koavf)TCM 02:11, 19 January 2026 (UTC)[reply]

┌──────────────────────────┘
@Koavf: These kinds of repairs (that is, replacing some pages in a scan by pages from another scan of the same printed edition) are standard practice. We do not have much use for having the original file around when a) we don't use it and b) it's on IA. (Also, to both of you, maybe keep the tone down? Getting offended doesn't help with anything.Alien  3
3 3
13:52, 18 January 2026 (UTC)[reply]

Notifying all members of Scan Lab (more info · opt out): (User:Alien333, User:Inductiveload, User:Koavf, User:Mpaa, User:Xover) Also pinging @廣九直通車: who might want to know about this. This version of the file appears to be missing the pages labelled 2 and 3. That scan was downloaded from https://www.legislation.gov.uk/ukpga/1979/2/pdfs/ukpga_19790002_en.pdf which now links to a different scan of the same act containing pages 2 and 3 (labeled 6 and 7). Could the pages labelled 6 and 7 be inserted into our file in place of the pages labelled 2 and 3? ToxicPea (talk) 13:36, 23 February 2026 (UTC)[reply]

@ToxicPea: Too bad that I didn't noticed the problem (and too bad that the file was broken since 2010). We are lucky that the Statutes at Large (the one on legislation.gov.uk) covered the two pages and exactly correspond to the two missing pages (i.e. we don't even need to care about formatting). I think it's totally fine to insert the two pages into the index.廣九直通車 (talk) 13:53, 23 February 2026 (UTC)[reply]

Put this in the monthly challenge, and then discovered problems: pages 172 and 173 are missing. I have not yet found an easy source, but may be able to get someone to look at the physical copy in Te Puna Mātauranga o Aotearoa to at least get a photo (would that work?) Pages 212 and 213 (PDF pages 222 and 223; marked as problematic) are duplicated. Can they be removed?--IdiotSavant (talk) 20:33, 1 March 2026 (UTC)[reply]

User:DrThneed has had the missing pages scanned. They are from a different edition from the same publisher (the 1929 edition), but it has the same pagination as the 1930 "cheaper edition". Can we use them, and if so, where is the best place to upload them so someone can splice them in? IdiotSavant (talk) 02:20, 7 March 2026 (UTC)[reply]
The missing pages have now been uploaded as File:Braisher Dear Aquaintance pp169-174.pdf. This is scanned from the 1929 edition, but it has the same pagination as the 1930 cheaper edition. Is someone able to splice them in, and remove the duplicate 212/213? IdiotSavant (talk) 01:30, 21 March 2026 (UTC)[reply]
Done Pages inserted at c:File:Dearacquantance0000rose.pdf, so the local File:Braisher Dear Aquaintance pp169-174.pdf can be deleted and the local Index:Dearacquantance0000rose.pdf needs to be fixed. —Justin (koavf)TCM 01:46, 21 March 2026 (UTC)[reply]
Thanks. I've fixed the index. Now I guess I need to finish the proofing. IdiotSavant (talk) 05:24, 21 March 2026 (UTC)[reply]

Pages 299 and 300 are missing from the original scan. Both Hathi Trust and Google Books appear to have full versions, but they appear to be copyright resstricted (despite being PD-US and PD-UK, the country of publication).--IdiotSavant (talk) 20:54, 1 March 2026 (UTC)[reply]

I'm not seeing any copyright restrictions on my end. ToxicPea (talk) 21:43, 1 March 2026 (UTC)[reply]
It depends on locations; typically, stuff being hidden outside the US, which can be circumvented by VPN (looks like this is one such case: from France it's hidden and with a VPN set to the US it isn't). — Alien  3
3 3
21:46, 1 March 2026 (UTC)[reply]
User:DrThneed has scanned the missing pages; what's the best place to upload them? IdiotSavant (talk) 22:00, 1 March 2026 (UTC)[reply]
The missing pages have now been uploaded as File:PassionatePuritan pages299-300.pdf. Can someone rotate and split them and splice them in please? IdiotSavant (talk) 05:29, 27 March 2026 (UTC)[reply]

Out by one on the OCR. Can the file be rebuilt? ShakespeareFan00 (talk) 20:29, 6 March 2026 (UTC)[reply]

There's a tool that can do this (see link to toolforge on User:Alien333/realignocr). I've only used it once so far, but it did the job. unsigned comment by Chrisguise (talk) .
That tool crashed out with an Internal Server error :(. It needs someone else to do the realignments ShakespeareFan00 (talk) 09:07, 7 March 2026 (UTC)[reply]
It is pretty fragile. I should get around to upgrade it but oh so many things to do. — Alien  3
3 3
11:41, 7 March 2026 (UTC)[reply]