User talk:GreenC bot/Archive 6
You can stop the bot by pushing the stop button. The bot sees and immediately stops running. Unless it is an emergency please consider reporting problems first to my talk page. |
whitehouse.gov
[edit]Hi there! Edits such as this one, this one, and this one are adding |work=whitehouse.gov
to references that already have a |work=
parameter (or one of its aliases), which adds the article to the maintenance Category:CS1 errors: redundant parameter. Could you please fix the references your bot broke and tweak your bot so it doesn't make similar edits in the future? Thanks! GoingBatty (talk) 01:18, 7 March 2021 (UTC)
- The bot job is done you don't need to hit the stop button. Was not aware newspaper and work are alias. I'll check into fixing them. -- 02:07, 7 March 2021 (UTC)
- The fix is done for 30 cites in 26 articles. -- GreenC 22:20, 7 March 2021 (UTC)
Archived url removed
[edit]Hi! With this edit your bot removed a working archived url. Is that a one-off glitch, or is there a mistake in your programming? Can we be sure that isn't going to happen again? Justlettersandnumbers (talk) 21:16, 13 March 2021 (UTC)
- The source link was changed and made live again, the archive URL is not longer needed. -- GreenC 21:47, 13 March 2021 (UTC)
Nobots issue
[edit]Is it possible to make Shadowbot respect these kinds of nobots notices? Unless Spinningspark is OK with removing the local files there is little reason to constantly flag them as shadowed, especially since the enwiki file and Commons file only differ in the version. Jo-Jo Eumerus (talk) 16:07, 29 March 2021 (UTC)
- Hi Jo-Jo Eumerus It is programmed to look for
|deny=shadows
since GreenC bot has so many different tasks each has its own name. I just added "GreenC bot" in addition, since that is a valid name to block also - all tasks by the bot. If you still see it editing let me know thanks. -- GreenC
Did GreenC bot join the oversight committee?
[edit]There sure are a lot of anonymous users on the Wikipedia:List of Wikipedians by article count/1–1000 now! No rush, but I figured I'd point it out. Thanks and take care. –Novem Linguae (talk) 12:16, 16 April 2021 (UTC)
Task 9
[edit]You can probably stop task 9 now. No-one is likely to use the old magic word after all this time. MichaelMaggs (talk) 11:14, 29 April 2021 (UTC)
- @MichaelMaggs:, Done. It's still on Toolforge if could be adapted to a similar purpose in the future. -- GreenC 16:33, 29 April 2021 (UTC)
A kitten for you!
[edit]Thanks for making all these archive links for us <3
~the.one.and.the.only~ (talk) 02:41, 8 May 2021 (UTC)
Obviously wrong archive link addition
[edit]Hi, with this edit on it.Wikipedia, your bot added a link to a book on archive.org which was completely unrelated with the book the source was citing. Please, could you explain which is the criterion you use to decide whether a link should be added or not? Could you also help me and the it.Wikipedia community seek and correct other false addition like that one?--Ferdi2005 (talk) 14:30, 21 May 2021 (UTC)
Stargely formated date by bot
[edit]@GreenC: In the change at https://en.wikipedia.org/w/index.php?title=My_Little_Pony:_The_Movie_(2017_film)&curid=51068079&diff=1028920093&oldid=1026037502 , the bot used "date=1.506825145863e+15 October 1, 2017" as the date of a link that was changed to the internet wayback machine. This number seems to by the number of microseconds since Janury first, 1970 (Unix epoch), which is likely used by the bot for internal calculations, but it should not give out that number in the final edit. You might want to look into this if it happend in more occasions and if it can be fixed. Gial Ackbar (talk) 20:25, 16 June 2021 (UTC)
- OH, big trouble. It happened in 655 citations in 118 articles. Debugging code wasn't removed during production run. Well, now to fix them. Thank you for the report. -- GreenC 20:57, 16 June 2021 (UTC)
- All fixed. -- GreenC 23:54, 16 June 2021 (UTC)
bot forgot to add |archive-date=
[edit]In this edit, bot forgot to add |archive-date=
. Really?
—Trappist the monk (talk) 12:05, 26 June 2021 (UTC)
- The archive.today API returned unexpected results the bot was not prepared for. Fixed. -- GreenC 00:48, 27 June 2021 (UTC)
- Apparently still broken; see this edit
- —Trappist the monk (talk) 13:11, 30 June 2021 (UTC)
- @Trappist the monk: Same issue with the fragment not returned by the API confusing the bot, which has been fixed. But unable to duplicate. Suspect the article was processed before the fix. There can be days between processing and upload (on edit conflict it reprocesses in a new batch). The NASA batch took a long time. WaybackMedic is ironically programmed to add missing
|archive-date=
in case you think it makes sense to run it on a tracking category backlog. -- GreenC 15:56, 30 June 2021 (UTC)
- @Trappist the monk: Same issue with the fragment not returned by the API confusing the bot, which has been fixed. But unable to duplicate. Suspect the article was processed before the fix. There can be days between processing and upload (on edit conflict it reprocesses in a new batch). The NASA batch took a long time. WaybackMedic is ironically programmed to add missing
This edit also missing |archive-date=
but not archive.today and no fragment.
—Trappist the monk (talk) 17:35, 5 July 2021 (UTC)
- Two bugs fixed that were previously hidden by error correction but surfaced here due to combined cite complications (and your report!). Thanks. -- GreenC 19:06, 5 July 2021 (UTC)
Hello GreenC bot, doesn't seem to be rescued looking at the resulte. Can you fix it? Thank you for your time. Lotje (talk) 14:01, 27 June 2021 (UTC)
- Fixed in article and code, thank you for the report. -- GreenC 18:04, 27 June 2021 (UTC)
bot used shortened archive url when creating |url=
[edit]See this edit where the bot created: | url = http://zfA/image | archive-url = https://archive.today/4szfA/image
and omitted |archive-date=
.
It's good that the bot is removing unnecessary |archive-date=
parameters.
—Trappist the monk (talk) 14:30, 28 June 2021 (UTC)
- Fixed. Four problems: url has a /image - url should be archive.today - url should be in the archive-url field - url should be long format .. the /image tripped it up when combo with the others. A lot of unnecessary
|archive-date=
parameters, some predating the existence of the web. -- GreenC 20:44, 28 June 2021 (UTC)
I don't understand what the Bot is trying to accomplish here
[edit]I don't understand what the Bot is trying to accomplish with this edit. The link is not dead. Hawkeye7 (discuss) 00:34, 30 June 2021 (UTC)
And I don't understand why the washingtonpost.com is consistently timing out with a header check, and only for this URL:
Starting headers (1) for https://www.washingtonpost.com/archive/politics/1979/09/04/nasa-weighs-deferring-1982-mission-to-jupiter/bfe8bb4a-20fe-41f5-af14-d6b1c003b470/ Ending headers (0 / -1) Headers time out
It appears to be an issue with the agent string (!). This works:
wget -SO- -q --retry-connrefused --waitretry=5 --read-timeout=2 --timeout=5 --tries=2 --no-dns-cache --no-check-certificate --user-agent="WaybackMedic" 'https://www.washingtonpost.com/archive/politics/1979/09/04/nasa%2Dweighs%2Ddeferring%2D1982%2Dmission%2Dto%2Djupiter/bfe8bb4a%2D20fe%2D41f5%2Daf14%2Dd6b1c003b470/' 2>&1 >/dev/null
This does not:
wget -SO- -q --retry-connrefused --waitretry=5 --read-timeout=2 --timeout=5 --tries=2 --no-dns-cache --no-check-certificate --user-agent="WaybackMedic bot" 'https://www.washingtonpost.com/archive/politics/1979/09/04/nasa%2Dweighs%2Ddeferring%2D1982%2Dmission%2Dto%2Djupiter/bfe8bb4a%2D20fe%2D41f5%2Daf14%2Dd6b1c003b470/' 2>&1 >/dev/null
The difference being the word "bot". But it's not only "bot", other words can cause it to fail. Normally, an agent string exists for human consumption and would not interfere with anything. At a loss. Anyway, this is not happening to all washingtonpost.com URLs only this one that I know of. Look under the surface, the Internet is pretty weird. I'm going to change the agent string to something less likely to trigger aggressive word filters. Zulu might work: --user-agent="In Zulu: Irobhothi leWayback Medic ngumsebenzisi weGreenC ngesiNgisi iWikipedia"
, conveying enough contact information. -- GreenC 03:33, 30 June 2021 (UTC)
probably GIGO but ...
[edit]This edit, bot removed |archive-date=
and |url-status=
from this:
{{cite news|title=Czy ateiści są dyskryminowani? |url=http://www.przeglad-tygodnik.pl/index.php?site=artykul&id=12668 |author=Radosław Tyrała |archive-date=1 January 1970 |archive-url=https://www.webcitation.org/3IK?url=http://www.pewinternet.org/css/layoutstyles.css |access-date=2007-07-04 |language=pl |url-status=dead }}
to make this:
{{cite news|title=Czy ateiści są dyskryminowani? |url=http://www.przeglad-tygodnik.pl/index.php?site=artykul&id=12668 |author=Radosław Tyrała |archive-url=https://www.webcitation.org/3IK?url=http://www.pewinternet.org/css/layoutstyles.css |access-date=2007-07-04 |language=pl }}
The value in |archive-url=
is completely bogus, but even so, should the bot have removed |archive-date=
? Generally ok to remove |url-status=dead
as redundant.
History:
- Editor Dwanyewest added the original
|archiwum=http://www.webcitation.org/3IK
with this edit – presumably a copy/paste from pl:Ateizm w Polsce where that parameter was added by an ip editor with this edit. - IAbot changed it to
|archive-url=http://www.webcitation.org/3IK?url=http://www.pewinternet.org/css/layoutstyles.css
with this edit – of course in the best of all possible worlds, that edit would have been inspected by an interested editor and the bogus archive url replaced with something meaningful ...
—Trappist the monk (talk) 16:54, 5 July 2021 (UTC)
- Added a check for 3IK as the WebCite ID (Unix time 0 in base62) and will delete the three fields when found. It could try to fix it with some difficulty but it is such a rare one-off bug nowhere else that I can find this should be enough for now. -- GreenC 20:07, 5 July 2021 (UTC)
highbeam
[edit]At this edit bot added |archive-date=
and |url-status=dead
(not really necessary because cs1|2 presumes dead
when |archive-url=
has a value) for one Highbeam citation yet did not do the same for the other.
But, because Highbeam is dead long since, and because archives of Highbeam pages show only the first paragraph or so and then prompt the reader to subscribe or login (to no benefit to the reader), perhaps the |archive-url=
, |archive-date=
, and |url-status=
parameters should be removed and {{dead link|date=...|bot=medic}}{{cbignore|bot=medic}}
added.
—Trappist the monk (talk) 13:44, 6 July 2021 (UTC)
- Looks like it might be the same bug above with combined cites, it was processed before the fix was made. HighBeam we have so many I can't delete without consensus. There is an argument they have some value, short of replacing with something better. Sometimes the cited fact is in the snippit which can't be known without a manual check; it might have useful metadata; helps verify the source exist(ed). -- GreenC 15:15, 6 July 2021 (UTC)
Minor vandalism by GreenC bot
[edit]In this edit, GreenC bot did some useful expansions of archive URLs (I checked a few, which seem OK), but also removed perfectly valid content from example dummy citations safely enclosed in {{void | ...}}. This fix by me shows my revert of the content that was, it seems to me, invalidly removed. GreenC bot seems to have done this on other articles too. Boud (talk) 00:01, 14 July 2021 (UTC)
- I'd never heard of
{{void}}
before and in 6+ years of work with over a million edits no one has ever brought it to my attention until now. It will need to a-void the void. BTW accusing someone of vandalism is not cool, WP:VANDALISM has a specific meaning, the bot does not do vandalism, which is intentionally causing damage, vandalism is not why I am here. See WP:AGF before assuming otherwise unless there is reason not to AGF. -- GreenC 01:03, 14 July 2021 (UTC)
Weight? (Circumcision controversies)
[edit]The organization you removed is the only one of its kind in Germany, intaktiv e.V. regularly organizes protests, gives interviews on the topic [1] and some public figures are advocates for this organization [2]. I would say this organization has the same weight as the rest of those anti-circumcision organizations. And some diversity would do the table good. So far there are only organizations from English speaking countries. So it would be nice if the entry would be restored. I can see that it’s a bot. And still it has removed this organisation on the ground of “Weight”. Which is strange if you look at the other groups in there they are all small and hardly know outside of the the circles of people how care about that topic.
References
Like I said before, you got the wrong user. Look at the history tab. HERE. Notice who reverted you: User:Alexbrn. They included my name in the edit summary to mean "reverting to the last version by GreenC bot" but neglected to say those words precisely causing some confusion. You can see the transaction of the edits, GreenC bot has nothing to do with it. -- GreenC 00:40, 19 July 2021 (UTC)
Oh, sorry your right.
- no problem easy to get confused by Wikipedia at first, good luck. -- GreenC 21:34, 21 July 2021 (UTC)
archive.is > archive.today ...???
[edit]Hello:
In a 2021-07-21T01:37:18 GreenC bot edit to the Wikipedia article on Daniel Ellsberg, "archive.is" was replaced by "archive.today". I checked and found the following:
- The "archive.is" link still seemed to give valid content.
- "archive.today" was auto-forwarded to "archive.ph" with content that looked to be equivalent to "archive.is".
- Neither the archive.is nor archive.today (or archive.ph) link supported the claims in the Ellsberg article. I found the original link in archive.org with content that seemed to support the claims in the article.
I do not know what if anything you think it might be appropriate to do about this. Thanks for your support of Wikipedia. DavidMCEddy (talk) 06:23, 21 July 2021 (UTC)
- Archive.today is a front-end re-router to one of the other domains currently active. It's the front door to access one of the 7 other domains. It will work using .is or .ph but maybe not in the future. We had a problem in 2019 where it stopped working for one of the 7 domains (for about a month). The owner of archive.today requested we use it so they are flexible on domain availability. The content is the same regardless of which domain. -- GreenC 13:51, 21 July 2021 (UTC)
July 2021
[edit]Hello, I'm Picard's Facepalm. Your recent edit(s) to the page Miami Vice appear to have added incorrect information, so they have been reverted for now. If you believe the information was correct, please cite a reliable source or discuss your change on the article's talk page. If you would like to experiment, please use your sandbox. If you think I made a mistake, or if you have any questions, you can leave me a message on my talk page. Thank you. Picard's Facepalm (talk) 16:42, 21 July 2021 (UTC)
- You are incorrect. The URL is dead. -- GreenC 19:35, 21 July 2021 (UTC)
- Then why am I looking at the web page right now, again, and from a totally different computer? I can send you a screenshot it you like. Perhaps the site is blocking your IP because your bot keeps banging against it? Picard's Facepalm (talk) 00:17, 22 July 2021 (UTC)
- The bot is correct, the link is dead and time.com is returning 404 for this url (look at the html source where time.com uses 404 in several places):
- The archive snapshot url isn't a whole lot better:
- The archive snapshot url was added to the article with [1] using an automated process. A better archive snapshot is:
- —Trappist the monk (talk) 00:31, 22 July 2021 (UTC)
- https://web.archive.org/web/20130822235037/http://www.time.com/time/magazine/article/0,9171,959822,00.html is working just fine for me. Not sure what to tell you guys. Picard's Facepalm (talk) 00:39, 22 July 2021 (UTC)
|url=
is tied to|url-status=
.. your looking at|archive-url=
which is always live, that's why we have them. -- GreenC 01:09, 22 July 2021 (UTC)- (edit conflict)
- Then you don't understand how
|url-status=live
works (see template documentation). Compare these:|url-status=live
:- Zoglin, Richard (1985-09-16). "Cool Cops, Hot Show". Time Magazine. Time Inc. Archived from the original on August 22, 2013. Retrieved 2007-11-02.
|url-status=dead
:- Zoglin, Richard (1985-09-16). "Cool Cops, Hot Show". Time Magazine. Time Inc. Archived from the original on August 22, 2013. Retrieved 2007-11-02.
- And, that archive snapshot that you say
is working just fine
, while it does 'work', doesn't work well because it is just a teaser requiring login to read the rest of that article. That is why I suggested the better archive snapshot url ... Also, these templates should be rewritten as{{cite magazine}}
because Time is not a scholarly or academic journal. - —Trappist the monk (talk) 01:17, 22 July 2021 (UTC)
- https://web.archive.org/web/20130822235037/http://www.time.com/time/magazine/article/0,9171,959822,00.html is working just fine for me. Not sure what to tell you guys. Picard's Facepalm (talk) 00:39, 22 July 2021 (UTC)
- Then why am I looking at the web page right now, again, and from a totally different computer? I can send you a screenshot it you like. Perhaps the site is blocking your IP because your bot keeps banging against it? Picard's Facepalm (talk) 00:17, 22 July 2021 (UTC)
was this a good fix?
[edit]Was this a good fix? cs1|2 expects 14-digit timestamps so converting this:
{{cite web |url=https://web.archive.org/web/20170614/http://www.intellivisionlives.com/bluesky/people/askhal/askhal.html |title=Ask Hal: Frequently Asked Questions to the Blue Sky Rangers |publisher=Intellivision Productions |access-date=November 3, 2008}}
- "Ask Hal: Frequently Asked Questions to the Blue Sky Rangers". Intellivision Productions. Retrieved November 3, 2008.
to this:
{{cite web |url=http://www.intellivisionlives.com/bluesky/people/askhal/askhal.html |archive-url=https://web.archive.org/web/20170614/http://www.intellivisionlives.com/bluesky/people/askhal/askhal.html |url-status=dead |archive-date=2017-06-14 |title=Ask Hal: Frequently Asked Questions to the Blue Sky Rangers |publisher=Intellivision Productions |access-date=November 3, 2008}}
- "Ask Hal: Frequently Asked Questions to the Blue Sky Rangers". Intellivision Productions. Retrieved November 3, 2008.
{{cite web}}
:|archive-url=
is malformed: timestamp (help)CS1 maint: url-status (link)
- "Ask Hal: Frequently Asked Questions to the Blue Sky Rangers". Intellivision Productions. Retrieved November 3, 2008.
creates broken cs1|2 templates. Because the archive url does not have a 14-digit timestamp, cs1|2 suppresses the |archive-url=
link so that |title=
is linked with |url=
, the presumably dead url. In preview, cs1|2 creates a 201706*
timestamp so that archive.org will show the calendar display for that year. That no longer works and archive.org just returns a "We're sorry — something's gone wrong" message. Apparently, archive.org no longer recognizes the wildcard character unless the timestamp is zero-filled to 14 digits. I'll fix that in the cs1|2 module.
I suppose, to answer my own question, that was as good a fix as should be expected (no need to include |url-status=dead
because that is the default when |archive-url=
has a value). I don't think that automated tools should be choosing which of (possibly) many archive snapshots to use in |archive-url=
so showing the error messages may attract interested editors to make the necessary repairs... or not.
—Trappist the monk (talk) 14:07, 5 August 2021 (UTC)
- In this particular case, normally Medic would have filled in the snapshot date when going to
web/20170614/
which is 20170728023443. With the 8-digit snapshot it is a working URL that redirects to 20170728023443 and the bot confirms that by filling it in, it's not deciding anything. However, it didn't work this time and I know why - sort of intentional but also unintentional.
- Regarding choosing snapshots, unfortunately it doesn't work to rely solely on the community. For example at dewiki they rejected use of IABot for years, and now people are upset because of the number of dead unmaintained links. On enwiki, in 2015 before IABot existed, there were around 4 million archive links added in the entire 15 year history of Wikipedia (much of that by older bots). Within two years IABot had added over 8 million more. True it would be good if people did it, the best solution, but people don't, evidently, at the scale required. It's hard, repetitive, boring, etc.. and endless, thousands of URLs are dying every day and new ones being added. There is strong community demand for a solution that is not 100% manual.
- I would love to hear if you have any big picture ideas, you have good experience, bots adding archive URLs doesn't need to be the (only) solution, it certainly has problems and I am keeping a list of them. Relying entirely on manual doesn't work well and the community wants something more. What other solutions might there be? Serious question, there must be other ways that are practical to implement (not require major changes to Mediawiki). I have some ideas, and I'm sure others do as well. -- GreenC 15:29, 5 August 2021 (UTC)
Gambot tasks
[edit]Recent changes to some of the Good Article list pages have broken one of the bot tasks. A bit more information at Wikipedia talk:Good articles#Removal of NavFrame. I am also unclear why it added Arizona State Route 88. CMD (talk) 09:10, 8 August 2021 (UTC)
- I actually found a bug in the code, the same variable name used twice for different purposes, why the program ever worked I don't understand. Something probably changed on the incoming data that exposed the bug. It might explain the Arizona 88 also. -- GreenC 14:25, 8 August 2021 (UTC)
- The magic of code. The Arizona 88 edit also suggested to me the code still produces the desired results in the bugged section, but just reads all the headers as well. Nonetheless, if you have the time to see if the code could be fixed that would be appreciated. CMD (talk) 15:32, 8 August 2021 (UTC)
- It was a simple fix once identified. If you see any more problems let me know. -- GreenC 15:47, 8 August 2021 (UTC)
- The magic of code. The Arizona 88 edit also suggested to me the code still produces the desired results in the bugged section, but just reads all the headers as well. Nonetheless, if you have the time to see if the code could be fixed that would be appreciated. CMD (talk) 15:32, 8 August 2021 (UTC)
Bot is labeling dead links that are NOT dead
[edit]Your bot just labelled two links "dead" that are both still good links. I just clicked on both of them. The article is Space Pioneer, and this is the diff. Cheers. N2e (talk) 19:35, 9 August 2021 (UTC)
- @N2e: thanks for the report. I think the trouble has been identified and fixed. It's more complex than not detecting the URL status correctly, having to do with soft404s and the way this particular run was configured. There are some others in spacenews.com possibly as many as 300 with false dead links. I'll post here when fixed. -- GreenC 01:01, 10 August 2021 (UTC)
- Super. So am I hearing you that you will send the bot back to clean it up? Or should I revert on that article? Sounds like maybe you are "on it" and will end up fixing those two and many more from SpaceNews (actually, a source I regularly use). Cheers. N2e (talk) 01:28, 10 August 2021 (UTC)
- It was actually only 17 links, I miscalculated with 300. The dead link tags are now removed for the 17. If you see any more problems let me know. -- GreenC 01:43, 10 August 2021 (UTC)
- Cool. I'm here to say I wandered back to that article today and found GreenC_bot has nicely dropped by and fixed the problem it had created. diff Thanks! N2e (talk) 11:48, 17 August 2021 (UTC)
This still seems to be happening – [2] and [3]. SpinningSpark 16:05, 1 June 2022 (UTC)
- I'm working on the dtic.mil domain and it's complicated.
Looks like out of about 5,000 links, 375 are not dead (containingThe bot assumed they were all dead. -- GreenC 16:57, 1 June 2022 (UTC)/citations/
in the URL).
Invisible character error
[edit]This edit seem to introduce invisible character error for {{cite news}}: "replacement character in |url= at position 246 (help); replacement character in |archive-url= at position 288 (help)". I have reverted it. Regards.—Bagumba (talk) 07:07, 11 August 2021 (UTC)
- Thanks, for the error report. Another edge case bug fixed in my urldecode function. -- GreenC 15:28, 11 August 2021 (UTC)
Bot removing Wayback "id_" identity flag
[edit]Hi. The bot appears to be removing the Wayback "id_" identity flag, which is intended for it "perform no alterations of the original resource, return it as it was archived." Removing this is harmless in many cases, but in many others it is not. I have purposely linked to these id_ archive links in the past because of text becoming unreadable with the normal archive link. Here are two examples where the cited text otherwise malfunctions for the reader unless "id_" is used: this vs this, and this vs this. In the latter, the text becomes otherwise truncated. Thank you. Οἶδα (talk) 10:42, 19 August 2021 (UTC)
- Thank you for bringing this up with the examples. I have not wanted to do this because removing the nav box sort of locks in the archive page so users can't easily navigate around in case the page doesn't verify or changes (WaybackMachine is not static, snapshots change and move). Still, you have shown there are sometimes good reasons for the flags. The bot should yield right of way to whatever flags users want. The code is done to preserve the flags (more complicated then it might seem) has not been tested at scale yet, next batch job will tell. BTW recent docs suggest if_ instead of id_ -- GreenC 04:19, 20 August 2021 (UTC)
Reformatting PDFs cited to archive.org
[edit]I undid an edit of yours that reformatted a PDF link on archive.org. I don't think it was a useful and pointless. I've seen this on a number of articles. What it does is requires the reader to click two times to open the PDF. Please do stop doing this.--Dr Silverstein (talk) 03:11, 10 October 2021 (UTC)
- like this. Those are machine-specific URLs that will break in the future (when the machine moves or is taken offline) they are ephemeral and should not be linked on Wikipedia. There might be a way to make a permanent link to a PDF but I'm not sure how, and, I'm not sure it's a good idea as the book reader is better for the general reader. The PDF link is still there if they want it. I know you personally prefer PDF but think of other people around the world on dialup modems, slow connections, costly cellular, etc.. From the main page eaders can individually choose what format they want, assuming they even want to download it at all. The main page also has metadata that is not visible when going direct to PDF. -- GreenC 03:33, 10 October 2021 (UTC)
- This is not about my personal preference. PDF is much more clear and enlarged and does not require any enlarging, page turning or any such thing. Fortunately most people do not use dial-up modems, slow connection, costly cellular and what not. The majority of the world uses Wi-Fi and lives in the year 2021. By removing PDFs to accommodate users of outdated devices, you are disrupting the ability for the vast majority of readers to read clearly. The purpose of these PDFs is not to provide a search option on WayBack but to provide the actual reading content. What do you do to PDFs that are on other websites? I hope you get my concern. As you probably know, I am not a very active user, but as a reader, I cannot access the content of any PDF link to WayBack Machine and finding the PDF option is not as easy as you think. If you look on the right side options, it's unknown to the average reader that the PDF format is available. I see this reformatting a problem on a number of articles. The book reader is too small for the average reader. It requires a lot of clicking, while the PDF is automatically fits the screen size of the device of a reader. This PDF link is permanent as it's not hosted on sites like Research Gate. I suggest you leave it as it is. If it was meant to be uploaded as another format, it would have been done so. The book format is outdated and requires multiple navigation as opposed to the direct PDF. Please do not reformat it anymore. If you are unsure, then leave it until you are certain it will stay permanent, which it will because it's not a Research Gate link.--Dr Silverstein (talk) 08:27, 10 October 2021 (UTC)
- FYI I have reformatted another PDF back to it's proper format. Please do not reformat them. The link to WayBack is not temporary as it's not a Research Gate paper. There's countless PDFs linked on Wikipedia and are viewed as PDFs, not complicated menus. Please do not make anymore of such changes in the future. It is not helpful at all, unfortunately.--Dr Silverstein (talk) 01:09, 13 October 2021 (UTC)
Adding url-status=dead to citations with |archive-url=... present
[edit]Hello. I was wondering why the bot's going round adding |url-status=dead
to citations with |archive-url=value
present, as its only edit to a page. E.g. here or here. I thought |url-status=dead
was the default in that case? The edit summary "Move 2 urls. Wayback Medic 2.5" doesn't explain. cheers, Struway2 (talk) 08:48, 20 October 2021 (UTC)
- Fixed going forward. A few more came through today in the queue. -- GreenC 17:14, 20 October 2021 (UTC)
Hi! In this edit, the bot added {{Usurped}} around links that have a pipe character in the link text. This breaks the template syntax. Could the bot replace the pipe character with the {{!}}
magic word? --rchard2scout (talk) 15:04, 8 November 2021 (UTC)
- @Rchard2scout: Done. Do you know if there any other characters that require escaping in a square-link title? -- GreenC 15:28, 8 November 2021 (UTC)
References
- looks like 1,2,7 (pipe, equals, right-square-bracket). I guess equal would be
{{=}}
- right-bracket would never appear since that would be impossible for the bot to parse. -- GreenC 20:28, 8 November 2021 (UTC)
- looks like 1,2,7 (pipe, equals, right-square-bracket). I guess equal would be
Altering intro text
[edit]Hi GreenC! I tried altering the intro text at Wikipedia:List of Wikipedians by article count/1–1000, but it appears it's baked into the bot so it reverted me on the next update. Could you make that text configurable? {{u|Sdkb}} talk 18:19, 15 November 2021 (UTC)
- Courtesy pinging GreenC. {{u|Sdkb}} talk 23:30, 30 November 2021 (UTC)
- No because this tool works across multiple language sites and it's not so easy as making a template due to grammar and numerical formatting issues. And also, I like the footnote, sorry you do not, it's harmless anyway. This is a third party tool, it's not an official tool or page, anyone can make their own tool or list. -- GreenC 06:10, 1 December 2021 (UTC)
bot is breaking citations
[edit]See this edit. Bot added |archive-date=2007-06-15
(with the hyphen) when the citation already has |archivedate=1 February 2017
(without the hyphen).
—Trappist the monk (talk) 01:09, 21 December 2021 (UTC)
- Also added two
{{cbignore}}
templates. One is sufficient, right? - —Trappist the monk (talk) 01:11, 21 December 2021 (UTC)
- And what is the real archive date for that citation anyway. The url seems to suggest 2017-01-31 which is different from the dates in the
|archive-date=
and|archivedate=
parameters. Only one can be correct, so which one is it? - —Trappist the monk (talk) 01:16, 21 December 2021 (UTC)
- Ah yeah there is a Pandora link in the
|page=
field, saw it during testing thought it was fixed guess not, fubar. The archive URL in this case is the https://webarchive.nla.gov.au no idea how it came up with 2007-06-15. -- GreenC 01:33, 21 December 2021 (UTC)- All pages fixed. The solution is remove webarchive (or pandora) .nla.gov.au URLs in a
|page=
field when it is the same as the URL in|archive-url=
. Because web archives do not open a PDF to a page number, they drop the fragment. It's redundant. -- GreenC 03:27, 21 December 2021 (UTC)
- All pages fixed. The solution is remove webarchive (or pandora) .nla.gov.au URLs in a
- Ah yeah there is a Pandora link in the
Sentongo haruna , list of ugandan by net worth
[edit]Please can you help me my aticle was blocled amd deleted i need some one to help me 41.210.145.202 (talk) 14:22, 20 January 2022 (UTC)
Date problem
[edit]Hello, there is a problem with the |archive-date=
field in this edit. Keith D (talk) 21:31, 20 January 2022 (UTC)
- Fixed, logs show it was singular. Can't say what caused it yet. -- GreenC 22:26, 20 January 2022 (UTC)
- Thanks for the fix. Keith D (talk) 22:44, 20 January 2022 (UTC)
archive.org
[edit]This just in from the Channel 37 newsroom: You are taking valid links to archive.org and disabling them bu misusing {{usurped}}. Case in point and again here. There may be others. There used to be a Clarke Ingram site on uhftelevision.com / dumonthistory.com with a fair amount of good information on a long list of individual stations (and one entire network) which failed in the early days of television (1950's and early 1960's) because TV manufacturers weren't required to include UHF tuners in new TV sets until 1964, leaving only room for two or three main networks over-the-air. Many of the individual station history articles here rely on sources like this as there's relatively little online from that distant era. Sadly, the Ingram domains were allowed to expire and at least one has been cybersquatted with hardcore pornography, leaving the archive.org versions the only readily-available copy of the material. And no, the usual strategy of linking to both the original URL and the archive link present makes no sense (and is actively harmful) because it's linking to a domain which is not under control of the original site and that domain is being abused. We should never be linking to cybersquatted or expired domains, as it only encourages abusive registrations intended to take traffic meant for the cited site and redirect it elsewhere - ads, spam, porn, the occasional attempt to "ticket-scalp" domain registrations by speculatively tying up hundreds or thousands of domains, putting each up for sale for four-figures or worse. This garbage is the scourge of the Internet; let a domain expire and it's gone not in sixty seconds but sixty milliseconds from when it becomes available to new registrants.
It would be better if you leave links to legitimately-archived content alone and remove links to cybersquatted or hijacked domains, instead of the inverse. This sort of edit is not helping the project. Link rot is a problem and archive.org a useful tool in damage control. 66.102.87.40 (talk) 18:26, 7 February 2022 (UTC)
{{usurped}}
does not disable links. I think you misunderstand what this template is for and how it works. -- GreenC 18:39, 7 February 2022 (UTC)
For example. Given this citation:
<ref>[https://web.archive.org/web/20180228080250/http://www.uhftelevision.com/articles/channel37.html History of UHF television: Why Is There No Channel 37?]</ref>
If seen by Citation bot or reFill or other tools, they will automatically convert to:
<ref>{{cite web |url=http://www.uhftelevision.com/articles/channel37.html |title=History of UHF television: Why Is There No Channel 37? |work=uhftelevision.com |archive-url=https://web.archive.org/web/20180228080250/http://www.uhftelevision.com/articles/channel37.html |archive-date=2018-02-28 |url-status=dead}}</ref>
This is a problem since the domain is usurped. So we need to a way to communicate when a bare or square archived URL is usurped. Thus {{usurped}}
is to flag other bots and tools (and people) that the underlying source URL is usurped. -- GreenC 18:54, 7 February 2022 (UTC)
- The edits appear with summaries like (Remove 3 citations per WP:USURPSOURCE. Wayback Medic 2.5); look up WP:USURPSOURCE and that page isn't about link rot. It's about scraper sites, which steal content from other websites, mangle it to slip past the search engine duplicate content penalties and then repost it without attribution. Entirely different animal. If we're dealing with a scraper site vs. a live originating site, we want the original. If we're dealing with archive.org vs. a cybersquatted domain we don't want a clickable link to the cybersquatters. Maybe {{usurped}} is legit for dealing with cybersquatting, but WP:USURPSOURCE is the wrong documentation as it applies to a different issue. 66.102.87.40 (talk) 18:55, 7 February 2022 (UTC)
- Ah sorry that edit summary is totally wrong. I made a mistake and didn't see it until it was almost done (only 34 edits). These edits have nothing to do with [WP:USURPSOURCE]]. I apologize for the confusion caused by that. -- GreenC 18:58, 7 February 2022 (UTC)
- The edits appear with summaries like (Remove 3 citations per WP:USURPSOURCE. Wayback Medic 2.5); look up WP:USURPSOURCE and that page isn't about link rot. It's about scraper sites, which steal content from other websites, mangle it to slip past the search engine duplicate content penalties and then repost it without attribution. Entirely different animal. If we're dealing with a scraper site vs. a live originating site, we want the original. If we're dealing with archive.org vs. a cybersquatted domain we don't want a clickable link to the cybersquatters. Maybe {{usurped}} is legit for dealing with cybersquatting, but WP:USURPSOURCE is the wrong documentation as it applies to a different issue. 66.102.87.40 (talk) 18:55, 7 February 2022 (UTC)
- I think there are four different WP:USURP policies, guidelines or procedures. The one you're looking for seems to be Wikipedia:Link rot/Usurpations aka WP:USURPURL. Labelling archive.org as a scraper site isn't what you want. 66.102.87.40 (talk) 19:00, 7 February 2022 (UTC)
- I am the author of WP:USURPURL and that is exactly the procedure that was followed (except the erroneous edit summary). -- GreenC 19:33, 7 February 2022 (UTC)
Edits showing template code
[edit]For example in Special:Diff/1067557821, where the bot tries to wrap a URL containing an equals sign in {{usurped}}. The equals sign gets interpreted as a template parameter, resulting in the literal text {{{1}}}[usurped!] showing up. Consider using {{usurped|1=...}}
instead. * Pppery * it has begun... 22:23, 10 February 2022 (UTC)
- Oh yeah not good. The problem exists on 426 pages, you are the first to notice and report it over many months. -- GreenC 22:43, 10 February 2022 (UTC)
- Script running now, example. Bot code updated. Template docs updated. -- GreenC 23:02, 10 February 2022 (UTC)
Bot added 2 reftalk templates
[edit]Hey GreenC! I noticed that on Talk:Rocket League, your bot added two {{reftalk}} templates. I'm guessing it did this because it saw there were 2 sections (1 was a sub-section) and refs and so it assumed the refs were in 2 different sections. Maybe it should look to see if the refs are part of the same section (or a subsection) and only add 1 reftalk template if the refs are all in the same section? ― Blaze WolfTalkBlaze Wolf#6545 01:29, 11 March 2022 (UTC)
- Thanks for the fix and notification. Based on the edit summary, the bot thought one was for the 2-level comment one for the 3rd-level comment. Given the location of the refs, free-floating outside any text block, either could be right - there is no indication which section the refs belong to. This is a GIGO situation. I don't know how to fix it, but, it does appear to be pretty rare as I have never seen it before. -- GreenC 03:37, 11 March 2022 (UTC)
- Alright sounds good. I"ve fixed your fix since the section is meant to be part of the edit request. ― Blaze WolfTalkBlaze Wolf#6545 05:30, 11 March 2022 (UTC)
Bot adding non-working archive link with incorrect archive date
[edit]... at Julie Higgins. The link it added goes to a 404 page (presumably a soft 404) from 2014. Graham87 07:03, 8 May 2022 (UTC)
- Thanks. NLA has a new URL scheme it should have done this not sure why it didn't will investigate. -- GreenC 10:14, 8 May 2022 (UTC)
Bot claiming links are dead when they are not
[edit]This morning my watchlist was flooded with GreenC bot edits like this claiming that a link was dead. Except that it isn't. Something has gone wrong. Hawkeye7 (discuss) 19:53, 1 June 2022 (UTC)
- Yeah I'm working on dtic.mil and the site fooled me into thinking entire subdomains are dead but it's actually only some links. Will go back over them and return what is working to live. -- GreenC 21:00, 1 June 2022 (UTC)
https://apps.dtic.mil/sti/pdfs/ADA546200.pdf is another example of this problem. --Ancheta Wis (talk | contribs) 10:43, 2 June 2022 (UTC)
- It will be posted soon. From the bot log last night:
- syslog:United States Army Futures Command----https://apps.dtic.mil/docs/citations/ADA546200 ---- MAKELIVE ---- remove [dead link ] from squarelink
- -- GreenC 14:14, 2 June 2022 (UTC)
Thank you
[edit]Thank you for the good work this bot is doing! Combating link rot is key to the long-term viability of Wikipedia and the preservation of knowledge; I thank you for your efforts in this direction. Al83tito (talk) 16:22, 15 June 2022 (UTC)
- Thanks you are welcome. Agree keeping up with archiving is vital for Wikipedia to work. -- GreenC 17:16, 15 June 2022 (UTC)
Great work on archive links
[edit]Hi there, thanks for the great work the BOT (and you!) are doing on fixing up many archive links in articles I have done a lot of work on. I see that sometimes you use 'download' when it is png or jpeg and 'detail' if is a pdf. Now, I can also see what I was doing to cause issues. If I uploaded a pdf to internet archive and my page arrived, I was downloading the PDF and using that as the link - when it should have been the window with 'detail'. I see you have made these changes and while they are not easy to navigate on a small device, I have my head around the window now. All good. So, in future I always load the 'detail' link right? What about Jpeg or png uploads? Can I download them and use the link? I am sure you have many articles on the go at the moment, so this one you did is a good example of what I am talking about: Parrs Park. You seem to move from 'detail' to 'download'? Thanks again for getting me on track with this process and I like not having so many pdfs floating around in my references! I hope this makes sense?Realitylink (talk) 21:23, 21 June 2022 (UTC)
- Thanks! Archive.org has some complications, but is also a powerful system that is open. Generally it's designed so the /details/ is the default landing page where users can then branch out to other options: accessing individual files like the .pdf, reading via the in-page flipbook reader, access metadata. For media files like jpg or mp3, it's often better to use /download/ because that is a direct link to the file it may be part of a larger package, it works better usually to open those directly as they are not textual in nature. Unless the /details/ page is the same content in which case it might be better to use /details/. The other thing is not to use machine-specific links like
https://ia802503.us.archive.org
.. these are temporary addresses that will change they have a limited lifespan - typically a few years. They are not designed as permanent links. Finally the|url=
field should not contain a web archive link eg. archive.today or web.archive.org .. only in the|archive-url=
field. Hope that helps. -- GreenC 04:27, 22 June 2022 (UTC)- Yes that is a terrific help. Your Bot is doing a lot of work fixing up my links...and not using the machine-specific link (something I have done a lot previously - but never again promise!!) is a useful piece of information. I might get back to you if further questions arise. Appreciate your response.Realitylink (talk) 05:30, 22 June 2022 (UTC)
- Thanks again. A couple more questions...you say it is better to use the download for media files, but that still relates back to the machine-specific link...is that a problem? Will the link still have a limited life? And...the url= field not containing a web archive link...what if a downloaded url is the only link there is? Can we use it as the url? We can't use it as an archive link, because I am sure that needs url to be archived against....or is it ok to use links from Internet Archive as urls? I see the bot has done that a few times... for example in the page for Alex Hassilev. It looks good and is accessible...but is still an archived url being used as a url. Or am I overthinking this?? cheers Greg Realitylink (talk) 00:18, 23 June 2022 (UTC)
- archive.org has a lot of services. One is web archiving ie. Wayback Machine at http://web.archive.org Archive.org also has a service where it scans books and media, exactly like Google Books. It's at https://archive.org/details .. One is sort of like a library of digital holdings. The other is a collection of website scrapes. So the digital book scans are primary urls and reside in the
|url=
field since that is the source URL, it's not archiving a different website somewhere, it's the original destination link. -- GreenC 00:53, 23 June 2022 (UTC)- That's most interesting and makes a lot of sense. So I should continue displaying snapshots of news clippings etc as a url? And judging from an earlier statement you made, 'detail' is probably the best way to go? Are there any advantages in using the 'download' function? Does either compromise the long term safety of the link, or once it is Wayback, is it secure? I am thinking of going back over the Hassilev article (and some of the others the bot has fixed) and converting all of those 'download' links to 'detail'. That won't pose any problems I am thinking? Realitylink (talk) 02:27, 23 June 2022 (UTC)
- Yes newsclips in
|url=
. They are not Wayback Machine links just normal book links (or "text collection" is the precise name). Anything on archive.org is secure except when it gets taken down :) Like due to a copyright holder request. A good way to deal with that is use archive.today to save the details or download link, then place the archive.today link in the|archive-url=
field. You could also use ghostarchive.org for same purpose as secondary archive. That would be three places which is pretty secure. Could also save a copy on your local disk. -- GreenC 03:00, 23 June 2022 (UTC)- Ah, ok so you can have the 'details' saved as a URL and an archive today link as an archive-url? I did do that for some, but an editor changed it...so for those news clippings if I use the 'detail link' and then an archive today link won't that look like its been archived twice? Would that matter? Hey, appreciate you patience and knowledge. Realitylink (talk) 03:12, 23 June 2022 (UTC)
- No because the details link is not a web archive link. It's not the Wayback Machine, it's a different service. -- GreenC 03:25, 23 June 2022 (UTC)
- I can see why myself and others get confused...but hey...all good. So they look the same because archive.today just captures the image, and hopefully will retain the link.Realitylink (talk) 03:36, 23 June 2022 (UTC)
- No because the details link is not a web archive link. It's not the Wayback Machine, it's a different service. -- GreenC 03:25, 23 June 2022 (UTC)
- Ah, ok so you can have the 'details' saved as a URL and an archive today link as an archive-url? I did do that for some, but an editor changed it...so for those news clippings if I use the 'detail link' and then an archive today link won't that look like its been archived twice? Would that matter? Hey, appreciate you patience and knowledge. Realitylink (talk) 03:12, 23 June 2022 (UTC)
- Yes newsclips in
- That's most interesting and makes a lot of sense. So I should continue displaying snapshots of news clippings etc as a url? And judging from an earlier statement you made, 'detail' is probably the best way to go? Are there any advantages in using the 'download' function? Does either compromise the long term safety of the link, or once it is Wayback, is it secure? I am thinking of going back over the Hassilev article (and some of the others the bot has fixed) and converting all of those 'download' links to 'detail'. That won't pose any problems I am thinking? Realitylink (talk) 02:27, 23 June 2022 (UTC)
- archive.org has a lot of services. One is web archiving ie. Wayback Machine at http://web.archive.org Archive.org also has a service where it scans books and media, exactly like Google Books. It's at https://archive.org/details .. One is sort of like a library of digital holdings. The other is a collection of website scrapes. So the digital book scans are primary urls and reside in the
Sorry, just one more point to clarify: So I upload a pdf of a scientific paper to the internet archive, and when it comes through, I use the 'detail' link, either as a url or an archive url. Now, is the safety of this detail link related to the machine-specific links like https://ia802503.us.archive.org
which if I have got it right, will just give somebody a chance to download and read the paper, but it won't be permanent. Correct? I am thinking that each person who downloads it from the detail page would get a different machine-specific link? And I keep using the 'detail' link. I appreciate your patience and will leave you in peace - hopefully not pieces! - after this. Kind regardsRealitylink (talk) 03:21, 25 June 2022 (UTC)
- If it does not contain "web.archive.org" it's not an archive URL, and should not go in the
|archive-url=
. Only web archive URLs go in the|archive-url=
, and only if they contain web.archive.org are they considered web archive URLs. To illiustrate:- https://web.archive.org = web archive URL --->
|archive-url=
- https://archive.org = non-web archive URL --->
|url=
- https://web.archive.org = web archive URL --->
- Quiz: given the URL https://archive.org/details/wonderfulwizardo00baumiala .. would go in the
|url=
or|archive-url=
? Or given the URL https://web.archive.org/web/20220101000646/http://example.com/ would it go in the|url=
or|archive-url=
? - Not sure I understand question about machine-specific links but the rule there is not use them as they expire. Everyone has a chance to download and read the paper from the details link. On the right side of the page it says "Download Options" with links to PDF, Epub, etc.. or they can read via the flip book.
- -- GreenC 05:54, 25 June 2022 (UTC)
- So links from archive.today, one I just used looked like this: https://archive.ph/XUik2 can't be used as an archive link? I have been using them as archive links...but they need to go now? It gets more confusing indeed...hang on, I just looked about and here is something you said: "A good way to deal with that is use archive.today to save the details or download link, then place the archive.today link in the |archive-url= field." It seems I can use archive.today links as archive-links! I hope so, 'cos I have hundreds of them in place...and I just checked Ghostarchive doesn't have web.archive.org thing either....so can it be used as an archive-link?Realitylink (talk) 07:45, 25 June 2022 (UTC)
- Yes archive.today, ghost and other web archive's are also archive links.. I was just focusing on the archive.org domain since that is what we were discussing which is a source of confusion. -- GreenC 15:28, 25 June 2022 (UTC)
- So links from archive.today, one I just used looked like this: https://archive.ph/XUik2 can't be used as an archive link? I have been using them as archive links...but they need to go now? It gets more confusing indeed...hang on, I just looked about and here is something you said: "A good way to deal with that is use archive.today to save the details or download link, then place the archive.today link in the |archive-url= field." It seems I can use archive.today links as archive-links! I hope so, 'cos I have hundreds of them in place...and I just checked Ghostarchive doesn't have web.archive.org thing either....so can it be used as an archive-link?Realitylink (talk) 07:45, 25 June 2022 (UTC)
Thanks, I think we are getting there. I used the Ghost one yesterday and it is pretty quick. So I noticed that an editor changed some links for archive today by removing the 'ph/wip' and adding 'today', the message was: (clean up archive.today work-in-progress links, replaced: https://archive.ph/wip → https://archive.today) It didn't change how it looked, but does having 'today' there mean the link is more archive-url friendly? And is there a way to default to it in archive today ?
Another interesting thing this editor did was (replaced 12 archive.today URL(s) with more transparent URL from <link rel="bookmark") and they changed this: [archive-url=https://archive.ph/mkqjZ to this: archive-url=http://archive.today/20220508021803/https://ia902502.us.archive.org/23/items/tallahassee-democrat/Tallahassee%20Democrat.jpg]. So is this the long format, the one that seems to be recommended. Help talk:Using archive.today#RfC: (See the Long format link issue section) Should we use short or long format URLs? If so, I am not sure how to activate it, short of doing the full manual thing and even that is unclear. I can carry on manually changing the ph/wip to today...but would be interested in what has been done with this long version. Realitylink (talk) 22:02, 25 June 2022 (UTC)
- That was BrownHairedGirl. Yes please use archive.today as this is a special gateway server they want us to use on Wikipedia - it redirects to one of the servers where the content is hosted (such as .ph) this way if one of the content servers goes offline he can redirect to a different one quickly by making a change in the archive.today server. The /wip/ is not a correct URL it's a temporary until the page is fully saved. You have to wait until the page is saved and it will give the right URL. There is no easy way to get the long form. What you have to do: right-click on the page, select "View source", cntrl-f and search on "long". Then hit the right arrow a few times till you see the long form URL. Copy and paste. It will have a date like 2002-09-01.01.01 you can remove all the "-" and "." so it's just 14 digits long.
- I see you are still using machine-specific URLs https://ia902502.us.archive.org/23/items/tallahassee-democrat/Tallahassee%20Democrat.jpg - this is not recommend. Why not use https://archive.org/details/tallahassee-democrat/Tallahassee%20Democrat.jpg ? By "machine specific" the name of the machine is ia902502 which is a temporary location. No reason to do so when a permanent long term link is available. -- GreenC 03:05, 26 June 2022 (UTC)
- Thanks again, you have totally cleared that up for me. No, I am not using machine specific urls, that was from an earlier edit window. I am using 'detail' for them all now. It's easy to change all my 'ph's' to 'todays'...I will play around with looking for the long form, and try a few more on Ghost...I really appreciate your generosity in sharing this expertise. Kind regards. Realitylink (talk) 03:25, 26 June 2022 (UTC)
All good now with the long form. One thing I was wondering...is it necessary to change the http to https in this long form before pasting? Or does it just change into this over time? Realitylink (talk) 20:36, 28 June 2022 (UTC)
- Yes! https is best.. if you forget a bot will eventually fix it. -- GreenC 22:25, 30 June 2022 (UTC)
Removing dead link tags
[edit]It seems that the bot is removing {{Dead link}} tags from refs to webcitation.org
See e.g. this edit[4] to 2008 North Indian Ocean cyclone season, which removed the dead link on the bare URL ref to https://www.webcitation.org/5bgwta1al?url=http://www.imd.gov.in/section/nhac/dynamic/endseasonreport.pdf ... which I had added in this edit[5] three weeks previously?
Please can the bot stop doing this? Removing the {{Dead link}} tag means that the ref is an untagged bare URL, which ends up in my cleanup lists that get fed to citation bot, or in this case tagged[6] as {{Bare URL PDF}} ... which is also unhelpful, 'cos it invites editors to fill a ref which is known to be dead. BrownHairedGirl (talk) • (contribs) 01:29, 24 June 2022 (UTC)
- [dead link ] was not meant to be used to tag bare archive URLs. Your kind of hacking the system a bit to trigger CitationBot to process the cite, a custom process flow. Normally once a citation has an archive URL, it's no longer tagged with a
{{dead link}}
- it has been "Saved". The dead link documentation says "Before considering whether to use the [dead link ] template it is often useful to make a search for an archive copy of the dead link and thereby avoid using the tag altogether." We've always removed these tags once an archive URL is added. I understand this messes with your process flow since you do things in stages often with a long period between steps. Would suggest finding another way such as running citation bot sooner than later on these cases, to avoid overlap with other maintenance bots. Another option is when adding the{{dead link}}
it would look like:{{dead link|date=May 2022|bot=BrownHairedGirl}}
.. I can program my bot to avoid removing those dead link tags with that|bot=
and for whatever dates you want. The issue is that there are other cases where these tags need to be removed, that were added in error by users who don't realize that an archive URL + dead link tag is redundant/non-standard. -- GreenC 02:09, 24 June 2022 (UTC)- @GreenC, you are missing the point: webcitation.org is dead.
- There is no system hack, and the whole point of my comment is that I do not want CB to process these dead links to webcitation.org, and having these dead links correctly tagged as dead takes them out of that workflow.
- I don't usually tag archive links are as dead, because they are live links to an archive site. It would be silly to tag as dead a link for example to https://web.archive.org/web/20120726135924/http://stonewall.org.uk/documents/stonewall_mp_voting_records_2010_1.pdf ... because that is a live link.
- But webcitation.org is dead, so it is completely appropriate to tag those links as dead ... and no custom variant of the tag is needed. Dead is dead.
- So I have just re-tagged as {{Dead link}} all the bare URL refs to webcitation.org. Please do not remove those tags.
- It's great to remove deadlink tags from bare URL links to live archives, such as archive.today and archive.org. That is very helpful.
- But webcitation.org is a dead site, so please stop removing the {{Dead link}} tags from links to webcitation.org. You need to amend your code to make it stop treating the dead site webcitation.org in the same way as it treats live archiving sites.
- Also, if you reply, please ping me. It's a pain to have to check for a reply, when a simple mechanism exists to notify. BrownHairedGirl (talk) • (contribs) 04:03, 24 June 2022 (UTC)
Help of you can
[edit]Hi, can you please help me on the wikipedia page of the Beta Israel community? The manager of the website is making up whatever he wants and he deleting sources and he lock pages. 2A02:6680:1106:1FCE:250E:E104:93A1:5652 (talk) 22:14, 30 June 2022 (UTC)
- You were blocked so can't help. I'm just a bot. -- GreenC 22:24, 30 June 2022 (UTC)
Why can't you help? I was Blocked for no reason 2A02:6680:1106:1FCE:250E:E104:93A1:5652 (talk) 22:53, 30 June 2022 (UTC)
Bot added erroneous archive URL
[edit]I would like to report that the bot added the following erroneous archive URL for a journal citation in this edit:
|archive-url=https://web.archive.org/web/2021*/https://www.iseas.edu.sg/wp-content/uploads/2021/10/ISEAS_Perspective_2021_145.pdf |url-status=dead |archive-date=2007-06-15
I already manually fixed the archive URL but just wanted to report the error. Regards. Sanglahi86 (talk) 21:04, 14 July 2022 (UTC)
- Sanglahi86: The problem is the
|url=https://web.archive.org/web/2021*/https://www.iseas.edu.sg/wp-content/uploads/2021/10/ISEAS_Perspective_2021_145.pdf
which confused the bot. It should be|url=https://www.iseas.edu.sg/wp-content/uploads/2021/10/ISEAS_Perspective_2021_145.pdf
. The bot was trying to make that switch but the "2021*" stumped it. There should not be an archive URL in the|url=
field. -- GreenC 21:11, 14 July 2022 (UTC)- Sorry, had not seen that citation's
|url=
detail. Thank you for clarifying. Regards. –Sanglahi86 (talk) 21:15, 14 July 2022 (UTC)- @Sanglahi86: No problem thanks for noticing and reporting and fixing! - GreenC 21:27, 14 July 2022 (UTC)
- Sorry, had not seen that citation's
Flagging non-dead link as dead
[edit]This edit flagged this URL as dead even though it isn't. Jo-Jo Eumerus (talk) 11:17, 18 July 2022 (UTC)
- Same with these edits:
- I appreciate it probably has to do with some kind of automatic PDF link serving in Javascript that Academia.edu uses wouldn't be readily captured with a bot; I don't know how fixable it is, but the links noted are not dead at all; I reverted both edits that the bot flagged. Ifly6 (talk) 14:35, 18 July 2022 (UTC)
- The url that Editor Jo-Jo Eumerus linked:
- Both of the urls that Editor Ifly6 links:
- There was some discussion about these kinds of academia links at Wikipedia:Link rot/URL change requests § www.academia.edu/download/
- —Trappist the monk (talk)
14:43, 18 July 2022 (UTC)14:46, 18 July 2022 (UTC)
- Jo-Jo Eumerus & User:Ifly6 they are dead for me (USA). Example. Are you getting a redirect to a cloudfront URL? Wondering if there is some kind of location-aware policy that determines when to serve the cloudfront URL vs a 404. If the cloudfront URL was known, it would be possible to save it at the Wayback Machine, then use the Cloudfront-Wayback URL on Wikipedia treated as a dead link (due to its &Expires self-destruct mechanism see WP:AWSURL). However, I wonder about copyright if academia.edu is making them unavailable in the US and possibly elsewhere, question why have that policy if not a rights issue. -- GreenC 15:04, 18 July 2022 (UTC)
- I'm in the US and am getting the links promptly. The links I am getting are Cloudfront ones with an expiry; I used the Academic.edu links to avoid the known expiry. Ifly6 (talk) 15:41, 18 July 2022 (UTC)
- Ah I see you use British English so I assumed you are not US. What browser do you use? Do you have any plugins that might affect javascript? This is impacting archive providers as well, such as Wayback Machine and Ghostarchive (US-based), they also get 404. Archive.today it "works" (global IP pool) but they are unable to correctly save the PDF. -- GreenC 16:00, 18 July 2022 (UTC)
- I do get a "d1wqtxts1xzle7.cloudfront.net" sort of thing. Jo-Jo Eumerus (talk) 17:33, 18 July 2022 (UTC)
- Language heuristics are always right 99pc of the time haha. I've confirmed on Edge (Windows 10) and Safari (macOS) that the Academia.edu link work. I don't have any plugins installed other than ad blockers that would affect something like this. The specific link that got generated for me with Rafferty was https://d1wqtxts1xzle7.cloudfront.net/51344857/Iris-_Fall_of_the_Roman_Republic-with-cover-page-v2.pdf. There were then a pile of GET parameters that I've excerpted – they change every time anyway – but are necessary to get the file served properly. Ifly6 (talk) 19:24, 18 July 2022 (UTC)
- Jo-Jo Eumerus do you use Edge or Safari? -- GreenC 19:38, 18 July 2022 (UTC)
- Wikipedia:Village_pump_(technical)#academia.edu/download .. seeing if anything comes up here. -- GreenC 19:52, 18 July 2022 (UTC)
- Ifly6 in the above thread someone suggested perhaps you had signed up for account on academia.edu at some point? Or some old cookies that are giving permission. One way to test is try to access from a private window. -- GreenC 20:46, 18 July 2022 (UTC)
- Yea, that's probably it. I opened it in a private window and got the 404. Ifly6 (talk) 20:57, 18 July 2022 (UTC)
- Same for me (Firefox) Jo-Jo Eumerus (talk) 21:12, 18 July 2022 (UTC)
- Cool, glad it is figured out what is causing it. My thinking is to replace the academia.edu links with a Wayback version of the cloudfront URL so it's accessible for everyone. Or second option is to use
|url-access=registration
but that 404 page is confusing and will result in bots marking it dead. -- GreenC 21:30, 18 July 2022 (UTC)
- Yea, that's probably it. I opened it in a private window and got the 404. Ifly6 (talk) 20:57, 18 July 2022 (UTC)
- Ifly6 in the above thread someone suggested perhaps you had signed up for account on academia.edu at some point? Or some old cookies that are giving permission. One way to test is try to access from a private window. -- GreenC 20:46, 18 July 2022 (UTC)
- Ah I see you use British English so I assumed you are not US. What browser do you use? Do you have any plugins that might affect javascript? This is impacting archive providers as well, such as Wayback Machine and Ghostarchive (US-based), they also get 404. Archive.today it "works" (global IP pool) but they are unable to correctly save the PDF. -- GreenC 16:00, 18 July 2022 (UTC)
- I'm in the US and am getting the links promptly. The links I am getting are Cloudfront ones with an expiry; I used the Academic.edu links to avoid the known expiry. Ifly6 (talk) 15:41, 18 July 2022 (UTC)
User:Jo-Jo Eumerus|User:Ifly6|User:Biogeographist: Would like to propose this solution: Special:Diff/1098978075/1099315632. It's only for academia.edu/download links, which are about 1,000 on enwiki.
- academia.edu returns a 404 when a user is not registered and logged in, which is most users. It does not say "log in to access paper", rather a misleading 404 dead link page. This causes problems:
- Archive bots will determine the links are dead (404) and mark with a
{{dead link}}
. - Users will be confused thinking the link is dead and not behind a registration wall.
- Should the link ever actually die for real, there would be no archive available since the Wayback Machine sees only a dead 404 page - the Wayback machine is not an academia.edu registered user.
- Archive bots will determine the links are dead (404) and mark with a
- While possible to use
|url-access=registration
this does not solve the misleading 404 problems. - The cloudfront link is an AWS container with an &Expires self-destruct mechanism. It's where the paper is actually located (not on academia.edu which redirects to cloudfront).
- The proposal is to determine the active cloudfront link via bot magic, immediately create a Wayback Machine save of the cloudfront URL, and change the citation to the Wayback-cloudfront link. eg. Special:Diff/1098978075/1099315632
This is what I can do somewhat easily right away. There are limits due to bot design and coding efforts what can be done. -- GreenC 04:15, 20 July 2022 (UTC)
- Hmm. It seems a bit complex and I wonder if people will be deleting the "expires" part of the link. Jo-Jo Eumerus (talk) 10:22, 20 July 2022 (UTC)
- It's a complex situation. If they delete the &Expires the URL will break (404). It will break anyway, due to the Expires, that is why the archive URL version is made the primary. The archive URL is accessible to everyone - academia.edu account not required. -- GreenC 15:30, 20 July 2022 (UTC)
- @GreenC: I think it's a problem that the
url
parameter points to CloudFront instead of Academia.edu; it would offer more transparency for the reader if the domain was academia.edu. Is it possible to retain the clean academia.edu URL (without the expires part) in theurl
parameter and use the long CloudFront URL in thearchive-url
parameter? Or to use a separate {{Webarchive}} template for the long CloudFront URL? Biogeographist (talk) 16:45, 23 July 2022 (UTC)- @Biogeographist: hi sorry I should have deleted this entire thread, because it was moved to the main talk page (User_talk:GreenC_bot) from this archive page (User_talk:GreenC_bot/Archive 6). I posted an updated situation. Can you follow up there? -- GreenC 17:14, 23 July 2022 (UTC)
- @GreenC: I think it's a problem that the
- It's a complex situation. If they delete the &Expires the URL will break (404). It will break anyway, due to the Expires, that is why the archive URL version is made the primary. The archive URL is accessible to everyone - academia.edu account not required. -- GreenC 15:30, 20 July 2022 (UTC)