User talk:Grammarbot
Please add new discussion to 'Run on " ," #1' or 'Random stuff'
Prerelease
Discussion from Wikipedia talk:Bots
I've made a list of articles which have spaces before commas (6518 articles). I think maybe a bot could do this:
- Every 10 seconds (or whatever the load limit is meant to be for bots) correct one article, move article from "todo" to "tocheck"
- 5/10/20/60 (?) minutes later, check back at the correction
- If it was reverted, take a note of it and move the article from its "tocheck" to "tosee"
- Bot author looks at "tosee" and attempts to code in any special cases
- If it wasn't reverted, delete the article from the "tocheck" list (or move article to "done" list)
- If it was reverted, take a note of it and move the article from its "tocheck" to "tosee"
- Continue.
I've also made other lists:
- "Space before colon" (7740 articles, main namespace only)
- "Space before exclamark" (1586) (many false positives involving table syntax)
- "Space before fullstop" (11489) (many false positives involving ellipsis and TLDs)
- "Space before fullstop which is before space" (2988) (less false positives)
- "Space before qmark" (4523)
- "Space before semicolon" (7740) (many false positives involving assembly language code and definition lists)
- "– instead of –" (1094)
- "— instead of —" (1496)
What do you think? Alternatively, I could post some reports (about 45 pages with 100 articles per page which provide context around the error, similar to the repeated words reports I made [1]) r3m0t 11:26, Feb 21, 2005 (UTC)
- What about articles which have commas in the title? What if the title was intentionally that way? -- AllyUnion (talk) 03:56, 24 Feb 2005 (UTC)
- Probably it would ignore the title. r3m0t 07:16, Feb 24, 2005 (UTC)
- As I said, it would also check back and if the change was reverted (i.e. a "space comma" sequence is back in the article again) it would put that (article and context) on a list for the bot author to look at. If the space comma was some sort of vandalism revert, the bot author would revert back to the valid version manually. Otherwise, the bot author would investigate the special case so that this false positive doesn't come up again. Also, they might want to put the article on a permanent "exclude" list or, in extreme cases, stop the bot. r3m0t 10:15, Feb 24, 2005 (UTC)
- Stop the bot: That is if there are any wierd uses of the comma I don't know about which are a major thing and can't be worked around. Of course, if coding the false positive will take long, the bot will be stopped until the coding is done. r3m0t 17:29, Feb 24, 2005 (UTC)
- There is still an argument over the whole ndash and mdash issue. I would avoid that. Also, where is this list going to be? -- AllyUnion (talk) 10:42, 24 Feb 2005 (UTC)
- Are you sure the argument is about what entity to use and not where to use ndash and where to use mdash? See Wikipedia:Manual of Style (dashes). It doesn't expressly say anything about whether to use the numerical entities or the named ones, but the named ones are "obviously" simpler (from the source anybody can tell it's some sort of dash, although 'n' and 'm' mean little). The list would be of course on the computer running the bot, but if you want it would also be on the wiki, whether updated whenever a change occurs or (eg) every few hours. IMO It would be a waste of Wikipedia's space. May I begin to develop and run the bot under User:R3b0t from Sunday? Would you be willing to give it the bot flag? Is it fine to run it (slowly) without a bot flag? Is a vote about allowing the bot necessary? r3m0t 17:11, Feb 24, 2005 (UTC)
- Yes. The list will be in a MySQL database, so I can write some scripts to show the list. r3m0t 16:44, Feb 25, 2005 (UTC)
- Am I getting permission to run this? I will try to run it to do one change a minute tomorrow. I will provide a facility to stop the bot (off-site) and will provide a link from here. Of course, you can also ban User:R3b0t. Is that name too similar to mine? r3m0t 00:15, Feb 27, 2005 (UTC)
- *twirls* What do you think? Grammarbot 10:42, 27 Feb 2005 (UTC) (Yes this is mine) r3m0t 12:57, Feb 27, 2005 (UTC)
- That's coming from User:AllyUnion who used to have a bot at User:Allyunion! Only joking. r3m0t 15:29, Feb 27, 2005 (UTC)
- Well, User:Allyunion is blocked now. I suggest you make your edits, and produce some kind of log every day. -- AllyUnion (talk) 16:01, 27 Feb 2005 (UTC)
Thank you. I'm programming it. r3m0t 17:16, Feb 27, 2005 (UTC)
The bot is now running making one change a minute without the bot flag. This is the log for today and tomorrow (with some earlier entries removed as I have since changed the log format) and various things are at stuff.php, including a list of upcoming articles, stuff that were fixed and stuff which it thinks were already fixed. There is also a counter showing the time until the next run. Note that currently it does not go back to check its edits. However, the backlog of changes it has made remains (in the database). If it were to check 2 a minute, it would catch up in - ooh, a day? Of course, I might implement that feature sooner, in which case, all the better. r3m0t 22:11, Feb 27, 2005 (UTC)
- Um, do you think you could at least add your bot to the list of bots running without a flag? I only spotted it because I was scanning RC. --Tony Sidaway|Talk 22:36, 27 Feb 2005 (UTC)
There were problems with some characters which screwed up many tags and tables. Unfortunately, grammarbot did about 100 edits before I noticed and stopped the bot. Not all of these edits were problematic, but for simplicity all were reverted in an hour or so. I'm waiting to recieve (conditional?) permission to run it again, after (some of?) the things detailed on User talk:Grammarbot have been done. r3m0t 01:16, Feb 28, 2005 (UTC)
It's up again without any problems (not even at Anglesey ;)) and I hope to get the bot flag in a few hours or so. I'll apply. I'll also list it on Wikipedia:Bots. r3m0t 22:35, Mar 3, 2005 (UTC)
First run
The panic
I'm assuming that this is not a deliberate vandalbot, but in its effects at Adrian Nastase, it might as well have been. Among other things, it is systematically screwing up HTML entities. -- Jmabel | Talk 22:36, Feb 27, 2005 (UTC)
It also screwed up the formatting on Anglesey requiring a revert.
Velela 23:04, 27 Feb 2005 (UTC)
- My sincerest apologies. The bot is now stopped. r3m0t 23:13, Feb 27, 2005 (UTC)
Err... incredible. Is there any way to revert all these automatically? Is this the death knell of my bot? r3m0t 23:14, Feb 27, 2005 (UTC)
Fuck. Fuck. Fuck. Fuck. Didn't Angela have a bot against this? (Yes, I really did stop the bot by now.) r3m0t 23:17, Feb 27, 2005 (UTC)
- No, she didn't. r3m0t 23:22, Feb 27, 2005 (UTC)
It's only about a hundred pages. I'm pretty sure they're being manually reverted as we speak (I did one. :-) And after all, some edits were probably completely uncontroversial. Hey, worse things happen. It's not the Willy on Wheels. :-) 82.92.119.11 23:23, 27 Feb 2005 (UTC)
I've done (from the most recent) up to American Association for the Advancement of Science. Phew. r3m0t 23:32, Feb 27, 2005 (UTC)
I wish I could help out, but I desperately need to catch some Z's. Try soliciting some brute force on the IRC channels. An admin (there are always some on the channels) may even have a bot handy for such things. 82.92.119.11 23:40, 27 Feb 2005 (UTC)
Problems
Problems, then:
- It removes the ampersand in entities such as & or ²
- It removed double quotes and maybe 'single' quotes too
- Err... in only some cases?
- It removes < and > and possibly other things which are escaped in the edit box such as " &
Sorry again. I can't imagine why this happened. Testpage at User:R3m0t/Sandbox and will be tested properly before re-enabling. r3m0t 23:13, Feb 27, 2005 (UTC)
If this ever runs again, please consider having it ignore everything between <math> tags, since that is often formatted for ease of reading while editing. Ben Cairns 00:05, 28 Feb 2005 (UTC).
- Don't worry, I'm not dead yet. I'll try to get that in. Grammarbot 00:07, Feb 28, 2005 (UTC)
For release #1
This appears all fixed. HOWEVER:
There is now an extra pageload, bumping the amount of pageloads up from 2 (edit and submit) to 3 (get text with Special:Export, edit and submit) and I really want to move this back down to 2 (the minimum)If I can't, I'll at least move the amount of pageloads for something which is already fixed back down to 1. That's easy.
I want to ignore things in math tags. I will use what I already have and possibly load more pages than I need to.I want to use a setting to enable/disable the UTF8 conversion which I now use, instead of hardcoding it in. (Possibility of outreach to other wikipedias)no longer neededI would like to test it on everything which it had already done and check the diffs "by human", to make sure everything is fine.- Nah.
I need to re-recieve permission.- Nah.
I need to make publicly available something to close the bot down.I need to move this down to every 5 minutes instead of every minute.- Nah.
- Perhaps it will be able to run on other reports making changes such as ' . ' -> '. ' and '.A' -> '. A' (for A-Z and only uppercase to avoid TLD problems)
Yours, r3m0t 01:09, Feb 28, 2005 (UTC)
Thanks Grammarbot this time it seems fine and Anglesey has survived the experience. Velela 22:26, 3 Mar 2005 (UTC)
- *smiles* My pleasure. Grammarbot 22:31, Mar 3, 2005 (UTC)
Run on " ," #1
Full list of reversions
- ASCII art - duh (preformatted text)
- Ahia Njoku - my own test to see if reverts were detected
- ArabTeX - brought preformatted text out of line
- Arnold Schwarzenegger - no idea, should have fied it first time around
- Black Path Game - brought preformatted text out of line
- Brainfuck - messed up Brainfuck code, also preformatted
- Childeric I - didn't fix " ," correctly - changed it to " ,"
- Chronic fatigue syndrome - didn't fix " ," correctly - changed it to " ,"
- Chung Ling High School - didn't fix " , ," correctly - changed it to ", ,"
- Comment (as in computer code) - was in
code tags
and was a list of characters seperated by spaces - Common Lisp - brought preformatted text out of line
- Dwight Schultz - didn't fix " ," correctly - changed it to " ,"
- Dynamic Time Warping - brought preformatted text out of line
- Ealing Studios - didn't fix " ," correctly - changed it to " ,"
- False programming language - messed up preformatted text
- Family tree of the Greek gods - brought preformatted text out of line
- Fermi-Dirac statistics - didn't fix " ," correctly - changed it to " ,"
- Foro de São Paulo - didn't fix " ," correctly - changed it to " ,"
- Fortran - brought preformatted text out of line (was in "pre" tags, not with a space at start)
- Francisco Javier Rodríguez - didn't fix " ," correctly - changed it to " ,"
- Frank Lloyd Wright - didn't fix " ," correctly - changed it to one less space (many times)
- Gnutella - didn't fix ", ," correctly - changed it to ", ,"
- Goodness and value theory - didn't fix " ," correctly - changed it to " ,"
- HP BASIC for OpenVMS - brought preformatted text out of line
- Hello world program in esoteric languages - messed up preformatted text
- History of Hamilton, Ontario - didn't fix " ," correctly - changed it to " ,"
- History of Hungary - didn't fix " ," correctly - changed it to " ,"
- ISU notables - didn't fix " ," correctly - changed it to " ,"
- Jacques Halévy - didn't fix " ," correctly - changed it to " ,"
- Jeffrey Zeldman - didn't fix " ," correctly - changed it to one less space
- Jury instructions - didn't fix " ," correctly - changed it to " ,"
- Kathleen Ollerenshaw - didn't fix " ," correctly - changed it to " ,"
- Lansquenet
- Levenshtein distance
- Lie derivative
- Linton, Cambridgeshire - bot never edited :O
- List of Australian highways - didn't fix " ," correctly - changed it to " ,"
- List of English words of Arabic origin - didn't fix " ," correctly - changed it to " ,"
- List of European cities with alternative names - didn't fix " ," correctly - changed it to " ,"
- List of Iranian Research Centers - didn't fix " ," correctly - changed it to " ,"
- List of historians - didn't fix " ," correctly - changed it to " ,"
- List of words of disputed pronunciation - didn't fix " ," correctly - changed it to " ,"
- Lugoj - didn't fix " ," correctly - changed it to " ,"
- Martyr
- Mount Ephraim
- My Life With the Thrill Kill Kult
- Nerd Boy
- Nivkh
- Obfuscated code
- Ombudsman
- Orthogonal polynomials
I will check these one by one and add explanations. From this we shall see what exceptions may need to be coded in. Feel free to update this list by pasting in new items as they show up here, but please leave tme to the analysis. r3m0t 00:09, Mar 6, 2005 (UTC)
Now it should remove about three spaces before a comma. I just call the same fixing function three times. r3m0t 13:31, Mar 6, 2005 (UTC)
Wikilinks
What's the policy on wikilinks? Communes of the Nièvre département removed a space from Asnois ([[Asnois , Nièvre|Asnois]] --> [[Asnois, Nièvre|Asnois]]), which is ok (good, even) as it was a red link; but if it hadn't been…yikes! Joestynes 06:11, 4 Mar 2005 (UTC)
- I can't imagine why there would be a space before a comma in a link (or, indeed, almost anywhere). Anyway, I guess it would have made the change, been reverted, and I would go to check it. r3m0t 07:24, Mar 4, 2005 (UTC)
Ellipsis
What's proper in a finite list after an ellipsis: x1, x2, x3, ... , xN or, after Grammarbot, without a space before the comma x1, x2, x3 ..., xN? Not sure there's any difference displayed after cdot in math markup. Sorry if my ignorance wastes any time. --Eddie | Talk 13:53, 4 Mar 2005 (UTC)
- How am I meant to know? I think that the second looks better. Anyway, if you put it in math tags Grammarbot won't correct it. r3m0t 14:34, Mar 4, 2005 (UTC)
- I think with space is proper (for values of proper involving TeX anyway). Gruepig
Need to fix something
I noticed in exponentiation the bot changed something of the format "a ,b" to the format "a,b" when it should have changed it to "a, b". Cheers. CryptoDerk 14:42, Mar 4, 2005 (UTC)
- Hmm... not so sure about that. What about numbers? r3m0t 15:05, Mar 4, 2005 (UTC)
- Yeah, I don't think it's necessary for the bot to add spaces where it thinks there should be. The exponentiation notation "a,b" is not necessarily wrong, anyway. --DropDeadGorgias (talk) 20:05, Mar 4, 2005 (UTC)
Yea, Grammarbot!
I figure you'd get a lot of dings about what didn't work. I'd thought I add at least one "attabot" for the many more that worked fine. I noticed about a dozen. Thanks. --A D Monroe III 21:45, 4 Mar 2005 (UTC)
space before commas
FYI: Grammarbot found and fixed the space before the comma in Japanese New Year, but only found, but did not fix, the space before the comma in Japanese poetry. BlankVerse ∅ 08:42, 5 Mar 2005 (UTC)
- That's... odd. I wonder why. Anyway, if in an hour it still hadn't been fixed, the article would go on the -2 list here and I would have looked at it. r3m0t 10:26, Mar 5, 2005 (UTC)
Also: I am wondering if it might be worth creating a page listing all the articles where you've wfound problems that needed correcting. The reason I am suggesting that is that I've noticed that when someone has gone through specific common errors on an article page that is in my watchlist, that is a good indication that there is probably other errors on that page, and a quick spell-check (I use the SpellBound extension in the Firefox browser) usually finds 3-5 more spelling errors on those pages. On the other hand, someone who was interested in following in the wake of the grammarbot looking for spelling errors could also just use the "User contributions" link. BlankVerse ∅ 08:42, 5 Mar 2005 (UTC)
- Well, I have a database of all these mistakes, so I can run a spellchecker on those articles if I like. Unfortunately, there are difficulties in running a spellchecker on Wikipedia text, including acceptance of regional spelling variations, masses of technical and foreign terms, latin phrases etc, and the inclusion of many rare proper nouns of names and places (Weebl and Bob anyone?) I could make it dump all the article texts to files, but it would take a long time. (One article per minute, and I would need to catch up on the backlog of about 1400 articles and growing) r3m0t 10:26, Mar 5, 2005 (UTC)
Coughing on one article
Medical_analysis_of_circumcision has been failing for ages (see end of today's log). I ought to make it give up eventually and go to the next article, I suppose. I'll investigate. r3m0t 22:42, Mar 5, 2005 (UTC)
- Maybe I'm banned! Grammarbot 22:46, 5 Mar 2005 (UTC)
- Well, the page is protected. No wonder. I'll try to be a bit clever about that. Grammarbot 22:49, 5 Mar 2005 (UTC)
- Thank you. (: Protected pages are now status -4 in the database. r3m0t 23:19, Mar 5, 2005 (UTC)
- Pages which were actually not possible to recieve were marked as protected! No worries, it's fixed now. I think. r3m0t 11:52, Mar 6, 2005 (UTC)
ASCII art
Careful with ASCII art there! Today Nerd Boy article got vandalized by this bot. There should be a check probably whether a line has a leading space to prevent further incidents like that. Grue 07:02, 6 Mar 2005 (UTC)
- Looking at the way my bot is programmed at the moment, that's somewhat difficult. Also, if you look at the list above, there are plenty of instances where this bot messed up preformatted text. I'll try to make it able to pass over not just math tags but also pre tags and pre lines. Note that there may be many instances in which there was just one space before the comma and the bot was not reverted and it therefore doesn't come up in the list. I'll leave it to your discretion whether to turn the bot off or not. (Try dividing the number of articles above in which it messed up preformatted articles by the -3 count at the bottom of this page, and multiplying that by the NULL count on the same page to get an idea of how many more articles this problem will affect.) r3m0t 11:30, Mar 6, 2005 (UTC)
- Don't worry, it's definately fixed now. There is still a small backlog of articles it has fixed incorrectly. r3m0t 13:06, Mar 6, 2005 (UTC)
Random stuff
I
"I will notice" this thing is not a person, loose the I and stop anthropmophizing(sp?) about your software.--Jirate 16:00, 2005 Mar 5 (UTC)
- I'm not the first to do this; see User:AngBot. There is also a page somewhere about never having personal attacks against Angela which I will find another time. r3m0t 16:49, Mar 5, 2005 (UTC)
- It's a slipery slope.--Jirate 23:31, 2005 Mar 5 (UTC)
- What comes next, then? r3m0t 23:51, Mar 5, 2005 (UTC)
- You stop giving the machine instructions, and move on to vague hints, soon your a web designer.--Jirate 23:58, 2005 Mar 5 (UTC)
- Web designers give precise instructions; it's just that some browsers don't follow them. r3m0t 00:03, Mar 6, 2005 (UTC)
The name
Why is it called Grammarbot when it checks punctuation, not grammar? --Angr 22:28, 5 Mar 2005 (UTC)
- Punctuation is grammar. On the other hand, this will be fixing HTML entities next... r3m0t 22:29, Mar 5, 2005 (UTC)
Neither punctuation nor HTML entities have anything to do with grammar. Now if you had made a bot that could fix dangling participles, sentence fragments, or subjacency violations (things like That's the man who I don't know whether took Martha to the dance last month), that would be a Grammarbot! --Angr 14:59, 6 Mar 2005 (UTC)
summary of change
Can grammarbot say what it is changing in the edit summary?
- "[[insect]]s , including" --> "[[insect]]s, including". Removed space before comma. I am a bot. Please revert my change if it was incorrect. I will notice automatically.
You can't even see what it has changed in the diff without careful scrutiny, since there is nothing to turn red. This will prevent us from having to view the diff at all... - Omegatron 16:09, Mar 6, 2005 (UTC)
should avoid pre and code
The grammarbot was blocked earlier today by User:CSTAR, presumably because of Poincaré-Birkhoff-Witt theorem. However, this was not an error: the bot changed "x ,x" → to "x,x", which is not worse than the original, although the full manual fix would be "x, x".
I have unblocked it. However, the grammarbot should avoid anything within <pre> ... </pre> and <code> ... </code>, because within these the spacing is significant (ASCII art etc).
-- Curps 19:26, 6 Mar 2005 (UTC)
- It does. The part in Poincaré-Birkhoff-Witt theorem was not in any such tags. On one false positive (which wasn't even false) CSTAR decided to block it. That's damn annoying. Apparently my bot went berserk.
- Of course, if there really are problems, CSTAR can please use the page I provided to turn it off, which prevents it from going screwy. Please provide examples.
- I'm sorry for the bile but I was hoping to finish this run a little earlier and have a wikiparty (is that a new word?). r3m0t 20:43, Mar 6, 2005 (UTC)
Well, it was just that I noticed some of the earlier edits at ASCII art did edit within pre and code, so I mentioned that, but presumably you fixed it along the way, after the early runs. Yes, the part in the PBW theorem page was not in any such section (sorry if my phrasing was not clear) and Grammarbot's edit to it was not an error.
Anyways, I did unblock it.
-- Curps 20:58, 6 Mar 2005 (UTC)
- Yes, thanks. Incidentally, next time remember to check for IP blocks. No worries, Raul654 was on IRC so he was able to help. :) I feel a bit stupid that I hadn't thought of such exceptions when I concieved (as in idea, not baby) the bot, and again when somebody requested it to exclude math tags. Actually, I think by the time I'de fixed that it was about half-way through. r3m0t 21:07, Mar 6, 2005 (UTC)
One thing grammarbot gets wrong
In mathematics articles, if one writes about an inner product < , >, obviously it would be wrong to change it to <, >. Michael Hardy 00:49, 7 Mar 2005 (UTC)