Jump to content

Wikipedia talk:Bot policy

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by The Cunctator (talk | contribs) at 05:33, 30 October 2002. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

One presumes that the problem which the following proposals address is Ram-Man's automatic geography-article generator.

Benefits bots can offer

  1. Provides a good template of pre-formatted data for contributors (see how the Newton, Massachusetts entry has been expanded; imagine ith the Periodic Table were used to start the 100+ articles for the elements)
  2. Potentially provides a unique resource not directly available elsewhere on the web (the small-town bot is a good example of a well-designed bot--see Ram-Man's description of the data acquisition process - uck!)
  3. Provides full coverage in cases where an a priori undeterminable subset of the data has a high likelihood of being (or becoming) interesting even though a randomly chosen entry has a low probability of being interesting / useful.

Inherent drawbacks of using bots in current system

  1. Adds tens of thousands of entries to Wikipedia that are unlikely to see a human edit any time soon (in fact, we could probably extrapolate the nearly exact rate at which they will get edited by seeing how many have been edited so far)
  2. Artifically inflates the perceived activity of Wikipedia
  3. Can be perceived as tilting (and possibly could tilt) the purpose of Wikipedia away from being an encyclopedia and towards being a gazetteer / Sports Trivia Reference / etc. This is also a problem with hand-generated imports from other resources.
  4. Danger of abuse by "vandal-bots", or just "clueless-bots". A bot running out of control could potentially cause heavy server load or even a denial of service attack.
  5. General complaints about interference with normal contributor operations, esp. Special:RecentChanges.

All of the following proposals should allow all of the above benefits and neutralize/eliminate all of the drawbacks. The goal is to end up with one proposal.

Proposal #1

Any graceful solution would provide the automatic functionality of the pros without the negative consequences of the cons. Bots would continue to be expected to meet the current criteria of usefulness and harmlessness--a good solution should reduce the potential for harm.

The general rule would continute to be "Avoid using Bots" unless it is the only practical option.

One example of the user experience for how the small-town data would optimally work, for example, would be to have a reasonably limited number of pages listing all the possible towns (perhaps by state), with links. If someone clicks on that link, they have the option of importing the small-town-bot entry.

An implementation of the solution is one that others have mentioned--tag the entries as "imported entries". Bots that add entries without tagging the entries as such would be banned, if there's no functionality for a random Wikipedian to do so mark the bot as an "importer" (that functionality lends itself to abuse--people could mark contributors they don't like as importers, however, so it's probably not a good idea unless that abuse is expressly forbidden and harshly dealt with).

It may be necessary to require/request that bots be registered beforehand, so that bots that run amok can be blocked. At a minimum there should be the encouragement to warn people about importation projects. Registration allows for quick reference by everyone, accountability for large edits, and the ability to block bad things quickly.

Imported entries would

  • be marked as ? pages (or at a minimum, ! pages)
  • not be listed on default RecentChanges
  • be listed on RecentImports (or some such) and/or BotsInProgress (or Currently Running Bots)
  • not appear on default RandomPage
  • show up under searches-by-name normally
  • appear if someone clicks on a link to the entry.

When someone clicks on that link, they would get an entry

  • clearly marked as an automated addition
  • with the choice to untag the entry on the edit page

People who want to hand-import entries from public/GFDL sources would use the same tag (thus "imported entries" rather than "bot entries". Are there any problems with killing two birds with one stone?)

A benefit of bot/import-registration is that users could change their preferences to make the bot's additions show up as normal links, or show/hide from Recent Changes.

It might be necessary to develop a "revert-bot"/"revert-import" functionality.

Proposal #2

Similar to proposal above, but without the "imported entry" tagging. This is a fundamental different. There should *not* be a special feature for a user to optionally "import" data into an article. Data from bots or humans should be treated on its own merits and not from who put it there.

Proposal #1 assumes that bot entries are a subset of imported entries.

Proposal #3

No special marking of bot articles, but all bots should "register" their plans in some place. Possibly, this could be technically enforced by having some kind of code which indicates an allowed bot (I'm not sure exactly how this should work).

Needs fleshing out: this proposal does not seem to answer drawbacks 1,2,3,5


"No bots" supporters: The Cunctator

"Keep bots" supporters: The Cunctator

"Avoid bots" supporters: The Cunctator Ram-Man (Only if it is better done manually. Humans are better at such things)

"Bots with restrictions" supporters: The Cunctator Ortolan88, Ram-Man (proposal #2), Clutch, user:sjc, --KQ Chas_zzz_brown, fonzy (with restrictions as in Proposal #1), Kpjas, The Anome "Uploadable bot-scripts" supporters: (explained far below as talk), Rlee0001


What is the Wikipedia policy on automated page creation? I notice that Ram-Man is currently entering statistics for every town in the US using some sort of script (unless he's a very fast typer.) I'm not sure that this is a particularly great idea. In the more general sense, I think that script-generation could get us into a lot of trouble (how do you revert vandalism when it's spread across five thousand pages?) Is there a page that would be more suitable for this discussion? Dachshund

I think it is fine. There have been a couple bumps in the road, but his bot's entries now appear to be correctly named, wikified, NPOVed and also have good and factual information. The only real policy on auto page creation is that you need to be very careful when you are doing it. For example somebody started importing hundred year old Eastman Bible Dictionary entries via bot and that caused an uproar: The entries were highly POV, incorrectly named, written in a pedantic Victorian prose and were incorrectly wikified (self links, multiple links, incorrectly named edit links...). The bot's IP was temporarily blocked and we worked everything out with the bot's creator on the Wikipedia mailing list. The city entries don't have these problems and also have the bare essentials that are needed for any city article; population and geography. And on top of this there is also demographic information. When complete this will be a unique resource on the net. What is better is that whenever somebody in the US looks

up their town they will find an entry in Wikipedia (and hopefully they will add some historical info to the article after finding it). If an actual vandal uses a bot then we will block that bot's IP. --mav

Three thoughts on batch page creation:
1) Special:Recentchanges is presently useless due to the town & county bot. This is the source of the irritation which led me to notice that:
2) The main page's count of Wikipedia articles is increasingly inflated -- we've gone from 60k to 70k awfully quickly, but:
3) These thousands of town and county pages are not encyclopedia articles, nor are the bulk of them ever likely to become same. They are atlas or gazetteer entries that have been converted to useless paragraphs rather than useful tables. The data are potentially valuable as such -- perhaps there should be a WikiAtlas? -- but they are no more encyclopedic than would be batch-added dictionary entries. --FOo


They are a bit telephone-directory-ish. I hope in future people will add colour and detail to them. It would be good though if bots like this went a little slower -- that was discussed before with the Eason's bot: only 100 every hour or less, please! Otherwise, as said above, RC is unusable, even with number of edits set to 1,000. We may have added 10k articles, but we haven't really added any value. Hundreds of core topics are still uncovered or amateurishly-written, and here we have a page for every one-horse town across the US. It won't project a terribly good image of wikipedia; that concerns me. -- Tarquin 20:20 Oct 21, 2002 (UTC)
I disagree that these entries are harmful. I just came across Auburn, California which is a small city near where I live. I've been meaning to write an article about Auburn ever sine I started the project in January but never did so because finding boring yet vital to have up-to-date population and geographic information isn't fun at all -- this is a perfect thing for a bot to do. So since all the boring to find info was already there I simply added a few external links, a history section and a short line in the intro on why this city is interesting. Granted many small towns won't ever be updated with more than what is there now, but most towns don't have much of any historical significance outside their own counties. So what if they exist in our database? They have correct info, are correctly wikified and named. Having every town, city and village in our database ensures that anybody in the US who is looking up information on their hometown via an external search engine will find that info here -

which makes these entries an important reader/contributor recruitment tool. Many of the same people will then update the articles with historical and other information. Yes, the US Census has this info but it isn't very readable or accessible and it can't be added to or its presentation improved. Can you think of another resource like this on the net (with 2000 data)? With that said, I also agree that Recent Changes is useless while Ram-Man's bot is at work. I wish there were a back-end way to import the 20,000 remaining cities/towns/villages/places. --mav

they are not bad pages -- but it's the Mithril argument again: newcomers clicking Random Page, finding pages and pages of middle-earth may think "encyclopedia! tolkienopedia, more like!"; finding hundreds of jargon file pages may think it's just a ton of hacker slang; finding these thousands of pages may think it's largely an encyclopedia of US towns. I am probably overreacting a bit, but we seem to be leaning every which way but toward serious core encyclopedic subjects: Arts, literature, science. There are plenty of minor novelists of the past centuries we don't say anything about, who are more important that these towns. I'm not against these town pages, but we must balance them! -- Tarquin
Obviously I agree with having the articles since I am the one making them. One thing I could do would be to make all the changes minor and then those changes could be filtered out by those who set up the option in their preferences. They are not minor, but maybe no one cares.
Since starting to add the information I have gotten comments from a number of people. One common idea is that without the articles in some form, people don't bother to add one line descriptions about a town because they want to avoid stub articles. I have had a number of people say that now they can add some information because the articles exist. In fact the RC's shows that people have been modifying their own town articles and adding some misc information. Unlike Maverick, I think that with an influx of users if many of them update their own home cities, then we can add quite a bit of new information. Also there is the possibility of adding other information automatically such as latitude and longitude, county seat information, etc.
I would vote to modify the "random" option to give city, state articles a lower priority. -- Ram-Man
You mean "like maveric" right? I was arguing for keeping the entries and allowing you to finish. --mav
Well you think that most of these entries will never be filled up with data. You once thought that these entries would never even be created. I think I did this just because you said it couldn't be done. So while I agree with you one everyone else, I don't believe that this wikipedia cannot grow to have those entries become much more complete entries. -- Ram-Man
There's nothing precisely wrong with the articles. In the future, perhaps automated pages could be saved on some other website as a static page, and only a link added from the Wiki page? Dachshund
I certainly don't think we should wipe them. -- Tarquin

Although I have been (and continue to be) a vocal opponent of automatic content creation and editing processes on Wikipedia, I think creating these articles is on balance a good thing. As Ram-Man suggests, they make good "seed" articles for people to add a sentence or two about their own town, and as long as they don't interfere with existing articles, that's good. However, I'd like to see the bot slowed down for two reasons: one, there is a strong presumption against bots here in general; the burden of proof is on the bot-maker to demonstrate that the bot is (1) useful, (2) harmless, and (3) not a server hog. If there's any doubt about any of these, the bot should be slow enough that humans have time to find problems, report them, and get them fixed. Secondly, the "Recent Changes" page is an important part of the Wikipedia user experience, and the fact that it is essentially useless while the bot is running is very annoying. Slowing down to, say, a page a minute would greatly improve the usability

of the system.

At any rate, I think it meets the "useful" test, and as far as I can tell from server logs, the bot isn't a major factor in server load, so that's good, but I think "harmless" should include not hogging the recent changes list, so let's keep it running, but at a leisurely pace. --LDC

It should be noted that I made a mistake of invalid data in some 2,000 articles. The bot repaired all of these. That is to say that if I make a stupid mistake, I will do my best to fix it. However going slowly has an important disadvantage, as pointed out by Maverick. The orphans page, which a lot of people apparently use, is full of lots of cities and townships. To fix these, I have to use the bot to update all the entries. At 1 modification per minute, this will mean that the orphan page is going to be unusable for possibly weeks or months. As I have suggested, I can make my changes minor and people can filter some of them out (partial solution). Going slow severly limits the progress I can make at fixing the various quirks that are introduced in all these entries because I simply wait for it to finish. This should be noted!

Let's say for instance that I use one modification every 30 seconds. That would be about 3,000 modifications per day. Essentially it would take me about 2 weeks for any change I decide to make to all the entries, such as adding latitude and longitude or fixing mistakes. -- Ram-Man

Theoretically speaking, we could set something up where the bot's modification times are fudged back a bit, so they wouldn't cover up the actual most recent changes. I don't know if that's a good idea, it's just a though. Bring it up on the mailing list. --Brion
Hm. Perhaps there should be another option in our prefs where we can turn off anything submited by a registered bot? Just give the bot's IP to the developers and then perhaps they could make each entry you sumbit marked with a B for bot. Displaying bot edits would be turned off by defualt in user preferences. But it is important that bots get registered somehow before this is allowed. --mav


I like the idea of registering bots, however, when could such a feature be implemented? When might *any* solution be accomplished? -- Ram-Man
Please keep tables as tables, instead of converting them to prose.

Anyway, it would seem that Wikipedia was never designed to handle bots.

And while you're at it, why limit it to the USA? Why not do England, Canada, Australia... why limit it to English-speaking countries? Why not do the wole world?? Clearly there is something absurd about this!

Besides, if you want to know about a town, do you really want just a bunch of numbers? Or do you want to know what is actually IN the town, such as malls, arcades, parks, etc.? Juuitchan

I would assume that Ram-Man is doing the US because that's what he's got census data for and that's what he's interested in. As far as additional data, yes, we want all that. But we can't have everything at once, now can we? --Brion 12:03 Oct 22, 2002 (UTC)
If I were to post every baseball score of every Major League baseball game ever played, with all the statistics and all that, it would be roughly analogous to what Ram-Man is doing. --Juuitchan
Not at all. Any encyclopedia would have the census information about the population, etc. The "problem" with the rambot is that it doesn't distinguish between the major leagues and the low minor leagues (and it doesn't fill in county seats), but just as I have beefed up his form letter for Newton, Massachusetts, where I live, and Valdosta, Georgia, where I was born, you can do the same for wherever you live and eventually we'll have them all, and, if you don't, we'll still have the basic information about your town. Ortolan88
County seats are on the agenda, along with latitude and longitude. And I agree, if I were the only one to ever work on these articles, *maybe* it would be a terrible idea. But I am naive and hope that other people beef up articles! -- Ram-Man
Speaking of baseball statistics, what is so terrible about adding them? I can understand if the only thing you added was the so-called unimportant ones, but this is supposed to be an all-encompassing (read: never-ending process) encyclopedia. -- Ram-Man
What I would really like is a complete set of football (soccer to you Americans) statistics. I will see about knocking up a bot to do this stuff. user:sjc
See above cited Valdosta, Georgia for some truly amazing football (football to us Americans) statistics. Ortolan88
I suspect that bots will do a lot of work on this encyclopedia as it grows into a bigger thing. I don't think this is all bad and it is to be expected. We have articles on places like Y, Alaska which has about 1,000 people in it. Other encyclopedia's would consider such a place worthless. But it has a cool name and 1,000 people care about it! This encyclopedia can grow huge, however, but without people flushing out articles, it will never be good enough. That's why I don't always *just* do geographic topics. -- Ram-Man
I for one am ecstatic to have these articles here. Even if they're light on local color, they're something, and the stats do give some general gist of the locality. I recall looking up my home town in Encarta lo some years past and being thrilled at the one or two meagre sentences I got along with a woefully out of date population figure; I'm sure I'm not the only one who looks up local stuff when discovering a new encyclopedia, and having a tantalizing beginning is both heartening (Wikipedia cares enough about my hometown to put in stats!) and encourages to direct action (and I can add more info!) Basic info on other subjects can I'm sure be similarly useful. --Brion 11:06 Oct 26, 2002 (UTC)
I just added to the Erick, Oklahoma (pop. 1023) entry that two wacked-out country-music stars came from there. Ortolan88

We need a policy on bots. It was grand and fun running amuck with my bot, but it did inconvienance some people. I'd suggest that if some bots are allowed (like mine for the geographic articles), that they be more controlled, for instance, have a section on the subject page which lists the currently running bots, the IP addresses (so blocks can easily be made), and what the bot is doing (explanation). -- Ram-Man


Ram-Man, I have little interest in your articles, but I'm wouldn't call them inappropriate. I actually find your effort comforting and reassuring. My only complaint is that the Recent-Changes page is pretty useless. Maybe we could petition the developers for a feature where we can filter out the entries by certain users? That is,if I don't want to see entries by Ram-Man, I could enable that in my Preferences.

But I also like the idea of having "bot" accounts identified; then we could have a Recent Changes page, and a Recent Changes by Bots page. That would clear up all my beefs; I don't really feel comfortable with adding a feature for users to filter out other users in the list of changes that they see. --Clutch 03:14 Oct 26, 2002 (UTC)

I think we have to wait and hope one of the developers finds time to do such a thing (It has been suggested above). -- Ram-Man

To address the issue of "random" pages and page count, for statistically scraped pages such as baseball games or towns: Could we set a flag indicating that (1) the page was generated by a bot; (2) the page has never been edited by a non-bot (henceforth called "somebody" ;) ).

Then, allow random pages and a page count only for pages that are not so flagged.

This would still allow people to look up and edit their hometowns, in order to add the information which makes it truly an article, and not "just" a row in database (albeit, a very nicely formatted database display).

It also allows us to consider a set of bot-generated pages as a single article; essentially equivalent to an article consisting of a (huge) table of data entries. We have the convenience of viewing each row in the table in a nicely formatted fashion.

Once a page is edited, we have distinct information supplied; as well as a smidgin of evidence that at least somebody gives a hoot about Ice Worms, Alaska (or game 3 of the 1907 hockey playoffs). Chas zzz brown 03:21 Oct 26, 2002 (UTC)


I changed the voting categories above. "Supporters of 'avoid bots'" and "Opponents" weren't going to be clear categories. I also added a third vote, "Support with restrictions", by which I mean things like, require labelling of bot-produced articles, hold bot-produced articles back until requested by reader, as suggested passim, such as just above here. Ortolan88

Maybe this whole naming thing is going to be messy anyway! The categories are not necessarily mutually exclusive. I just copied the format from other policy pages! -- Ram-Man

I meant that "I support avoiding" and "I oppose avoiding" read like conceptual "double negatives" and so I restated positively to make them clearer. The residual category I added so there would be something I could vote for. Ortolan88


"Keep bots with restrictions" is obviously a difficult category. Some bots -- the "automatic spell checker" -- I would oppose entirely. Others -- "the ancient Bible dictionary" -- I would make into some kind of request-filling engine, if someone wanted an article on Hagar or Haman she could request it without our accepting the entire musty content of that old dictionary. On the towns and counties, all I would expect would be a note "This entry is derived from census data." with a suggestion that users are invited to extend it. In other words, the restrictions would be imposed on a bot-by-bot basis. Ortolan88

When I was running the bot I posted my IP address on my talk page in case anyone needed to quickly block the bot. I don't know if anyone saw it, but someone in the discussion mentioned registering a bot before using it. Now whether that means programming a special feature or merely good-faith posting of the IP address on the Bot page is not for me to decide, but it was one thing that was requested in case things went wrong. -- Ram-Man
That seems fair to me (simply posting the IP in a good faith effort). Other bots will be apparent from Recent Changes, unless it's randomized somehow. Anyway, if one is quick enough to be noticable, it's quick enough to see and block, and if it's not, then it's slow enough we can revert. --KQ

I should probably note my methodology: I downloaded all the mass amounts of imformation from the United States Census Bureau. It was all in multiple files and a mess. I had to combine it all together, clean it up, etc. I did all this in a spreadsheet. I also created a number of new categories like % water which I could easily calculate from the data. In doing so there were a number of naming problems. I still have not worked all those problems out, but I was *aware* of them. Then I exported the data and moved it into a MySQL database. From there I created all the information for U.S. Counties. I did all of them by hand (some 3,000) and it took a long time. I decided to write a bot to do what I would do anyway but on a larger scale. Nevertheless, I added the cities for Alaska and Alabama by hand to make sure before I ran the bot that it wasn't doing stupid things. The rest of the errors were caught by others watching (or not yet caught by anyone?). I like this place too much to not be careful, but I probably could have been even more careful! It should be noted that someone noted an error that around 2,000 articles were bad. *ouch*. It was actually easily corrected, but it did raise a red flag! -- Ram-Man


Haven't read all of the above (yet), but some remarks:

  • If we run bots without a user name, every admin can block them whenever necessary. This is (right now) not possible for usernames.
  • There might be a note on this page that a certain bot is currently active. This way we can still see who is operating the bot
  • Let all edits by a bot be done using "minor edit" on - that way we can at least read RC, by filtering out the minor edits (use your user preferences for that).

Jeronimo


I think there's more to be won by regulating the kind of bot entries rather than dealing with them. I think the "ShitHole, SomeState" articles outrank many other articles in quality, and I see no reason to to mark them as different. It would be completely different if somebody were to add articles with "ShitHole is a truckstop in SomeState, USA." as the contents.
So, I'd say, don't do anything with them at all, as long as their content is reasonable and encyclopedic. I don't even care about the random page feature. The only reason I use it for, is to find stubs. These articles are also stubs. Granted, I might not be able to tell anything about most of them, but that's not a problem to me. So if the Random function were to change, it should be coupled to articles being (or not being) stubs, and not to being uploaded by a bot. Jeronimo

There definitely should be pressure for imported entries to be of high quality. Bot-registration would allow individual users to pick and choose which entries they consider ready for prime time.

Note that the proposal would have the imported entry show up on a search, so the content wouldn't be hidden from someone just looking for specific information on that subject. --The Cunctator

My point is that why separating perfectly good quality stub articles from other articles, while there are loads of other crap on Wikipedia added by normal persons? We might just as well have articles not viewed or edited by a second person marked. Jeronimo
The short answer is that bots don't sleep. A major part of the reason that Wikipedia's quality doesn't devolve is that all parties have essentially equivalent resources--even the most prolific individual can't do more work than, say, 10 people. But bots are a completely different story.
Note that the proposal doesn't call for a primary distinction of bot/non-bot--it calls for a distinction between imported/non-imported. --The Cunctator
Once again, the fact that bots add articles quicker than non-bots has no implications on the contents of the articles. We should be just as vigilent for normal persons to add crappy, bad, NPOV, copyrighted, etc. contents as for bots, or importers, or whatever. Therefore, there's no reason to treat such articles differently. Jeronimo
I don't know that I agree that there are no implications on the contents of the articles. I think that the small-town bot, and bots in general, are most useful (an most likely to be implemented) when essentially importing/reformatting tabular data (such as the census data entered by Ram-Man).
If a bot were able to actually write articles even as bad as the worst stub, that would actually be an acheivement; since the stubs rarely have a common theme or format (apart from being short).
In a "conceptual space", stubs are generally far apart in terms of the actual content they cover; whereas bots by their nature are going to focus the added content in one area. To my mind, bot generated articles are just tabular data, reformatted in a pretty way; but they might as well be a single, extremely long article which presents a table. That's not a general characterization of stubs - 100 stubs couldn't easily be combined into a single article (let alone 20,000). Chas zzz brown 02:07 Oct 28, 2002 (UTC)
First of all, stubs combined into single articles are not useful. Someone looking for an article on one of the small towns would not find an article to satisfy themselves. It would also be a mess to create full articles. People wouldn't even bother trying to find the stub.
Secondly, the tabular data *is* interesting and useful in various cases (like non-tabular data!) and as such they should be formulated in a way that is approachable. When I first started adding statistical data (to U.S. states) I had Zoe tell me that to her the numbers are not really meaningful because she is not particularly number oriented. For people like this, tabular data is nearly useless, so putting it in an accessible format is important. Very few people would look at tables of data (Trust me, 35,000 city entries make for very large tables!)
Thirdly, and I should have stated this earlier if it was not clear, all the entries created would have been done eventually by hand had I not done it with a bot. I just employed my programming skills to save me a *lot* of time. Don't believe me? I did 3,000 counties and all the cities and Alabama and Alaska by hand. I'm just obsessive that way, that and I did it just because I could.
Fourthly, a good point was raised. These "stubs" are larger than the current median and average article size. Now even though they are stubs, they are more useful than a lot of other articles that are created hand. The big issue is that people are biased against bots. No one has complained one bit about the county articles but I hear a lot of complaint about the bot added cities. I bet no one even knew that the Alabama and Alaska entries were entered by hand! The articles are almost equivalent, but people don't like one because a bot did it.
Fifthly and Lastly, a bot creating *bad* articles is can be far worse than any vandal destroying pages. -- Ram-Man

I agree that the problem of bots is not necessarily that they are mechanical. I see the core issue as that of imported material. (There is the secondary but real concern of bots that run amok--but that's not an issue with the entries themselves.)

It seems to me that the proposed policy above does a reasonable job of answering (nearly) all concerns without placing a burden on people who want to add these kinds of entries. --The Cunctator

Thoughts on the proposal: I don't like the idea of flagging entries as bot entries or import entries. Data is data and it should be treated as such as I mentioned above. There should be no choice to import or not. Pages need to be static and able to be easily changed at will. On the other hand, having some way of marking entries would be useful so that large edits can be reverted in the event of a large-scale mistake because they are already pre-filtered. One thing I do think is mandatory is some way to clean up the Recent Changes. This means having some way to register a bot (just a list maybe?) and some way to either filter out the bot entries or have a separate recent changes for bots. Let's also remember the spirit of Wikipedia. We are not trying to make hard and fast rules. Easy guidelines should be good. That is why I am in favor of a system that is based on good faith. I will post a more complete proposal that I would approve of soon. -- Ram-Man

1. I think there may be some confusion of meaning here--I too think that all entries should stand on their own merits. The proposal #1 wouldn't distinguish between bot entries and import entries.
It assumes that bot entries are a subset of imported-data entries.
Do you believe this is an invalid assumption?
2. I moved the proposals off the main page because there's a fundamental point of contention. I also tried to merge your points with the first proposal respectfully. There only seemed to be the one major point of contention. Of course, you should certainly revert the merging if you believe it was done improperly. --The Cunctator

Uploadable Bot-Scripts

Process of creating script:

  1. Goto Special:Bot. Click "Create New Bot".
  2. Name the Bot and add usage description. Write or paste the script. Click "Add Bot".
  3. (Bot is added. User is sent to its page.)
  4. Click "Run Bot".
  5. (Bot's DATA section is interpretted)
  6. Fill out prompts. Upload any required databases. Click "Continue".
  7. (Bot's CODE loop is executed)
  8. Review Change-log, click "Save Changes" or "Cancel".

The run of the script should occur async to the user who should be able to view the bots progress by returning to the bot's page. The bot's progress should appear immediately in articles as the bot is processing its script. The execution should be both cancel-able and undo-able on the bot's page. A change-log should appear from the bot's page but not in the list of recently updated pages (enless the viewer of recent updates elects to view bot activity). One entry in the list of recently updated pages should exist for each bot per hour (linking to the bot's change-log). Use of external bots should be forbidden. Rlee0001 09:47 Oct 28, 2002 (UTC)

Idea is nice, but very dangerous. Uploading scripts... Jeronimo
I'm thinking, proprietery language. With commands like IfExists, CreateArticle, AppendArticle, ReplaceArticle, MoveArticle, InsertIntoOrderedList, InsertIntoUnorderedList, InsertSection, UpdateProperty, OpenDatabase, CloseDatabase, MoveTo, RecordCount, MoveNext, MoveFirst, MoveLast, MovePrevious, SaveSetting, LoadSetting, OpenArticle, CloseArticle, RemoveArticle, ClearArticle, RedirectArticle, CreateRegion, MoveRegion, PositionRegion, BoldRegion, ItalicRegion, IndentRegion, InsertOrderedList, InsertUnorderedList, AddProperty, InsertCode, CodeRegion, InsertTable, InsertImage, InsertLink, UpdateLink, GetFirstLink, GetNextLink, GetFirstSection, GetNextSection, GetSectionTitle, GetLinkTarget, GetLinkText, SetLinkTarget, SetLinkText, SetSectionTitle, SetSectionText, RemoveLink, RemoveSection, RemoveTable, RemoveCode, RemoveUnorderedList, RemoveOrderedList, and so on and so on... Perhaps:
OnError {
  EmitLog ("Unable to Initialize, Quitting.");
  Return (1);
}
RecordSet CanadianCities;
Article ThisCity;
CanadianCities = OpenDatabase("CanCits.mdb", "CityTable")

ForEach (CanadianCity, CanadianCities) {
  OnError {
    EmitLog ("Skipping article: [[" & CanadianCity.CityName & ", " & _
      CanadianCity.Province & "]]. Unknown error working with article.");
    MoveNext
  }
  IfArticleExists (CanadianCity.CityName & ", " & CanadianCity.Province) {
    EmitLog ("Skipping article: [[" & CanadianCity.CityName & ", " & _
      CanadianCity.Province & "]]. Article already exists.");
    MoveNext;
  }
  CreateArticle (CanadianCity.CityName & ", " & CanadianCity.Province);
  ThisCity = OpenArticle (CanadianCity.CityName & ", " & _
    CanadianCity.Province);
  ThisCity.AppendWikiText ("" & CanadianCity.CityName & ", " & _

    CanadianCity.Province & " has a population of " & _

    CanadianCity.Population & " and a total land area of " & _
    CanadianCity.LandArea & ".");
  ThisCity.Save ("Creating stub, added population and land area.");
  ThisCity.Close;
  MoveNext;
}
Return (0);
Just an idea. Rlee0001 21:40 Oct 28, 2002 (UTC)
This makes it a little safer, but not much; I can still upload a database with all article names and overwrite everything with "Wikipedia!" or so. Even if you let the user only fill in the last part of your code (were a new article is created), it's still kinda dangerous. In fact, I'm afraid having this possibility will attract more "vandals" than when allowing only self-built bots (because that means somebody will actually have to do some work). Jeronimo
Well I was thinking that the execution of the bot would be "Undoable" as one single edit. Obviously each individual article can also be reverted/edited, but if a bot made a mistake, as long as the bot's log were in a server-readable format, any user can revert the entire bot's run by simply openning the log for the run and clicking "Revert All" (Or whatever). The only exception would be, if an article were modified after the bot modified the article, the undo option would have no effect on that particular article becouse of possible edit conflicts. Another words, each "bot" should have a "History" page which lists all the runs that the bot has made. Possibly like this:
Openning one of the logs above would give you a "Modification List" exactly like the Recently Updated Pages page. So that people can see what articles were affected and see the DIFF between article versions and so on. And of course, Revert All becomes possible. Furthormore, becouse the script is server-side, the 'administration' (or whoever) can grant or deny any user access to the bot feature on the per-run bases, per-bot bases or indefinatly. Furthor, the server can enforce a limit on how many edits any given run can make. This forces users to break runs down into sections so that if a catestrofic error were found, it would only effect a limited number of articles (say, 1000). Between runs, a bot can save settings and restore settings (like the last record number used). Rlee0001 09:23 Oct 29, 2002 (UTC)
Sound like a lot of work to implement, but a good idea! In this way, we tackle the problem at the root, and there should be no need for "tagging" or anything. Jeronimo