Jump to content

Wikipedia talk:Plagiarism

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Carcharoth (talk | contribs) at 23:08, 22 June 2008 (The need for specific attribution in relation to plagiarism: clarification). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

User:Andries/Wikipedia:plagiarism and when you copy something do not forget to cite me :) Andries 20:26, 12 November 2006 (UTC)[reply]

Proposal to separate this from a redirect to Wikipedia:Copyright

Plagiarism is a broader topic than copyright infringement. Suggest we change Wikipedia:Plagiarism to a soft redirect with a proposal template and work out a functional definition of what plagiarism is and how to avoid it. DurovaCharge! 22:00, 19 June 2008 (UTC)[reply]

    • I don't know what a "soft redirect" is. Let's just make this not a redirect, and work through policy. Here are some issues I raised with Carol Spears before:

Unique descriptions and phrases copied exactly from books must be put in quotation marks as I did with "in the rock crevices and water-receiving depressions". It is not enough to correctly attribute the source, if the same exact phrase is used it must be in quotation marks. --Blechnic (talk) 00:22, 4 May 2008 (UTC)[reply]

In this case I would add, in addition to unique descriptions and phrases, entire sentences or longer portions of text. It's a simple guideline. --Blechnic (talk) 15:59, 19 June 2008 (UTC)[reply]

Al, plagiarism is plagiarism, but putting stuff in quotation marks does not give us the right to use as much as we want, and this runs into copyright violations, where all or most or large, inappropriate, non fair-use portions of a text are incorporated whole into articles on Wikipedia, for example, this well cited article:

--Blechnic (talk) 22:53, 19 June 2008 (UTC)[reply]

I think we should do one or the other: ignore the copyright issue, it exists but at a certain level is wholly separate from the issue of copy-pasting anything, public domain, free license, works-for-hire - not a legal topic but rather an ethical topic; or, lets pretend that plagiarism of a PD work is exactly equivalent to a copyvio. We should pick one or the other approach, else this discussion will always be disrupted by the interested editor interjecting "that's a copvio!" - but that's not what we're about here, right? Franamax (talk) 01:47, 20 June 2008 (UTC)[reply]
I support this suggestion, just making a policy on plagiarism, after all, that's what we came for. --Blechnic (talk) 02:31, 20 June 2008 (UTC)[reply]

Resources

Wikipedia articles

Internal pages

Page Key section Type Highlight Note
Wikipedia:Copyright policy copyright and plagiarism are distinct concepts
Wikipedia:Copyright problems Plagiarism that does not infringe copyright process “If an editor has copied text or figures into Wikipedia without proper attribution, politely refer him to Wikipedia:Verifiability, Wikipedia:Citing sources, and/or Help:Citations quick reference. Editors who have difficulties or questions about this guidance can be referred to the Help Desk. Editors engaged in ongoing plagiarism who do not respond to polite requests may be blocked from editing.” copyright and plagiarism are distinct concepts
Wikipedia:Verifiability policy “All quotations and any material challenged or likely to be challenged should be attributed to a reliable, published source using an inline citation.”
Wikipedia:Manual of Style guideline “The author of a quote of a full sentence or more should be named; this is done in the main text and not in a footnote....”
Wikipedia:Quotations essay
Wikipedia:Citing sources guideline “Wikipedia:Verifiability, which is policy, says that attribution is required for direct quotes ...”
Wikipedia:Attribution 2007 proposed policy attempt to merge Verifiability with WP:No original research; did not achieve consensus
{{citequote}} article template "Use this tag, {{Citequote}}, for quotations that are used without a citation." Category:Articles with unsourced quotes

Public domain materials and templates

Previous discussions

Please add more links as needed. Carcharoth (talk) 23:16, 19 June 2008 (UTC)[reply]

Starting some text

I like Andries plagia-def, but I'm not satisfied with it. The "dishonestly" bit seems to me to exclude good-faith copying, which was evident in the recent incident which brought us here. Here we need the WP definition of what is and is not plagiarism. I'm also not comfortable with the ORI definition, since we are all about totally incorporating other people's ideas, any ideas other than our own in fact. Thus, I'm copying over my proposed initial text, to be dismembered at will:

  • "Plagiarism is the copying of material produced by others, either verbatim or with only minimal changes, without attributing that material to the original author. Material can be plagiarized from books and other printed media, websites, and GFDL-licensed works, such as the work of other Wikipedia editors. The copyright status of the work is irrelevant, directly copying a public-domain work is still plagiarism unless the original work is noted. Material in infoboxes (corporate data, species taxonomy, etc.) is not considered as plagiarized." Franamax (talk) 03:17, 20 June 2008 (UTC)[reply]
I'm not sure how the "other Wikipedia editors" bit works. Some editors release their contributions into the public domain (but there is no obvious indication of that), but more relevantly, copying of other editors' work takes place all the time. Attribution is sometimes only through the page history or an edit summary, rather than in the text. When you rewrite any article, you are adding yourself to a general list of authors, and kind of taking 'general credit' for the resulting article, so talking about plagiarising doesn't make sense here. What is plagiarising is if you use a Wikipedia article (or part of it) outside Wikipedia without crediting Wikipedia (ideally you provide a link so people can look up the authors, but at minimum people need to make clear "this is from Wikipedia - I did not write this"). The infoboxes thing needs tweaking as well - most data (but not all), but not "distinctive phrases of text". If you say in the infobox that someone is "best known for being 'The Man of Steel'", you should quote it and make clear it is a promotional, probably trademarked, tagline, not a description that you wrote (eg. "best known for climbing Everest"). Not the best example, I know, and the distinctiveness of the phrase means most people would realise what you meant, but still. One more point: "directly copying a public-domain work is still plagiarism unless the original work is noted" - people will think that it is enough to put a reference tag and give the source. That is not sufficient. You need to make clear by the layout of the text that the wording is not ours. In other words, the quoted text needs to be offset from the surrounding text, or the right template put at the bottom to indicate that the article is substantially (or wholly) from this PD source. Quite when the line is reached when incremental rewrites mean no "distinctive trace" of the original is left, I don't know. I don't think anyone is going to go back through all the 1911 stuff and check for that. Carcharoth (talk) 06:48, 20 June 2008 (UTC)[reply]
Plagiarising other wiki-editors can happen when you lift text from one place and drop it in another without giving attribution. Here's an example where another editor liked my work well enough to copy it elsewhere, with attribution. Another example would be copying text from a deleted article, Mr. wikibiz got all tied up in a knot about that a little while ago. Moving text within an article is fine of course, since it's tracked by the history.
The wording on infoboxes does need clarification, I was thinking of factual items that are impossible to restate.
And yes, this guideline needs to lay out exactly when and how you indicate that you've directly copied a source, preferably with some examples. Franamax (talk) 17:36, 20 June 2008 (UTC)[reply]

Copying from elsewhere on Wikipedia

(new sub thread)

As per Carcharoth, I normally only attribute Wikipedia copies with an edit summary. With that summary, it is similar in principle to cutting a paragraph from one section of an article and moving it to another. This is also normal practice for people who translate articles among the different language Wikipedias. Are you suggesting that this is somehow problematic, though we say on every edit page If you don't want your writing to be edited mercilessly or redistributed for profit by others, do not submit it. ?

--Hroðulf (or Hrothulf) (Talk) 19:49, 20 June 2008 (UTC)[reply]

Is there something unclear about "without attributing that material to the original author"? If you attribute in the edit summary, as in the example I gave above, you're doing it right. If you copy someone else's work and pretend it's your own original material, it's wrong. Translations are just the same, if you say "translated from de:wiki", you're giving the traceback to the original authors. If you translate it then say "I wrote this article", you're doing things the wrong way. This is why we need a guideline, to lay out what's acceptable and what's not. Franamax (talk) 20:00, 20 June 2008 (UTC)[reply]
I understand your drift now. Lets add a sentence to one of the existing guidelines or new editor guides to say that. --Hroðulf (or Hrothulf) (Talk) 20:26, 20 June 2008 (UTC)[reply]

Why?

why do we need a policy for plagiarism of public domain works. It is not illegal and lack of citation is already covered by established policy. Lack of a policy does not indicate need of a policy. Jeepday (talk) 13:28, 20 June 2008 (UTC)[reply]

The difference is between: (a) copying text and sticking a reference tag on something; and (b) explicitly saying (with quote marks or naming the source in the text) that you are using someone else's words to express their idea or concept. Wording like The event was described as "horrific" (Baker, 2001) and "horrendous" (Smith, 2007) as opposed to This was an horrific event <ref>Baker (2001)</ref>. Other forms (both acceptable) are This was an "horrific" event <ref>Baker (2001)</ref> and Baker, in his book published in 2001, described the event as "horrific". Again, not a great example, but the exact approach to take always depends on the exact context. As regards public domain material, Durova gave the example of Felbrigge Psalter. Look at the blockquoted section from Davenport (1903). If you fail to use blockquotes (or something similar), or fail to say "Davenport describes the back cover in the following manner", then you risk misleading the reader as to who is saying what. Consider it as being the difference between the editorial/authorial voice of the multitude of Wikipedia editors, and the voice of the sources. You can rewrite the latter to get to the former, but insufficient rewriting is plagiarism, and quoting without attribution is also plagiarism. Is that clearer? Have I got any of that wrong? Carcharoth (talk) 13:47, 20 June 2008 (UTC)[reply]
Sure it is wrong, but we are not going to block anyone for copying a public domain work. It happened en masse about 2002 when articles were imported for EB 1911 (a lot of those editors are probably highly respected admins now.) Another editor will come along and add the quote marks. That is the wiki way. Why do we need a policy for what editors do anyway? --Hroðulf (or Hrothulf) (Talk) 19:39, 20 June 2008 (UTC)[reply]
I'm a bit twitchy about the citation on the 1911 EB import, but at least the effort was made to credit the source—as far as I know, every one of those articles was started with a Template:1911 attached. As well, much of the 1911 material is dated in style or content, and I expect that it will tend to gradually erode out of Wikipedia as we get around to updating those articles.
I absolutely think we should block anyone who – after a warning – insists upon copying public domain material without attribution. Indeed, I will block any such individual who comes to my attention. (I would give credit to anyone who makes a good-faith attempt to cite their sources, of course—many people have never been taught how to properly footnote, and wikimarkup can be daunting even to academics. Just pasting a URL after a block of quoted text is enough of a pointer for a wikignome to use, and demonstrates the intention to give proper credit.)
Of course, I'm not sure that it's always wise to look back to the way things were done in 2002 to govern how we ought to manage things now. Just as one might say "another editor will come along and add the quote marks" now, back then someone might say "another editor will come and fact-check John Seigenthaler's biography". Out of that mess, we got a dreadful amount of bad press, and harsh policy imposed rapidly from on high: WP:BLP.
Do I expect this to become another Seigenthaler incident? Well, probably not. The press don't tend to have the patience or the appreciation for nuance to present this type of issue—but I could be mistaken. Still, this does have the potential to have a slow, steady, erosive effect on Wikipedia's reputation, and is likely to be most damaging among the experts whom we most want to recruit to our project. At some point – and I think it should be now – we have to stop saying "someone will fix it eventually" and start saying "let's start cleaning up, and let's not let it get any worse". TenOfAllTrades(talk) 23:01, 20 June 2008 (UTC)[reply]
Agreed. I also just found Category:Attribution templates (some useful ones there - there are a lot more than just the 1911 ones), another ANI debate here and I remembered my example of an on-wiki collection of PD-material - see here. Your comment (in the edit summary) that we need to step up is quite true. Similar sentiments were expressed here:

"When Wikipedia was young, the threshold for copying was similar to a blog or diary. Now that Wikipedia is established, firm and harsh rules must apply. Wikipedia must follow the same rules as print encyclopedias. No copying, no plagiarism, no moving a few words around. Those who do must be notified and asked to stop. We have to start acting like a trustworthy group, not a band of kids writing half-copied term papers. We also need to have good customer service and courtesy, not gossip, IRC, etc." Model710 (talk) 18:16, 19 June 2008 (UTC)

I agree absolutely with what Model710 said. Carcharoth (talk) 23:27, 20 June 2008 (UTC)[reply]
(edit conflict) To TenofAllTrades: After 1 warning? Boy, you are strict. We don't do that for vandals.
By the way, I agree that the EB 1911 import is horrible, and the pesky tag should be deprecated in some way. As far as I can tell we didn't have history back then (I am a 2006 newbie) so it is not always clear even to an editor where EB ends and Wikipedians begin. This is how I like to extensively copy from public domain sources, if I really have to, which is more or less how the Manual of Style tells me to.
We already have several policies and guidelines that cover public domain plagiarism and attribution in some detail. (I added some links and quotes above.) Maybe we need to add a brief sentence at or near Wikipedia:Copyright problems#Plagiarism that does not infringe copyright reminding people that changing a few words in a sentence or stream of thought is still copying? Instead of a brand new guideline page, perhaps what might be useful is a tutorial to help people teach themselves the difference between original writing and original research, as that is something distinctive to encyclopedia writing that you won't get in school.
After all that, all I am saying is: this is much easier to do than negotiating a new guideline page, getting consensus for its contents, and getting it read by new editors.
--Hroðulf (or Hrothulf) (Talk) 23:46, 20 June 2008 (UTC)[reply]
That is an interesting link you provided to that astronaut bio with large chunks of quoted text. The "1911" way can be seen in the articles using the templates at Category:Attribution templates. Some examples: Template:Factbook, Template:Catholic, Template:1728, Template:Appletons, Template:Harper's Encyclopedia, Template:A Short Biographical Dictionary of English Literature, Template:USDA, and so on. It seems there was never really a debate over which way was better. Carcharoth (talk) 23:59, 20 June 2008 (UTC)[reply]
For what it's worth, when I said 'after a warning', what I really meant was 'after whatever warning framework we decide is appropriate has been employed'. I also didn't state that we would start with a ban indefinite block. Still, from the standpoint of protecting the integrity of the project, there are actually sound reasons why we might treat (some) instances of plagiarism more harshly than simple childish vandalism.
Detecting plagiarism in the first place is not necessarily straightforward. Often, material is plagiarised because it is good: well-written, detailed, professionally copyedited. As editors, we're sensitive to the insertion of content that is sloppy—stuff that introduces errors of spelling, grammar, style, or fact. We're usually quite grateful to see a few paragraphs of clean prose. Moreover, unless the contributing editor has left obvious web page formatting features behind (manual carriage returns at the end of each line, etc.) it can be very awkward to approach an editor to ask "Gee, your edit to foo looked awfully good. Did you really write that?"
As well, it is rare for a plagiarist to be caught after the first instance. An editor may copy material into Wikipedia for months or even years without the problem coming to light. Plagiarism is not as conspicuous as page-blanking vandalism. I have seen instances where a 'helpful' editor goes through a copy-pasted block of text to correct formatting and style and to add wikilinks. (WP:AGF is usually a Good Thing, but honestly—once a block of text is wikified, it is unlikely to receive further scrutiny.)
Enforcement is more challenging than for vandalism. A plagiarist may only be editing a few articles per week. Unlike the vandal who runs through three warnings in a couple of hours, followup is more difficult. I know of few admins whose memories are sufficiently reliable that they could consistently say, "Okay, I warned User:Plagiarist two weeks ago to start using citations; I guess I had better go back and check all his contributions since then." It is much less likely that other editors will spontaneously pick up on plagiarism compared to simple vandalism. Warnings may be less effective—in blind naivete, a good-faith editor may say "But I've been writing this way for years—why am I being hassled now? The admin's probably wrong." (The bad-faith editor might also observe that "Only one guy's noticed in two years...I can ignore the warning.")
We just don't have an infrastructure that is well-suited to catching plagiarised text. With images on Wikipedia, we deal with a discrete chunk of data; it has source info attached, and can be checked up on. (Perhaps instead of mentioning WP:BLP above, I should have brought up WP:NFCC as an example of where we were doing something that wasn't right by the lights of the project, and we finally got around to doing something about it.) With bare facts in the article body, we can demand citations to support claims, we can add {fact} tags, we can edit out unsourced statements. With plain prose, we have no straightforward mechanism for weeding out plagiarised stuff. It sits there, getting more and more deeply embedded—Wikipedia's dirty little secret. TenOfAllTrades(talk) 14:43, 21 June 2008 (UTC)[reply]

It may have been decided which was better without a debate. Someone wrote the block quote way into WP:V and WP:MOS and somehow it gained consensus. It is decent, honest and reasonable. Other people saw the 1911 example and continued to emulate it. (That was how Piers Sellers looked before I added the block quotes.) We should think of a kind and gentle way to stop them.

The 1911 example makes me uncomfortable for a number of reasons, but plagiarism is not the main one (after all the tag puts our collective hand up to that). The main issues for me are verifiability and accuracy (as none of those imports that I have seen yet cited EB 1911's sources!), and the stuff that was added after the import that never got attributed. This, to me, seems worse than the newcomer who comes along and writes a new article out of his head because at least we know that it did come from his head, and we should check it and fix it.

--Hroðulf (or Hrothulf) (Talk) 00:21, 21 June 2008 (UTC)[reply]

I've gone ahead and made a hack-job of a start for Wikipedia:Plagiarism. I do think it's an important enough topic to rate it's own guideline, so it can be easily linked. OTOH I won't shed a single tear if that mess I made gets reverted back. Oh yeah, can anyone spot the bit I plagiarised? :) Franamax (talk) 00:40, 21 June 2008 (UTC)[reply]
Started a new section below to start discussing the actual wording of the page. Carcharoth (talk) 08:03, 21 June 2008 (UTC)[reply]

Come to think of it, this is why we need it (some context). MER-C 13:41, 21 June 2008 (UTC)[reply]

Wording of the proposed policy or guideline (1)

Thanks to Franamax for being bold and making a start on this. Some of the previous discussions and links in the "resources" section above will be useful, so we should mine stuff from there (um, with attribution within reason - attributing stuff from other Wikipedia pages in honoured more in the breach than in reality - altering stuff within a page is dealt with in the page history - acknowledging movement of text between pages is, and always has been, more problematic - many people don't see the difference). Anyway, my changes are offered up below for review if needed:

  • Example of how to quote and attribute material from other Wikipedia pages. [1] The astronaut example should also be given, I think, as that is more relevant here.
  • This bit here is a critical point. Is it right? Could it be made clearer?
  • I'm uncertain how to phrase the common facts and data bit. My attempt is here. Improvements and corrections welcomed.
  • I tried to address plagiarism of copyrighted works with this bit here. Again, improvements and corrections welcomed.
  • What else is needed?
    • The overall structure need an overhaul - some stuff in the lead section needs summarising and being moved to its own section below.
    • The resources section needs writing, with suitable external links - what links would be best?
    • The set of templates used needs to be overhauled and tidied up. Probably some new ones are needed as well, though that can swiftly become very bureaucratic.

Anything else? Carcharoth (talk) 08:03, 21 June 2008 (UTC)[reply]

Cripes Carch, you go on at length, I expand at the slightest provocation. We might as well bring in FT2 and ask NYB to do a cameo :) Which one of the Norns was it who held the scissors and cut the yarn when it was time? Or put another way, nice work, but we do have to watch the length to make this remotely accessible when it's done, so I hope we will all wield a judicious sword beside the pen. I'm happy to see this underway though :) Franamax (talk) 08:55, 21 June 2008 (UTC)[reply]
If we go all Greek, it was Atropos (yes, I had to look that up). I did write directly into the lead section, partly because the bits below were, well, not ready to be used yet! :-) Once the structure settles down, the lead section should be much shorter and most people will only read that. Some people will only read the nutshell. Do you agree with the concept of nutshells? Want to write the first attempt at a nutshell? :-) Carcharoth (talk) 09:14, 21 June 2008 (UTC)[reply]
I'm more from the Nordic tradition, there was one crone with a big foot from spinning the wheel, one with a big thumb from arranging the multi-coloured threads of the lives of men (women too I guess, they didn't language-ize in those days), and the third one with the big scissors, when the thread of a man's life was done, she cut the thread. Most mythological, also helpful at funerals. OTOH, Atropos gave us a useful drug. I'm iffy on nutshells, especially because they have to fit inside something small, but I gave it a try. The wonderful thing about Wikipedia is that you can be sure someone will come along sooner or later and fix up your lame efforts. And if they don't, maybe it wasn't so lame after all. :) Franamax (talk) 09:38, 21 June 2008 (UTC)[reply]

I dumped in a few examples on how to properly attribute PD material and sketched out some other stuff. What I'd like is

  • How to spot plagiarism, especially of the uncited type.
  • Cookie-cutter user warning templates
  • We need to say that Wikipedia is a scholarly work, etc in the intro. After all, this is why we're discussing this.
  • A good, free (software) plagiarism detector. Unfortunately, I don't think I'm 1337 enough for this task and my (offline) Wikipedia to-do list is embarassingly long as it is.
  • An article issue template/associated category, something like (once again, a crude sketch):
  • What do we do with articles which show up on WP:SCV and consist entirely of one copied sentence?

I think that's enough for the time being. Eviscerate away. MER-C 13:19, 21 June 2008 (UTC)[reply]

Incorporating other free content into Wikipedia is not plagiarism

The use of attribution templates to mark when free content has been incorporated into Wikipedia is a longstanding practice. As a free content project, we permit others to use our content if they attribute it to us - this doesn't mean they have to quote us, it means they can literally take our text and use it without quotation, provided they follow the terms of the GFDL. Similarly, provided that we give attribution in line with the license of the free material we use, it's perfectly acceptable for us to use other people's free content as part of our articles. We have always done so, both for images and for text.

This isn't plagiarism, it's simply the normal free-content process. If I copy some source code from one GPL program into another, and follow the terms of the GPL in doing so, I have not "plagiarized" the original program. Similarly, if I copy text from one GFDL project to Wikipedia and give attribution per the GFDL, I am not "plagiarizing" the original, I'm doing exactly what the author of the other content encouraged me to do when he or she made that content GFDL. This free-content model is very different than the academic model in which plagiarism is a concern. — Carl (CBM · talk) 02:38, 22 June 2008 (UTC)[reply]

I think this is a very important point, and I agree totally, as long as it is clear to everyone that "free content" in this context means content that is explicitly made free by its creator, not just content that has lost copyright protection. The 1911 Britannica is not "free content"; it may be freely copied, but to copy it verbatim without attribution and quotation is plagiarism, since the authors never intended that it be "free content" beyond the fact that its copyright would eventually expire. And certainly copying it verbatim without quotation in Wikipedia doesn't convert it into free content.
Arguably, material licensed by the Creative Commons share-alike license (derivatives allowed, no attribution required) could be used verbatim with no attribution. Is that plagiarism? How does Wikipedia simultaneously support academic integrity and free content? I don't see a conflict between the two. Because we're talking about plagiarism here, and not copyright, I think it's appropriate to clarify that this is a case where free content can be freer than public domain.--Curtis Clark (talk) 03:52, 22 June 2008 (UTC)[reply]
I think we're trying to be very clear here that copying free content is OK as long as it is attributed properly. It's actually exactly the same as the academic model - don't pass off work as your own, if you copied it, no problem, just say you copied it. Academic journal articles are packed full of references to other people's work, it's when you leave the references off that trouble arises. Curtis Clark raises an interesting point though - cc-sa, if no attribution is required, how do we handle that. That will need to be addressed in the guideline too I auppose, no-one said this was going to be easy! Franamax (talk) 04:14, 22 June 2008 (UTC)[reply]
We also copy stuff from Wikipedia, but citing Wikipedia as a source is forbidden.(diff) -- SEWilco (talk) 04:44, 22 June 2008 (UTC)[reply]
That is partly a misconception about what we're trying to accomplish here, and partly a very valid point. No-one is trying to say that copying free content is wrong, we're trying to establish the parameters of how you copy things. No you can not cite Wikipedia as a source per se, but if you copy a chunk of text around en:wiki, it goes in the new spot under your own name/nym - now it's important to attribute the original authorship of that text, so people aren't misled into thinking those are your own sparkling pearls of prose. Keep in mind the distinction between using the wiki as a source for itself, which we just plain can't do; and copying around chunks of text within the wiki itself, which we do all the time, but need to be careful about.
And that said, I doubt that I've ever written a totally original FUR - I very commonly find a good one somewhere else, copy it and change the details. That's actually to be encouraged since it helps with learning and efficiency. The question we need to answer is - what are the boundaries and how do we properly attribute our copying? Franamax (talk) 05:09, 22 June 2008 (UTC)[reply]
Carl, you talk about GFDL, which is a good point, but what about public domain stuff? Material written by employees of the US government and released into the public domain on websites, for instance, should that not be attributed? Does releasing into the public domain remove the ethical responsibility of us, as Wikipedia editors, making clear who wrote what? I think the basic problem is that the way Wikipedia is written, it is exceedingly difficult to tell who wrote what - which bits were Wikipedia editors, and which bits are external bits imported in. And I think that distinction is important to maintain for editorial integrity, if nothing else. Carcharoth (talk) 09:25, 22 June 2008 (UTC)[reply]

Replying to several comments: Public domain text is one type of free content, which we can use; GFDL and CC-BY-SA are also acceptable to us. Regardless of how a piece of free content is free, we should attribute it. I think that if these practices are followed, there are few concerns:

  • Anything directly copied from an external source should be attributed, either as a quotation if brief or with an attribution template otherwise. Of course only free content can be directly copied outside of brief quotations. The dividing line is: if the external content is being used to reference our own writing, then it should be treated like any other reference. If the external content is simply becoming part of a WP article, then it should get an attribution template.
  • Extensive paraphrasing from a single document should be attributed similar to a direct quote.
  • When material is copied from one WP article to another, a comment should be left in the edit summary to mark this.

— Carl (CBM · talk) 11:31, 22 June 2008 (UTC)[reply]

So effectively you're saying that even CC-SA should be attributed? That would certainly make the rules easier to follow, and I support as long as everyone understands that that's what we're doing.--Curtis Clark (talk) 19:54, 22 June 2008 (UTC)[reply]
Yes, I think that the same sort of attribution should be used as for all other free content that is incorporated into our articles. — Carl (CBM · talk) 20:30, 22 June 2008 (UTC)[reply]
If you'll require attribution templates, you'll need an attribution template suitable for the 1902 "Blue skinks of western Guatemala" when someone starts articles by copying from it. It's not a government document nor is it 1911EB, so which mandatory template is proper? -- SEWilco (talk) 13:21, 22 June 2008 (UTC)[reply]
Clearly the onus is on the person adding the material. There are a large number of templates available at Category:Attribution templates. If none of those will suffice, the person can create a new one, or just make a note in plain text. — Carl (CBM · talk) 13:25, 22 June 2008 (UTC)[reply]
A "note in plain text" violates the proposed practice of using an attribution template. Maybe you should be referring to attribution rather than attribution templates. -- SEWilco (talk) 18:46, 22 June 2008 (UTC)[reply]

Atrribution templates and when to remove them

I'm a little unclear on some of Carcharoth's (I think) wording, specifically in the lead: "subsequent rewritings should not lose the sense of the original, or lose track of where a concept, idea, or phrase originated from, unless the text has been so substantially rewritten as to be a new piece of work. A clear distinction should also be drawn between work submitted by Wikipedia editors as their own work (which can be "edited mercilessly") and Wikipedia editors submitting work written by other people (in which case, more care is needed)"

When the work is covered by an attribution template, I'd be leaning more toward yes it can be edited mercilessly, the sense of the original indeed can be changed, and eventually the origin of the concept, ideas and phrases will be smeared through the edit history, just like every other sentence we type in. Obviously this doesn't apply to blockquotes, which should only be wikilinked I guess - but even then, if you blockquote text containing "the planet Earth is flat in shape", I'm gonna have to do something about it!

Here's a simple case study, Anadyr River. The text originally copied from EB1911 is still there in ghostly form, but it has substantially changed in sense. "Barren and desolate" has changed with modern sensibilities to "tundra, with a rich variety of plant life" and "Reindeer...in considerable numbers" now includes "population...collapsed dramatically". Presumably, the EB1911 version has died a death of a thousand cuts due to the process of merciless editing.

Which leads to my other question, when should the 1911 attribution template come off, or should it stay there forever? If "reindeer" is still in the article, should that template remain? What's remaining in there? I can see "For nine months of the year the ground is covered with snow" - but I could write that with my eyes closed, it's Siberia people - so does that continued piece of text necessitate the template? There is another more compelling bit "the Ivashki or Ivachno", an obvious holdover from EB1911 (and quite likely not even right!). If I replace that, can I take the attribution template off? Please respond in the spirit intended, don't find another example to refute me :) Franamax (talk) 04:56, 22 June 2008 (UTC)[reply]

I'm happy with: "A clear distinction should also be drawn between work submitted by Wikipedia editors as their own work (which can be "edited mercilessly") and Wikipedia editors submitting work written by other people (in which case, more care is needed)" That is very similar to what Curtus Clark said above: "The 1911 Britannica is not "free content"; it may be freely copied, but to copy it verbatim without attribution and quotation is plagiarism, since the authors never intended that it be "free content" beyond the fact that its copyright would eventually expire. And certainly copying it verbatim without quotation in Wikipedia doesn't convert it into free content." He also said "this is a case where free content can be freer than public domain" and I'm going to try and work that into the article, while noting that explicitly releasing into the public domain is different ethically (maybe not legally, but then I'm not a lawyer) from something falling into the public domain (ie. copyright expired). Having said that, there does come a point when the passage of time and change obscures things to such an extent that original attribution is no longer possible and no longer makes sense. T-shirts of the pyramids as compared to t-shirts of the Eiffel Tower, maybe? To get back to text examples, people using the plot of the Odyssey in their stories, as opposed to using the plot of the Da Vinci Code (let's not look at where the plot for that book came from). An author might genuinely not realise that he had rehashed Homer's Odyssey until someone points it out to him. Ditto for people writing an encyclopedia article on Wikipedia. They may not realise that the article they expanded and rewrote and took to FAC contains scattered sentences that are remnants from Britannica 1911 text. If the attribution template has been removed, and the sentences in question have not been put in quote marks and cited to the 1911 Britannica, I think that would be misleading. Easily fixed, but still unfortunate.
I'm less happy with "subsequent rewritings should not lose the sense of the original, or lose track of where a concept, idea, or phrase originated from, unless the text has been so substantially rewritten as to be a new piece of work". Maybe something along the lines of: "Rewriting old public domain text is difficult, and care is needed. Distinctive phrases and original concepts and ideas should still be attributed after the rewriting and updating, as well as attributing the newer ideas and concepts introduced in the rewriting."? Carcharoth (talk) 09:20, 22 June 2008 (UTC)[reply]
Responding to your case study, I like your phrase "Presumably, the EB1911 version has died a death of a thousand cuts due to the process of merciless editing."! My view is that incremental change is possible, but that as newer sources are added, the attribution of the remaining bits of the older source (here 1911 Britannica) should change from a general attribution template at the bottom, to inline citations for the remaining possibly incorrect bits, or merely distinctive bits. The former should be checked and changed (and marking these bits as such helps later editors), and the latter should be placed in quote marks. Thus you make clear during the process which bits are from where, aiding both verification and attribution. Carcharoth (talk) 09:36, 22 June 2008 (UTC)[reply]
And comparing this with this (global diff here). Let's quote the initial 1911 entry in full:

ANADYR, (1) a gulf, and (2) a river, in the extreme N.E. of Siberia, in the Maritime Province. The gulf extends from Cape Chukchi on the north to Cape Navarin on the south, forming part of the Bering Sea. The river, taking its rise in the Stanovoi mountains as the Ivashki or Ivachno, about 67 deg. N. lat. and 173 deg. E. long., flows through the Chukchi country, at first south-west and then east, and enters the Gulf of Anadyr after a course of about 500 miles. The country through which it passes is thinly populated, barren and desolate. For nine months of the year the ground is covered with snow. Reindeer, upon which the inhabitants subsist, are found in considerable numbers."

The bits from the current article that remain appear to be: "The river rises in the Stanovoi Mountains as the Ivashki or Ivachno, about 67̊N latitude and 173̊E longitude, flows through Chukotka Autonomous Okrug, at first southwest and then east, and enters the Gulf of Anadyr of the Bering Sea after a course of about 800 kilometres (500 mi)." plus the other bits as well. It would help the reader if the bits from the 1911 text were explicitly marked as such. I may try and do that now. The other point is that the Gulf of Anadyr text ended up at Gulf of Anadyr, not in the original text, but with the merge at this point, leading to: "It is in the northwestern part of the Bering Sea and extends from Cape Chukchi on the north to Cape Navarin in the south, forming part of the Bering Sea." (compare to the text above), and now reads "It is in the northwestern part of the Bering Sea and extends from Cape Chukchi on the north to Cape Navarin in the south." That insertion should be cited as well. Now, I have no hope that this will be done throughout Wikipedia for all the 1911 or other imported PD-text (some of my articles suffer from vague attribution templates instead of specific inline cites), and it is long practice that vague cites at the end of an article are OK compared to specific inline cites, but the point still needs to be made that there is a difference between paraphrasing (with a general cite at the end) and copying verbatim and ending the article with a general wave of the hand to say "large bits of this text are from here". The point being that once other people start rewriting the article, the general citation becomes useless and you have to dig through the history to find out which bits come from where (again, this is also a general problem with Wikipedia as a whole - and was mentioned by one of the candidates in the WMF board election - in relation to BLPs I think). Carcharoth (talk) 09:59, 22 June 2008 (UTC)[reply]
See my edits here and here. Is this sort of improved citation needed or not? I think it is, but some may disagree. Carcharoth (talk) 10:28, 22 June 2008 (UTC)[reply]
While you can add these if you like, they are in no way required. The attribution template clearly and directly says that some text was taken from the EB 1911 - nobody is being deceived. This is why I edited the proposal yesterday to point out that there is no immediate need to change articles that use attribution templates. The practice you describe as "copying verbatim and ending the article with a general wave of the hand to say "large bits of this text are from here"." is perfectly fine provided that the material added is not a copyright violation; see my comments above about free content.
Especially in the case of GFDL attribution templates, the attribution template should never be removed, as the entire article is a derived work of the other document as well, regardless whether the text has been edited out over time. — Carl (CBM · talk) 11:38, 22 June 2008 (UTC)[reply]
What about the intermediate stages where some rewriting has occurred and it is no longer clear which bits of text are from where? The licensing concerns are covered by the attribution template and the page history, but the specific attribution of fragmentary remnant of text is not covered this way. Imagine that the article has been printed out. Carcharoth (talk) 12:00, 22 June 2008 (UTC)[reply]
I don't see that as a problem. If we have incorporated other free content into WP, that content literally becomes part of WP, so there is no reason that it needs to be possible to tell which parts of the article are descended from the old source and which are not. The attribution note at the bottom (which should appear in a printout as well) is a clear notice that some of our content is shared with the other source. I think it's important to keep in mind that in these situations we are not using the other material as a reference, we are relying on the same references that the other material did, and just using the text from the other material. So there is no reason to think that that text should be marked in the same was as a reference. — Carl (CBM · talk) 12:13, 22 June 2008 (UTC)[reply]
It depends on the context. A skilled reader can detect bits of 1911 or Catholic Encyclopedia text sticking out of an article like a sore thumb. Others won't, and that will deceive some readers. Take Pope Agapetus I. There are two attribution templates at the bottom. Can you tell which bits are from which source? If so, how? My point is that best practice will take a more conservative approach than just using attribution templates. At any point where the reader is left wondering "who is saying that?", some more specific attribution of the text is needed. And as far as "we are relying on the same references that the other material did, and just using the text from the other material" - that is OK as long as we can tell which bits in the article are from where. Carcharoth (talk) 12:35, 22 June 2008 (UTC)[reply]

(unindent) The simple solution would have been to require the attribution template to link to the diff showing the actual text being added and/or to a wikisource version of the text. That would allow an easier comparison. It still doesn't get over the problem of "authorial or editorial voice", and the need to accurately make clear which sources are "talking" in a particular paragraph or sentence (with the default being that the editorial voice is Wikipedia's unless stated otherwise). Carcharoth (talk) 12:37, 22 June 2008 (UTC)[reply]

A further point: "If we have incorporated other free content into WP, that content literally becomes part of WP, so there is no reason that it needs to be possible to tell which parts of the article are descended from the old source and which are not." - that is fine for Wikipedia editors who have clicked "save" and agreed to their text being mercilessly edited. It is not fine for text written by people who never agreed to have their text mercilessly edited. Carcharoth (talk) 12:43, 22 June 2008 (UTC)[reply]

Re the last several comments: I don't think it matters whether we can tell which parts are from which source. The reader doesn't have to wonder who is saying what: Wikipedia is saying all of it. This is what I mean about the difference between referencing text and incorporating it. If we were using the text as a reference, it would be necessary to be precise about what we are referencing. But if we are using the text in our own voice as part of our article, we don't need to mark it in any special way. The attribution template gives credit that the work of others has been used to help build Wikipedia.
Compare programming: there's no reason that I would go out of my way to mark which lines of code I took from another GPL program when writing my own program. I would, however, make a note that I have used some code from the other program. In the end, the code I copied and the code I wrote myself form a unified program, and someone reading that code doesn't usually worry about which parts came from where. The underlying premise of free content is to share (both give and take) with other free projects. — Carl (CBM · talk) 12:47, 22 June 2008 (UTC)[reply]
"The underlying premise of free content is to share (both give and take) with other free projects." - this runs straight into one of the major differences between free content and copyright-expired public domain material where the authors had never even heard of the free content movement. Let me give you an example. If some famous speeches fall into the public domain, how would you suggest the reuse of such material is handled? People attribute in those cases because they know they are using the power of someone else's words to give power to their own ("as Martin Luther King said, "I have a dream that one day this nation will rise up...", and that is how I feel about <insert issue here>"). Similarly, if we are using the power or credibility of someone else's text, that should be acknowledged at both the article level and the individual "distinctive phrase" and "individual research" level. Don't get me wrong, if the text used to present an idea or concept is rewritten enough in your own words, then just an inline citation is all that is needed. But it should be clear, if you don't cite your sources, you are either expressing your own opinion, or you are plagiarising someone, even if you are not fully aware that this is what you are doing. And even if you cite the source, you may still be plagiarising if the material is badly paraphrased or rewritten at any point after the text is added to the article (an over simplified example of this is removing quote marks from a quote, but a more insiduous change is introducing a modern idea next to an old one, and not making clear what has been changed and which bits come from where).
To expand on that last point - if a distinctive opinion or phrase is used from an old public domain text, should we not be more specific about who's voice is talking? Used in isolation, without the rest of the text, you would rightly insist such an "opinion" paragraph is quoted and attributed. Used as part of huge chunk of text, you say that a general attribution template is sufficient. I'm saying that if that large chunk of text is rewritten until only the distinctive sentence remains, then the attribution template is insufficient and we need to directly quote and attribute the fragment as if it had been placed in there on its own. Two different routes to the same result, but inconsistent templating and citing of the end result, depending on the route taken. Is that any clearer? Carcharoth (talk) 13:28, 22 June 2008 (UTC)[reply]
If only one sentence remains of the original, and you want to add more specific attribution for it, that is of course fine. But that doesn't mean that we should change the general practice of incorporating free content (which includes both public-domain content and content that was explicitly placed under a free license) into our articles, marked by general attribution templates. These do not rely on the power or credibility of the original text; they simply rely on the same credibility as the rest of the wikipedia article, the credibility that the wikipedia editors have done a good job with the article.
The attribution template is not meant to give extra authority to the article, it simply gives credit to the authors of some text that has been used in the article. For very old texts, I expect that a lot of editing will need to be done to bring the content up to date, so I would give more credibility to the contemporary wikipedia version than the original. I think that very little of the content marked by attribution templates consists of distinctive, memorable phrases. Most of it is just ordinary prose about the topic at hand. — Carl (CBM · talk) 16:25, 22 June 2008 (UTC)[reply]
That's the point, though. A general attribution template does not give specific credit, because it doesn't say what it is giving credit for, but only gives general credit for some unspecified amount of text that might not even be there any more. But leaving that to one side for the moment, what do you think about attributions templates having a "link to the diff showing the actual text being added and/or to a wikisource version of the text", as I suggested above? wikisource:1911 Encyclopædia Britannica exists (though it is incomplete). Indeed, at wikisource:1911 Encyclopædia Britannica/Vol 1:16, there is a link for "Anadyr". Browsing through there, I see that wikisource:1911 Encyclopædia Britannica/Amber (resin) is there. I looked around a bit and found Template:1911EB, and added that to our amber article. Now imagine that this could be done for all public domain materials added using the attribution templates. People could compare the two articles much more easily. Surely this is a better method than using attribution templates and then losing the original text in a series of changes and rewrites that leave it unclear which is which? Carcharoth (talk) 17:05, 22 June 2008 (UTC)[reply]
In the case of Anadyr River, I just hit "earliest" as it seemed the logical thing to do. I had been thinking of that issue though, how do you know in general what is the "template attributed" portion of the article? I kept quiet about it because of the immense volume of previously imported material. However, I do like the idea of specifying in some way exactly what the PD content was, before it went through the normal wiki slice-and-dice. Franamax (talk) 18:03, 22 June 2008 (UTC)[reply]
It should all be identified and stuck on Wikisource and then proofed again. Failing that, a new source should be found and there should be an effort to link to wikisource versions of the original text. While checking some of the Anadyr River stuff, I've discovered that the link to Stanovoi Mountains (the supposed source of the river) takes you about 3000 kilometres to the south-west, down near Lake Baikal. It seems the names used in the EB 1911 are rather out-of-date as well. I've been trying to sort out exactly what mountains are which, but it is a confusing mess. And I'm still no closer to finding out what the current name is of the mountains that the Anadyr rises in. Hopefully the RefDesk people will be able to help. Carcharoth (talk) 18:11, 22 June 2008 (UTC)[reply]
There's another excellent reason to carefully attribute the text - so we can say it was someone else's mistake! Looks like the source is in the Khrebet Kolymskiy to me. I'd been thinking about this a little more, seems to me there is some distinction between the grizillion articles we have stuffed with unattributed factoids and the few where virtually every sentence has a reliable source. Eventually we will get all of them to the high-quality state and at that point, we will be referencing every sentence that came from EB1911. Until then, we're in a kind of indeterminate state, although that of course is why we're trying to create this page. Franamax (talk) 18:31, 22 June 2008 (UTC)[reply]
A requirement for the original to be stuck on Wikisource seems at conflict with our acceptance of sources which are not online. Do we also have to stop accepting printed sources because their text is not available online? I've reused PD text where even I didn't have the original in a version that could be put on Wikisource, because I did not type in the text exactly as it was on paper (such as due to rephrasing or replacing "f" with "s"). -- SEWilco (talk) 18:55, 22 June 2008 (UTC)[reply]
I am only talking about complete documents for a wikisource link. If you've typed out a complete PD-document, it should go on wikisource anyway (if they will take it). If you've only typed out a small bit, then you just quote and cite a source as normal. Rule of thumb: if it is not your work, don't hit save in such a way that it implies that you wrote it. It is also good practice to provide the original, and only then make changes, however minor. Carcharoth (talk) 19:29, 22 June 2008 (UTC)[reply]
And certainly, if you've accessed the source online, it's incumbent on you to provide the link, even if it's a TIFF scan of a printed document. The key themes here are to make sure people are aware of your source (for instance, you may be rewriting the PD source in modern English, but the structure and content of the writing are not yours); and to provide the original raw reference online if it can possibly be done. Franamax (talk) 19:37, 22 June 2008 (UTC)[reply]
So now we'll be able to be blocked if someone finds an online source and thinks we violated the requirement to provide the link. -- SEWilco (talk) 20:46, 22 June 2008 (UTC)[reply]
Not at all. This is still a proposal, and if you provide a citation without a link, that is fine. It is just typing in the text without saying where it came from that is a problem. Public domain doesn't mean we don't need to say where things come from. If someone else finds an online link, they can add it to what you wrote and point it out to you, and you might then thank them for finding that online source. Carcharoth (talk) 21:34, 22 June 2008 (UTC)[reply]

Re Carcharoth:

"A general attribution template does not give specific credit, because it doesn't say what it is giving credit for, but only gives general credit for some unspecified amount of text that might not even be there any more."

This is all that we promise our own contributors, as well - that other people will give them some sort of general attribution when the content is reused. The GFDL doesn't require that we actually provide diffs to the content that each editor on WP has contributed (the history page is for our convenience, but isn't required for GFDL compliance). I don't see why we would need to do differently for other free content.

There's nothing dishonest or sneaky about explicitly saying, "this article uses text from X", any more than it would be dishonest for a program to say "this program uses some source code from program Y". I don't see that there is a need to track down which parts of which articles came from which sources. If I don't like the text in an article, I can always improve it, regardless where it came from. Knowing the original source of the text isn't particularly important, provided it isn't a copyright violation. — Carl (CBM · talk) 18:28, 22 June 2008 (UTC)[reply]

That's an interesting statement. It seems like a lot of attention is paid to maintaining correct article histories, often with the specific motivation of "required by the GFDL". Franamax (talk) 18:35, 22 June 2008 (UTC)[reply]
I think the GFDL is invoked because the article history is often the only record of the editor, and thus is the record of the holder of the copyright for that edit. When public domain text is reused, there is no copyright tracking requirement of the original text and the editor only has obvious copyright on whatever changes he made (the unchanged PD text technically has no copyright protection, but in practice it is hard to identify whether changes have been made). The original text has no legal protection from being changed, and as Wikipedia points out once it's in Wikipedia it may be edited mercilessly. Requiring original text be quoted and obsessively identified leads to walling off text such as this; if you look at the current article you can recognize some of the original text exists, although much has been rearranged (such as putting descriptions of one part of the building together). -- SEWilco (talk) 19:15, 22 June 2008 (UTC)[reply]
But this is not about legal protection and copyright. It is about plagiarism and academic honesty and making clear who said what. When we use the text of a long-dead author who never gave permission for that work to be "mercilessly edited" (the bit on the edit screen says "If you don't want your writing to be edited mercilessly" (my emphasis)), then we have a moral responsibility to take more care, and this also holds for living authors of public domain text. Carcharoth (talk) 19:22, 22 June 2008 (UTC)[reply]
The original authors can be protected from our damage by not mentioning them. There is conflict between giving the authors the credit which they deserve and protecting them from being blamed for the result of our work. We have to be able to edit things to improve them, but that risks damaging someone else's work. -- SEWilco (talk) 20:54, 22 June 2008 (UTC)[reply]
Our attribution does not claim we are making the same arguments as the original author, or that the original author would agree with our article. It only claims that we have taken some text written by the original author, and possibly edited it significantly since then. Its purpose is to provide partial credit for writing text, not to associate us with the author's opinions or associate our opinions with the author. — Carl (CBM · talk) 20:58, 22 June 2008 (UTC)[reply]
Keep in mind that an attribution template tends to behave like having a similar citation as a Reference, where the citation merely states that a certain book was used as a source but does not identify what parts of the article have information from the book. Such a citation only states that someplace in the article are concepts or text from the cited source. Both concepts and text might vanish during later editing; at what point does one decide that concepts or text have changed "too much" for a cited source to be obsolete in the article? How does one measure text changes? -- SEWilco (talk) 19:15, 22 June 2008 (UTC)[reply]
That's a weakness of the wiki-system, not a strength, and is why the page history is so vital to piece together what happened. Carcharoth (talk) 19:18, 22 June 2008 (UTC)[reply]
(to CBM) What if there is a possible mistake? Knowing where the original text came from is useful then. That is an argument for having a link to a wikisource version of the text, which would address some of my concerns. It still doesn't address the concern that the model of taking a chunk of PD-text and rewriting it on-wiki can cause confusion if done poorly. Again, having a link to a wikisource version (or a permanent record to the moment the PD-text was added) would allow people to compare the changes. You will say that this is no different to being able to do this for any text anyone adds, but the difference here is between a Wikipedia editor adding their own text, and adding work done by others. When you add work done by others, it is important to link to a clean version of the original text, and to ensure that more care is taken to integrate the text as compared to that written by other Wikipedia editors. It is the difference between a Wikipedia editor writing something and saying "what I've written is based on this source", and a Wikipedia editor copying and pasting something and saying "I've copied this from this source, now I'm going to leave it here for anyone to edit as they see fit". It's not the same process because the former is someone creating a paraphrase of the sources and then leaving it for others to edit, while the latter is taking a chunk of original text direct from a source, and leaving it for others to edit. Carcharoth (talk) 19:17, 22 June 2008 (UTC)[reply]
I don't see that there is a large difference. Whether free content is originally written by a WP editor, or written elsewhere and then incorporated into WP, I think we should give attribution and then simply treat the text as part of the article to be improved by others. This is the very purpose of free content - that it is not necessary to redo everything over and over again. The work of others can and should be reused to create new works, and once incorporated into a new work it can and should just be edited as part of that new work. Those who create content know perfectly well that, under our system of copyright, their work becomes public domain under certain circumstances, and the implicitly agree to this when they publish their work.
As for providing links to the original source, I believe we already often do so via our attribution templates. But I don't see that there is a significant benefit in giving explicit diffs and quotes (if there was, it would have been evident when the EB material was added in 2002, no?). Because we aren't using the incorporated text as a reference for our own claims, but are simply repeating the same claims that the incorporated text made, checking our claims will require sources other than the text that was incorporated. If we change our article to no longer agree with the original incorporated text, this is no problem, since we are only claiming that our text was originally taken from the source, not that our text agrees with the source or continues to rely on the source in any way. — Carl (CBM · talk) 20:45, 22 June 2008 (UTC)[reply]
Reusing old works is indeed the basis of "To promote the Progress of Science and useful Arts…". -- SEWilco (talk) 20:59, 22 June 2008 (UTC)[reply]
But Wikipedia is not based on the US Constitution, nor the US Bill of Rights. Many, many of those protections are not present here, to take an example, Wikipedia fair-use is not the same as US consitutional fair-use at all, and many challenges to blocking based on free-speech rights have been batted down out-of-hand. We have the opportunity and the right here to develop a global resource that does not rely solely on the wording of the consitution of one country. Franamax (talk) 21:49, 22 June 2008 (UTC)[reply]
First off, "Those who create content know perfectly well that, under our system of copyright, their work becomes public domain" does not apply to the content creators of EB1911 in any way. Secondly, I think what Carcharoth is saying is that it is important to denote which exact text was originally copied from an external source, in the case under discussion, a source which has become public domain only through the expiration of copyright, not through explicit granting of rights by the creators. Take a look at Anadyr River and its talk page, we've actually discovered some inaccuracies brought on by time. As with all of Wikipedia, we don't claim it to be a definitive resource, we only claim it to be a resource to find the original sources. In the example of that article, it becomes important to let the user discover why it is claimed to rise in the Stanovoi Mountains. We constantly harangue the younger ones to be aware of article history, talk page discussions, the necessity of checking sources. Here we have a case of a blanket attribution to EB1911, not available online, and now seemingly demonstrated to be incorrect because terminology has changed. Easily available references to the original source are indeed beneficial here - not everyone is so accustomed to the wiki-hunt. Franamax (talk) 22:12, 22 June 2008 (UTC)[reply]

More EB 1911 stuff

Found Wikipedia:1911 Encyclopaedia Britannica. May be of interest. Carcharoth (talk) 22:15, 22 June 2008 (UTC)[reply]

Carcharoth has asked me to join in, so here are a few quick comments before I come up to speed. For the EB1911, Wikisource has the entire wikiproject dedicated to the transcription project, and a complete set of pagescans with an index to assist people find the article they want. see s:User:Tim_Starling. We also have tips on how to obtain the text without transcribing it, as there are many online copies, however they are sadly all either very poor quality, or they are not faithful to the original (i.e. they have done unspecified value adding, and are usually vague about what they have added; in short they have injected enough copyright material in order for them to enjoy the benefits of copyfraud.) We have a few people who regularly add a few pages per month, and the style guideline has recently been updated to gel with current best practises. In short, a Wikipedian should be able to put their first EB1911 article on Wikisource with only about 5 hours worth of stumbling along and asking questions as they go.
s:Category:Encyclopedias contains other large encyclopedias being transcribed, and s:Wikisource:Biographies#Collections has a number of other similar works, like Dictionary of National Biography, Appleton's Cyclopedia, A Short Biographical Dictionary of English Literature, etc.
John Vandenberg (chat) 22:45, 22 June 2008 (UTC)[reply]

The need for specific attribution in relation to plagiarism

This is another attempt to explain what I see as the fundamental problem in relation to plagiarism and dumps of PD-text into Wikipedia articles. If an academic, whose career depends on their academic integrity and intellectual honesty, wrote any sort of resource or document or encyclopedia article, and based it largely on a copy of text from PD sources like the 1911 Encyclopedia Britannica, made only a few changes and expansions, and then stuck a footnote at the bottom of the new article saying "This article incorporates text from the Encyclopædia Britannica Eleventh Edition, a publication now in the public domain", they would quite simply get laughed out of the building no matter how much they insisted "but I was co-writing the article with the authors of the 1911 Encyclopedia Britannica". It is the difference between doing the proper research needed to write a proper article and write and paraphrase from your sources (rather than copying them), and taking a lazy shortcut and reusing the work of others.

The critical thing is deciding at what point the article becomes our own work again, independent from the work of the 1911 Encyclopedia Britannica authors. That can only be answered by having the original text and comparing it side-by-side with the Wikipedia article. Then you look for bits of unaltered text, and additions of new text, and corrections of old text. At some point, the old text will have been rewritten, rephrased, reordered and corrected enough to count as "our" work and not "their" work. This is the normal process of writing from sources that occurs on articles every day on Wikipedia, the rewriting taking place in Wikipedia editors' heads between their books or other sources, and the Wikipedia servers. With the 1911 and other PD text, the rewriting from the sources takes place not in people's heads, but live, on the wiki.

At the end of this rewriting process, some fragments of the original text may remain, and those can be placed in quote marks and specifically attributed. At that point, the article should have reached the point where we can honestly claim it is our work. Until that point, though, the article will always be open to charges of being a work plagiarised from the Encyclopædia Britannica Eleventh Edition. A generalised attribution template at the bottom of the article is sufficient to cover licensing concerns, but not plagiarism concerns. Ditto for any other articles based largely or solely on PD-text. The exception being when the entire article is quoted verbatim, but that is the domain of wikisource, not Wikipedia.

Does that make things any clearer, and does anyone think this is a concern or not? Carcharoth (talk) 23:01, 22 June 2008 (UTC)[reply]

Short version: a spectrum exists from a pure, original PD-text (eg. on wikisource) to a fully rewritten and corrected article (here on wikipedia). The process in-between, if the changes are not specifically attributed to the new sources and it is made clear at every stage that the rest of the text is still from the original source, is where we are open to accusations of plagiarism, of having articles based on the work of others rather than our work, or of mixing the two and not making clear what the differences are. Legally, there is no problem, but there can be problems in terms of intellectual integrity and honesty. Carcharoth (talk) 23:01, 22 June 2008 (UTC)[reply]

Actually, I should probably make clear that I'm talking here about a "muddled sources" type of plagiarism, rather than a full-blown "claiming the works of others as your own" plagiarism. They are both still plagiarism, but the latter (intending to deceive) is more serious than the former (being incompetent). Carcharoth (talk) 23:08, 22 June 2008 (UTC)[reply]