Talk:Reduced instruction set computer

Reduced instruction set computer is a former featured article. Please see the links under Article milestones below for its original nomination page (for older articles, check the nomination archive) and why it was removed.

Article milestones
Date	Process	Result
December 15, 2003	Featured article candidate	Promoted
January 8, 2005	Featured article review	Demoted

Current status: Former featured article

Modern x86 is in fact RISC

The section of the text that discusses RISC and x86 fails to divulge that all modern x86 CPUs are in fact RISC cores wrapped in x86 decoders. To make matters worse it also alludes to a notion that RISC has been defeated by x86 as if RISC is some sort of product. RISC is a processor design strategy/architecture and not a technology in itself. The fact that x86 processors since the K5 and P6 have had internal RISC-like cores is a good indication that the design strategy of RISC has eclipsed the design strategy of CISC.

There are numerous articles dispersed across the Internet that backup this claim. Glancing at the block diagrams of Intel and AMD processors also verify this. A few links culled from Google...

http://www.hardwaresecrets.com/article/324/7

http://www.hardwaresecrets.com/article/235/4

http://arstechnica.com/cpu/4q99/risc-cisc/rvc-6.html

http://www.osnews.com/story.php/3997/Analysis-x86-Vs-PPC/

Computer Architecture by Hennessy and Patterson make note of the x86 RISC core. I own the third edition and can provide citations.

Some talk could also be devoted to the blatantly RISC processors used in embedded devices, gaming consoles, and the like.

--BrandonEdens 06:38, 19 September 2007 (UTC)[reply]

Um, x86 chips have always had a decode step as far as I know (a step where instructions are expanded into simpler instructions). The G5 PPC RISC chips also have a decode step now, so are they RICS chips with a CISC shell? ;)

Claiming that modern x86 chips are RISC at the core is more about marketing than anything else. By claiming to have a RISC core brave Marketers figured that they could surf on the RISC wave which was all the rage back then.

That said, new discussion should go to the bottom.

--Anss123 07:59, 19 September 2007 (UTC)[reply]

Intel x86 processors through the 486 and the so-called 586 break are not only considered RICH (CISC), but their heavy load of legacy support and awkward numeric storage were major impetuses to move toward RISC technology.

What is impressive is that Intel engineers were brilliant enough to make their CISC processors compete with much of the then-current RISC competition.

--UnicornTapestry (talk) 16:47, 6 December 2007 (UTC)[reply]

Correct. Intel processors up to the Pentium or so may have been microcoded, but they did not contain anything like a RISC core! The newer x86 processors do have a RISC-like core, which the x86 instructions are decoded into. However, since you can't directly write code for this core... it seems a stretch to say that a modern x86 processor is RISC! ǝɹʎℲxoɯ (contrib) 16:56, 6 December 2007 (UTC)[reply]

Not being load/store oriented, current x86 cores are not particularly RISC like. Some x86 CPUs did indeed contain cores that could be called RISC, but that is no longer the case.--Anss123 (talk) 18:56, 7 December 2007 (UTC)[reply]

Better RISC Definition

The general guideline for the RISC computer architecture is: (changed to #ed points for ease of below discussion -Moxfyre 00:58, 19 December 2007 (UTC))

one instruction per a cycle
fixed instruction length
only load and store instructions access memory
simplified addressing modes
fewer simpler operations
delayed loads and branches
prefetch and speculative execution
offload work to the compiler

BrandonEdens 06:37, 19 September 2007 (UTC)[reply]

I agree with the first 4 as guidelines.

The 5th entry is confusing; in fact, RISC computers typically load and execute many more instructions than CISC machines, because RICH overhead is hidden in microcode.

Items 6 and 7 have been typical of 'modern' CICS architecture since the 1960s, especially by supercomputers of the era, the CDC 6600 being a prime example. Simpler CISC architectures, naturally, didn't bother with pipelining.

The last item, number 8, is misleading. Since a single RICH instruction could embody the equivalent of several (sometimes dozens and occasionally hundreds) RISC instructions, machine-level programming became far more onerous and unwieldy task within a RISC computer. Therefore, even though it wasn't mandatory, RISC instruction sets made assembly code impractical for anything but the lowest level programming.

--UnicornTapestry (talk) 17:06, 6 December 2007 (UTC)[reply]

The developers of RISC OS might disagree with that last point; much of it was written in ARM assembler language[1]. Guy Harris (talk) 18:54, 6 December 2007 (UTC)[reply]

We're in agreement here. Speaking as an OS developer, I stated "lowest level programming", which by definition includes operating systems. Systems programming languages have been developed even for CISC environments, such as PL/S, Paul Abraham's SPLASH, and most famous of all, C, the latter used in Unix development.

The gist of my point was that for half a century, the majority of applications have been written in high-level languages, reverting to assembly language for purposes of speed or space (or sometimes availability). RISC instruction sets made this 'casual' use more difficult to do.

--UnicornTapestry (talk) 22:22, 6 December 2007 (UTC)[reply]

I totally concur on the difficulty. I implemented a MIPS core in a chip and had to write rudimentary validation "code" for chip verification. It was written in assembler since it had to be concise and I remember the nuances of code sequencing. Almost as much work as the core (not really). Napablue (talk) 19:23, 17 December 2007 (UTC)[reply]

First, great interaction to all editors here. I'm new (so be gentle) and read great passion into each contribution. If I ever unearth my notes from coursework with David Patterson I'll post the "RISC guidelines" from the early days. This would be before any pubs since we were just developing the paradigm. Now my thoughts on this topic.

Viewing Item 5 from a chip designer perspective, I read it as meaning that internal (chip) implementation is simplified. This would be accurate from a purely RISC vs. CISC research point of view. Commercial implementations and evolution may have muddied the waters. If I recall, we used Princeton's DDL (Digital Design Language) to describe the logic (as compared to Verilog and VHDL used today) in some exercises and the DDL code was far simpler for a RISC processor as compared to a CISC core.

Items 6 and 7 are indeed less RISC vs. CISC issues.

Item 8 is maybe more the reliance of the chip designer on the software team since less work was designed into (and being done by) the hardware core. Napablue (talk) 19:23, 17 December 2007 (UTC)[reply]

older entries

Rationalized Instruction Set? Was this an IEEE change? Who actually uses this definition?

Stanford doesn't. http://cse.stanford.edu/class/sophomore-college/projects-00/risc/risccisc/

In fact a google search reveals that only this Wiki entry uses it. I'm setting it back to the textbook definition.

So this a featured article? Wikipedia is ridiculous. --

By David Kanter

IBM's POWER architecture is used as the basis of the iSeries (formerly AS/400 midrange systems) and pSeries (formerly RS/6000). The IBM zSeries (formerly S/360,S/390 etc.) all run natively on a CISC chip that IS NOT POWER compatible at all.

-- "and the fastest CPU in SPECfp is the IBM Power 5 processor."

is ther a link to this benchmarks?

Removed:

Pure RISC machines have failed in the general computer marketplace. They are not used in the majority of PCs or business computers. Notably, RISCs succeeded in digital signal processing and graphics computation. Because of these facts, many commentators believe that they failed in many applications because they have required larger, more expensive memory systems. Some RISC designers (Such as Acorn and IBM) have successfully responded to this criticism by producing RISC machines that decompress code on the fly.

Pure RISC machines have *not* failed in the marketplace, unless you count the PPC Macs as a failure. RISC architectures are used in virtually everything that isn't a x86 PC (including all virtually all the Unix servers out there). The reason why they didn't replace the x86 PC was that they didn't offer a compelling enough price-performance advantage (thanks partly to the sheer implementation skill of Intel) to warrant the extra costs and limited availability of OTS software and the lack of backward compability. --Robert Merkel

--

Actually pure RISC has failed in the PC marketplace, as they make up less than 3% of the market and shrinking. While I think the original statement was over the top, the statements by Merkel are slightly short of the mark. Most unix servers are now linux servers powered by x86, and x86 now makes up the majority of the server market. And it's not price/performance that killed RISC on the PC, it's lack of software mainly; poor price/performance was a lesser drawback.

--

I suggest the paragraph beginning with "This is surprising in view of the domination of the Intel x86..." be edited. Although I largely agree with the point presented, I feel the tone of the paragraph editorializes the matter.

--

I greatly changed the pre-RISC section as it incorrectly discussed register set size as the main delimiter between RISC and CISC philosophies.

RISC Instruction Sets have failed in the marketplace, but not the design philosophy. Since all major x86 designs are very RISC-like internally (PentiumPro, Athlon), the design principles have affected all computer vendors.

--

I removed the addition example; I had already made some copy-editing improvements to it, but it was still completely pointless IMHO. There was no attempt to connect the example to the rest of the article, or show why describing how addition works is relevant to RISC. I'm fine if someone wants to add it back, but please integrate it into the article with more care. -- Neilc 19:57, 25 Jul 2004 (UTC)

--

Reading this article one would think that the IBM 390/zseries mainframes are built on Power RISC architecture hardware. This is incorrect. AFAIK, the IBM mainframes still execute the 360 ISA natively. Dyl 00:57, Oct 27, 2004 (UTC)

--

Some more practical-minded engineers now refer to RISC as "Relegate Important Stuff to the Compiler"

This is hardly a necessary remark unless the reasons behind it are clarified. The rest of the article doesn't seem to support it at all, even making points to the contrary ie. when discussing homogenous register sets.

Section order incorrect

The "Pre-RISC design philosophy" section should precede the "RISC design philosophy" section --Surturz 07:41, 23 June 2006 (UTC)[reply]

RISC have more instructions than CISC

The statement "RISC is a computer CPU design philosophy that favors a smaller and simpler set of instructions" is actually a misnomer. While I am not intimately familiar with many RISC architectures, I do have significant experience with PowerPC and some exposure to MIPS. PowerPC documentation lists over 200 instructions and I believe that MIPS has even more. One consistancy that I have noticed among the RISC processors is that the instructions are typically of a fixed size, while CISC processors, such as the x86, tend to have variable length instructions. The result of this is that PowerPC code typically occupies more memory than functionally similar x86 code. The idea is that a fixed instruction size will result in a faster, more predictable, execution time, in general. -- jimwilliams57

The term reduced in RISC refers more to a single instruction than to the entire set of instructions. While the set of instructions for a RISC may not be small, the amount of work that a single instruction accomplishes is reduced. This enables the hardware to treat instructions in a more uniform manner and benefits techniques like pipelining. A second point: If you count the instructions of an ISA with all possible modes (address modes, operand sizes, etc) the size of the PowerPC ISA is still small compared to x86. So the statement you criticize is probably still correct. --Stefan

No that is not what it refers to, RISC processors have a reduced instruction set, and thus the name is accurate. The design of MIPS or PowerPC does not negate the fact that most RISC processors have a smaller instruction set than most CISC processors.

Read this: http://arstechnica.com/cpu/4q99/risc-cisc/rvc-4.html —The preceding unsigned comment was added by 68.118.218.191 (talk) 11:39, 11 December 2006 (UTC).[reply]

This is what I find confusing. A given RISC instruction accomplishes less and the difference is made up by running the clock faster.

Sometimes. But part of the difference is also made up by the fact that a RISC CPU can execute more instructions per clock tick than a CISC CPU. 4.255.42.124 22:46, 3 September 2005 (UTC)[reply]

You guys are focusing too much on small details. RISC really refers to a design philosophy that became popular with system designers because modern CPUs (known as CISC) were getting too difficult to build. By simplifying the hardware design using "RISC", they hoped to reduce the complexity and make designing more feasible and faster. This simplified design helped reduce die size which enabled additional functionality (like pipelines, register sets, cache, etc). This is what helped improved performance, not just the reduced instruction set. Improvements in compiler technology also played a part in RISC, it wasn't just hardware. --Pelladon 02:28, 8 July 2006 (UTC)[reply]

Yet advertised speeds for RISC chips are always slower than a comprable x86. Have I got it backwards or something?

The problem of looking at just the clock rate is that while modern implementations of x86 such as the Pentium 4 run x86, they do not directly execute x86 instructions. Instead, each x86 instruction is translated at run-time into a short sequence of micro-ops, which are more like RISC instructions. This is done in order to support legacy programs, while taking advantage of the simplicity of a RISC implementation (in processor design, simpler usually means faster). Having said that, you might want to compare the execution time of a single RISC instruction to that of a single micro-op (In which case, keep reading :P).

Another part of the answer has to do with pipelining, a technique all modern processors use. You can think of a single instruction as needing to go through every stage in a pipeline for it to be completely executed (this might not be exactly true if an instruction does not use the later stages in a pipeline). Each stage takes one cycle. A CISC which translates code into micro-ops usually has a much longer pipeline than a true RISC processor. For example, the Pentium 4 has a notoriously long pipeline of 20+ stages; whereas the classic RISC pipeline has only 5 stages. This means that a Pentium 4 running at 4Ghz executes a single (micro-op) instruction in the same time as a classical RISC does running only at 1Ghz. This is rather a trivial example though, because we are usually more concerned with instruction throughput (the rate of instructions coming out the end of the pipeline per unit time), rather than instruction latency. There are many factors that effect throughput.

Danielx 21:38, 18 May 2006 (UTC)[reply]

The article also mentions that the 8086 has about 400 instructions. I would expect modern extensions of the architecture to have at least double that number. Comparatively, PowerPC easily counts as smaller and simpler.

Explication

Rarely has RISC architecture had fewer instructions than CISC (RICH). Going back to classic examples, both the IBM/370 and the Transputer had more than a couple of hundred instructions. The difference was in how the instructions were implemented (which Brandon Edens attempted to address in the Better RISC Definition section.)

From the introduction of the term, RISC has confused almost everyone struggling for a clear definition. Stefan (above near the beginning of this section) has a explanation for the outcome, if not the intent of original designers. Think 'simple' instead of 'reduced', and you're closer to the truth.

CISC supplanted the acronym RICH (Rich Instruction CHip) in the 1980s, although I consider RICH a more accurate (and more clever) term. RICH meant that individual machine instructions were potentially extremely powerful, sometimes competing with compiler statements.

(For example, in the venerable IBM/360, just 2 machine instructions converted between binary and decimal. Just one machine instruction could translate one string of text into another.)

In a more extreme dichotomy, the DEC PDP-8 had only 8 basic instructions! (One op-code was reserved for a number of sub-instructions.) Imagine any RISC architecture trying to accomplish anything with only 8 machine instructions!

--UnicornTapestry (talk) 23:19, 6 December 2007 (UTC)[reply]

Programmer's historical Point of View

Let me tell you a story. Years ago I was teaching this course about basic microprocessor operation in TUT, and amongst the undergraduates, there was this middle-aged woman, who was studying for a new degree. One day, after class, she talked to me about her history in the industry. She had been a programmer already in the late 1960's. She told great stories,like how they loaded software from a deck of punch cards and then somebody fell down in the stairs and spilled the card deck on the floor. Then, as they were gathering the cards from the floor, they thought that "maybe it would be a good idea to number these cards..." as they had great difficulty in figuring out in which order the cards should be. And, on another occasion, they managed to print out the company's paychecks in hexadecimal and the bank was obviously a bit confused about them.

Now comes the important part. She told me about the greatest advances she remembered about the operation of computers in their company. One was when their old Datasaab was replaced by a new IBM computer (and later a VAX, if I remember correctly) and they got video terminals instead of the older paper terminals. Obviously, with a video terminal, it's easier to edit software, because you can see the data in front of your eyes, move the cursor around, and edit the data on the screen. They did a lot of the programming in hexadecimal, directly in machine language, because running an assembler was so slow. This created some problems, sometimes a quick fix in the hex code was not properly updated in the assembly language source codes, so a bug would reappear when the code was reassembled. All in all, they loved the new CISC machine because it was so easy to program it in hexadecimal, because the instruction set was powerful. You could do complex matrix arithmetic and vector calculations in a single instruction. It seems that the instruction sets of the pre-microprocessor age mainframes were indeed powerful.

I think you can imagine the way programmers worked. They did not have compilers, and it seems that they avoided even using an assembler whenever they could. It was faster and easier to work directly with machine language. When the microprocessor was invented, it was a crude machine at first, something like computers had been 20 years earlier, because you could not fit such complexity into a single chip. But in time the microprocessors got better and better, resembling more and more their mainframe paragons. They become increasingly cheaper and more people start using them. But they are still crude and difficult to program so languages and compilers get developed. The advent of RISC is the result of this development. With compilers around, you don't need to program in machine language anymore. So why have all these complex instructions that make the processor slow and difficult to engineer? Let the compiler deal with the hardware so we can develop the processor to be "lean and mean".

I think we all basically agree about what the general idea of RISC is. In light of later development, I think it's more to the point to think that "Reduced Instructions" means roughly the same as "Simpler Instructions". Obviously a CISC machine has a "Complex set of Complex instructions". Early RISCs had a "Reduced set of Reduced instructions". A modern RISC might indeed have a "Complex set of Reduced Instructions" and it's still called a RISC. But I think that the name 'RISC' was more a marketing decision than a technical decision. And I think that the marketing strategy behind that choice of name was to convince programmers that they should consider dropping the idea of writing complex software directly in machine language and use compilers instead.

Panu-Kristian Poiksalo (talk) 08:10, 23 March 2008 (UTC)[reply]

Actually compiled languages were the predominant mode of programming long before microprocessors, much less RISC. RISC may have been a promotional term, but it wasn't to promote the use of compiled languages. -R. S. Shaw (talk) 22:05, 24 March 2008 (UTC)[reply]

This is the best article I have ever seen on wikipedia

This article explains the subject sooooo well, if only more math and science and articles can be this clear, especially in expounding the history and reasoning behind a development or innovation.

RISC = RNA induced silencing complex

i just run across this article in search for something completely different... i'm biology student and for me RISC means RNA induced silencing complex ... after some more searching i found it here: RNA-induced silencing complex but it is a little bit confusing not mentioning it on the RISC page. can we somehow crosslink this pages? or use RISC as abrevation page to point to the other two? --Damir Perisa 08:16, 15 September 2005 (UTC)[reply]

I reckon RISC is more widely used as a computing term (I've never run into your use of the acronym before, but then I'm probably biased, being a CS student). I added a link to the page you mentioned, so both senses are now presented. --Lorkki 13:28, 18 September 2005 (UTC)[reply]

RISC is an incredibly important component involved in RNAi. RISC definitely needed the disambiguation. --G3pro 13:42, 18 September 2005 (UTC)[reply]

Berkeley RISC article

Some information here may belong in Berkeley RISC. Please discuss on Talk:Berkeley RISC#RISC article. StuartBrady 13:37, 12 January 2006 (UTC)[reply]

6502 and Z80 comparison

I think that the comparison between the 6502 and the Z80 may be misleading. As far as I've been able to determine, a 6502 at 1 MHz and a Z80 at 4 MHz are similar performance, but it really depends on what you're doing. Instructions on the 6502 use fewer cycles... but you sometimes need to use more instructions (due to the lack of any 16-bit registers), and the 4MHz Z80 is clocked faster... Memory contention is a problem for Z80-based systems, but there are ways of mitigating the problem. Anyway, I'd like to see a more detailed study — I'm not saying the comparison is false, but I don't know that it's correct. --StuartBrady (Talk) 12:17, 5 July 2006 (UTC)[reply]

Useful background in Berkeley RISC

There's a lot of useful, general info on RISC in that article. Merge some of it, maybe? MOXFYRE (contrib)

'Other solutions' assumptions

I have problems within the section "Other solutions" regarding conclusions. The troublesome spot is in the middle of an paragraph in the middle of the section, the paragraph beginning "These techniques relied on increasing speed..." (I've flagged the end of the spot with a comment: "This is a false premise and conclusion. In fact, the first computers to deploy parallelism and pipelining used then current CISC (or RICH) technologies.")

Pipelining and parallelism both appeared a quarter century before RISC arrived on the scene, and are hardly unique to RISC computers. The article contends "RISC was tailor-made to take advantage of these techniques", an assertion not borne out by the facts. RISC can be made to take advantage, but tailor-made, no. That's a matter of design.

Parallelism, pipelining, and word-alignment can help either architecture to achieve faster speeds, and these enhancements are not unique to– or depend upon– either architecture model. I urge the paragraph be changed to reflect this, as indeed the section title seems to suggest.

--UnicornTapestry (talk) 17:35, 6 December 2007 (UTC)[reply]

Word-aligned operations

The "Other solutions" section says:

Yet another technique was the use of word-aligned operations, primarily for data memory access, but in some processors for instruction fetch as well, the technology characterized by the early CDC 6600, the IBM System/360 Model 44, and the DEC Alpha a quarter of a century later. For example, the IBM/370 MVC and MVCL instructions could directly move an arbitrary byte string from any memory legal location to any other, regardless of alignment. However, even within the 370, working with word-aligned data locations was considerably faster since precise byte alignment took additional overhead. RISC machines, such as the CDC and the DEC Alpha, relied upon block moves of the greater part of the data, deploying register shift instructions to handle the beginning and end (head and tail) of long strings. ("Word" in this case is hardware defined: For the DEC Alpha, a word was 8 bytes (64 bits); in early Intel processors, a word was 2 bytes (16 bits); within the IBM 370, a halfword was 2 bytes (16 bits), a fullword was 4 bytes (32 bits), and a doubleword was 8 bytes (64 bits), the longer length alignments being the most efficient. The CDC series used 6-bit characters rather than 8-bit bytes and a word length of 60 bits.)

These techniques relied on increasing speed by adding complexity to the basic layout of the CPU, as opposed to the instructions running on them. With chip space being a finite quantity, in order to include these features something else would have to be removed to make room. RISC was tailor-made to take advantage of these techniques, because the core logic of a RISC CPU was considerably simpler than in CISC designs. Although the first RISC designs had marginal performance, they were able to quickly add these new design features and by the late 1980s they were significantly outperforming their CISC counterparts. In time this would be addressed as process improved to the point where all of this could be added to a CISC design and still fit on a single chip, but this took most of the late-80s and early 90s.

(It originally said "IBM/370 model 44", but no such model existed; the Model 44 was an S/360, not an S/370.)

In the original S/360, instructions were always aligned on 2-byte boundaries, and, unless the machine had the "byte-oriented operand feature", the operands of fixed-point, floating-point, and bitwise operations had to be aligned on the appropriate boundaries[2]. S/370 made that a standard feature; in S/360's with that feature, and S/370's, unaligned references, while handled transparently in hardware/firmware, were, indeed, faster.

In some machines, operand references were by definition aligned, as they didn't have byte addressibility, just word addressibility; the CDC 6600 was such a machine. (Multiple instructions were packed into a single word; the target of a branch instruction had to be the first instruction in a word[3].)

Instructions in most if not all of the original RISC processors fit in a 32-bit word, and had to be aligned; operands had to be aligned on their natural boundaries. I think most PowerPC processors could handle unaligned operators in hardware, and some other RISC processors may have supported that as well. (SPARC still doesn't.) MIPS had special "load word left", "load word right", "store word left", and "store word right" instructions from which to synthesize unaligned loads and stores.

The DEC Alpha took this further, by not having 8-bit or 16-bit load or store instructions; loading an 8-bit or 16-bit quantity was done by loading the word containing that quantity and doing shifts and masks with the aid of special instructions to assist that.

As far as I know, adding support for unaligned operations adds complexity to the CPU, so "the use of word-aligned operations" doesn't add complexity. Guy Harris (talk) 00:55, 7 December 2007 (UTC)[reply]

Guy is correct on all counts, including the S/360 44. I am responsible for the first of the two paragraphs, which intends to make the point that word-alignment is a technique used to speed throughput, or conversely, microcode to perform byte alignment sucks cycles. Word-alignment has been a traditional design criterion in many RISC machines, but adopting word-alignment can speed CISC-architecture programs.

I have voiced an objection to wording in the second paragraph in the preceding section above, "Other solutions assumptions". (See above.)

Further note of interest:

The CDC 6600 actually padded code with NOPs to get word alignment for branches (jumps).

--UnicornTapestry (talk) 05:50, 7 December 2007 (UTC)[reply]

'Other solutions' correction

After giving thought to the the problematic paragraph mentioned in the two sections immediately above, I believe it has wrong analysis and problems beyond what was originally argued. Virtually every sentence bears a fallacious premise or conclusion without a supporting reference, which makes it hard to tell what the paragraph was trying to say. (Most of its issues are documented almost from the beginning of this discussion page.)

I'm not beating up on the author here; my guess is that this one paragraph received less attention than the overall excellent body of work. This is the paragraph in question:

These techniques relied on increasing speed by adding complexity to the basic layout of the CPU, as opposed to the instructions running on them. With chip space being a finite quantity, in order to include these features something else would have to be removed to make room. RISC was tailor-made to take advantage of these techniques, because the core logic of a RISC CPU was considerably simpler than in CISC designs. Although the first RISC designs had marginal performance, they were able to quickly add these new design features and by the late 1980s they were significantly outperforming their CISC counterparts. In time this would be addressed as process improved to the point where all of this could be added to a CISC design and still fit on a single chip, but this took most of the late-80s and early 90s.

The essential valid points I've worked to capture within the article:

These techniques relied either on additional complexity within other regions of the CPU (pipelining), CPU topologies (parallelism), or in the software programs (word-alignment, instruction multiplicities).
RISC performance improved through the 1980s as typified in commercial machines by MIPS, DEC, IBM, and Motorola. CISC manufacturers, led by Intel and AMD, countered with RICH x86 advances of their own within a single chip by the early 1990s.

I just deleted the entire paragraph about alignment, and the other mention of word alignment, given that it's misleading. Processors with word addressing, by definition, require alignment, as addresses can't refer to something not on a word boundary, as they refer to words (you might have specialized instructions with byte pointers, but fixed-point and floating-point instructions take word addresses). The byte-addressed S/360 had required alignment by default with an option to allow unaligned accesses (with a warning that unaligned accesses might incur a performance penalty); S/370 made that option standard, again with a performance penalty. Thus, at least in the case of S/3x0, it's not as if alignment requirements were added; instead, they were removed. It's also not as if the performance penalty was added as a result of changes to speed the processors up; it was always there. Guy Harris (talk) 23:25, 8 December 2007 (UTC)[reply]

I wish you had not deleted that. There's a fundamental misunderstanding regarding word alignment versus a word-addressable computer. An unmodified IBM/360 used word alignment for math calculations, but NOT for other operations, to wit, MVI, CLI, TM, STC, MVC, CLC, PACK, and perhaps a hundred more.

Word-alignment is elemental in RISC design but optional in CISC. Word alignment was near the top of the Alpha's development list. They could have made the machine work on individual bytes, but they didn't, being antithetical to RISC philosophy. Byte addressing was implied, but you couldn't directly operate on a single byte. A RISC machine is not the same as the older, word-addressable computers which were often considered 'scientific' machines.

There's an engineering quirk with many computer architectures: The more zeros in the low-order addressing bits, the faster the access. That's one of the reasons RISC shuns byte-addressability.

It's possible the paragraph might have used improvement. I'm new to Wikipedia, but an old hand with CISC and RISC. If I can explain this better, feel free to contact me, but please don't undo someone else's work.

respectfully, --UnicornTapestry (talk) 21:02, 9 December 2007 (UTC)[reply]

The work was wrong, and should have been undone. It spoke of "the user of word-aligned operands" in a section that started with

While the RISC philosophy was coming into its own, new ideas about how to dramatically increase performance of the CPUs were starting to develop.

The notion of word-aligned operands being more efficient on byte-addressable processors long antedates RISC (dating back at least as far as S/360), so it's not a notion that was "starting to develop" "while the RISC philosophy was coming into its own". What happened with most RISC processors is that they got rid of support for unaligned loads and stores, saving some hardware.

Word-alignment of data is not "elemental" in RISC design. PowerPC processors, for example, do unaligned loads and stores in hardware. And operating on individual bytes is not "antithetical to RISC philosophy"; Alpha was the exception, not the rule, and with the BWX extension, it ceased to be an exception. MIPS, SPARC, POWER/PowerPC/Power Architecture, and PA-RISC, for example, all support byte and 16-bit load and store instructions.

As for alignment of instructions, S/3x0 requires alignment of instructions - it's just that it has variable-length instructions, the length quantum of which is 16 bits, so all instructions are aligned on 16-bit boundaries. RISC processors tend to have fixed-length instructions, usually 32 bits, so the length quantum is 32 bits and all instructions are aligned on 32-bit boundaries.

The issue of block moves and string/decimal instructions is a separate issue from the issue of alignment of integral and floating-point operands. Yes, most RISC processors didn't have single instructions for doing block moves, string operations, and decimal operations; those were done by subroutines (which, I suspect, used algorithms similar to what was used in microcode for those instructions on processors that implemented them in microcode). That's just one component of the standard RISC vs. CISC argument; it's not an issue of requiring word alignment.

As for the confusion between word addressability and word alignment, that confusion was present in the paragraph (given the CDC 6600 citations), which is another reason why it needed to go.

A discussion of the issue of word alignment might be useful, but it doesn't belong in the "other solutions" section, and needs to accurately describe what was done in RISC architectures (and not just in Alpha, which was one of the later RISC architectures - and which, for better or worse, ended up abandoning the "purity" of not having byte and 16-bit load and store instructions). Guy Harris (talk) 23:52, 9 December 2007 (UTC)[reply]

>The notion of word-aligned operands being more efficient on byte-addressable processors long antedates RISC…

Of course, it did, as did parallelism and pipelining. That's not at issue.

>What happened with most RISC processors is that they got rid of support for unaligned loads and stores, saving some hardware.

RISC wasn't about saving hardware; its quest was a search for speed.

>Word-alignment of data is not "elemental" in RISC design. PowerPC processors, for example, do unaligned loads and stores in hardware. And operating on individual bytes is not "antithetical to RISC philosophy";

Word-alignment goes to the heart of RISC design, but keep in mind the topic is "Other Solutions". Deliberate word-alignment dates back to the CDC-6600 (early RISC) and similar era CISC machines, where byte addressing in most architectures comes at the cost of speed. The whole concept of RISC is to move operands into registers whereupon datum can be manipulated. While some RISC architectures can and do manipulate bytes in memory, memory-memory is not part of the elemental RISC concept, but memory-register is.

>As for alignment of instructions, S/3x0 requires alignment of instructions - it's just that it has variable-length instructions, the length quantum of which is 16 bits, so all instructions are aligned on 16-bit boundaries. RISC processors tend to have fixed-length instructions, usually 32 bits, so the length quantum is 32 bits and all instructions are aligned on 32-bit boundaries.

The same hardware that finds it less of a burden to fetch aligned data also prefers instructional alignment. This is a minor point, but manifests itself in two ways, both involving branches (and cases in which one instruction references another). On a CDC-6600, Compass was designed to pad out a word with NOPs following branches as a way to assist pipelining. By itself, that's not noteworthy.

More interesting is that branch targets execute faster when aligned to the width of the data path, i.e, 32, 64 bits, or whatever. The reason is that fetch mechanisms (generally) load from the data width granularity boundary. Thus, exploiting your familiarity of an IBM S/3x0 with a 64-bit data path and a target address of, say, 0xnnnnnnnE, the fetch mechanism actually loads from 0xnnnnnnn8 and discards the first 6 bytes until it arrives at the intended instruction. If the previous bytes contain instructions that pass through the target, then not a lot can be done about it since padding NOPs may not be cost effective. However, if the branch target is not encumbered by IC pass-through, then it makes sense to align the target to the next boundary, typically a doubleword boundary to take advantage of the maximum granularity. It's not an accident that module boundaries (CSECT on 360/370) are aligned on data width boundaries, and good system software designers exploit this with branches to data-width granular boundaries.

Does this fairly illuminate the concept? Instruction sets with single-byte or two-byte instructions have to be mindful of both these constraints. As I said, instruction alignment isn't a crucial point, but clearly falls within the "Other Solutions" under discussion.

>The issue of block moves and string/decimal instructions is a separate issue from the issue of alignment of integral and floating-point operands. Yes, most RISC processors didn't have single instructions for doing block moves, string operations, and decimal operations; those were done by subroutines (which, I suspect, used algorithms similar to what was used in microcode for those instructions on processors that implemented them in microcode).

This is part of the issue, here. I'm trying to gently explain how microcode works. Why be hostile to the knowledge and experience of someone who is familiar with processors at the development level?

>That's just one component of the standard RISC vs. CISC argument; it's not an issue of requiring word alignment.

We're at risk of losing sight of the "Other Solutions" heading and introductory paragraph: While the RISC philosophy was coming into its own, new ideas about how to dramatically increase performance of the CPUs were starting to develop.

The point is, whether RISC or CISC, odd byte boundaries are costly. Exploiting your excellent knowledge of the Sys/3x0 line, consider a 370 store multiple STM instruction to an odd-byte boundary. Now, consider what it would take to obtain the same effect on the IBM/360. The 370 carries a portmanteau of extra microcode, just to handle that alignment. Whether CISC or RISC, word alignment (actually path width alignment) generally improves performance.

>As for the confusion between word addressability and word alignment, that confusion was present in the paragraph (given the CDC 6600 citations), which is another reason why it needed to go.

I've offered a tutorial regarding the difference and significance in the preceding paragraphs. I've shared with you the reasoning, rationale, and effect of word-alignment. Please don't feel entrenched.

The purpose of talk pages is to discuss changes, not to unilaterally announce the deletion of work someone else put in. You have, I believe, vastly more experience with Wikipedia than many of us, but you should not conclude that the rest of us have vastly less technical experience.

>A discussion of the issue of word alignment might be useful, but it doesn't belong in the "other solutions" section, and needs to accurately describe what was done in RISC architectures (and not just in Alpha, which was one of the later RISC architectures - and which, for better or worse, ended up abandoning the "purity" of not having byte and 16-bit load and store instructions).

The purity? Byte addressability is not a RISC fiefdom.

Keep in mind: "Other Solutions". Not every RISC processor had parallelism, just as not every RISC processor enforced word-alignment. Contrarily, certain CISC processors deployed pipelining, parallelism, and word-alignment in a bid for greater throughput. Word-alignment (data path alignment), while not as sexy or expensive as paralleling or pipelining, was one more tool in the quest for speed. However, before expensive approaches, hardware designers first look to inexpensive solutions, the basics that make a difference.

We haven't begun to mention other concepts considered for RISC as well, such as op-code ordering, used by the Transputer and, I think, the 88100.

(We're tending to speak in absolutes, but if this talk page demonstrates anything, it's that RISC/CISC characteristics aren't absolute, but blurred between architectures.)

>The work was wrong, and should have been undone. (with reference to BE BOLD page)

If you want contributions from people who've worked at this level, hear them out. On the one hand, you confess that you do not have knowledge of hardware logic and the microcode level, and yet you insist others with experience must be wrong? The page you refer to says Be Bold, not Be Imperious, neither being a reason to stamp out the contributions of others.

Clearly the man with the delete key wields power over others. I consider it a token of respect to myself and others for you to restore what you deleted.

--UnicornTapestry (talk) 10:59, 10 December 2007 (UTC)[reply]

The paragraph in question started out talking about alignment in general:

Yet another technique was the use of word-aligned operations, primarily for data memory access, but in some processors for instruction fetch as well, the technology characterized by the early CDC 6600, the IBM System/360 Model 44, and the DEC Alpha a quarter of a century later.

and then started talking about memory-to-memory instructions:

For example, the IBM/370 MVC and MVCL instructions could directly move an arbitrary byte string from any memory legal location to any other, regardless of alignment. However, even within the 370, working with word-aligned data locations was considerably faster since precise byte alignment took additional overhead.

and then spoke of something that's more of an issue of the way that memory-to-memory operations (as opposed to just memory-to-memory instructions) are implemented (for example, C has memory-to-memory operations such as memcpy(), strcpy(), strcmp(), etc., which might be implemented on some platforms as a single instruction, on other platforms as some setup instructions plus a single instruction, and on other platforms as a loop made up from simpler instructions):

RISC machines, such as the CDC and the DEC Alpha, relied upon block moves of the greater part of the data, deploying register shift instructions to handle the beginning and end (head and tail) of long strings.

I would not be surprised to hear that at least some processors with block-move/string instructions did the same thing in the implementation of those instructions, i.e. did most of a long move a "word" at a time, and to special-case the beginning and end of the operation (if not, why not?).

If the technique that paragraph intended to discuss was the removal of memory-to-memory instructions, and synthesizing memory-to-memory operations as loops of memory-to-register and register-to-memory operations, it should speak of that as the technique, not "the use of word-aligned operations" - on all but one commercial RISC processor, and on the S/360 Model 44, you could do, for example, string moves as a loop of byte load and store instructions (that wouldn't be the fastest possible implementation, but it would be a correct implementation).

If the technique was that of doing the bulk of memory-to-memory operations as a loop (possibly completely unrolled if the length is known at code generation time) of word-at-a-time operations with the beginning and end handled specially, and with operands at different alignments handled specially, it should speak of that as the technique - a technique that could even be used for MVC/MVCL on S/3xx, with the microcode implementing in that fashion.

The two are perhaps somewhat connected, in that doing the former might induce one to do it in the latter fashion, but are not the same, as one could do the former without the latter (and probably leave some performance on the table as a result), and do the latter without the former (for example, by doing the latter in the microcode for MVC/MVCL). Guy Harris (talk) 01:41, 11 December 2007 (UTC)[reply]

RISC, to recap, embodied a number of theories, first and foremost,

1. the unbundling of microcode

with was the only true RISC innovation, removal of what had previously been considered an advance in RICH development. Also floated were a number technologies and techniques that had already been in use or were considered for CISC, to wit:

2. word-alignment

3. pipe-lining

and eventually another innovation from CISC,

4. paralleling

Still another technique was promoted by INMOS (whether or not they actually innovated it),

5. op-code ordering

You clearly are passionate about the subject and I have knowledge to fill in the gaps. With your Wikipedia experience, I would consider it an honor if we were to put our heads together on this. I don't want to leave the topic with a USA Today 'hilites' feel, when I know more was involved.

What do you say?

--UnicornTapestry (talk) 14:46, 11 December 2007 (UTC)[reply]

The first four of those five techniques do seem to be at the core of RISC, although I wouldn't characterize the removal of microcode as much of an innovation; after all there were more than a few pre-microcode machines. RISC doesn't seem to be a technique in itself but rather a design emphasis - that of emphasizing performance over ISA functionality. The central realization enabling this is that providing broad functionality in the ISA has substantial drawbacks and little advantage. (The advantage is mainly allowing designers to appear responsive to the desires of ISA-users.)

One view is that by minimizing the supported ISA functionality, RISC allows a design to make more efficient use of resources, particularly two: chip real estate and designer attention. By requiring less of each for basic functionality, more of each can be allocated to other areas which advance performance, such as caches, pipelining, branch prediction, etc. There is a synergy as well with some of these: the simplicity of the basic functionality requires less effort in the design of some performance techniques such as piplining than would a more complex ISA.

The advantages of RISC were strongest when hardware technology was young and improving most quickly. New process technology could be used first with smaller designs and thus RISC could take advantage of that exactly because it had a simpler core. This is why some RISC designs have had some success in spite of the large bandwagon-effect advantage enjoyed by the CISC x86.

-R. S. Shaw (talk) 01:44, 12 December 2007 (UTC)[reply]

"Unbundling of microcode" might alternately be described as providing as instructions the micro-operations implemented by the processor, rather than providing instructions that involve multiple micro-operations, such as memory-to-register arithmetic instructions (composed of a load micro-operation and an arithmetic micro-operation) or auto-increment/auto-decrement addressing modes or block move/string/variable-length decimal operations. Some machines, such as the GE-600 series and Honeywell 6000 series, implemented quite complex instruction sets without microcode, and the original Pyramid Technology minicomputers, as I remembered, were considered RISC but were implemented using microcode. (In addition, even on RISC processors some instructions might involve multiple micro-operations, such as integer multiply and divide instructions if done iteratively, shifts if not done with a barrel shifter, POWER/PowerPC/Power Architecture load multiple/store multiple instructions, and some floating-point operations.)

That might also be a bit clearer than "unbundling" of microcode (especially given that, as I remember, IBM, for example, used "unbundling" to refer to selling software separately from hardware, i.e. I'm not sure it would be immediately obvious what "unbundling" means).

As per my earlier comments, "word-alignment" isn't sufficient to indicate what is meant. If you mean implementing block move/string/etc. operations with as many word-at-a-time operations as possible, that needs to be clarified. Guy Harris (talk) 09:30, 12 December 2007 (UTC)[reply]

The most basic logic building blocks include NANDs and NORs, the 'atoms' of computing. Those can be assembled to create 'molecules', such as flip-flops and half-adders. By themselves, they still aren't particularly useful, but they can be arranged to form, say, 4 instructions that will load 2 values into registers, add them, and store the result into memory again.

Those four instructions are essential and represent the simplest useful instructions for a machine language programmer. We might show the algorithm in pseudo-code as:

LD R1,location1 ; load

LD R2,location2 ; load

AD R1,R2 ; add

ST R1,location1 ; store

If this takes place on a machine that doesn't require word-alignment, the pseudo-code might expand to something like this, where x and y are byte offsets within a 32-bit word:

LD R1,location1-x ; load left partial

LD R3,location1+4-x ; load right partial

SL R1,(x*8) ; shift left partial

SR R3,32-x*8 ; shift right partial

OR R1,R3 ; OR them for operand 1

LD R2,location2-y ; load left partial

LD R3,location2+4-y ; load right partial

SL R2,(y*8) ; shift left partial

SR R3,32-y*8 ; shift right partial

OR R2,R3 ; OR them for operand 2

AD R1,R2 ; Whew, NOW we add them!

LR R2,R1 ; copy the result

SR R1,(x*8) ; shift left partial

LD R3,location1-x ; load left fragment

SR R3,(x*8) ; shift out former value

SL R3,(x*8) ; shift zeros in place

OR R1,R3 ; left partial ready

ST R1,location1-x ; store left partial

SL R2,32-x*8 ; shift right partial

LD R3,location1+4-x ; load right fragment

SL R3,32-x*8 ; shift out former value

SR R3,32-x*8 ; shift zeros in place

OR R2,R3 ; left partial ready

ST R2,location1+4-x ; store right side

(It's possible some of the steps can be done with masking. If there's a math error or bug in the above, I DON'T want to hear about it. This is for illustration purposes.)

Microcode bundled (or combined) multiple instructions into a higher-level instruction, which we'll call ADD.

ADD location1,location2

To an assembly language programmer, it looks like one instruction, but internally, it's at least 3 and as many as 20-odd basic instructions utilising registers unavailable to the general public. (In practice, microcode would probably exploit internal double-width registers and masking which cuts the steps in half.)

As you can surmise, this RICH microcode was a major boon for assembly language programmers, reducing both coding and debugging time. However, these complex combination instructions (CISC) created overhead which RISC developers sought to reduce.

Guy, I have demonstrated my point in more than one way and I've held out an olive branch. Restore the section, let's laugh about it, and move on.

--UnicornTapestry (talk) 17:24, 12 December 2007 (UTC)[reply]

I don't see the relevance of the above half-adder analysis to the alignment paragraph question, but in any case, I think I should say that article is better off without the alignment paragraph (IMO of course). Memory alignment has been recognized as important to performance from the very earliest computers, so it does not qualify as one of the "new ideas about how to dramatically increase performance of the CPUs" that were starting to develop. -R. S. Shaw (talk) 23:42, 12 December 2007 (UTC)[reply]

Memory alignment was the only choice for a couple of decades. Subsequently, byte addressing only became a commercial realty in the 1960s, and the 'importance' of word alignment wasn't widely clear until the 1980s. So no, at the risk of alienating everyone, the importance was not recognized since the earliest computers.

The half-adder and surrounding text was mentioned as part of a rough explanation of hardware layering constructs for software people (in response to the previous post), i.e,

1. gates (NANDs, NORs)

2. logic (adders, etc)

3. basic instructions

4. microcode

At a time when Wikipedia is criticized for inaccuracy, I'm troubled that opinion is overriding fact. For example, op-code ordering was summarily dismissed above as a throughput-boosting technique, even though it was a key transputer inclusion. Are we to say if we didn't happen to read it in Wired, it didn't exist?

I've gone to a lot of trouble to document and explain, and I hope educate a little. It would be great if other engineers with experience at this level would contribute. But, it's up to you guys. If the group wants a less complete (in my opinion, of course!) picture of 'Other Solutions', then public opinion trumps.

--UnicornTapestry (talk) 00:52, 13 December 2007 (UTC)[reply]

The introduction of sub-word addressing, notably with System 360, included direct recognition of the importance of word alignment for word-sized operands like the binary numeric types. The later introduction of unaligned operands (in 370s) was certainly not done for performance, but for unrelated benefits such as programmer productivity or competitive advantage through feature introduction. To say that 'importance' of alignment was not understood before the 1980s seems more like opinion than fact. -R. S. Shaw (talk) 20:48, 13 December 2007 (UTC)[reply]

With all due respect, I did not say 'understood' at all. The point that was addressed much earlier in this section referred to a quest for speed, which required challenging accepted practices.

I've been struggling to explain what happens at the microcode level, not at the programming or even application level, i.e, the process not the result. It's a long section, but I recommend perusing it.

I've also attempted without success to offer a technical/historical perspective of innovations within the industry from the 6600 through the Alpha which some (especially DEC) considered the peak of RISC development.

BTW, it occurred to me that when you spoke of "the very earliest computers", that could mean very different things to someone who came on-line (computationally) in the 1990s, 1970s, 1950s, or ... the 1930s.

--UnicornTapestry (talk) 02:53, 14 December 2007 (UTC)[reply]

Long but interesting read. Reminds me of pigeons in the park, tapestry. Build a rock solid foundation, but if they can't knock you off the pedestal, then they'll crap all over you.

One thing I never saw explained: what is op code ordering? --198.161.33.146 (talk) 00:38, 3 January 2008 (UTC)[reply]

INMOS, for example, realized that some instructions were used far more often than others. They ran an analysis of numerous programs to determine which instructions were used the most and which the least. That information was used to determine which instructions received highest priority and, if I recall correctly, the shortest instruction length. (That was a long time ago!)

--UnicornTapestry (talk) 16:19, 24 January 2008 (UTC)[reply]