Distributed computing
This page or section may contain link spam masquerading as content. |
Distributed computing is decentralised and parallel computing, using two or more computers communicating over a network to accomplish a common objective or task. The type of hardware, programming languages, operating systems and other resources may vary drastically. It is similar to computer clustering with the main difference being a wide geographic dispersion of the resources.
Organization
Organizing the interaction between each computer is of prime importance. In order to be able to use the widest possible range and types of computers, the protocol or communication channel should not contain and use any information that may not be understood by certain machines. Special care must also be taken that messages are indeed delivered correctly and that invalid messages are rejected which would otherwise bring down the system and perhaps the rest of the network.
Another important factor is the ability to send software to another computer in a portable way so that it may execute and interact with the existing network. This may not always be possible or practical when using differing hardware and resources, in which case other methods must be used such as cross-compiling or manually porting this software.
Goals and Advantages
There are many different types of distributed computing systems and many challenges to overcome in successfully designing one. The main goal of a distributed computing system is to connect users and resources in a transparent, open, and scalable way. Ideally this arrangement is drastically more fault tolerant and more powerful than many combinations of stand-alone computer systems.
Openness
Openness is the property of distributed systems such that each subsystem is continually open to interaction with other systems (see references). Web Services protocols are standards which enable distributed systems to be extended and scaled. In general, an open system that scales has an advantage over a perfectly closed and self-contained system.
Consequently, open distributed systems are required to meet the following challenges:
- Monotonicity
- Once something is published in an open distributed system, it cannot be taken back.
- Pluralism
- Different subsystems of an open distributed system include heterogeneous, overlapping and possibly conflicting information. There is no central arbiter of truth in open distributed systems.
- Unbounded nondeterminism
- Asynchronously, different subsystems can come up and go down and communication links can come in and go out between subsystems of an open distributed system. Therefore the time that it will take to complete an operation cannot be bounded in advance (see unbounded nondeterminism).
Scalability
A scalable system is one that can easily be altered to accommodate changes in the number of users, resources and computing entities affected to it. Scalability can be measured in three different dimensions:
- Load scalability
- A distributed system should make it easy for us to expand and contract its resource pool to accommodate heavier or lighter loads.
- Geographic scalability
- A geographically scalable system is one that maintains its usefulness and usability, regardless of how far apart its users or resources are.
- Administrative scalability
- No matter how many different organizations need to share a single distributed system, it should still be easy to use and manage.
Some loss of performance may occur in a system that allows itself to scale in one or more of these dimensions.
Drawbacks and Disadvantages
Leslie Lamport describes a distributed system like this: "You know you have one when the crash of a computer you've never heard of stops you from getting any work done."
Architecture
Various hardware and software architectures exist that are usually used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of that network being printed onto a circuit board or made up of several loosely-coupled devices and cables. At a higher level, it is necessary to interconnect processes running on those CPUs with some sort of communication system.
- Client-server — Smart client code contacts the server for data, then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change.
- 3-tier architecture — Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are 3-Tier.
- N-tier architecture — N-Tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers.
- Tightly coupled (clustered) — refers typically to a set of highly integrated machines that run the same process in parallel, subdividing the task in parts that are made individually by each one, and then put back together to make the final result.
- Peer-to-peer — an architecture where there is no special machine or machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers.
- Service oriented — Where system is organized as a set of highly reusable services that could be offered through a standardized interfaces.
- Mobile code — Based on the architecture principle of moving processing closest to source of data
- Replicated repository — Where repository is replicated among distributed system to support online / offline processing provided this lag in data update is acceptable.
Concurrency
Distributed computing implements a kind of concurrency.
Multiprocessor systems
A multiprocessor system is simply a computer that has more than one CPU on its motherboard. If the operating system is built to take advantage of this, it can run different processes on different CPUs, or different threads belonging to the same process.
Over the years, many different multiprocessing options have been explored for use in distributed computing. Intel CPUs employ a technology called Hyperthreading that allows more than one thread (usually two) to run on the same CPU. The most recent Sun UltraSPARC T1, Athlon 64 X2 and Intel Pentium D processors feature multiple processor cores to also increase the number of concurrent threads they can run.
Multicomputer systems
A multicomputer system is a system made up of several independent computers interconnected by a telecommunications network.
Multicomputer systems can be homogeneous or heterogeneous: A homogeneous distributed system is one where all CPUs are similar and are connected by a single type of network. They are often used for parallel computing which is a kind of distributed computing where every computer is working on different parts of a single problem.
In contrast an heterogeneous distributed system is one that can be made up of all sorts of different computers, eventually with vastly differing memory sizes, processing power and even basic underlying architecture. They are in widespread use today, with many companies adopting this architecture due to the speed with which hardware goes obsolete and the cost of upgrading a whole system simultaneously.
Computing taxonomies
The types of distributed computers are based on Flynn's taxonomy of systems; single instruction, single data (SISD), multiple instruction, single data (MISD), single instruction, multiple data (SIMD) and multiple instruction, multiple data (MIMD). Other taxonomies and architectures available at Computer architecture and in Category:Computer architecture.
Computer clusters
A cluster is multiple stand-alone machines acting in parallel across a local high speed network. Distributed computing differs from cluster computing in that computers in a distributed computing environment are typically not exclusively running "group" tasks, whereas clustered computers are usually much more tightly coupled. The difference makes distributed computing attractive because, when properly configured, it can use computational resources that would otherwise be unused. It can also make available computing resources which would otherwise be impossible.
The Second Life grid is a heterogeneous multicomputer and so are most Beowulf clusters.
Grid computing
A grid uses the resources of many separate computers connected by a network (usually the internet) to solve large-scale computation problems. Most use idle time on many thousands of computers throughout the world. Such arrangements permit handling of data that would otherwise require the power of expensive supercomputers or would have been impossible to analyze.
Distributed computing projects also often involve competition with other distributed systems. This competition may be for prestige, or it may be a means of enticing users to donate processing power to a specific project. For example, stat races are a measure of the work a distributed computing project has been able to compute over the past day or week. This has been found to be so important in practice that virtually all distributed computing projects offer online statistical analyses of their performances, updated at least daily if not in real-time.
See List of distributed computing projects for more information on specific projects.
Examples
An example of a distributed system is the World Wide Web. As you are reading a web page, you are actually using the distributed system that comprises the site. As you are browsing the web, your web browser running on your own computer communicates with different web servers that provide web pages. Possibly, your browser uses a proxy server to access the web contents stored on web servers faster and more securely. To find these servers, it also uses the distributed domain name system. Your web browser communicates with all of these servers over the Internet, via a system of routers which are themselves part of a large distributed system.
See also
- Algorithms
- Memory and communication models
- History
- Physical computing hardware topology
- Render farms — The rendering of 3D computer images and movies is often spread between several computers to speed up the process.
- Network of Workstations
- Supercomputer
- Flash mob computing
- Lists
- List of distributed computing publications
- List of distributed computing projects
- Distributed computing category
- Other
- Parallel computing
- Application server
- Software componentry
- Distributed computing environment
- Fallacies of Distributed Computing
References
- Davies, Antony (2004). "Computational Intermediation and the Evolution of Computation as a Commodity" (PDF). Applied Economics.
{{cite journal}}
: Unknown parameter|month=
ignored (help) - Kornfeld, William (1981). "The Scientific Community Metaphor". MIT AI (Memo 641).
{{cite journal}}
: Unknown parameter|coauthor=
ignored (|author=
suggested) (help); Unknown parameter|month=
ignored (help) - Hewitt, Carl (1983). "Analyzing the Roles of Descriptions and Actions in Open Systems". Proceedings of the National Conference on Artificial Intelligence.
{{cite conference}}
: Unknown parameter|booktitle=
ignored (|book-title=
suggested) (help); Unknown parameter|coauthor=
ignored (|author=
suggested) (help); Unknown parameter|month=
ignored (help) - Hewitt, Carl (1985). "The Challenge of Open Systems". Byte Magazine.
{{cite journal}}
: Unknown parameter|month=
ignored (help) - Hewitt, Carl (1999-10-23–1999-10-27). "Towards Open Information Systems Semantics". Proceedings of 10th International Workshop on Distributed Artificial Intelligence. Bandera, Texas.
{{cite conference}}
: Check date values in:|date=
(help); Unknown parameter|booktitle=
ignored (|book-title=
suggested) (help) - Hewitt, Carl (1991). "Open Information Systems Semantics". Journal of Artificial Intelligence.
{{cite journal}}
: Unknown parameter|month=
ignored (help)
Lists of noteworthy things
Distributed computing infrastructure
- Moab Grid Suite — Cluster workload management, reporting tools, and end user submission portal
- Remote procedure call — This high-level communication mechanism allows processes on different machines to communicate using procedure calls even though they don't share the same address space.
- Distributed objects — Systems like CORBA, Microsoft D/COM, Java RMI, ReplicaNet [1], and others that try to map object oriented design onto the network.
- SOAP
- XML-RPC
- GLOBE [2] — Maintained by Andrew S. Tanenbaum and others.
- Group Communication
- Acute [3] — Distributed functional programming with migration based on OCaml.
- PYRO — Python Remote Objects - Free Software Distributed Object Technology
- RPyC — Remote Python Call, fully transparent, symmetrical RPC and distributed-computing mechanism for python
- Distributed Computing Toolbox for MATLAB — enables distributing MATLAB applications using technologies like MPI or LSF
- BOINC — Berkeley Open Infrastructure for Network Computing
- Globus Toolkit — an open source software toolkit used for building Grid systems and applications
- Beowulf clusters [4] — Linux based parallel computing using commodity hardware.
- Cosm [5]
- Proprietary infrastructure
- Cluster Resources, Inc. [6] — Software workload and resource management tools
- DataSynapse [7]
- Digipede [8] — Has a commercial, all .NET distributed computing solution for Windows.
- Entropia [9] — (Defunct) Vendor of distributed computing software technologies.
- ICE
- OfficeGRID [10] — A grid solution from MESH-Technologies A/S.
- Parabon Computation [11] — One of the largest commercial distributed computing networks.
- Popular Power [12] — (Defunct) building a platform for Internet-wide distributed computing.
- United Devices [13] — One of the largest commercial distributed computing networks.
- Xgrid — Software developed by Apple's Advanced Computation Group.
Distributed computing conferences and journals
- The International Conference on Dependable Systems and Networks
- ACM Symposium on Principles of Distributed Computing
- Journal of Parallel and Distributed Computing
- IEEE transactions on Parallel and Distributed Systems
- Distributed Computing
- ACM/IEEE Distributed Computing on Sensor Systems
Notable people
The following is a list of people who have significantly contributed to the distributed computing research:
- Foundations and Principles
Gul Agha, Henry Baker, James Aspnes, Hagit Attiya, Will Clinger, Danny Dolev, Shlomi Dolev, Ian Foster, Michael J. Fischer, Vassos Hadzilacos, Carl Hewitt, Leslie Lamport, Nancy Lynch, Michael Merritt, Paul Spirakis, Sam Toueg, Aki Yonezawa
- Systems
Ken Birman, Frans Kaashoek, Barbara Liskov, Andrew Tanenbaum
Currently Operating Projects
List_of_distributed_computing_projects
External links
- DCS-Wiki Scientific Wiki for Complex and Distributed Systems
- distributedcomputing.info
- How-To: Join Distributed Computing projects that benefit humanity
- Template:Dmoz
- A primer on distributed computing
- Distributed Systems Department
- Distributed.net: Node Zero
- Info Sharkz Distributed Computing
- Cluster Builder
- Majestic-12 Distributed Search Engine
- Distributed Systems at the University of Patras
- http://www.climateprediction.net
- Project Statistics provided by Team Free-DC