Jump to content

Software testing

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Kgf0 (talk | contribs) at 23:23, 27 July 2006 (Agile vs. Traditional: copyedit, add book refs). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Template:Software-development-process Software testing is the process used to help identify the correctness, completeness, security and quality of developed computer software.Testing is a process of executing a program or application in the intent of finding errors. With that in mind, testing can never completely establish the correctness of arbitrary computer software. In other words, testing is criticism or comparison, that is comparing the actual value with an expected one.

There are many approaches to software testing, but effective testing of complex products is essentially a process of investigation, not merely a matter of creating and following rote procedure. One definition of testing is "the process of questioning a product in order to evaluate it", where the "questions" are things the tester tries to do with the product, and the product answers with its behavior in reaction to the probing of the tester. Although most of the intellectual processes of testing are nearly identical to that of review or inspection, the word testing is connoted to mean the dynamic analysis of the product—putting the product through its paces. The quality of the application can, and normally does, vary widely from system to system but some of the common quality attributes include reliability, stability, portability, maintainability and usability.A good test is the one which finds an as yet undiscovered error. Refer to the ISO standard ISO 9126 for a more complete list of attributes and criteria.


Introduction

In general, software engineers distinguish software faults from software failures. In case of a failure, the software does not do what the user expects. A fault is a programming error that may or may not actually manifest as a failure. A fault can also be described as an error in the correctness of the semantic of a computer program. A fault will become a failure if the exact computation conditions are met, one of them being that the faulty portion of computer software executes on the CPU . A fault can also turn into a failure when the software is ported to a different hardware platform or a different compiler, or when the software gets extended.

Software testing may be viewed as a sub-field of software quality assurance but typically exists independently (and there may be no SQA areas in some companies). In SQA, software process specialists and auditors take a broader view on software and its development. They examine and change the software engineering process itself to reduce the amount of faults that end up in the code or deliver faster.

Regardless of the methods used or level of formality involved the desired result of testing is a level of confidence in the software so that the developers are confident that the software has an acceptable defect rate. What constitutes an acceptable defect rate depends on the nature of the software. An arcade video game designed to simulate flying an airplane would presumably have a much higher tolerance for defects than software used to control an actual airliner.

A problem with software testing is that the number of defects in a software product can be very large, and the number of configurations of the product larger still. Bugs that occur infrequently are difficult to find in testing. A rule of thumb is that a system that is expected to function without faults for a certain length of time must have already been tested for at least that length of time. This has severe consequences for projects to write long-lived reliable software.

A common practice of software testing is that it is performed by an independent group of testers after finishing the software product and before it is shipped to the customer. This practice often results in the testing phase being used as project buffer to compensate for project delays. Another practice is to start software testing at the same moment the project starts and it is a continuous process until the project finishes.

Another common practice is for test suites to be developed during technical support escalation procedures. Such tests are then maintained in regression testing suites to ensure that future updates to the software don't repeat any of the known mistakes.

It is commonly believed that the earlier a defect is found the cheaper it is to fix it.

In counterpoint, some emerging software disciplines such as extreme programming and the agile software development movement, adhere to a "test-driven software development" model. In this process unit tests are written first, by the programmers (often with pair programming in the extreme programming methodology). Of course these tests fail initially; as they are expected to. Then as code is written it passes incrementally larger portions of the test suites. The test suites are continuously updated as new failure conditions and corner cases are discovered, and they are integrated with any regression tests that are developed.

Unit tests are maintained along with the rest of the software source code and generally integrated into the build process (with inherently interactive tests being relegated to a partially manual build acceptance process).

The software, tools, samples of data input and output, and configurations are all referred to collectively as a test harness.

White-box and black-box testing

In the terminology of testing professionals (software and some hardware) the phrases "white box", or "glass box", and "black box" testing refer to whether the test case developer has access to the source code of the software under test, and whether the testing is done through (simulated) user interfaces or through the application programming interfaces either exposed by (published) or internal to the target.

In white box testing the test developer has access to the source code and can write code that links into the libraries which are linked into the target software. This is typical of unit tests, which only test parts of a software system. They ensure that components used in the construction are functional and robust to some degree.

In black box testing the test engineer only accesses the software through the same interfaces that the customer or user would, or possibly through remotely controllable, automation interfaces that connect another computer or another process into the target of the test. For example a test harness might push virtual keystrokes and mouse or other pointer operations into a program through any inter-process communications mechanism, with the assurance that these events are routed through the same code paths as real keystrokes and mouse clicks.

In recent years the term grey box testing has come into common usage. The typical grey box tester is permitted to set up or manipulate the testing environment, like seeding a database, and can view the state of the product after their actions, like performing a SQL query on the database to be certain of the values of columns. It is used almost exclusively of client-server testers or others who use a database as a repository of information, but can also apply to a tester who has to manipulate XML files (DTD or an actual XML file) or configuration files directly. It can also be used of testers who know the internal workings or algorithm of the software under test and can write tests specifically for the anticipated results.

Alpha, Beta, and Gamma testing

In the first phase of alpha testing, developers test the software using white box techniques. Additional inspection is then performed using black box or grey box techniques. This is usually done by a dedicated testing team. This is often known as the second stage of alpha testing.

Once the alpha phase is complete, development enters the beta phase. Versions of the software, known as beta-versions, are released to a limited audience outside of the company. The software is released to groups of people so that further testing can ensure the product has few faults or bugs. Sometimes, beta-versions are made available to the open public to increase the feedback field to a maximal number of future users.

Testing during the beta phase, informally called beta testing, is generally constrained to black box techniques although a core of test engineers are likely to continue with white box testing in parallel to the beta tests. Thus the term beta test can refer to the stage of the software—closer to release than being "in alpha"—or it can refer to the particular group and process being done at that stage. So a tester might be continuing to work in white box testing while the software is "in beta" (a stage) but he or she would then not be part of "the beta test" (group/activity).

Gamma testing is a little-known informal phrase that refers derisively to the release of "buggy" (defect-ridden) products. It is not a term of art among testers, but rather an example of referential humor. Cynics have referred to all software releases as "gamma testing" since defects are found in almost all commercial, commodity and publicly available software eventually. (Some classes of embedded, and highly specialized process control software are tested far more thoroughly and subjected to other forms of rigorous software quality assurance; particularly those that control "life critical" equipment where a failure can result in injury or death; see Ivars Peterson's Fatal Defect for counter examples).

Where alpha and beta refer to stages of the software before release (and also implicitly on the size of the testing community, and the constraints on the testing methods), white box, black box, and grey box refer to the ways in which the tester accesses the target.

System testing

Main article: System testing

Most software produced today is modular. System testing is a phase of software testing in which testers see if there are any communications flaws--either not passing information or passing incorrect information--between modules.

Testing that attempts to discover defects that are properties of the entire system rather than of its individual components.

Regression testing

Main article: Regression testing

A regression test re-runs previous tests against the changed software to ensure that the changes made in the current software do not affect the functionality of the existing software. Regression testing can be performed either by hand or by software that automates the process. Regression testing can be performed at unit, module, system or project level.

Test cases, suites, scripts and scenarios

Black box testers usually write test cases for the majority of their testing activities. A test case is usually a single step, and its expected result, along with various additional pieces of information. It can occasionally be a series of steps but with one expected result or expected outcome. The optional fields are a test case ID, test step or order of execution number, related requirement(s), depth, test category, author, and check boxes for whether the test is automatable and has been automated. Larger test cases may also contain prerequisite states or steps, and descriptions. A test case should also contain a place for the actual result. These steps can be stored in a word processor document, spreadsheet, database or other common repository. In a database system, you may also be able to see past test results and who generated the results and the system configuration used to generate those results. These past results would usually be stored in a separate table.

The most common term for a collection of test cases is a test suite. The test suite often also contains more detailed instructions or goals for each collection of test cases. It definitely contains a section where the tester identifies the system configuration used during testing. A group of test cases may also contain prerequisite states or steps, and descriptions of the following tests.

Collections of test cases are sometimes incorrectly termed a test plan. They may also be called a test script, or even a test scenario.

Most white box testers write and use test scripts in unit, system, and regression testing. Test scripts should be written for modules with the highest risk of failure and the highest impact if the risk becomes an issue. Most companies that use automated testing will call the code that is used in their test scripts.

A scenario test is a test based on a hypothetical story used to help a person think through a complex problem or system. They can be as simple as a diagram for a testing environment or they could be a description written in prose. The ideal scenario test has five key characteristics. It is (a) a story that is (b) motivating, (c) credible, (d) complex, and (e) easy to evaluate. They are usually different from test cases in that test cases are single steps and scenarios cover a number of steps. Test suites and scenarios can be used in concert for complete system tests. See An Introduction to Scenario Testing

Scenario testing is similar to, but not the same as session-based testing, which is more closely related to exploratory testing, but the two concepts can be used in conjunction. See Adventures in Session-Based Testing and Session-Based Test Management.

A sample testing cycle

Although testing varies between organizations, there is a cycle to testing:

  1. Requirements Analysis: Testing should begin in the requirements phase of the software development life cycle.
  2. Design Analysis: During the design phase, testers work with developers in determining what aspects of a design are testable and under what parameter those tests work.
  3. Test Planning: Test Strategy, Test Plan(s), Test Bed creation.
  4. Test Development: Test Procedures, Test Scenarios, Test Cases, Test Scripts to use in testing software.
  5. Test Execution: Testers execute the software based on the plans and tests and report any errors found to the development team.
  6. Test Reporting: Once testing is completed, testers generate metrics and make final reports on their test effort and whether or not the software tested is ready for release.
  7. Retesting the Defects

Not all errors or defects reported must be fixed by a software development team. Some may be caused by errors in configuring the test software to match the development or production environment. Some defects can be handled by a workaround in the production environment. Others might be deferred to future releases of the software, or the deficiency might be accepted by the business user. There are yet other defects that may be rejected by the development team (of course, with due reason) if they deem it inappropriate to be called a defect.

Code coverage

For main article, see Code coverage

Code coverage is inherently a white box testing activity. The target software is built with special options or libraries and/or run under a special environment such that every function that is exercised (executed) in the program(s) are mapped back to the function points in the source code. This process allows developers and quality assurance personnel to look for parts of a system that are rarely or never accessed under normal conditions (error handling and the like) and helps reassure test engineers that the most important conditions (function points) have been tested.

Test engineers can look at code coverage test results to help them devise test cases and input or configuration sets that will increase the code coverage over vital functions. Two common forms of code coverage used by testers are statement (or line) coverage, and path (or edge) coverage. Line coverage reports on the execution footprint of testing in terms of which lines of code were executed to complete the test. Edge coverage reports which branches, or code decision points were executed to complete the test. They both report a coverage metric, measured as a percentage.

Generally code coverage tools and libraries exact a performance and/or memory or other resource cost which is unacceptable to normal operations of the software. Thus they are only used in the lab. As one might expect there are classes of software that cannot be feasibly subjected to these coverage tests, though a degree of coverage mapping can be approximated through analysis rather than direct testing.

There are also some sorts of defects which are affected by such tools. In particular some race conditions or similar real time sensitive operations can be masked when run under code coverage environments; and conversely some of these defects may become easier to find as a result of the additional overhead of the testing code.

Controversy

There is considerable controversy among testing writers and consultants about what constitutes responsible software testing. Members of the "context-driven" school of testing believe that there are no "best practices" of testing, but rather that testing is a set of skills that allow the tester to select or invent testing practices to suit each unique situation. This belief directly contradicts standards such as the IEEE 829 test documentation standard, and organizations such as the Food and Drug Administration who promote them.

Some of the major controversies include:

Agile vs. Traditional

Starting around 1990, a new style of writing about testing began to challenge what had come before. The seminal work in this regard is widely considered to be Testing Computer Software, by Cem Kaner (1988, ISBN 083069563X; as of 1999 in a 3rd edition, ISBN 1850329087). Instead of assuming that testers have full access to source code and complete specifications, these writers, including Kaner and James Bach, argued that testers must learn to work under conditions of uncertainty and constant change. Meanwhile, an opposing trend toward process "maturity" also gained ground, in the form of the Capability Maturity Model. The agile testing movement (which includes but is not limited to forms of testing practiced on agile development projects) has popularity mainly in commercial circles, whereas the CMM was embraced by government and military software providers.

Exploratory vs. Scripted

Exploratory testing means simultaneous learning, test design, and test execution. Scripted testing means that learning and test design happens prior to test execution, and quite often the learning has to be done again during test execution. Exploratory testing is very common, but in most writing and training about testing it is barely mentioned and generally misunderstood. Some writers consider it a primary and essential practice. Structured exploratory testing is a compromise when the testers are familiar with the software. A vague test plan, known as a test charter, is written up, describing what functionalities need to be tested but not how, allowing the individual testers to choose the method and steps of testing.

There are two main disadvantages associated with a primarily exploratory testing approach. The first is that there is no opportunity to prevent defects, which can happen when the designing of tests in advance serves as a form of structured static testing that often reveals problems in system requirements and design. The second is that, even with test charters, demonstrating test coverage and achieving repeatability of tests using a purely exploratory testing approach is difficult. For this reason, a blended approach of scripted and exploratory testing is often used to reap the benefits of both while mitigating each approach's disadvantages.

Manual vs. Automated

Some writers believe that test automation is so expensive relative to its value that it should be used sparingly. Others, such as advocates of agile development, recommend automating 100% of all tests. A challenge with automation is that automated testing requires automated test oracles (an oracle is a mechanism or principle by which a problem in the software can be recognized). Such tools have value in load testing software (by signing on to an application with hundreds or thousands of instances simultaneously), or in checking for intermittent errors in software. The success of automated software testing depends on complete and comprehensive test planning. Software development strategies such as test-driven development are highly compatible with the idea of devoting a large part of an organization's testing resources to automated testing. Many large software organizations perform automated testing. Some have developed their own automated testing environments specifically for internal development, and not for resale.

Certification

Many certification programs exist to support the professional aspirations of software testers. These include the "CSQE" program offered by the "American Society for Quality", the "CSTE/CSQA" program offered by QAI, Quality Assurance Institute, and the "ISTQB certifications" offered by ISTQB, International Software Testing Qualification Board. No certification currently offered actually requires the applicant to demonstrate the ability to test software. No certification is based on a widely accepted body of knowledge. This has led some to declare that the testing field is not ready for certification.

Custodiet Ipsos Custodes

One principle in software testing is best summed up by the classical Latin question posed by Juvenal: Quis Custodiet Ipsos Custodes (Who watches the watchmen?), or is alternatively referred informally, as the "Heisenbug" concept. Heisenberg's uncertainty principle makes it clear that any form of observation is also an interaction, that the act of testing can also affect that which is being tested.

In practical terms the test engineer is testing software (and sometimes hardware or firmware) with other software (and hardware and firmware). The tools can have their own defects and the process can fail in ways that are not the result of defects in the target but results as artifacts of the harness.

There are metrics being developed to measure the effectiveness of testing. One method is by analyzing code coverage (this is highly controversial) - where every one can agree what areas are not at all being covered and try to improve coverage on these areas.

Finally, there is the analysis of historical find-rates. By measuring how many bugs are found and comparing them to predicted numbers (based on past experience with similar projects), certain assumptions regarding the effectiveness of testing can be made. While not an absolute measurement of quality, if a project is halfway complete and there have been no defects found, then changes may be needed to the procedures being employed by QA.

See also

Quotes

  • "An effective way to test code is to exercise it at its natural boundaries." -- Brian Kernighan
  • "Program testing can be used to show the presence of bugs, but never to show their absence!" Edsger Dijkstra
  • "Beware of bugs in the above code; I have only proved it correct, not tried it." Donald Knuth

References

  • Boris Beizer: Software Test Techniques. Second Edition, International Thomson Computer Press, 1990, ISBN 1-850-32880-3
  • Rex Black: Managing the Testing Process. Second Edition, John Wiley and Sons, 2002, ISBN 0-471-22398-0
  • Mark Fewster, Dorothy Graham: Software Test Automation. Addison Wesley, 1999, ISBN 0-201-33140-3
  • Cem Kaner, Jack Falk, Hung Quoc Nguyen: Testing Computer Software. Second Edition, John Wiley and Sons, 1993, ISBN 0-471-35846-0
  • Cem Kaner, James Bach, Bret Pettichord: Lessons Learned in Software Testing. A Context-Driven Approach. John Wiley & Sons, 2001, ISBN 0-471-08112-4
  • Glenford J. Myers: The Art of Software Testing. John Wiley and Sons, 1979, ISBN 0-471-04328-1
  • Hung Nguyen, Robert Johnson, Michael Hackett: Testing Applications on the Web (2nd Edition): Test Planning for Mobile and Internet-Based Systems ISBN 0-471-20100-6
  • Robert V. Binder: Testing Object-Oriented Systems: Objects, Patterns, and Tools. Addison-Wesley Professional, 1999, ISBN 0-201-80938-9

Further reading

  • James A. Whittaker: How to Break Web Software: Functional and Security Testing of Web Applications and Web Services, Addison-Wesley Professional, February 2, 2006. ISBN 0321369440