ASK HERE

seminar surveyer · 12-01-2011, 05:12 PM

Finding Bugs in Web Applications Using
Dynamic Test Generation and Explicit-State
Model Checking

Shay Artzi, Adam Kie _zun, Julian Dolby, Frank Tip, Danny Dig,
Amit Paradkar, Senior Member, IEEE, and Michael D. Ernst

Abstract
Web script crashes and malformed dynamically generated webpages are common errors, and they seriously impact the usability of Web applications. Current tools for webpage validation cannot handle the dynamically generated pages that are ubiquitous on today’s Internet. We present a dynamic test generation technique for the domain of dynamic Web applications. The technique utilizes both combined concrete and symbolic execution and explicit-state model checking. The technique generates tests automatically, runs the tests capturing logical constraints on inputs, and minimizes the conditions on the inputs to failing tests so that the resulting bug reports are small and useful in finding and fixing the underlying faults. Our tool Apollo implements the technique for the PHP programming language. Apollo generates test inputs for a Web application, monitors the application for crashes, and validates that the output conforms to the HTML specification. This paper presents Apollo’s algorithms and implementation, and an experimental evaluation that revealed 673 faults in six PHP Web applications.

INTRODUCTION

DYNAMIC test generation tools, such as DART , Cute, and EXE , generate tests by executing an application on concrete input values, and then creating additional input values by solving symbolic constraints derived from exercised control-flow paths. To date, such approaches have not been practical in the domain of Web applications, which pose special challenges due to the dynamism of the programming languages, the use of implicit input parameters, their use of persistent state, and their complex patterns of user interaction.

This paper extends dynamic test generation to the domain of web applications that dynamically create web (HTML) pages during execution, which are typically presented to the user in a browser. Apollo applies these techniques in the context of the scripting language PHP, one of the most popular languages for server-side Web programming. According to the Internet research service, Netcraft,1 PHP powered 21 million domains as of April 2007, including large, well-known websites such as Wikipedia and WordPress. In addition to dynamic content, modern Web applications may also generate significant application logic, typically in the form of JavaScript code that is executed on the client side. Our techniques are primarily focused on server-side PHP code, although we do some minimal analysis of client-side code to determine how it invokes additional server code through user-interface mechanisms such as forms.

Our goal is to find two kinds of failures in web applications: execution failures that are manifested as crashes or warnings during program execution, and HTML failures that occur when the application generates malformed HTML. Execution failures may occur, for example, when a web application calls an undefined function or reads a nonexistent file. In such cases, the HTML output contains an error message and execution of the application may be halted, depending on the severity of the failure. HTML failures occur when output is generated that is not syntactically well-formed HTML (e.g., when an opening tag is not accompanied by a matching closing tag). HTML failures are generally not as important as execution failures because Web browsers are designed to tolerate some degree of malformedness in HTML, but they are undesirable for several reasons. First and most serious is that browsers’ attempts to compensate for malformed webpages may lead to crashes and security vulnerabilities. Second, standard HTML renders faster.Third, malformed HTML is less portable across browsers and is vulnerable to breaking or looking strange when displayed by browser versions on which it is not tested. Fourth, a browser might succeed in displaying only part of a malformed webpage, while silently discarding important information. Fifth, search engines may have trouble indexing malformed pages.

Web developers widely recognize the importance of creating legal HTML. Many websites are checked using HTML validators.4 However, HTML validators can only point out problems in HTML pages, and are by themselves incapable of finding faults in applications that generate HTML pages. Checking dynamic Web applications (i.e., applications that generate pages during execution) requires checking that the application creates a valid HTML page on every possible execution path. In practice, even professionally developed and thoroughly tested applications often contain multiple faults (see Section 6).

There are two general approaches to finding faults in web applications: static analysis and dynamic analysis (testing). In the context of Web applications, static approaches have limited potential because 1) Web applications are often written in dynamic scripting languages that enable on-the-fly creation of code, and 2) control in a Web application typically flows via the generated HTML text (e.g., buttons and menus that require user interaction to execute), rather than solely via the analyzed code. Both of these issues pose significant challenges to approaches based on static analysis. Testing of dynamic Web applications is also challenging because the input space is large and applications typically require multiple user interactions. The state of the practice in validation for Web-standard compliance of real Web applications involves the use of programs such as HTML Kit5 that validate each generated page, but require manual generation of inputs that lead to displaying different pages. We know of no automated tool that automatically generates inputs that exercise different control-flow paths in a Web application, and validates the dynamically generated HTML pages that the Web application generates when those paths are executed.

This paper presents an automated technique for finding failures in HTML-generating web applications. Our technique is based on dynamic test generation, using combined concrete and symbolic (concolic) execution, and constraint solving . We created a tool, Apollo, that implements our technique in the context of the publicly available PHP interpreter. Apollo first executes the Web application under test with an empty input. During each execution, Apollo monitors the program to record path constraints that reflect how input values affect control flow. Additionally, for each execution, Apollo determines whether execution failures or HTML failures occur (forHTMLfailures, anHTMLvalidator is used as an oracle). Apollo automatically and iteratively creates new inputs using the recorded path constraints to create inputs that exercise different control flow. Most previous approaches for concolic execution only detect “standard errors” such as crashes and assertion failures. Our approach detects such standard errors as well, but also uses an oracle to detect specification violations in the application’s output. Another novelty in our work is the inference of input parameters, which are not manifested in the source code, but which are interactively supplied by the user (e.g., by clicking buttons in generated HTML pages). The desired behavior of a PHP application is usually achieved by a series of interactions between the user and the server (e.g., a minimum of five user actions are needed from opening the main Amazon page to buying a book). We handle this problem by enhancing the combined concrete and symbolic execution technique with explicit-state model checking based on automatic dynamic simulation of user interactions. In order to simulate user interaction, Apollo stores the state of the environment (database, sessions, and cookies) after each execution, analyzes the output of the execution to detect the possible user options that are available, and restores the environment state before executing a new script based on a detected user option.

Techniques based on combined concrete and symbolic Executions may create multiple inputs that expose the same fault. In contrast to previous techniques, to avoid overwhelming the developer, our technique automatically identifies the minimal part of the input that is responsible for triggering the failure. This step is similar in spirit to Delta Debugging. However, since Delta Debugging is a general, black box input minimization technique, it is oblivious to the properties of inputs. In contrast, our technique is white box: It uses the information that certain inputs induce partially overlapping control-flow paths. By intersecting these paths, our technique significantly minimizes the constraints on the inputs.

Read more:
http://docs.googleviewer?a=v&q=cache:NN9...JEEfnCa-_g

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	Dynamic Search Algorithm in Unstructured Peer-to-Peer Networks	seminar surveyer	3	2,854	14-07-2015, 02:24 PM Last Post: seminar report asees
	web spoofing full report	computer science technology	9	11,077	26-03-2014, 06:29 AM Last Post: Guest
	Web Services Architecture	computer topic	0	7,595	25-03-2014, 10:20 PM Last Post: computer topic
	Dynamic Synchronous Transfer Mode	computer science crazy	3	4,588	19-02-2014, 03:29 AM Last Post: Guest
	Opera (web browser)	computer science crazy	3	4,384	08-07-2013, 12:45 PM Last Post: computer topic
	Dynamic programming language	seminar projects crazy	2	3,210	03-01-2013, 12:31 PM Last Post: seminar details
	Relation-Based Search Engine in Semantic Web	project topics	1	2,178	21-12-2012, 11:00 AM Last Post: seminar details
	Hydra: A Block-Mapped Parallel Flash Memory Solid-State Disk Architecture	summer project pal	3	2,943	01-12-2012, 12:40 PM Last Post: seminar details
	A survey of usage of Data Mining and Data Warehousing in Academic Institution and Lib	seminar class	1	2,152	29-11-2012, 12:56 PM Last Post: seminar details
	Distributed Cache Updating for the Dynamic Source Routing Protocol	seminar class	3	2,286	17-11-2012, 01:26 PM Last Post: seminar details

Important Note..!

ASK HERE