ASK HERE

seminar surveyer · 12-01-2011, 05:12 PM

Finding Bugs in Web Applications Using
Dynamic Test Generation and Explicit-State
Model Checking

Shay Artzi, Adam Kie _zun, Julian Dolby, Frank Tip, Danny Dig,
Amit Paradkar, Senior Member, IEEE, and Michael D. Ernst

Abstract
Web script crashes and malformed dynamically generated webpages are common errors, and they seriously impact the usability of Web applications. Current tools for webpage validation cannot handle the dynamically generated pages that are ubiquitous on today’s Internet. We present a dynamic test generation technique for the domain of dynamic Web applications. The technique utilizes both combined concrete and symbolic execution and explicit-state model checking. The technique generates tests automatically, runs the tests capturing logical constraints on inputs, and minimizes the conditions on the inputs to failing tests so that the resulting bug reports are small and useful in finding and fixing the underlying faults. Our tool Apollo implements the technique for the PHP programming language. Apollo generates test inputs for a Web application, monitors the application for crashes, and validates that the output conforms to the HTML specification. This paper presents Apollo’s algorithms and implementation, and an experimental evaluation that revealed 673 faults in six PHP Web applications.

INTRODUCTION

DYNAMIC test generation tools, such as DART , Cute, and EXE , generate tests by executing an application on concrete input values, and then creating additional input values by solving symbolic constraints derived from exercised control-flow paths. To date, such approaches have not been practical in the domain of Web applications, which pose special challenges due to the dynamism of the programming languages, the use of implicit input parameters, their use of persistent state, and their complex patterns of user interaction.

This paper extends dynamic test generation to the domain of web applications that dynamically create web (HTML) pages during execution, which are typically presented to the user in a browser. Apollo applies these techniques in the context of the scripting language PHP, one of the most popular languages for server-side Web programming. According to the Internet research service, Netcraft,1 PHP powered 21 million domains as of April 2007, including large, well-known websites such as Wikipedia and WordPress. In addition to dynamic content, modern Web applications may also generate significant application logic, typically in the form of JavaScript code that is executed on the client side. Our techniques are primarily focused on server-side PHP code, although we do some minimal analysis of client-side code to determine how it invokes additional server code through user-interface mechanisms such as forms.

Our goal is to find two kinds of failures in web applications: execution failures that are manifested as crashes or warnings during program execution, and HTML failures that occur when the application generates malformed HTML. Execution failures may occur, for example, when a web application calls an undefined function or reads a nonexistent file. In such cases, the HTML output contains an error message and execution of the application may be halted, depending on the severity of the failure. HTML failures occur when output is generated that is not syntactically well-formed HTML (e.g., when an opening tag is not accompanied by a matching closing tag). HTML failures are generally not as important as execution failures because Web browsers are designed to tolerate some degree of malformedness in HTML, but they are undesirable for several reasons. First and most serious is that browsers’ attempts to compensate for malformed webpages may lead to crashes and security vulnerabilities. Second, standard HTML renders faster.Third, malformed HTML is less portable across browsers and is vulnerable to breaking or looking strange when displayed by browser versions on which it is not tested. Fourth, a browser might succeed in displaying only part of a malformed webpage, while silently discarding important information. Fifth, search engines may have trouble indexing malformed pages.

Web developers widely recognize the importance of creating legal HTML. Many websites are checked using HTML validators.4 However, HTML validators can only point out problems in HTML pages, and are by themselves incapable of finding faults in applications that generate HTML pages. Checking dynamic Web applications (i.e., applications that generate pages during execution) requires checking that the application creates a valid HTML page on every possible execution path. In practice, even professionally developed and thoroughly tested applications often contain multiple faults (see Section 6).

There are two general approaches to finding faults in web applications: static analysis and dynamic analysis (testing). In the context of Web applications, static approaches have limited potential because 1) Web applications are often written in dynamic scripting languages that enable on-the-fly creation of code, and 2) control in a Web application typically flows via the generated HTML text (e.g., buttons and menus that require user interaction to execute), rather than solely via the analyzed code. Both of these issues pose significant challenges to approaches based on static analysis. Testing of dynamic Web applications is also challenging because the input space is large and applications typically require multiple user interactions. The state of the practice in validation for Web-standard compliance of real Web applications involves the use of programs such as HTML Kit5 that validate each generated page, but require manual generation of inputs that lead to displaying different pages. We know of no automated tool that automatically generates inputs that exercise different control-flow paths in a Web application, and validates the dynamically generated HTML pages that the Web application generates when those paths are executed.

This paper presents an automated technique for finding failures in HTML-generating web applications. Our technique is based on dynamic test generation, using combined concrete and symbolic (concolic) execution, and constraint solving . We created a tool, Apollo, that implements our technique in the context of the publicly available PHP interpreter. Apollo first executes the Web application under test with an empty input. During each execution, Apollo monitors the program to record path constraints that reflect how input values affect control flow. Additionally, for each execution, Apollo determines whether execution failures or HTML failures occur (forHTMLfailures, anHTMLvalidator is used as an oracle). Apollo automatically and iteratively creates new inputs using the recorded path constraints to create inputs that exercise different control flow. Most previous approaches for concolic execution only detect “standard errors” such as crashes and assertion failures. Our approach detects such standard errors as well, but also uses an oracle to detect specification violations in the application’s output. Another novelty in our work is the inference of input parameters, which are not manifested in the source code, but which are interactively supplied by the user (e.g., by clicking buttons in generated HTML pages). The desired behavior of a PHP application is usually achieved by a series of interactions between the user and the server (e.g., a minimum of five user actions are needed from opening the main Amazon page to buying a book). We handle this problem by enhancing the combined concrete and symbolic execution technique with explicit-state model checking based on automatic dynamic simulation of user interactions. In order to simulate user interaction, Apollo stores the state of the environment (database, sessions, and cookies) after each execution, analyzes the output of the execution to detect the possible user options that are available, and restores the environment state before executing a new script based on a detected user option.

Techniques based on combined concrete and symbolic Executions may create multiple inputs that expose the same fault. In contrast to previous techniques, to avoid overwhelming the developer, our technique automatically identifies the minimal part of the input that is responsible for triggering the failure. This step is similar in spirit to Delta Debugging. However, since Delta Debugging is a general, black box input minimization technique, it is oblivious to the properties of inputs. In contrast, our technique is white box: It uses the information that certain inputs induce partially overlapping control-flow paths. By intersecting these paths, our technique significantly minimizes the constraints on the inputs.

Read more:
http://docs.googleviewer?a=v&q=cache:NN9...JEEfnCa-_g

seminar class · 03-05-2011, 03:13 PM

Abstract—
Web script crashes and malformed dynamically-generated web pages
are common errors, and they seriously impact the usability of web applications.
Current tools for web-page validation cannot handle the dynamically
generated pages that are ubiquitous on today’s Internet. We present a
dynamic test generation technique for the domain of dynamic web applications.
The technique utilizes both combined concrete and symbolic
execution and explicit-state model checking. The technique generates tests
automatically, runs the tests capturing logical constraints on inputs, and
minimizes the conditions on the inputs to failing tests, so that the resulting
bug reports are small and useful in finding and fixing the underlying faults.
Our tool Apollo implements the technique for the PHP programming
language. Apollo generates test inputs for a web application, monitors the
application for crashes, and validates that the output conforms to the HTML
specification. This paper presents Apollo’s algorithms and implementation,
and an experimental evaluation that revealed 302 faults in 6 PHP web
applications.
General Terms Reliability, Verification
Index Terms—Software Testing,Web Applications, Dynamic Analysis, PHP
1 INTRODUCTION
Dynamic test generation tools, such as DART [18], Cute [36],
and EXE [7], generate tests by executing an application on
concrete input values, and then creating additional input values
by solving symbolic constraints derived from exercised control
flow paths. To date, such approaches have not been practical in
the domain of web applications, which pose special challenges
due to the dynamism of the programming languages, the use
of implicit input parameters, their use of persistent state, and
their complex patterns of user interaction.
This paper extends dynamic test generation to the domain of
web applications that dynamically create web (HTML) pages
during execution, which are typically presented to the user in a
browser. Apollo applies these techniques in the context of the
scripting language PHP, one of the most popular languages for
web programming. According to the internet research service,
Netcraft1, PHP powered 21 million domains as of April 2007,
1. See http://news.netcraft.
including large, well-known websites such as Wikipedia and
WordPress.
Our goal is to find two kinds of failures in web applications:
execution failures that are manifested as crashes or warnings
during program execution, and HTML failures that occur when
the application generates malformed HTML. As an example,
execution failures may occur when a web application calls an
undefined function or reads a nonexistent file. In such cases,
the HTML output contains an error message and execution
of the application may be halted, depending on the severity
of the failure. HTML failures occur when output is generated
that is not syntactically well-formed HTML (e.g., when an
opening tag is not accompanied by a matching closing tag).
Although web browsers are designed to tolerate some degree
of malformedness in HTML, several kinds of problems may
occur. First and most serious is that browsers’ attempts to
compensate for malformed web pages may lead to crashes
and security vulnerabilities2. Second, standard HTML renders
faster3. Third, malformed HTML is less portable across
browsers and is vulnerable to breaking or looking strange
when displayed by browser versions on which it is not tested.
Fourth, a browser might succeed in displaying only part of
a malformed webpage, while silently discarding important
information. Fifth, search engines may have trouble indexing
malformed pages [45].
Web developers widely recognize the importance of creating
legal HTML. Many websites are checked using HTML
validators4. However, HTML validators can only point out
problems in HTML pages, and are by themselves incapable
of finding faults in applications that generate HTML pages.
Checking dynamic web applications (i.e., applications that
generate pages during execution) requires checking that the
2. See bug reports 269095, 320459, and 328937 at https://bugzilla.mozilla.
org/show bug.cgi?
3. See http://weblogs.mozillazinehyatt/archives/2003 03.html#
002904. According to a Mozilla developer, one reason why malformed
HTML renders slower is that “improper tag nesting [. . . ] triggers residual
style handling to try to produce the expected visual result, which can be very
expensive” [33].
4. http://validator.w3.org, http://htmlhelptools/validator
application creates a valid HTML page on every possible
execution path. In practice, even professionally developed and
thoroughly tested applications often contain multiple faults
(see Section 6).
There are two general approaches to finding faults in web
applications: static analysis and dynamic analysis (testing). In
the context of web applications, static approaches have limited
potential because (i) web applications are often written in
dynamic scripting languages that enable on-the-fly creation
of code, and (ii) control in a web application typically flows
via the generated HTML text (e.g., buttons and menus that
require user interaction to execute), rather than solely via
the analyzed code. Both of these issues pose significant
challenges to approaches based on static analysis. Testing of
dynamic web applications is also challenging, because the
input space is large and applications typically require multiple
user interactions. The state-of-the-practice in validation for
web-standard compliance of real web applications involves
the use of programs such as HTML Kit5 that validate each
generated page, but require manual generation of inputs that
lead to displaying different pages. We know of no automated
tool for the validation of web applications that dynamically
generate HTML pages.
This paper presents an automated technique for finding
failures in HTML-generating web applications. Our technique
is based on dynamic test generation, using combined concrete
and symbolic (concolic) execution and constraint solving [7],
[18], [36]. We created a tool, Apollo, that implements our technique
in the context of the publicly available PHP interpreter.
Apollo first executes the web application under test with
an empty input. During each execution, Apollo monitors the
program to record the dependence of control-flow on input.
Additionally, for each execution Apollo determines whether
execution failures or HTML failures occur (for HTML failures,
an HTML validator is used as an oracle). Apollo automatically
and iteratively creates new inputs using the recorded
dependence to create inputs that exercise different control
flow. Most previous approaches for concolic execution only
detect “standard errors” such as crashes and assertion failures.
Our approach also detects such standard errors, but is to our
knowledge the first to use an oracle to detect specification
violations in the application’s output.
Another novelty in our work is the inference of input
parameters, which are not manifested in the source code, but
which are interactively supplied by the user (e.g., by clicking
buttons in generated HTML pages). The desired behavior of a
PHP application is usually achieved by a series of interactions
between the user and the server (e.g., a minimum of five user
actions are needed from opening the main Amazon page to
buying a book). We handle this problem by enhancing the
combined concrete and symbolic execution technique with
explicit-state model checking based on automatic dynamic
simulation of user interactions. In order to simulate user interaction,
Apollo stores the state of the environment (database,
sessions, cookies) after each execution, analyzes the output
of the execution to detect the possible user options that are
5. http://htmlkit.com
available, and restores the environment state before executing
a new script based on a detected user option.
Techniques based on combined concrete and symbolic executions
[7], [18], [36] may create multiple inputs that expose
the same fault. In contrast to previous techniques, to avoid
overwhelming the developer, our technique automatically identifies
the minimal part of the input that is responsible for
triggering the failure. This step is similar in spirit to Delta
Debugging [8]. However, since Delta Debugging is a general,
black-box input minimization technique, it is oblivious to the
properties of inputs. In contrast, our technique is white-box:
it uses the information that certain inputs induce partially
overlapping control flow paths. By intersecting these paths,
our technique minimizes the constraints on the inputs within
fewer program runs.
The contributions of this paper are the following:
• We adapt the established technique of dynamic test
generation, based on combined concrete and symbolic
execution [7], [18], [36], to the domain of PHP web
applications. This involved the following innovations: (i)
using an HTML verifier as an oracle, (ii) inferring input
parameters that are not manifested in the source code,
(iii) dealing with datatypes and operations specific to the
PHP language, (iv) tracking the use of persistent state and
how input flows through it, and (v) simulating user input
for interactive applications.
• We created a tool, Apollo, that implements the technique
for PHP.
• We evaluated our tool by applying it to 6 real web applications
and comparing the results with random testing.
We show that dynamic test generation can be effective
when adapted to the domain of web applications written
in PHP: Apollo identified 302 faults while achieving line
coverage of 50.2%.
• We present a detailed classification of the faults found by
Apollo.
The remainder of this paper is organized as follows. Section
2 presents an overview of PHP, introduces our running
example, and discusses classes of failures in PHP web applications.
Section 3 presents a simplified version of the algorithm
and illustrates it on an example program. Section 4 presents
the complete algorithm handling stateful execution with the
simulation of interactive user inputs, and illustrates it on an
example program. Section 5 discusses our Apollo implementation.
Section 6 presents our experimental evaluation of Apollo
on open-source web applications. Section 7 gives an overview
of related work, and Section 8 presents conclusions.
2 CONTEXT: PHP WEB APPLICATIONS
2.1 The PHP Scripting Language
This section briefly reviews the PHP scripting language, focusing
on those aspects of PHP that differ from mainstream
languages. Readers familiar with PHP may skip to the discussion
of the running example in Section

DOWNLOAD FULL REPORT
http://citeseerx.ist.psu.edu/viewdoc/dow...1&type=pdf

seminar paper · 14-02-2012, 12:55 PM

to get information about the topic web application full report ,ppt and related topic refer the page link bellow

http://studentbank.in/report-web-application-project

http://studentbank.in/report-ajax-a-new-...plications

http://studentbank.in/report-ajax-a-new-...ons?page=2

http://studentbank.in/report-finding-bug...-state-mod

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	Dynamic Search Algorithm in Unstructured Peer-to-Peer Networks	seminar surveyer	3	2,834	14-07-2015, 02:24 PM Last Post: seminar report asees
	web spoofing full report	computer science technology	9	11,059	26-03-2014, 06:29 AM Last Post: Guest
	Web Services Architecture	computer topic	0	7,587	25-03-2014, 10:20 PM Last Post: computer topic
	Dynamic Synchronous Transfer Mode	computer science crazy	3	4,583	19-02-2014, 03:29 AM Last Post: Guest
	Opera (web browser)	computer science crazy	3	4,380	08-07-2013, 12:45 PM Last Post: computer topic
	Dynamic programming language	seminar projects crazy	2	3,199	03-01-2013, 12:31 PM Last Post: seminar details
	Relation-Based Search Engine in Semantic Web	project topics	1	2,169	21-12-2012, 11:00 AM Last Post: seminar details
	Hydra: A Block-Mapped Parallel Flash Memory Solid-State Disk Architecture	summer project pal	3	2,938	01-12-2012, 12:40 PM Last Post: seminar details
	A survey of usage of Data Mining and Data Warehousing in Academic Institution and Lib	seminar class	1	2,136	29-11-2012, 12:56 PM Last Post: seminar details
	Distributed Cache Updating for the Dynamic Source Routing Protocol	seminar class	3	2,286	17-11-2012, 01:26 PM Last Post: seminar details

Important Note..!

ASK HERE