Reduction of bleed-through in scanned manuscript documents
#1

By:Eric Dubois and Anita Pathak
School of Information Technology and Engineering,
University of Ottawa, Ottawa, ON Canada K1N 6N5
Reduction of bleed-through in scanned
manuscript documents




Abstract
Many old manuscript documents were written on both sides of the paper, and the bleed-through from one side of the document to the other increases the difficulty in reading or deciphering the information on the page. This paper presents techniques for reducing such bleed-through distortion using techniques of digital image processing. Both sides of the document are scanned, maintaining full spatial and amplitude resolution (8 bits/sample). The bleedthrough is reduced by processing both sides of the document simultaneously. First the verso side is flipped from left to right, and then the recto and flipped verso images are registered. This registration is necessary since it is impossible to perfectly align the front and back when scanning the document, and the scanner may not be perfectly uniform. We used a six-parameter affine transformation to register the two sides, determining the parameters using an optimization method. Once the two sides have been registered, areas consisting primarily of bleed-through are identified and replaced by the background color or intensity. The method has been tested on a number of documents, including documents we generated under controlled conditions and some original manuscripts; the readability of documents with heavy bleed-through has been greatly improved by this method.
Introduction
Many documents written or printed on both sides of the page suffer from bleed-through which can significantly impair the readability of the document. Fig. 1 shows an extract of the corresponding portions of the front (recto) and rear (verso) of a typical eighteenth century manuscript document, where the bleed-through clearly makes the task of reading the document more challenging and fatiguing. There is thus great interest in removing this bleed-through using digital image processing techniques. Since the darkness of some of the bleed-through is comparable to the darkness of some of the desired writing, a simple thresholding operation will not be successful in removing the bleedthrough. However, by processing both sides of the document together, it is possible to identify regions of the image that are due to bleed-through and replace them with an estimate of the background. Techniques of this type are reported in [1, 2] for reducing show-through in scanned documents; the basic idea is presented in [1] and a restoration technique using adaptive filtering is presented in [2]. In order to adequately remove bleed-through, the recto and left-right flipped verso images must be registered; this did not receive much attention in [1, 2]. This paper presents a method to carry out this registration along with a proposed method to reduce the bleed-through. Section 2 presents the general formulation of the problem and describes the registration and bleed-through removal algorithms. Section 3 gives experimental results with a test document .

Bleed-through Removal Algorithm
2.1. Assumptions
In this paper, we assume that the original document consists of some type of paper on which ink has been applied to both sides, either through writing or printing. Ink may simply show through from one side to the other, or it may have actually “bled” through to the other side. Both sides of the document are digitized in order to apply the bleed-through removal algorithm. The sampled recto and verso images are denoted fr(x; y) and fv(x; y) respectively, where the sample points (x; y) lie on a two-dimensional rectangular sampling structure L. In this article, we assume that 8-bit gray-scale versions of the image of size pw by ph are acquired, which are normalized such that 0 _ fr(x; y) _ 1 and 0 _ fv(x; y) _ 1 with gray-level 0 corresponding to white and 1 corresponding to black. Color information may be helpful and will be addressed in future work. We assume that there exist ideal recto and verso images representing the writing applied to the front and the back of the paper, denoted fwr(x; y) and fwv(x; y) respectively; these are zero where there is no writing. Similarly, we assume that there is an ideal background fbr(x; y) and fbv(x; y) corresponding to the image of the paper without writing. The measured recto image combines the back The measured verso image is obtained in a similar fashion. However, since the two sides are scanned in separate operations, the two scanning rasters will not be aligned; they will differ by some offset, rotation and possible skew. The ideal measured verso image with perfect registration is given by fI v (x; y) = C(fbv(x; y); fwv(x; y);Rfwr(x; y)) (5) where the coordinate systems of the recto and verso are perfectly aligned. However, the actual measured verso image is fv(x; y) = ApfI v (x; y) (6) where Ap is a linear operator that models the geometric distortion between the two scanning lattices. In our work, we have assumed that Ap is an affine transformation specified by six parameters p = (p11; p12; p13; p21; p22; p23) defined by g = Apf : g(x; y) = f(p11x + p12y + p13; p21x + p22y + p23): (7)
2.2. Problem formulation
With these assumptions, the bleed-through removal problem can be stated as follows: given the sampled recto and verso images fr and fv, estimate the restored images ^ fr(x; y) = C(fbr(x; y); fwr(x; y); 0) (8) ^ fv(x; y) = C(fbv(x; y); fwv(x; y); 0): (9) The problem can be broken down into two steps: 1. Estimate fI v using fv and fr (registration). 2. Estimate ^ fr and ^ fv from fr and ^ fI v (restoration).
For more information about this topic,please follow the link:
http://site.uottawa.ca/~edubois/document...01pics.pdf
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: artificial tongue drive system scanned in my tongue, software development documents, brake system bleed, what documents, pdf word scanned document converter, admission of documents into evidence, 3gpp documents,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Application System Modeling Data Modeling through ER Model seminar class 0 1,970 31-03-2011, 12:12 PM
Last Post: seminar class
  Communication through submarine cables seminar surveyer 0 2,036 10-01-2011, 05:31 PM
Last Post: seminar surveyer

Forum Jump: