ASK HERE

Wifi · 30-10-2010, 12:36 PM

AUGMENTED REALITY LEARNING
SEMINAR REPORT
Submitted by
SREERAM SASIKUMAR
Seventh Semester
B. Tech
Applied Electronics and Instrumentation
College Of Engineering, Trivandrum
2007-11 batch

[attachment=7066]

Hi friends, if you like this report, say hi to the author sreeram.sasikumar007[at]gmail.com

ABSTRACT
AR is a field of computer research which deals with the combination of real world and computer generated data. At present, most AR research is concerned with the use of live video imagery which is digitally processed and “augmented” by the addition of computer generated graphics. Augmented reality adds graphics, sounds, haptics to natural world as it exists. This technology will have countless applications like gaming, teleconferencing, education and also military training. With AR displays which will eventually look like a normal pair of glasses , informative graphics will appear in the user’s field of view and audio will coincide with whatever the user sees. These enhancements will be refreshed continually to reflect the movements of the users head.

TABLE OF CONTENTS
1. INTRODUCTION 1

2. HOW IS AUGMENTING DONE 3
2.1 What makes a picture 3D 3
2.2 How to make it look real 4
2.3 How GPS receivers work 8
3. ALGORITHM TO PROVIDE BEST THIRD PERSON 10
PERSPECTIVE
3.1 Description of the system 10
4. APPLICATIONS IN EDUCATION 11
4.1 Magic book 11
4.2 Augmented classroom 13
4.3 Magic mirror 13
4.4 AR supported English learning system 16
5. LATEST DEVELOPMENTS IN AR 19
5.1 Sixth sense 19
5.2 Working of sixth sense technology 20
5.3 Applications 23
6. CONCLUSION 25
7. BIBLIOGRAPHY 26

CHAPTER 1
INTRODUCTION
The use of computer and communication technologies and the increasing of the education and learning technologies has lead to Augmented Reality (AR) technologies to appear on the service. Augmented reality (AR) is a term for a live direct or indirect view of a physical real-world environment whose elements are augmented by virtual computer-generated imagery. It is related to a more general concept called mediated reality in which a view of reality is modified by a computer. As a result, the technology functions by enhancing one’s current perception of reality.
Additionally Paul Milgram and Fumio Kishino defined Milgram's Reality-Virtuality Continuum in 1994 . They describe a continuum that spans from the real environment to a pure virtual environment. In between there are Augmented Reality (closer to the real environment) and Augmented Virtuality (is closer to the virtual environment).

Fig 1 Video see-through HMD conceptual diagram
The main hardware components for augmented reality are: display, tracking, input devices, and computer. Combination of powerful CPU, camera, accelerometers, GPS and solid state compass are often present in modern Smartphone, which make them prospective platforms. There are three major display techniques for Augmented Reality:
 Head Mounted Displays
 Handheld Displays
 Spatial Displays
 Head Mounted Displays
A Head Mounted Display (HMD) places images of both the physical world and registered virtual graphical objects over the user's view of the world. The HMD's are either optical see-through or video see-through in nature.
 Handheld Displays
Handheld Augment Reality employs a small computing device with a display that fits in a user's hand. All handheld AR solutions to date have employed video see-through techniques to overlay the graphical information to the physical world.
 Spatial Displays
Instead of the user wearing or carrying the display such as with head mounted displays or handheld devices; Spatial Augmented Reality (SAR) makes use of digital projectors to display graphical information onto physical objects.
Modern mobile augmented reality systems use one or more of the following tracking technologies: digital cameras and/or other optical sensors, RFID, wireless sensors etc. Most important is the tracking of the pose and position of the user's head for the augmentation of the user's view. For users with disabilities of varying kinds, AR has real potential to help people with a variety of disabilities. Only some of the current and future AR applications make use of a Smartphone as a mobile computing platform.

CHAPTER 2
HOW IS AUGMENTING DONE.
• Camera provides live view of surroundings
• GPS tells device where you are
• Compass tells device which direction you are facing
• Layar and Yelp softwares gather information about the surrounding area
• Total immersion software recognizes the image and displays it as a 3D figure
2.1 What Makes a Picture 3-D?
A picture that has or appears to have height, width and depth is three-dimensional (or 3-D). A picture that has height and width but no depth is two-dimensional (or 2-D). Some pictures are 2-D on purpose. Graphics are used: 2-D graphics are good at communicating something simple, very quickly. 3-D graphics tell a more complicated story, but have to carry much more information to do it. If making a 2-D picture into a 3-D image requires adding a lot of information, then the step from a 3-D still picture to images that move realistically requires far more. Part of the problem is that we’ve gotten spoiled. We expect a high degree of realism in everything we see.
The games, or movies made with computer-generated images, have to go through three major steps to create and present a realistic 3-D scene:
1. Creating a virtual 3-D world.
2. Determining what part of the world will be shown on the screen.
3. Determining how every pixel on the screen will look so that the whole image appears as realistic as possible .
A virtual 3-D world isn't the same thing as one picture of that world. This is true of our real world also. Take a very small part of the real world -- your hand and a desktop under it. Your hand has qualities that determine how it can move and how it can look. The finger joints bend toward the palm, not away from it. If you slap your hand on the desktop, the desktop doesn't splash -- it's always solid and it's always hard. Your hand can't go through the desktop. You can't prove that these things are true by looking at any single picture. But no matter how many pictures you take, you will always see that the finger joints bend only toward the palm, and the desktop is always solid, not liquid, and hard, not soft. That's because in the real world, this is the way hands are and the way they will always behave. The objects in a virtual 3-D world, though, don’t exist in nature, like your hand. They are totally synthetic. The only properties they have are given to them by software. Programmers must use special tools and define a virtual 3-D world with great care so that everything in it always behaves in a certain way.
2.2 How to Make It Look Real
No matter how large or rich the virtual 3-D world, a computer can depict that world only by putting pixels on the 2-D screen. The most important things are shapes, surface textures, lighting, perspective, depth of field and anti-aliasing.
Shapes
When we look out our windows, we see scenes made up of all sorts of shapes, with straight lines and curves in many sizes and combinations. Similarly, when we look at a 3-D graphical image on our computer monitor, we see images made up of a variety of shapes, although most of them are made up of straight lines. We see squares, rectangles, parallelograms, circles and rhomboids, but most of all we see triangles. However, in order to build images that look as though they have the smooth curves often found in nature, some of the shapes must be very small, and a complex image say, a human body might require thousands of these shapes to be put together into a structure called a wireframe. At this stage the structure might be recognizable as the symbol of whatever it will eventually picture, but the next major step is important: The wireframe has to be given a surface.
Surface Textures
When we meet a surface in the real world, we can get information about it in two key ways. We can look at it, sometimes from several angles, and we can touch it to see whether it's hard or soft. In a 3-D graphic image, however, we can only look at the surface to get all the information possible. All that information breaks down into three areas:
• Colour: What colour is it? Is it the same colour all over?
• Texture: Does it appear to be smooth, or does it have lines, bumps, craters or some other irregularity on the surface?
• Reflectance: How much light does it reflect? Are reflections of other items in the surface sharp or fuzzy?
One way to make an image look "real" is to have a wide variety of these three features across the different parts of the image. Look around you now: Your computer keyboard has a different colour/texture/reflectance than your desktop, which has a different colour texture reflectance than your arm. For realistic colour, it’s important for the computer to be able to choose from millions of different colours for the pixels making up an image. Variety in texture comes both from mathematical models for surfaces ranging from frog skin to Jell-o gelatin to stored “texture maps” that are applied to surfaces. We also associate qualities that we can't see -- soft, hard, warm, cold -- with particular combinations of colour, texture and reflectance. If one of them is wrong, the illusion of reality is shattered.
Lighting and Perspective
One technique, called ray-tracing, plots the path that imaginary light rays take as they leave the bulb, bounce off of mirrors, walls and other reflecting surfaces, and finally land on items at different intensities from varying angles. It's complicated enough when you think about the rays from a single light bulb, but most rooms have multiple light sources several lamps, ceiling fixtures, windows, candles and so on. Lighting plays a key role in two effects that give the appearance of weight and solidity to objects: shading and shadows. The first, shading, takes place when the light shining on an object is stronger on one side than on the other. This shading is what makes a ball look round, high cheekbones seem striking and the folds in a blanket appear deep and soft. These differences in light intensity work with shape to reinforce the illusion that an object has depth as well as height and width. The illusion of weight comes from the second effect shadow . Solid bodies cast shadows when a light shines on them. You can see this when you observe the shadow that a sundial or a tree casts onto a sidewalk. And because we’re used to seeing real objects and people cast shadows, seeing the shadows in a 3-D image reinforces the illusion that we’re looking through a window into the real world, rather than at a screen of mathematically generated shapes. When all of the objects in a scene look like they will eventually converge at a single point in the distance, that's perspective. There are variations, but most 3-D graphics use the "single point perspective" just described. The most common technique for calculating these factors is the Z-Buffer. The Z-buffer gets its name from the common label for the axis, or imaginary line, going from the screen back through the scene to the horizon. (There are two other common axes to consider: the x-axis, which measures the scene from side to side, and the y-axis, which measures the scene from top to bottom.)The Z-buffer assigns to each polygon a number based on how close an object containing the polygon is to the front of the scene. Generally, lower numbers are assigned to items closer to the screen, and higher numbers are assigned to items closer to the horizon. In the real world, our eyes can’t see objects behind others, so we don’t have the problem of figuring out what we should be seeing. But the computer faces this problem constantly and solves it in a straightforward way. As each object is created, its Z-value is compared to that of other objects that occupy the same x- and y-values. The object with the lowest z-value is fully rendered, while objects with higher z-values aren’t rendered where they intersect. The result ensures that we don’t see background items appearing through the middle of characters in the foreground. Since the z-buffer is employed before objects are fully rendered, pieces of the scene that are hidden behind characters or objects don’t have to be rendered at all. This speeds up graphics performance.
Depth of Field
Another optical effect successfully used to create 3-D is depth of field. Computer animators use this depth of field effect for two purposes. The first is to reinforce the illusion of depth in the scene you're watching. It's certainly possible for the computer to make sure that every item in a scene, no matter how near or far it's supposed to be, is perfectly in focus. Since we're used to seeing the depth of field effect, though, having items in focus regardless of distance would seem foreign and would disturb the illusion of watching a scene in the real world.
Antialiasing
A technique that also relies on fooling the eye is anti-aliasing. Digital graphics systems are very good at creating lines that go straight up and down the screen, or straight across. But when curves or diagonal lines show up (and they show up pretty often in the real world), the computer might produce lines that resemble stair steps instead of smooth flows. So to fool your eye into seeing a smooth curve or line, the computer can add graduated shades of the colour in the line to the pixels surrounding the line. These "grayed-out" pixels will fool your eye into thinking that the jagged stair steps are gone. This process of adding additional coloured pixels to fool the eye is called anti-aliasing,
Making 3-D Graphics Move
When you go to see a movie at the local theatre, a sequence of images called frames runs in front of your eyes at a rate of 24 frames per second. Since your retina will retain an image for a bit longer than 1/24th of a second, most people's eyes will blend the frames into a single, continuous image of movement and action. If you think of this from the other direction, it means that each frame of a motion picture is a photograph taken at an exposure of 1/24 of a second. If you look at a single frame from a movie about racing, you see that some of the cars are "blurred" because they moved during the time that the camera shutter was open. This blurring of things that are moving fast is something that we're used to seeing, and it's part of what makes an image look real to us when we see it on a screen. However, since digital 3-D images are not photographs at all, no blurring occurs when an object moves during a frame. To make images look more realistic, blurring has to be explicitly added by programmers. Some designers feel that "overcoming" this lack of natural blurring requires more than 30 frames per second, and have pushed their games to display 60 frames per second. While this allows each individual image to be rendered in great detail, and movements to be shown in smaller increments, it dramatically increases the number of frames that must be rendered for a given sequence of action. As an example, think of a chase that lasts six and one-half minutes. A motion picture would require 24 (frames per second) x 60 (seconds) x 6.5 (minutes) or 9,360 frames for the chase. A digital 3-D image at 60 frames per second would require 60 x 60 x 6.5, or 23,400 frames for the same length of time.
Creative Blurring
The blurring that programmers add to boost realism in a moving image is called "motion blur" or "spatial anti-aliasing.". Copies of the moving object are left behind in its wake, with the copies growing ever less distinct and intense as the object moves farther away. The length of the trail of the object, how quickly the copies fade away and other details will vary depending on exactly how fast the object is supposed to be moving, how close to the viewer it is, and the extent to which it is the focus of attention. As you can see, there are a lot of decisions to be made and many details to be programmed in making an object appear to move realistically. There are other parts of an image where the precise rendering of a computer must be sacrificed for the sake of realism. This applies both to still and moving images.
2.3 How GPS Receivers Work
The Global Positioning System (GPS) is actually a constellation of 27 Earth-orbiting satellites (24 in operation and three extras in case one fails). The U.S. military developed and implemented this satellite network as a military navigation system, but soon opened it up to everybody else. Each of these 3,000- to 4,000-pound solar-powered satellites circles the globe at about 12,000 miles (19,300 km), making two complete rotations every day. The orbits are arranged so that at any time, anywhere on Earth, there are at least four satellites "visible" in the sky. A GPS receiver's job is to locate four or more of these satellites, figure out the distanc¬e to each, and use this information to deduce its own location. This operation is based on a simple mathematical principle called trilateration. Trilateration in three-dimensional space can be a little tricky.
GPS Calculations
At a particular time , the satellite begins transmitting a long, digital pattern called a pseudo-random code. The receiver begins running the same digital pattern also exactly at midnight. When the satellite's signal reaches the receiver, its transmission of the pattern will lag a bit behind the receiver's playing of the pattern. The length of the delay is equal to the signal's travel time. The receiver multiplies this time by the speed of light to determine how far the signal travelled. Assuming the signal travelled in a straight line, this is the distance from receiver to satellite. In order to make this measurement, the receiver and satellite both need clocks that can be synchronized down to the nanosecond. To make a satellite positioning system using only synchronized clocks, you would need to have atomic clocks not only on all the satellites, but also in the receiver itself. But atomic clocks cost somewhere between $50,000 and $100,000, which makes them a just a bit too expensive for everyday consumer use. . Every satellite contains an expensive atomic clock, but the receiver itself uses an ordinary quartz clock, which it constantly resets. In a nutshell, the receiver looks at incoming signals from four or more satellites and gauges its own inaccuracy.. The correct time value will cause all of the signals that the receiver is receiving to align at a single point in space. That time value is the time value held by the atomic clocks in all of the satellites. So the receiver sets its clock to that time value, and it then has the same time value that all the atomic clocks in all of the satellites have. The GPS receiver gets atomic clock accuracy "for free." When you measure the distance to four located satellites, you can draw four spheres that all intersect at one point. Three spheres will intersect even if your numbers are way off, but four spheres will not intersect at one point if you've measured incorrectly. Since the receiver makes all its distance measurements using its own built-in clock, the distances will all be proportionally incorrect. The receiver can easily calculate the necessary adjustment that will cause the four spheres to intersect at one point. Based on this, it resets its clock to be in sync with the satellite's atomic clock. The receiver does this constantly whenever it's on, which means it is nearly as accurate as the expensive atomic clocks in the satellites. In order for the distance information to be of any use, the receiver also has to know where the satellites actually are. This isn't particularly difficult because the satellites travel in very high and predictable orbits. The GPS receiver simply stores an almanac that tells it where every satellite should be at any given time.

CHAPTER 3
ALGORITHM TO PROVIDE BEST THIRD PERSON
PERSPECTIVE

Fig 2 Left: Schema of the cameras (in blue) disposition in the room in order to cover the whole space (their angle of view is represented by the line in different colors). Each camera is coupled with a picture representing a snapshot of its video ow sent onto the network; Right: Schema of the system architecture. A well-known way to reduce the marker occlusions consists in working with multiple cameras. Indeed, even if the marker is hidden for one camera, the other cameras (due to their strategic position) should still be able to detect the marker. The second issue, the registration, is one of the main issues in AR software and may be a source of motion sickness. Notice that it has been proven in that the use of a fixed camera considerably reduces this lack. We will then use several fixed cameras to provide 3PP when the user is moving, and 1PP for the fine manipulation with a camera coupled on the users' HMD.

3.1 Description of the system
The goal of the system is to provide the best view" to a user who can move in several rooms
and manipulate objects in augmented reality. We know that these two actions require different perspective: third-person perspective (3PP) for navigation tasks, respectively first-person perspective (1PP) while manipulating an object with the hands. In our case, instead of having a camera that follows the user for the 3PP, we decided to have multiple fixed cameras. Consequently, the user will not need to matter about collisions of a cumbersome backpack with wall, ceiling, doors, etc. As there are multiple cameras, we need a system that will automatically detect which camera needs to be activated for the user best view. For this simulation, we will work in an area of two adjacent rooms in which we already put three cameras at strategic positions. System considers that there are two networks of three cameras (one network per room). Our system then first have to localize the user, i.e. in which room he/she is currently. Once done, depending on the visibility of the markers and on the displacement of the user, our system will choose which video stream to provide to the user. Video clients are linked to a Webcam They have three tasks to perform in parallel: acquire webcam video ow with the help of the Digital Signal Video Library and process it with ARToolKit to detect if there are visible markers; Processed video flow onto their network in a continuous way; and connect to the server of their network to transmit the markers visibility status (ARToolKit). The servers also have three main tasks to perform: detect and accept all the three video clients (plus the user) of its network to receive their information about their markers visibility status; choose the best video stream to indicate to the user client; and inform user client the video stream to connect to. The user client is composed of a notebook (Windows Vista for network tricks with a Wi-Fi antenna to connect to the network server) linked to a Webcam, an inertial tracker (user movements), a Wi-Fi USB adapter (user location), and a video see-through HMD (user video feedback). Here are the main rules of the algorithm no movement detected by the inertial tracker on the user directly leads to choose 1PP and detected movement leads to propose 3PP

CHAPTER 4
APPLICATIONS IN EDUCATION

Fig 3
4.1 MAGIC BOOK
Many applications has been implemented using the AR technology in educational applications, one of the most interesting application is the Magic Book which was developed by Mark Billing hurst . In the Magic Book experiment they used the Normal books a the main interface objects. The user can read the book pages and look at the pictures and turning these pages without any additional technology.In another work by the Narrative-based, Immersive, Collaborative Environment Aim to explore virtual reality as a learning medium. The project was designed for children aged 6 to 10 to learn Biology subjects. The children as shown in Fig, are represented in the virtual space by an avatar while moving around in the space. The children meet the talking signpost that helps to direct them in the virtual world to explore the garden.

Fig 4

Geometric subject in the mathematical field has the most interest of AR application for education. Kaufmann.H use AR application to allow the use of Construct 3D in classrooms and supporting teacher student interaction. Another idea in the project present Augmented Classroom as shown in Fig. 3, where the students can interact with the teacher and the subject by using AR as a learning medium. The evaluation of this application showed that students got very positive and encouraging results. Kaufmann.H again present a research on the title of "The Potential of AR in Dynamic Geometry Education", the research summarizes his development in the areas of dynamic 3D geometry, usability design, spatial abilities, pedagogy in VR/AR environments and of low cost VR systems and the research shows that Construct3D is easy to use as shown in Fig, where it requires little time to learn, encourages learners to explore geometry and can be used in a consistent way figure:Components of the Magic Book using handheld display Interface
4.2 AUGMENTED CLASSROOM

Fig 5
4.3 MAGIC MIRROR

Fig 6 A screen shot of magic mirror
The magic mirror consists of a PC with a display, a web cam, and a small wooden pedestal with an embedded RFID antenna. A child can use the magic mirror by simply placing a figure on the pedestal: the mirror will then switch from displaying the child’s reflection (using the web cam) to displaying the learning modules associated with this figure . In order to recognize the child, we provide them with a personal magic item (e.g., a brooch, a magic card, or a figurine) that they will need to wear or place next to the mirror in order to activate it. Each magic item contains an RFID transponder that identifies the child in the game environment, thus supporting an individual learning history for each child in order to keep track of his or her progress.
The magic mirror then displays the educational content available for this play figure. Depending on the available content, the children can select from a number of different learning modules and retrieve information about the figure as well as facts about the Middle Ages in general. In contrast to the verbal commentaries, the magic mirror is much more powerful in terms of feedback (i.e., text, pictures, and videos). The main advantage, however, is the higher level of
interactivity: while the verbal commentaries during the play allow for some interaction with the children , the magic mirror is capable of more sophisticated selection and feedback processes such as quizzes and puzzles with regard to the previously displayed educational content. In addition to the magic mirror, a PDA-based solution was implemented. This approach is mainly motivated by two trends regarding mobile phones and similar devices: first, these devices are steadily becoming more powerful with novel capabilities being added and old ones improved constantly. Second, the number of children in possession of mobile phones is continuously growing, even at elementary school age. In other words, small mobile devices must be considered seriously when designing pervasive (computing) environments, even for children.
The PDA can be seen as a pocket magic mirror and principally offers the same functionality as the magic mirror. The problem is that a PDA has limited resources by comparison and a standard browser on the PDA is often not sufficient since the browser usually displays the original website with scroll bars, necessitating a custom-built user interface. We thus implemented our own user interface with Microsoft Visual Studio .NET. The PDA, an HP iPAQ hx2400, has integrated WiFi and an attached RFID reader . This device is capable of (displaying of almost all learning modules – the multimedia content is dynamically adjusted to the I/O capabilities of this device (e.g., images /videos are resized accordingly)

First part of the entity-relationship diagram
Second part of the ER diagram Fig 7
With regard to the augmented toy environment and the individual interests, we can derive the following use cases: Children can interact with the AKC and retrieve educational content using the magic mirror or the PDA.
Parents ,educators can selectively activate educational content for the children. For them, a web based user interface is provided. Furthermore, they can review each child’s individual learning history and progress (i.e.,what modules the child has viewed, what quizzes have been solved, etc).Developers can create and modify learning modules and integrate multimedia content using a content management system. Central to the infrastructure are the play objects and the associated educational content. The educational content is delivered in learning modules with each module representing a complete and closed topic. Each module is divided into several, consecutive levels, which reflect the gradually increasing complexity and difficulty of the content. Each level is associated with multimedia content .A child can select any level of a module that is unlocked. While most levels of a module are available from the beginning, a level can have a quiz that must be solved in order to proceed to the next level (unlock); these levels cannot be skipped. This not only ensures a revisable learning progress but is even necessary when a higher level is dependent on the information provided in previous levels. In addition to this, each module might be available in several languages. Parents and educators configure what languages should be available, and the children can switch between these selected languages by simply pressing a button. The presentation of the same educational content in different languages can help the children with learning foreign languages in a playful manner. The modules are associated with play objects using keywords. To this end, each figure and each learning module has a set of keywords associated with it. The keywords are meant to describe them as precisely as possible. They are assigned by the designers of the figures and the developers of the learning modules, respectively. The keywords can be labeled sensitive by the designer if the content is potentially critical (e.g., learning modules about weapons, torturing methods, sexuality, etc.) and also have a minimum age attribute. Additionally, the keywords can be weighed to reflect the relevance for a figure or learning module. A learning module is relevant for a figure, if one or more keywords match. This approach has several advantages compared to fixed associations:
There can be many designers of educational content and figures; there is no synchronization or coordination required. •Newly created learning modules can be easily associated with figures already in existence. • The number of learning modules is not limited; for each figure there is potentially a great variety of different modules to pick from.

4.4 AR SUPPORTED ENGLISH LEARNING SYSTEM
Many positioning technologies have been used to get context-aware information, such as 802.11, IrDA, RFID and GPS. In addition to this, 2D barcode technology has many benefits and it has potential to be applied in various fields. The comparison of aforementioned positioning technologies is shown in Table 1. 2D barcode has many advantages including large storage capacity, high information density, strong encoding, strong error correcting, high reliability, low cost, and ease of printing. A variant of this technology, the Quick Response (QR) Codes created by Japanese corporation Denso-Wave in 1994 and placed in the public domain. QR Code was designed for rapid reading using CCD array cameras and image processing technology. More recently, the inclusion of QR Code reading software in camera phones in Taiwan has become more and more popular. Therefore, this study used QR Code technology to conduct context-aware learning, and puts a number of QR barcodes around campuses. The fundamental concept of the proposed English learning system in this study is that the digitized learning materials are stored in a learning server. The page link information between context-aware materials and learning zones is defined in 2D barcodes. Following the guide map to approach learning zones, each student carries a PDA phone having a video camera and wireless local area network (WLAN). A student can use the guide map displayed on PDA screen to approach a zone in campus and uses the PDA phone to decrypt the 2D barcode which was printed on a paper and was attached on a wall or information board in the zone. The detected 2D barcode information is then sent to the learning server for querying content via a WLAN. The learning server sends context aware content to student’s handheld learning device. Then using AR technology, the learning device superimposes a 3D animated virtual learning partner (VLP) over the real world zone image to create a new image presented on the learning device by AR technology. The student can complete the context-aware learning process by talking to the VLP. F shows the learning device and QR barcode. This study integrates 2D barcodes, the Internet, mobile computing, wireless communication and database technologies to construct an interactive, mobile and context-aware learning environment, Handheld English Language Learning Organization (HELLO). The HELLO consists of two subsystems— the HELLO server is a learning server and the m-Tools is software application. Teachers and system administrators can utilize personal computers to connect with HELLO server via using the Internet. The functions of HELLO server are:
Content Management (CM) unit: College affairs stipulate self-study courses and store the learning materials in the Content Database (CDB). Assessment Management (AM) unit: College teachers can give assessments to students to evaluate their learning results. Portfolio Management (PM) unit: Students can upload their portfolios onto Evaluation Database (EDB) then teachers can review students’ portfolios and give grades through the PM unit. Forum unit: Through this unit, teachers can instruct students and classmates can share learning experience
with each other. Push unit: The push unit automatically deliveries a daily sentence to students to offer practical conversation materials. Students hold a PDA phone installed with the m- Tools to pursue mobile English learning. The functionalities of m-Tools are: Listening and Reading: The m-Player can be used to download course materials and then student can read articles/news, or listen to conversation from the HELLO server.
Playing : The m-Player can be used to play learning game or English songs. Speaking: To improve speaking ability, students can utilize the m-Speaker. The m-Speaker superimposes
the VLP on the learning zone image (captured from the m-Camera); students feel like talking to a person in the real world. The m-Speaker automatically stores students’ speaking patterns and displays a graph comparing their sounds with those of the virtual teacher. By referring to this graph, students can correct their pronunciation. Writing: Students can utilize the m-Writer to pen an article or diary in English. Context-awareness: When students hold a PDA phone near a zone attached with a 2D barcode, the m-Reader on the PDA phone will decrypt the internal code that
the PDA phone will send to the HELLO server. The HELLO server will download context-aware content to the PDA phone. Evaluation : Each student can use the m-Test to take tests and evaluate his/her learning achievement. Moreover, learning records can be stored in the m- Portfolio through the Human Control Interface (HCI) after each student finished his/her learning tasks. Upon completion, the student learning portfolio can be uploaded into the EDB of HELLO server for the teacher’s review.

Fig 8

CHAPTER 5
LATEST DEVELOPMENTS IN AR

5.1 SIXTH SENSE
SixthSense is a wearable “gesture based” device that augments the physical world with digital information and lets people use natural hand gestures to interact with that information. It was developed by Pranav Mistry, a PhD student in the Fluid Interfaces Group at the MIT Media Lab. A grad student with the Fluid Interfaces Group at MIT, he caused a storm with his creation of SixthSense. SixthSense will allow us to interact with our world like never before. We can get information on anything we want from anywhere within a few moments! We will not only be able to interact with things on a whole new level but also with people! One great part of the device is its ability to scan objects or even people and project out information regarding what you are looking at.

Fig 9 Recent Prototype

The SixthSense prototype is composed of a pocket projector, a mirror and a camera. The hardware components are coupled in a pendant-like mobile wearable device. Both the projector and the camera are connected to the mobile computing device in the user’s pocket.We can very well consider the Sixth Sense Technology as a blend of the computer and the cell phone. It works as the device associated to it is hanged around the neck of a person and thus the projection starts by means of the micro projector attached to the device. Therefore, in course, you turn out to be a moving computer in yourself and the fingers act like a mouse and a keyboard. The prototype was built from an ordinary webcam and a battery-powered 3M projector, with an attached mirror all connected to an internet- enabled mobile phone. The setup, which costs less than $350, allows the user to project information from the phone onto any surface walls, the body of another person or even your hand. Mistry wore the device on a lanyard around his neck, and colored Magic Marker caps on four fingers (red, blue, green and yellow) helped the camera distinguish the four fingers and recognize his hand gestures with software that Mistry created.
5.2 WORKING OF SIXTH SENSE TECHNOLOGY
Components
The hardware components are coupled in a pendant like mobile wearable device.
 Camera
 Projector
 Mirror
 Mobile Component
 Color Markers
Camera :A webcam captures and recognises an object in view and tracks the user’s hand gestures using computer-vision based techniques. It sends the data to the smart phone. The camera, in a sense, acts as a digital eye, seeing what the user sees. It also tracks the movements of the thumbs and index fingers of both of the user's hands. The camera recognizes objects around you instantly, with the micro-projector overlaying the information on any surface, including the object itself or your hand.
Projector : Also, a projector opens up interaction and sharing. The project itself contains a battery inside, with 3 hours of battery life. The projector projects visual information enabling surfaces, walls and physical objects around us to be used as interfaces. We want this thing to merge with the physical world in a real physical sense. You are touching that object and projecting info onto that object. The information will look like it is part of the object. A tiny LED projector displays data sent from the smart phone on any surface in view–object, wall, or person.
Mirror : The usage of the mirror is significant as the projector dangles pointing downwards from the neck.
Mobile Component : The mobile devices like Smartphone in our pockets transmit and receive voice and data anywhere and to anyone via the mobile internet. An accompanying Smartphone runs the Sixth Sense software, and handles the connection to the internet. A Web-enabled smart phone in the user’s pocket processes the video data. Other software searches the Web and interprets the hand gestures.
Colour Markers :It is at the tip of the user’s fingers. Marking the user’s fingers with red, yellow, green, and blue tape helps the webcam recognize gestures. The movements and arrangements of these makers are interpreted into gestures that act as interaction instructions for the projected application interfaces.

Fig 10
Working
 The hardware that makes Sixth Sense work is a pendant like mobile wearable interface
 It has a camera, a mirror and a projector and is connected wirelessly to a Bluetooth or 3G or wifi smart phone that can slip comfortably into one’s pocket
 The camera recognizes individuals, images, pictures, gestures one makes with their hands
 Information is sent to the Smartphone for processing
 The downward-facing projector projects the output image on to the mirror
 Mirror reflects image on to the desired surface
 Thus, digital information is freed from its confines and placed in the physical world

The software program analyses the video data caught by the camera and also tracks down the locations of the coloured markers by utilising single computer vision techniques. One can have any number of hand gestures and movements as long as they are all reasonably identified and differentiated for the system to interpret it, preferably through unique and varied fiducials. This is possible only because the ‘Sixth Sense’ device supports multi-touch and multi-user interaction.
The software recognizes 3 kinds of gestures:
 Multitouch gestures, like the ones you see in Microsoft Surface or the iPhone -- where you touch the screen and make the map move by pinching and dragging.
 Freehand gestures, like when you take a picture. Or, you might have noticed in the demo, because of my culture, I do a namaste gesture to start the projection on the wall.
 Iconic gestures, drawing an icon in the air. Like, whenever I draw a star, show me the weather. When I draw a magnifying glass, show me the map. You might want to use other gestures that you use in everyday life. This system is very customizable.
The technology is mainly based on hand gesture recognition, image capturing, processing, and manipulation, etc. The map application lets the user navigate a map displayed on a nearby surface using hand gestures, similar to gestures supported by multi-touch based systems, letting the user zoom in, zoom out or pan using intuitive hand movements. The drawing application lets the user draw on any surface by tracking the fingertip movements of the user’s index finger.

5.3 APPLICATIONS
The SixthSense prototype implements several applications that demonstrate the usefulness, viability and flexibility of the system. The SixthSense device has a huge number of applications. The following are few of the applications of Sixth Sense Technology.

Fig 11 Multimedia reading experiences Fig 12 Checking time

Fig 13 Taking pictures Fig 14 Feed information on people

 Make a call
 Call up a map
 Check the time
 Create multimedia reading experience
 Drawing application
 Zooming features
 Get product information
 Get book information
 Get flight updates
 Feed information on people
 Take pictures
 Check the email

CHAPTER 6
CONCLUSION

AR has significant effects in the process of learning. The sixth sense prototype implements several applications that demonstrate the usefulness ,viability and flexibility of the system. Clearly, this has the potential of becoming the ultimate "transparent" user interface for accessing information about everything around us. If they can get rid of the colored finger caps and it ever goes beyond the initial development phase, that is. But as it is now, it may change the way we interact with the real world and truly give everyone complete awareness of the environment around us. There are privacy concerns. Image-recognition software coupled with AR will, quite soon, allow us to point our phones at people, even strangers, and instantly see information from their Facebook, Twitter, Amazon, LinkedIn or other online profiles. The future of augmented reality is clearly bright, even as it already has found its way into our cell phones and other areas.

CHAPTER 7
BIBLIOGRAPHY

• Waleed Fayiz Maqableh, Manjit Singh Sidhu ‘From Boards to Augmented Reality Learning’ IEEE, 2010
• Steve Hinske, Marc LangheinrichAn Infrastructure for Interactive andPlayful Learning in Augmented Toy Environments IEEE 2009
• Tsung-Yu Liu, Tan-Hsu Tan and Yu-Ling Chu
2D Barcode and Augmented Reality Supported English
Learning system, IEEE 2007.
• http://en.wikipediawiki/Augmented reality

seminar paper · 13-02-2012, 12:21 PM

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	Object class recognition using unsupervised scale-invariant learning FULL REPORT	seminar class	1	1,552	03-07-2011, 02:56 PM Last Post: resmiraveendran

Important Note..!

ASK HERE