dr. deborah stacey ryerson university cis image ...ncart.scs.ryerson.ca › wp-content › uploads...

5
Elliott Coleshill University of Guelph CIS Guelph, Ont, Canada [email protected] Dr. Alex Ferworn Ryerson University NCART Toronto, Ont, Canada [email protected] Dr. Deborah Stacey University of Guelph CIS Guelph, Ont, Canada [email protected] Abstract – Image enhancement within machine vision systems has been performed a variety of ways with the end goal of producing a better scene for further processing. The majority of approaches for enhancing an image scene and producing higher quality images have been explored employing only a single image. An alternate approach, yet to be fully explored, is the use of image sequences utilizing multiple image frames of information to generate an enhanced image for processing within a machine vision system. This paper describes a new approach to image enhancement for controlling lighting characteristics within an image called Frame Extraction Through Time (FETT). Using multiple image frames of data, FETT can be used to perform a number of lighting enhancement tasks such as scene normalization, nighttime traffic surveillance, and shadow removal for video conferencing systems. Keywords: Traffic Monitoring, Vision Systems, Image Processing, Computer Vision Introduction Lighting characteristics within an image scene are often a major source of problems within machine vision systems today. Most vision systems will discard an image [1], which has poor lighting, and take a new version after “x” amount of time. Other vision systems attempt to adjust the lighting within the image by using techniques such as histogram equalization and lighting normalization [2][3]. The difficulty with these and similar approaches is that they tend to lose key details and information required for post processing techniques such as object or edge detection. One of the main reasons for loss of detail and information is due to the lack of data provided to the system. Most approaches employ only a single image for processing. There has been some investigation into using multiple images, however, this has been mainly in the area of background subtraction and motion detection [4]. The new FETT design employs multiple images providing more data for generating a new image with superior lighting characteristics for post processing purposes. The Theory of FETT Using a sequence of images taken through time from a video camera, a single image can be created of a scene with lighting characteristics which help to prevent failures in vision system detection algorithms. The use of FETT can be divided into the following main steps: acquiring images, pixel selection, and finally the generation of a superior image representing the scene with better lighting and contrast characteristics. Step #1: Acquiring Images A sequence of images is taken over time to provide the FETT algorithm with the changing lighting characteristics of the scene. The number of images required depends on the frequency of the change in lighting over the scene. The following is an example showing a sample of input images for FETT. Figure 1: Lighting Sequence Example T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 131–135. © Springer Science+Business Media B.V. 2008 Image Enhancement Using Frame Extraction Through Time

Upload: others

Post on 29-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Elliott Coleshill University of Guelph

    CIS Guelph, Ont, Canada [email protected]

    Dr. Alex Ferworn Ryerson University

    NCART Toronto, Ont, Canada

    [email protected]

    Dr. Deborah Stacey University of Guelph

    CIS Guelph, Ont, Canada

    [email protected]

    Abstract – Image enhancement within machine vision systems has been performed a variety of ways with the end goal of producing a better scene for further processing. The majority of approaches for enhancing an image scene and producing higher quality images have been explored employing only a single image. An alternate approach, yet to be fully explored, is the use of image sequences utilizing multiple image frames of information to generate an enhanced image for processing within a machine vision system.

    This paper describes a new approach to image enhancement for controlling lighting characteristics within an image called Frame Extraction Through Time (FETT). Using multiple image frames of data, FETT can be used to perform a number of lighting enhancement tasks such as scene normalization, nighttime traffic surveillance, and shadow removal for video conferencing systems.

    Keywords: Traffic Monitoring, Vision Systems, Image Processing, Computer Vision

    Introduction

    Lighting characteristics within an image scene are often a major source of problems within machine vision systems today. Most vision systems will discard an image [1], which has poor lighting, and take a new version after “x” amount of time. Other vision systems attempt to adjust the lighting within the image by using techniques such as histogram equalization and lighting normalization [2][3]. The difficulty with these and similar approaches is that they tend to lose key details and information required for post processing techniques such as object or edge detection.

    One of the main reasons for loss of detail and information is due to the lack of data provided to the system. Most approaches employ only a single image for processing. There has been some investigation into using multiple images, however, this has been mainly in the area of background subtraction and motion detection [4]. The new FETT design employs multiple images providing more data

    for generating a new image with superior lighting characteristics for post processing purposes.

    The Theory of FETT

    Using a sequence of images taken through time from a video camera, a single image can be created of a scene with lighting characteristics which help to prevent failures in vision system detection algorithms.

    The use of FETT can be divided into the following main steps: acquiring images, pixel selection, and finally the generation of a superior image representing the scene with better lighting and contrast characteristics.

    Step #1: Acquiring Images

    A sequence of images is taken over time to provide the FETT algorithm with the changing lighting characteristics of the scene. The number of images required depends on the frequency of the change in lighting over the scene. The following is an example showing a sample of input images for FETT.

    Figure 1: Lighting Sequence Example

    T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 131–135. © Springer Science+Business Media B.V. 2008

    Image Enhancement Using Frame Extraction Through Time

  • Step #2: Pixel Selection

    Average pixel intensity values for each input image are used to generate a scale required to extract the necessary pixel intensities. Each of the input images i through n are aligned and a grid defined where point [1,1] on image 1 is equal to the same pixel position of the scene in image n . A scale is then generated based on the pixel intensities using the following algorithm:

    ∑= BGR ,,ϖ (1)

    ( )hw

    n

    i

    ×=∑

    =0

    ϖψ (2)

    Using (1) for each pixel position within an image the Red, Green, and Blue pixel intensities ( )BGR ,, are summed up and used as the true pixel value ϖ for that pixel position. The next step is to calculate the average ψ of the current pixel position for all the input images provided using equation (2). This is done by summing up all the calculated pixel values across the input sequence and dividing it by the size of the scene where w is the width of the scene in pixels and h is the height in pixels.

    Using the average input intensity calculated above, a scale is generated by selecting the high ( )τ and low ( )σ pixel values and then calculating the midpoint ( )ς of the scale by using equation (3).

    ( )n

    στς −= (3)

    Step #3: Generation of FETT Image

    The final step of the FETT algorithm is to extract all the “good” information from the input image sequences and reconstruct a new optimized image of the scene with better lighting characteristics. This process is accomplished by selecting the best pixel ( )φ within the input sequence { }ni ϖϖ .. which best represents the calculated midpoint ( )ς :

    ( ) ςτϖσφ ≅= (4)

    The selected pixel ( )φ is then copied into the final image using the same grid position as defined in the input image.

    Testing Results

    A series of controlled and uncontrolled datasets were generated to test and verify the new FETT theory. These datasets contained image sequences and videos from simple scenes of individual objects with controlled dynamic light transitioning across the object to real world applications such as space-based images, traffic surveillance, and video conferencing where lighting changes cannot be controlled. Dynamic light, as defined herein, is bright light introduced within the scene that causes the camera to saturate and features to be lost.

    In order to measure the success of the FETT algorithm a Root Mean Square Error (RMSE) algorithm and Histogram comparison were used. For each of the controlled test cases a picture was taken with standard lighting (i.e. no introduced dynamic light) called the “Normal”. This Normal image was used as the “ground truth” for comparison measurement. Figure 2 below provides an example of a “Normal” image (b) against sample input images (a).

    Figure 2: Normal Example

    The RMSE was calculated for the average pixel intensity across the input images and final FETT Optimized image and compared. From the graph in Figure 3 one can see that the RMSE value calculated for the FETT Optimized images

    COLESHILL ET AL. 132

  • are greater than the average of the input images for most cases. For the cases where the FETT Optimized RMSE was less than the input average it was determined that shadowing effects introduced during the FETT process were reducing the average pixel intensity values. However, all dynamic lighting was removed proving that the FETT Optimized image average pixel intensity was closer to the Normal than the average of the input images.

    Figure 3: RMSE Graph

    A comparison of the image histograms was also performed. The Normal, FETT Optimized, and one of the input images were used to compare the pixel graph, mean and standard deviation. Figure 4 and 5 below show the graphed pixel intensity mean and standard deviations for a sample set of the controlled dataset. Again, one can see that the overall mean and standard deviation of the intensity in the FETT Optimized image becomes closer to the value of the Normal image showing that the dynamic light is being filtered out and the scene is being reconstructed to become similar to that of the Normal.

    Figure 4: Histogram Mean Graph

    Figure 5: Histogram Standard Deviation Graph

    A simple review of the actual histogram pixel plot also shows the improvement in the overall scene lighting. Figure 6 contains Histogram graphs of a single input image (a), and the FETT Optimized image (b). From these one can see the spike of pixels at the white end of the scale is removed from the FETT Optimized image.

    Figure 6: Histogram Pixel Graphs

    Using the Histogram pixel plots one can also see the final FETT Optimized image plot is merging towards the Normal image plot. Figure 7 provides another test case showing the Histogram plot of the Normal (b) against the plot of a FETT Optimized image (a). With the exception the extra pixels

    IMAGE ENHANCEMENT USING FRAME EXTRACTION THROUGH TIME 133

  • being distributed and replaced with better represented pixels, the overall plot is very similar to the Normal demonstrating the FETT algorithm trying to recreate the “Normal” image with the information in the image sequence.

    Figure 7: FETT vs. Normal Histograms

    Due to the nature of the uncontrolled dataset a Normal image could not be produced. Therefore standard edge detection was used to demonstrate the effects of the FETT Theory as a pre-processing tool. Edge detection was performed on a single input image and the FETT Optimized image. Onboard the International Space Station, there are several black and white dots (Figure 8) strategically placed around the structure. These dots are used as part of the Space Vision System (SVS) for determining position and orientation of modules during installation.

    Figure 8: SVS Target Dots

    When saturated with light, the SVS target dots are undetectable when using edge detection, as seen in Figure 9 (a). Using a sequence of images over time, FETT was used as a pre-processing tool to produce a view with better lighting and contrast characteristics in order to detect the SVS target dots correctly. As one can see in Figure 9 (b), all the faintly detected targets in (a) are represented better.

    Figure 9: SVS Edge Detection Example

    Note: Picture shown in the inverse for better visibility.

    Applicability and Future Enhancements

    To date, the FETT algorithm has been tested in multiple application domains such as target detection on the International Space Station [5], Traffic Surveillance [6], and Video Conferencing systems [7]. In all these application domains we have successfully proven that FETT can remove dynamic lighting and reconstruct targets correctly, reconstruct nighttime traffic in a daytime setting, and remove presenter’s shadows cast onto presentation material. However, all these test cases we have assumed no motion of the video camera. In the future we plan to enhance the FETT Algorithm to include repositioned cameras, multiple cameras at different angles, as well as moving objects within the scene.

    COLESHILL ET AL. 134

  • Conclusion

    This paper has described a novel method for image enhancement using a sequence of images through time to generate a view with better lighting and contrast characteristics. With the use of Frame Extraction Through Time (FETT) dynamic lighting information can be extracted and a new image reconstructed to help reduce lighting as a pre-processing step for machine vision system applications without the loss of detail and information within the scene.

    References

    [1] Steven Miller. “Eye in the Sky” Engineering Dimensions, November-December 1999.

    [2] Kim, Y.-T., Contrast Enhancement Using Brightness Preserving Bi-Histogram Equalization. IEEE - Consumer Electronics, 1997. 43(1): p. 1-8

    [3] Y. Matsushita, K.N., K. Ikeuchi, and M. Sakauchi, Illumination Normalization with Time-Dependant Intrinsic Images for Video Surveillance. IEEE - Pattern Analysis and Machine Intelligence, 2004. 26(10): p. 1336-1347.

    [4] Peleg, M.I.a.S., Motion Analysis for Image Enhancement: Resolution, Occlusion, and Transparency. Visual Communication and Image Restoration, 1993. 4(4): p. 324-335

    [5] (James) Elliott Coleshill, Dr. Alex Ferworn, Dr. Deborah Stacey, “Feature Extraction Through Time” , 57th International Astronautical Congress, IAC-06-B4.4.03, Valencia Spain, Oct 2-6, 2006.

    [6] Elliott Coleshill, Dr. Alex Ferworn, Dr. Deborah Stacey, “Traffic Safety using Frame Extraction Through Time”, SoSE 2007, April 16-18, 2007, San Antonio, TX, USA.

    [7] Elliott Coleshill, Dr. Alex Ferworn, Dr. Deborah Stacey, “Obstruction Removal using Feature Extraction Through Time for Video Conferencing Processing”, CISSE 2006, Dec 4-14, 2006, Online.

    IMAGE ENHANCEMENT USING FRAME EXTRACTION THROUGH TIME 135