An Integrated WWW Image Retrieval System


Guojun Lu and Ben Williams, Gippsland School of Computing and Information Technology Monash University, Churchill, Vic 3842 Guojun.lu@infotech.monash.edu.au


Abstract

Effective WWW image retrieval systems are required to locate relevant images as more and more images are used in HTML documents. We describe an approach integrating text based and content based techniques, to take advantage of their complementing strengths. Our experimental results show that the integrated approach has higher retrieval performance than either the text based or the content based techniques.

Introduction

More and more images are used in HTML documents on the WWW. There is an urgent need of an effective image search engine or system which can retrieve relevant images quickly on demand. This paper describes such a system.

Image contents on the WWW are described by text in the HTML documents as well as contained in image data itself. Thus an effective image retrieval system should make use of both text and image data, by integrating text-based and content-based image retrieval techniques. Text-based and content-based image retrieval techniques complement each other. Text-based techniques can capture high level abstraction and concepts. It is easy to issue text queries. But text descriptions are sometime subjective and incomplete, and cannot depict complicated image features very well. Text-based techniques cannot accept pictorial queries (query by example). On the other hand, content-based techniques can capture low level image features and accept pictorial queries. But they cannot capture high level concepts. Pictorial query process is hard to start, as the user has to specify the query image by selecting an existing image or drawing a sketch. Both methods are difficult to use, as the user normally does not have access to images which can be used as queries.

We propose to integrate the text-based and content-based techniques into one system. Such a system can capture both high and low level features. Users can start their search process by issuing a text query. From the initial returned images, they can select images as content-based queries. The final returned images are based on combined matching scores of the text-based and content-based searching, incorporating the user's preference weighting.

There are many content-based image retrieval techniques, such as those based on colour, texture and shape (Swain & Ballard 91; Finlayson 92; Niblack et al 93; Stricker & Orengo 95). In our prototype system, we implemented the colour based retrieval technique. There are two reasons for this choice. Firstly, the colour based technique has been reported to produce good retrieval performance. Secondly, it is simple to implement. Unlike the texture based and shape based methods, it does not require image segmentation which itself is a hard image processing problem.

The key issues of designing the integrated image retrieval system are how to use the structure of HTML documents to achieve effective text-based image retrieval, how to implement the colour based image retrieval technique, and how to combine the retrieval results of these two techniques to obtain meaningful final results. The contribution of this paper is in these three issues or areas. In the following, we describe these three issues and present some experimental results.

 

 

Text-based Image Indexing and Retrieval

Text-based image retrieval can be based on the traditional text information retrieval (IR) technique (Salton 83; Chua et al 94). However, to improve retrieval performance, we should make use of the structure of HTML documents. This is because words or terms appearing at different locations of an HTML document have different levels of importance or relevance to related images. Therefore, we have to assign term weights based on term positions.

We classify terms into the following groups based on their locations. The numbers in parentheses are their assigned weights, reflecting their importance for image retrieval.

The final weight of each term for each image is the sum of weights assigned based on its appearing locations and frequency. However, a problem arises from a straightforward calculation of weights. HTML documents with terms in only a few of the designated locations will have lower weights in comparison to HTML documents with text in most or all locations. Consider two HTML documents on the topic of trains. One document consists of no text other than a single heading 'Pictures of Trains'. Another document consists of text throughout all eight of the locations. Using straightforward addition of term weights, the former document will be poorly represented with image/term weights, despite the fact that the heading, and in particular the term 'trains' is likely to describe the linked images well. To solve this problem we designed a weighting redistribution algorithm for location weights that are not assigned.

The principle of the weighting scheme is to calculate a factor that increases weights of terms extracted from HTML documents that have little text. The factor is calculated based on the empty locations in the HTML text and the available weights for these empty locations, using the following formula:

where r is a scaling factor between 0 and 1 that controls the proportion of unused location weights to be redistributed to non-empty locations. The scaling factor should not be too high because it is not appropriate to allocate all weights of empty locations to non-empty locations. In our implementation, 0.5 is chosen for r.

After we found emptyFactor for a document, the weights of non-empty locations are increaded by that factor. We use an example to illustrate the above algorithm. Suppose a document only has text in Meta data, Title and Image URL. Then

SumOfEmptyLocationWeights=0.2+0.3+0.4+0.4+0.2=1.5

SumOfNonEmptyLocationWeights=0.2+0.2+0.4=0.8

EmptyFactor=1+(1.5/0.8) x 0.5 = 1.94

The weights of non-empty locations Meta data, Title and Image URL will increased to 0.388, 0.388 and 0.776 respectively.
 
 

Another problem with the straightforward weight calculation is that excessive repetition of a term in the same location will distort the term weight. To solve this problem, we assign weights to a term appearing many times in the same location (group) as follows. For the first appearance, the full location or group weight is assigned. The second appearance will have weight of a half of the first appearance. The third appearance will have only a quarter of the weight of the first appearances, and so forth. This strategy discourages authors to repeat terms purposely to increase hit rate while increases the relevance for legitimate multiple appearances of terms.

Colour-based Image Indexing and Retrieval

In the colour based image retrieval technique, each image in the database is normally represented using three primaries of the colour space chosen. Each colour channel is quantized into m intervals. So the total number of discrete colour combinations (called bins) n is equal to m3. For example, each channel is commonly quantized into 16 intervals. So we have 4096 bins in total. A colour histogram H(M) is a vector (h1, h2,..., hn), where each element hj represents the number of pixels falling in bin j in image M. These histograms are the feature vectors (indexes) to be stored as the index of the image database.

During image retrieval, a histogram is found for the query image or estimated from the user's query. A metric is used to measure the distance between the histograms of the query image and images in the database. (If images are of different size, their histograms are normalized.) Images with a distance smaller than a pre-defined threshold are retrieved from the database and presented to the user.

In our previous study, we identified some limitations and proposed solutions to overcome these limitations (Lu & Phillips 98). We briefly describe the limitations and solutions in the following

To achieve high storage and retrieval efficiency, the number of histogram bins used is normally much smaller than the total number of colours used to represent images. Therefore, a number of colours have to be grouped into one bin. This is called colour quantization.

There are three main problems associated with colour quantization. The first is that due to quantization error perceptually similar colours may be quantized into different bins and perceptually different colours may be quantized into the same bin. This problem can be partially overcome by using the CIEL*u*v* uniform colour space.

The second problem is that each quantization is done on each colour channel independently instead of based on overall perceptual colour difference between pixels. Different colour channels have different contributions to the final perceptual colour, so simple division by the same value is not a effective method.

The third problem is that a colour may be similar to colours of more than one bins, but it is normally only quantized into one bin, leading to the problem that images consisting of similar colours have very large histogram distance.

To solve the above problems, instead of dividing each colour channel by a constant (quatization step) when obtaining histogram, we propose to find representative colours in the CIEL*u*v* colour space. The number of the representative colours is equal to the required number of bins. These representative colours are uniformly distributed in the CIEL*u*v* colour space. While building histogram, ten perceptually most similar representative colours are found for each pixel. The distance between the pixel and the ten representative colours are calculated. Then weights are assigned to these ten representative colours inversely proportional to the colour distances. The total weights for each pixel is equal to 1. In this way, we obtain a so-called perceptually weighted histogram (PWH) for each image. It has been shown that the PWH based method has higher image retrieval performance than the normal colour histogram based method (Lu & Phillips 98).

Image Retrieval Combining Text- and Colour-based Techniques

The text-based and colour-based methods return two independent lists of images with different weights. These two lists must be combined in a meaningful way to give the user a combined image list. During the combining process, the user preference weighting for text and colour should be incorporated. The default weighting is 50% each for both methods. But the user can choose other weighting.

The major difficulty in combining the two weighted lists is that the two lists are obtained using two totally different weighting schemes (one based on term weights and the other on colour distribution). So we cannot simply add the weights of each image in the two lists to obtain a combined weight.

The solution we used is to normalize the similarities calculated based on text and colour histograms so that the normalized similarities are within the common range of 0 and 1. The normalized similarity is equal to the similarity before normalization divided by the possible maximum similarity for that technique. In the text based retrieval technique, the maximum similarity is equal to the number of terms used in the query times the maximum weight of each term. Based on our weight allocation method, the maximum term weight is achieved when the term occurs many times in each location and is equal to the sum of twice location weights. Therefore, if k terms are used in a query, the maximum similarity is equal to 4.3k (k x 2 (0.2+0.2+0.2+0.3+0.4+0.4+0.4+0.2)).

In colour histogram based technique, the maximum similarity is twice the number of pixels in the images.

 

 

Implementation and Experimental Results

We have implemented the prototype integrated image retrieval system. It can be accessed at http://www-mugc.cc.monash.edu.au/~guojunl/search.cgi.

Figure 1 shows the user interface of the system and retrieval results obtained by first using "flower" as the query and then using "flower" and colour histogram of the first image (with weights of 50% each) as the query.

To compare the retrieval performance of the integrated technique with those of text based and the colour based techniques, we tested these techniques with four queries over a collection of about 600 images. The common aim of the experiments is to retrieve images of snow, sunset, flower and building (Figure 2). To test the text based technique, the four terms "snow", "sunset", "flower", and "building" are used as four queries. To test the colour based technique, the four corresponding images are used as queries. To test the integrated technique, the four terms and their corresponding images are used as four queries.

The search effectiveness is determined by the relevance of the set of images returned to the user. The user is most concerned with the first set of results presented, and is unlikely to search through extra screens of results, especially if the first set is largely irrelevant. For this reason, we calculated the percentages of relevant images (precision) for the first 9 and for the first 18 results for each of the three search types. Our experimental results show that the integrated technique retrieves more relevant images than the text based and the colour based techniques when first 9 and 18 items are returned (Figure 3). That is, the integrated technique has higher retrieval precision than the text based and the colour based techniques.


 


Figure 1 The user interface and a sample retrieval results


 

 
 
 
 

image #83

image #602

image #229

image #561

"snow"

"sunset"

"flower"

"building"

Figure 2 The four queries used in our experiment


 





 


Figure 3 Comparison of precision for Displayed Results


 

Discussion and Conclusion

In this paper, we described an integrated WWW image search system. Our experimental results show that the integrated system has higher retrieval performance than the text based and the colour based techniques.

There are three benefits of combined searches that are not made apparent in the results of Figure 3. The first is that combined searches using both text and colour return results that are more relevant than for searches based on only text or colour alone. The higher relevance does not affect the precision values in Figure 3 since images are only deemed as being either relevant or not relevant, irrespective of how relevant they are. Text searching alone finds images based on the semantic meaning of terms, while colour matching alone finds images based on the low level comparison of colour distribution. Images returned based on either search are likely to match only the semantic meaning or low level meaning as appropriate to the search type. On the other hand, combined searches return results that match both the semantic meaning of terms and the low level colour features. This can result in higher relevance of images returned from a combined search. An example of the higher relevance of images returned from a combined search can be seen for images returned from a search of flowers. A search on the term 'flower' will return images of flowers of any colour. A search on a query image of a green and yellow flower is likely to return images of green plants, including flowers and other types, that have yellow features. A search on the term 'flower' combined with the colour characteristics of query image featuring a green and yellow flower is much more likely to return images with both characteristics, therefore producing more relevant results to the user's requirements.

The second benefit is that a larger set of results is produced for combined searches. Images that match either one or both of the specified search criteria for terms and colour matching are retrieved. In almost every case, the user is able to obtain a higher number of relevant images for combined searches than for individual searches alone.

The third benefit is the ease of carrying out image search. The user can start a search using a text only query and then use retrieved images as queries in combination of text. Image retrieval is an iterative process whereby a user may refine a search using retrieved relevant images as new queries. This allows the user to narrow down the range of results found, so that a more precise range of images can be found.

References

Swain M. J. & Ballard D. H. (1991), "Color indexing", Int. J. Comput. Vision, 7:11-32.

Finlayson G. D (1992), Colour Object Recognition, MSc Thesis, Simon Fraser University, 1992.

Niblack W. et al (1993), "QBIC Project: querying images by content, using colour, texture, and shape" Proceedings of Conference on Storage and Retrieval for Image and Video Databases, 1-3 Feb. 1993, San Jose, California, US, SPIE Vol. 1908, pp.1908-1920.

Stricker M. & Orengo M. (1995), "Similarity of color images", Proceedings of Conference on Storage and Retrieval for Image and Video Database III, 9-10 Feb. 1995, San Jose, California, SPIE Vol. 2420, pp. 381-392.

Salton G. (1993), Introduction to Modern Information Retrieval, McGraw-Hill Book Company.

Chua T. S et al (1994), "A Concept-based Image Retrieval System", Proceedings of 27th Annual Hawaii International Conference on System Science, Maui, Hawaii, January 4-7 1994, Vol. 3, pp 590-598

Lu G. & Phillips J. (1998), "Using perceptually weighted histograms for colour-based image retrieval", Proceedings of Fourth International Conference on Signal Processing, 12-16 October 1998, Beijing, China, pp. 1150-1153.

Copyright

Guojun Lu and Ben Williams, (c) 1999. The author assigns to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The author also grants a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.


[ Proceedings ]


AusWeb99, Fifth Australian World Wide Web Conference, Southern Cross University, PO Box 157, Lismore NSW 2480, Australia Email: "AusWeb99@scu.edu.au"