This paper presents an application of mobile agent technology to the problem of web-based face recognition. For high performance in terms of reduced processing time, we introduce an innovative four-layer structural model and a three-dimension operational model. In addition, the proposed system integrates the computing advantages of mobile agents with several improved face recognition algorithms to enhance system robustness. Preliminary experimental results demonstrate the advantages and potentials of our approach for face recognition on the web.
Keywords: Mobile agent, Face recognition, World
Wide Web.
Web-based face recognition systems have received more and more attention in recent years. Aslandogan and Yu [1] developed a web-based image search agent named Diogenes, which takes advantage of the text/HTML structure of web pages as well as the visual analysis of the images for personal image search and identification. Diogenes works well with web pages that contain a facial image accompanied by a body of text that contains textural identification of the image.
Another web-based system combining intelligent software agents with face recognition engines has now been developed by the ANSER team [2]. The Missing Children Locator Agent is an example of one of ANSER’s systems. The software agents continuously search and retrieve facial images and relevant information from web sites on the Internet. A back-end face recognition engine analyses each image file to detect and match each face to a set of stored face images of missing children. The user can also search for an input image from the database for a match. Successful implementation of the system demonstrates its ability to speed up labor-intensive tasks and automate the processes required for face recognition.
However, in the above two systems, the intelligent agent does not get real mobility when it traverses web pages. It can not migrate from host to host in a heterogeneous Internet environment. This disadvantage makes it difficult to incorporate it with legacy systems when the system has to access image databases in different formats over the Internet.
Aoki et al. [3] report an active tracking vision that can detect human faces from sequential images. By integrating multiple information sensing from real-time and semi real-time agents, their system can track people in a room reliably and efficiently. The real-time agents are used to extract and transfer useful information based on captured images for quickly detecting and tracking objective faces in real time. Semi real-time agents work at the back-end, extracting the skin color region, determining the total evaluation values of facial probability for all face-like objects, and detecting faces in sequential images. The system design is straightforward and the experimental results are promising. However, in the Aoki et al.’s project, only a face detection scheme is involved; there is no facial feature extraction and graph matching schemes, which are two essential parts in a facial recognition system.
In this paper, we propose a web-based face recognition system using mobile agent technology. Both structural and operational models will be discussed. It is a network-oriented system involving not only face detection but also facial feature extraction and graph matching schemes. The mobile agent will display its full computing advantages in all system sub-layers.
This paper is organized as follows. Section 2 reviews
related mobile agent computing paradigm and Section 3 discusses the proposed
system framework, with highlighting its four-layer structural model and
three-dimensional operational model. Section 4 reports the experimental
results. Finally, the conclusion is presented in Section 5.
Comparing with earlier paradigm such as process migration or remote evaluation in distributed computing, mobile agent model is becoming popular for network-centric programming. Traditional client/server paradigm relies on handshake mechanism to communicate over a network. The client requests information, while the server responds. Each request/response has to be a complete round trip on the network. The emerging mobile-agent paradigm has redefined the way Internet-based applications work. As an autonomy software entity with pre-defined functionality and certain intelligence, mobile agent is capable of migrating autonomously from one host machine to another, making its request to the server directly and performing tasks on behalf of its master. Furthermore, following certain working pattern, multi-agents can friendly cooperate to accomplish a more complicated task, providing a dynamic and flexible platform for a wide range of software applications.
During the past few years, more than dozen Java-based agent systems have been developed. The Java Virtual Machine, standard security manager and two other functional facilities, namely object serialisation and remote method invocation have made it simple to build mobile agent workbench. Of all these available Java-based systems, ObjectSpace’s Voyager [HREF1], General Magic’s Odyssey [HREF2] and IBM’s Aglet [HREF3] are three leading commercial ones. A detailed discussion on current commercial and research-based agent systems can be found in [4].
The Aglet system developed by IBM is chosen as the implementation example in our proposed system. Although it is not a full-fledged platform until now, it has received the most press coverage and shows promises as a functional technology that fits very well into the Java world. Several phrases can be summarised to characterise an aglet: lightweight objects migration, built with persistent support, event driven and so on. Central to the aglet architecture is the context, which is the server environment for aglet execution. When an aglet has finished its work in a context, its state and data will be serialised to a stream of bytes and exported to the new context through Agent Transfer Protocol. In a reverse process, the state of the aglet can be reconstructed from the stream of bytes and become active at the new context [5]. The basic migration paradigm of aglet can be illustrated as Figure 1.
Figure 1: Basic migration paradigm of an aglet
To achieve high performance and better robustness, we propose a multi-agent system. The system framework can be viewed in two ways: one is the structural model, and the other is the operational model. Figure 2 illustrates the four-layer system structural model:
Figure 2: Four-layer system structural model
From an operational point of view, we propose a three-dimensional model consisting of three blocks, as shown in Figure 3:
Figure 3: Three-dimensional system operational model
To make accessing the system as easy as possible, a point-to-point input/output layer is established explicitly connecting the input/output device and the application server. All recognition schemes are hidden from the end user. A user interface is used to accept the input image and return recognition results. The structure of the point-to-point input/output layer is shown in Figure 4.
Figure 4: The structure of the point-to-point input/output layer
The central controller layer is the main part of the whole system connecting all sub-systems. A line-like intelligent detection scheme is implemented in this layer. Four agents reside at the central controller host: the interface agent, the chroma processing agent, the skin-color detection agent and the model matching agent. The structure of the line-like central controller layer is illustrated in Figure 5.
Figure 5: The structure of the line-like central controller layer
Chroma processing agent
The chroma chart is the key element for automatic skin-color region extraction. For example, In the work of Cai et al. [6], information about skin color (embedded in a chroma chart) has been used to find likely image regions where faces may exist. The knowledge about facial patterns (distribution of non-skin sub-regions) will be used to determine instances of face within such regions. However, in their work, the non-uniformity in the distribution of skin-colours weakens the minority skin-colours because the statistical results are smoothed directly using Gauss filters. To solve this problem, we encapsulate a novel algorithm in the proposed chroma processing agent, and all the skin-colours are taken as the same once the statistical value for one chroma exceeds the selected threshold. This improvement is illustrated in Figure 6.

Figure 6: a. Facial images
b. Statistical result
c. Threshold result (0.125)
d. Cai et al.’s result (
)
e. Cai et al.’s result (
)
f. Our result (
)
In Figure 6, it can be noted that under the same smaller
variance (
), Cai et al.’s algorithm
weakens the lower part of the statistical result incorrectly, whereas ours
is consistent with the statistical result. When the variance increases
(
), we can obtain a larger coverage
area. With this chroma chart, a colour image can be transformed into a
gray-scale image and the face regions can be determined after thresholding
and morphological filtering.
Figure 7 is a further comparison of the two algorithms focusing on the histogram in Figure 7 (e), which is much more obvious than that in Figure 7 (d), making it easier to determine the threshold.

Figure 7: a. A color image
b. Gray image using Cai et al.’s chroma chart
c. Gray image using our chroma chart
d. Histogram of b
e. Histogram of c
Skin-color detection agent
Mathematical morphology analyses images based on concepts from algebra and geometry, such as set theory, translation, convexity and so on. Due to the pioneering works of Matheron [7] and Serra [8], it has been successfully used in contour retrieval, segmentation, shape analysis, etc. In our proposed system, we introduce mathematical morphology into the skin-color detection agent in order to suppress all kinds of noises and improve the reliability of the detecting result.
The skin-color detection agent uses morphological operators to divide different convex objects, removing regions that are too small, and recovering regions’ sizes while keeping the same topological structure. The processing steps are illustrated in Figure 8.

Figure 8: a. Skin-color regions
b. The result of dilation
c. Erosion
d. Extreme erosion
e. SKIZ expansion
Model matching agent
The majority of face-detection methods are based on the model matching scheme, which combines prior knowledge about the face with lower-level processing results. In our proposed model-matching agent, we introduce a model face and a matching function for model matching with emphasis on the topological characteristics, which reduces the computation cost and improves matching efficiency.
The matching function includes two parts: one for topological
constraints, the other for the geometric constraints of each facial feature.
For each topological or geometrical attribute, the expected value and variance
can be determined through experiments. Through changing the size of the
Gauss
filter,
this method can detect faces of different sizes in an image, avoiding noises
in lower-level processing and forming a complete description of the face.
The similarity item is defined in formula (1), where x is the difference
between the measured and expected values of each topological attribute,
and
is the reasonable fluctuation
range. The matching function is defined as the weighting sum of all these
similarity items.

(1)
Figure 9: Model face and similarity item
To deal with the complexity of a facial feature’s contour, we introduce the Multi-GVF(Gradient Vector Flow)-snakes paradigm in this layer. This approach uses open snakes to represent smooth contours and the cross points of two snakes for sharp corners. It performs better than the single snake model in accuracy, but its long execution time is a penalty: all the facial features (mouth, nose, eyes and so on) have to be extracted separately and sequentially.
To address the above problem, we employ a group of external computation agents and adopt the "Divide and Conquer" strategy in this layer. The central agent controller dissects large amount of computation data into several operational portions and dispatches these smaller computation units to the neighboring worker agents for execution in parallel. A global result will be computed after all the sub-results are sent back to the central host for assembly. The system structure of the star-like external assistant layer is illustrated in Figure 10.
Figure 10: The structure of the star-like external assistant layer
The Gradient Vector Flow was introduced by Xu and Prince in order to overcome the limitations of the traditional snake model in initialisation for convergence to concave boundaries [9]. It strongly improves the convergence of the classic snake towards the desired solution and always has a large capture range barring interference from other objects. However, it proves some lack of performance in the presence of corners, because of the diffusion effects. Thus a single closed snake can extract a contour robustly only when the features are smooth. In our proposed star-like layer, we introduce the Multi-GVF-snakes paradigm and integrate it with the mobile agent’s worker model to address these limitations and achieve high efficiency.
Remote graph-matching scheme will be implemented in this layer. Facing the fact that large image databases with different formats might be located in different places over the network, it makes more sense if the mobile agent moves to the remote data source for searching and matching, rather than transferring large volumes of data over the network for processing. In this layer, we explicitly create a matching agent, initializing it with matching algorithms and dispatching it to the Internet. Upon reaching a new host, the matching agent interacts with remote agents and communicates with the backend databases for searching and matching.
A two-step-matching scheme will be performed: geometric-based coarse matching and dynamic-link-architecture based accurate matching. After the matching agent has achieved its pre-defined goal, it will migrate to the next host until returning home with the results. Some advantages of this scheme can be summarised as reduced design work, better bandwidth usage and broader searching range. The structure of the oval-like remote application layer is shown in Figure 11.
Figure 11: The structure of the oval-like remote application layer
Geometric-based coarse matching
Geometric, feature-based face recognition is among the earlier algorithms proposed [10]. As the image database becomes larger, however, it turned out to be impossible to perform accurate recognition by simply using this scheme. However, we embed this algorithm into our proposed matching agent as an effective pre-filtering for the second-step accurate matching.
In this scheme, the overall geometrical configuration
of the face features is described with a vector of numerical data representing
the position and size of the main facial features, e.g. eyes, nose and
mouth, and supplemented by the shape of the face outline [11]. Nearest
Neighbor (NN) classifier can be used for performing general, non-parametric
classification. Its performance is a function of the number of classes
to be discriminated (people to be recognized) and of the number of examples
per class.
Dynamic Link Architecture-based accurate matching
The Dynamic Link Architecture (DLA) is initially proposed
to solve certain conceptual problems of conventional artificial neural
networks. The power of DLA can best be demonstrated by applying it to a
complex problem like position and distortion invariant object recognition
[12, 13]. Elastic matching of an object graph
to an image graph
is a
pattern classification strategy, which clearly accounts for local distortions.
The cost function, which combines the topological and feature terms into
a measure of distance between the object domain and the image domain, is
used to evaluate the quality of a match. Gabor wavelet is chosen as a suitable
data format for object recognition, which proved to be invariant with respect
to background, translation, distortion and size.
To implement and evaluate the system performance, we apply the IBM Aglet as our implementation example. Three sets of tests are conducted: Intelligent Detection Test (IDT) in the line-like layer, Feature Extraction Test (FET) in the star-like layer, and Remote Graph-matching Test (RGT) in the oval-like layer. As explained below, the IDT Test aims at evaluating the algorithm’s robustness in the proposed detection agents under different shooting conditions. Then the FET test will calculate the speedup efficiency of our agent-based Multi-GVF-snakes model under different worker patterns. Finally the RGT test will examine the correctness and effectiveness of remote graph matching scheme in a distributed environment.
In order to obtain a stable chroma chart with fuzzy character, more than 2000 face images with different shooting conditions and different types of skin are chosen as the training samples for the chroma-processing agent. More than 200 face images were used for the IDT test. The experimental results illustrated in Figure 12 and Table 1 show that the improved algorithms in our proposed detection agents are robust to the unevenness of lighting, multiple faces, moderate tilt of faces and partial sheltering out.
Figure 12: Intelligent Detection Test
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Table 1: Intelligent Detection Test
To examine the speedup efficiency of the proposed agent-based Multi-GVF-snakes model, a group of external assistant agents is employed. The central agent controller is responsible for dissecting the task and assembling the final result. The processing result is illustrated in Figure 13.


where latencies
include network transmission and data packaging and unpacking.
(7)
and the total processing time is:

The speedup ratio of different worker patterns is illustrated in Figure 14:
Figure 14: The speedup ratio of different worker patterns
To examine the correctness and effectiveness of the remote graph matching scheme in the oval-like layer, two sets of tests are conducted: the viewing perspective test and the pattern occlusion and distortion test.
Viewing Perspectives Test
In this test, about 100 test patterns with different viewing perspectives are examined. Sample patterns and recognition results are presented in Figure 15 and Table 2. The overall correct recognition rate is about 85%.
Figure 15: Viewing perspective test pattern
|
Perspectives |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Table 2: Viewing perspectives test
Pattern Occluded Test
In this test, another 100 test patterns with partial occlusion, various expressions or wearing accessories are examined. Sample patterns and recognition results are presented in Figure 16 and Table 3. It should be explained that the classification rate of the partial occlusion test is heavily dependent on which portion and how much of the facial image is hidden.
Figure 16: Occlusion and distortion test pattern
|
& Distortion |
Classification |
|
|
|
|
|
|
|
|
|
Table 3: Occlusion and distortion test
This paper presents a web-based face recognition system using mobile agent technology. In contrast to current face recognition models, which suffer from slow performance and platform dependence, a four-layer system structural model and a three-dimension system operational model are introduced to achieve high performance and better flexibility. The proposed system model with several improved algorithms has been tested through experiments, demonstrating its feasibility and effectiveness. Coupled with other supporting schemes, the system is potentially useful in a wide range of Internet-based face recognition services.
[2] H. Wisniewski, "Face Recognition and Intelligent Software Agents - an Integration System," Prepared statement for the U.S. Senate Committee on Commerce, Science and Transportation, May 12 1999.
[3] Y. Aoki, K. Hisatomi and S. Hashimoto, "Robust and Active Human Face Tracking Vision Using Multiple Information," Proceedings of SCI’99 (World Multiconference on Systems, Cybermetics and Informatics), Vol.5, pp. 28-33, Aug. 1999, Orlando.
[4] J. Kiniry and D. Zimmerman, "Special Feature: A Hands-on Look at Java Mobile Agents," IEEE Internet Computing, Vol. 1, No.4, pp.21-30, July/August 1997.
[5] D.B Lange and M. Oshima, Programming and Deploying Java Mobile Agents with Aglets, Addison-Wesley, 1998.
[6] J. Cai, A. Goshtasby and C.Yu, "Detecting Human Faces in Color Images," Image and Vision Computing, 18(1): 63-75, 2000.
[7] G. Matheron, Random Sets and Integral Geometry, John Wiley & Sons, New York, 1975.
[8] J. Serra, Image Analysis and Mathematical Morphology, Academic Press, London, 1982.
[9] C. Xu and J. L. Prince, "Snakes, Shapes and Gradient Vector Flow," IEEE Transactions on Image Processing, Vol. 7, No. 3, Mar 1998.
[10] T. Kanade, "Picture Processing by Computer Complex and Recognition of Human Faces," Technical report, Kyoto University, Dept. of Information Science, 1973.
[11] Y. Kaya and K. Kobayashi, "A Basic Study on Human Face Recognition," In Satosi Watanabe, editor, Frontiers of Pattern Recognition, pp. 265--289. Academic Press, New York, NY, 1972.
[12] J. Buhmann, M. Lades and C. von der Malsburg, "Size and Distortion Invariant Object Recognition by Hierarchical Graph Matching," in IJCNN International Conference on Neural Networks, San Diego, pp. 411-416, IEEE, 1990.
[13] M. Lades, J. C. Vorbruggen and J. Buhmann, "Distortion Invariant Object Recognition in the Dynamic Link Architecture," IEEE Transactions on Computers, Vol. 42, No. 3, Mar. 1993.
[HREF1] ObjectSpace’s Voyager: http://www.objectspace.com/product/voyager
[HREF2] General Magic’s Odyssey: http://www.genmagic.com/agents/odyssey.html
[HREF3] IBM’s Aglets: http://www.trl.ibm.co.jp/aglets/index.html