TOPIC ESTIMATION OF WEB SEARCH TRANSACTION LOG QUERIES USING MONTE-CARLO SIMULATION

Dr. Seda Ozmutlu [HREF1], Assistant Professor, Industrial Engineering Department [HREF2] , Muhendislik-Mimarlik Fakultesi Uludag University [HREF3], Gorukle, Bursa, 16059, Turkey. seda@uludag.edu.tr

Dr. H. Cenk Ozmutlu [HREF4], Associate Professor, Industrial Engineering Department [HREF5] , Muhendislik-Mimarlik Fakultesi Uludag University [HREF6], Gorukle, Bursa, 16059, Turkey. hco@uludag.edu.tr

Dr. Amanda Spink [HREF7], Professor of Information Technology, Faculty of of Information Technology [HREF8] , Queensland University of Technology [HREF9], Gardens Point Campus 2 George St, GPO Box 2434 ah.spink@qut.edu.au

Abstract

A user’s single session with a Web search engine may consist of seeking information on single or multiple topics. Limited research has focused on multitasking search query sessions. The objective of the study is to provide a detailed analysis of multitasking sessions and attempt to identify the topic of subsequent queries. The analysis is not only on which topics the users are interested in, but also from which topics to which topics the users are switching, hence we form topic transition matrices. Using this knowledge, Monte-Carlo simulation is used to identify the topic of upcoming queries. Findings include: (1) the number of topic shifts are small compared to the number of topic continuations in the dataset (2) the most frequently detected topics in the dataset are general information, entertainment and computers, followed by sexual, hobbies, shopping and travel in both portions of the dataset, and (3) Monte Carlo simulation and the use of conditional probabilities for subsequent queries have not performed favorably for topical estimation of subsequent queries.

Introduction

Web users interact with search engines extensively to retrieve information from the Web. Query analysis is essential to understand Web users’ behavior, and is one of the mainstream research directions of Web mining. Many researchers worked on large scaled studies on search engine datalogs, such as Silverstein et al. (1999), Cooley, Mobasher and Srivastava (1999), Spink, et. al., (2000, 2001, 2002a) and Ozmutlu, et al. (2003b, 2003c). Most of the studies are based on statistical or linguistic characteristics of the search queries (Pu, et al, 2002), and the number of studies on content analysis is few, the reason generally being the effort required to manually process the queries for topic identification. However, content analysis is a growing area (Pu et al., 2002).

Some researchers, such as Silverstein et al. (1999) and Spink et al. (2001), have performed content analysis of search engine data logs at the term level, hence analyzed the frequency of terms and term pairs in search engine queries. All these researchers have observed that the highest ranking terms are related to topics of pornography, entertainment and education. In another study, Spink et al. (2002a) have analyzed Excite datasets from 1997, 1999 and 2001 for content and found that human information needs and search content have evolved from 1997 to 2001; users’ interests have shifted from entertainment and pornography to travel and commerce. Ozmutlu, et al. (2004b) have done an hourly statistical and topical analysis of an Excite query log and a FAST query log of about 1 million queries each. They have found that the popularity of topics vary throughout the day. For example, topics such as finance, business and education are more popular during the earlier hours of the day, whereas entertainment and pornography are more prevalent during the evening. Ozmutlu et al. (2004b) and Beitzel, et al. (2004) have also shown that the popularity of some topical categories depend on the hour of the day. Wang, et al. (2003) analyzed a query log from the website of the University of Tennessee at Knoxville and have observed that the vocabulary of Web users is comparatively small, and term and term pairs have similar trends of frequency. However, Wang et al. (2003) mention that such studies are at the lexical level and they intend to carry their studies to a conceptual level by using lexical databases.

At a further level of contextual studies, several studies have proposed query clustering algorithms, where the queries are grouped in several clusters based on their topics. Pu et al. (2002) developed an automatic classification methodology to classify search queries into broad subject categories. They formed a subject taxonomy and fit each search query into one of the categories in the taxonomy. Muresan and Harper (2004) proposed a topic modeling system for developing mediated queries. They performed a statistical analysis of terms in documents available in a source collection and a statistical representation of the lexicographic model of the query. This step is followed by context analysis, which relates topics and terms considering weights, and then developing mediated queries based on the similarity of the terms to specific topics.

Another dimension of topic-related information retrieval is multitasking. Multitasking is performing more than one task simultaneously. In terms of information retrieval, multitasking information seeking and searching processes are defined as ‘‘the process of searches over time in relation to more than one, possibly evolving, set of information problems including changes or shifts in beliefs, cognitive, affective, and/or situational states’’ (Spink et al., 2002b). Spink et al. (2002b) first identified information multitasking processes in four studies conducted within different information environments, including library use. They found that people often seek information in a library for information on more than one information task during a single or multiple library use episodes. Spink (2004) found library use episodes that included up to 17 information task switches. Spink and Park (2005) suggest that multitasking, and information and non-information task switching may be affected by many factors; including; nature and complexity of content in relation to the information seeker’s domain knowledge, amount and depth of information processing required for different information tasks, information seeker’s level of interest, attention and focus in the information task, level of planning and priorities by the information seeker in relation to their information tasks, pros and cons or the effects on effectiveness, efficiency and productivity of information tasks switching, serendipity by the information seeker that is prompted by visual information cues, and task prioritizing.

Through the analysis of transaction logs, Spink, et al. (1999) found that eleven (3.8%) of the 287 Excite users responding to a Web-based survey reported multitasking searches. Shortly after, Spink, et al. (2002b), Ozmutlu, et al (2003a) and Ozmutlu et al. (2003d) found that 11.4%-31.8% of the Excite and FAST search engine users performed multitasking searches, respectively. Spink, et al. (2006) studied the transaction logs of the AltaVista search engine, and observed that 81% of two-query sessions and 91% of three or more query sessions included multiple topics, there is a broad variety of topics in multitasking search sessions, and three or more query sessions sometimes contained frequent topic changes.

Within the concept of multitasking, several researchers have attempted to determine whether the user would attempt to change topics during their search session, hence performed automatic new topic identification. Ozmutlu (2006) applied multiple factor regression to automatically identify topic changes, and showed that there is a valid relationship between non-semantic characteristics of user queries and topic shifts and continuations. Ozmutlu (2006) showed that the non-semantic factors of time interval, search pattern and query position in the user session, as well as the search pattern and time interval interaction, have a statistically significant effect on topic shifts. These results provide statistical proof that Web users demonstrate a certain way of behavior when they are about to make topic shifts or continue on a topic, which is exacerbated when a certain combination of search pattern and time interval occurs. Within this direction, Ozmutlu and Cavdur (2005a) used a new topic identification algorithm that uses Dempster-Shafer Theory (1976) and genetic algorithms, and applied the methodology on Excite data. Ozmutlu and Cavdur (2005b) and Ozmutlu, et al. (2004a) proposed artificial neural networks to automatically identify topic changes, and showed that neural networks successfully provided new topic identification.

The purpose of this study is to analyze multitasking Web search sessions, where multiple topics are integrated into a single search session, and to determine the prevalence and characteristics of multitasking Web searching. Beyond just the topic analysis of the queries, the study seeks to learn from which topic to which topic search engine users are switching, and thus developing a “from topic-to topic” matrix or a topic transition matrix. Such an analysis of queries might be helpful in estimating the topic of subsequent queries in a Web search session, and have not yet been performed in information science literature. In addition, the paper attempts to determine the topics of subsequent queries and the event of a topic shift in a test dataset using Monte-Carlo simulation and the topic transition matrix.

Research Design

The dataset

The transaction log used in the study also comes from the Excite Web search engine. All queries were submitted on May 4, 2001 to the Excite search engine. The entire data set consists of approximately 1.7 million queries. 10,256 queries were selected from the entire dataset using Poisson sampling (Ozmutlu, et al., 2002a) to provide a sample dataset that is both representative of the data set and small enough to be analyzed conveniently. The sample was not kept very large, since evaluation of the performance of the algorithm would require a human expert to go over all the queries. In the Excite transaction log structure, the entries are given in the order they arrive. New user sessions were identified through a user ID and each query is given time stamps in hours, minutes and seconds.

Monte-Carlo Simulation

Monte-Carlo simulation is a static simulation scheme that employs random numbers, and is used for solving stochastic or deterministic problems, where time plays no substantial role (Law and Kelton, 1991). Monte-Carlo simulation is used to solve many problems that are analytically complex. In the Monte-Carlo technique, artificial data is generated via the use of a random number generator and the cumulative distribution of interest (Pegden, et al. 1995). An acceptable random number generator should be used (Pegden, et al., 1995). A reasonable and acceptable random number generator is important, since the random numbers generated are not actually random, but pseudorandom, meaning that random number sequence is actually reproducible (Pegden, et al., 1995). Reproducibility is required to be able to repeat the experiment if necessary. For Monte-Carlo simulation, random numbers are usually generated from Uniform (0,1) distribution, and based on the random number the relevant response is selected. Consider the example of tossing a coin. In order to perform a Monte-Carlo simulation of the coin problem, we draw a random number from U(0,1). If the random number generated is below 0.5, we assign a heads response, otherwise a tails response (or vice versa based on the assignment of the random number range to a response). Monte-Carlo simulation can also be used for solving other analytically complex problems, such as to determine the area of a formless figure.

Methodology

The methodology used in this paper is described as below.

• Processing the transaction logs to determine topics of queries: All the queries in the transaction log were manually analyzed to topic categories. 17 topic categories were used in the analysis. Manual analysis is required to determine the correct topic of the queries. We used the topic categories in Ozmutlu, et al. (2003d) and Spink, et al. (2002b), and the categories are news (category 1), government/politics (category 2), business (category 3), medical (category 4), arts and humanities (category 5), hobbies (category 6), entertainment (category 7), employment (category 8), education (category 9), shopping (category 10), computers (category 11), individual/family (category 12), sexual (category 13), science (category 14), travel (category 15), general information (category 16) and unexplicit (category 17). Then, we determined from which topic to which topic the users have switched to form the “from topic-to topic matrix” or namely a topic transition matrix. In the matrix, we also added the categories “begin” and “end” to include the first and last queries of each session in the analysis.

• Dividing the data into two sets: The transaction log is divided to two almost equal parts for the application of Monte-Carlo simulation. The first half of a dataset is used to determine the conditional probabilities of switching from one topic to another, and the second half is used to test the performance of Monte-Carlo simulation. Hence, the first half of a transaction log is used to make estimates of topic shifts in the second part of the transaction log. In this study, the two halves of the dataset contains equal number of queries, however this is due to pure chance. The two data sections in a transaction log do not necessarily contain the same number of queries, and their size depends on the size of the session in the middle of the dataset. Usually the two portions do not contain an equal number of queries to keep the entirety of the user session containing the query in the middle of the datasets. However, this is not the case in this study. The size of the dataset portions are seen in Table 1.

Table 1: Size of the dataset portions used in the study

Entire dataset 1.7 million queries
Sample set 10,256 queries
1st half of the sample set used for determining the topic transition matrix 5,128 queries
2nd half of the sample set used for estimating the topic of subsequent queries 5,128 queries

• Applying Monte-Carlo simulation: Using the first portion of the dataset, we determine the conditional probabilities for the topic of a subsequent query. Hence, the conditional probability that a subsequent query belongs to a certain category, given that the current query belongs to a certain category, is determined. Then, the conditional probabilities are used to estimate the topic of subsequent queries in a separate dataset, which is the second portion of the sample transaction log in this study. Random numbers are used to estimate the topics of subsequent queries in the second portion of the dataset. The details of this procedure are as follows: The queries were labeled with respect to their topics previously. Using this information, the topic transition matrix is formed for the first half of the dataset. The topic transition matrix can be explained as follows. Consider the following row of data in Table 2 from a certain topic transition matrix.

Table 2: Sample row of data from a topic transition matrix

To topic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 end total
From topic 1 15 0 4 4 2 2 3 2 0 0 0 0 0 2 0 0 2 10 46

This row of the topic transition matrix shows that there were a total of 46 queries which belong to topic category 1, i.e. these queries were on news. 15 out 46 queries were again followed by a query on news. Four out of 46 queries were followed by a query that belonged to topic categories “3” and “4”, hence business and medical. No queries on government (category 2) were made after a query on hobbies. The following step is to create a cumulative topic transition matrix as in Table 3. The cumulative topic transition matrix shows the cumulative number of queries following a certain query.

Table 3: Cumulative topic transition matrix for a sample row of data

To topic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 end total
From topic 1 15 --- 19 23 25 27 30 32 --- --- --- --- --- 34 --- --- 36 46 46

Random numbers are generated using the cumulative topic transition matrix and uniform distribution. The Monte-Carlo simulation is tested on the second portion of the dataset. Let’s assume that we retrieve the first query of the test dataset, and the first query belongs to topic category “1”. In this case, a random number is generated between 0 and 46. If the random number is less than 15, than the subsequent query (2nd query) is labeled as category “1”; if the random number if between 15 and 19, the subsequent query is labeled as category “3”; and so on. If the random number is between 36 and 46, we estimate that the session is about to end. These procedures are applied on the second portion of the transaction log, i.e. the test dataset.

• Validation of the results: The estimated topic categories are compared with the actual categories of the queries, which were previously analyzed by the human expert. Then, the number of correctly and incorrectly labeled queries, as well as the number of correctly estimated topic shifts, is determined to show the performance of the algorithm.

Results

As a result of manual topic analysis, we formed the topic transition matrix for the Excite 2001 dataset, as in Tables 4 and 5. Tables 4 and 5 are the topic transition matrices for the first and second halves of the Excite 2001 transaction log, respectively. Category “Begin” is relevant to the initial query of a session. Hence, the cell of the matrix corresponding to “begin” and “1” shows the number of initial queries that have topic “1“, i.e. news. The similar logic is applied for the category “End”. Category “End” is relevant to the final query of a session. The cell of the matrix corresponding to “1” and ”End” shows the number of final queries that have topic “1“, i.e. news.

The most attention-grabbing result in Tables 4 and 5 is that, the number of topic shifts constitutes a small portion of the dataset (comparable to the previous studies on automatic topic identification (Ozmutlu, et al, 2004a, Ozmutlu and Cavdur, 2005a, 2005b, Ozmutlu, 2006)). 309 queries out of 5128 in Table 4 (first portion of the dataset) and 390 out of 5128 queries in Table 5 (second portion of the dataset) represent a topic shift.

We can also observe that the most frequently detected topics in the dataset are general information, entertainment and computers in both portions of the dataset. Other popular topics include sexual, hobbies, shopping and travel. The topic distribution of the two portions of the dataset seems to resemble each other. In order to test whether this claim is true, we compute the correlation of the topic distributions between the first and second portions of the dataset, and receive a correlation coefficient of 0.89. This shows that the topic distribution of the queries is linearly dependent between the first and second halves of the dataset. The dispersion graph of the topic categories in two portions of the dataset is seen in Figure 1. It is clear that there is a linear relationship between the topic distributions of the datasets. There is one outlier point, which is relevant to category 16; general information. There are more queries on general information in the first portion of the dataset, compared to the second portion of the dataset.

Table 4: Topic transition matrix for the first half of the Excite 2001 transaction log

From topic

To topic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 end total
1 20 0 0 1 1 0 0 0 1 0 0 0 0 0 5 2 0 24 54
2 0 47 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 36 88
3 1 0 108 1 2 0 3 1 1 1 5 0 1 0 2 1 1 72 200
4 0 0 0 67 1 1 1 0 1 2 0 0 1 1 0 1 0 52 128
5 1 0 4 0 149 2 7 0 1 0 3 0 1 1 3 4 0 103 279
6 0 0 2 1 2 156 2 0 1 2 1 0 0 0 1 4 0 129 301
7 0 1 2 2 4 3 392 0 0 2 5 0 1 2 1 10 0 245 670
8 0 1 1 1 1 2 0 131 3 0 1 1 0 0 0 5 0 68 215
9 0 0 0 0 0 1 1 1 51 1 1 0 2 0 0 0 0 41 99
10 0 0 1 3 1 0 1 0 0 238 3 0 4 1 2 7 1 101 363
11 1 0 1 1 1 0 6 2 0 6 410 0 4 1 5 6 1 232 677
12 0 0 0 0 0 0 1 0 0 0 0 5 0 0 0 0 0 8 14
13 0 0 0 1 2 3 4 2 0 0 0 0 332 0 2 1 1 192 540
14 0 0 1 0 0 1 0 0 1 1 0 0 0 128 1 4 1 60 198
15 0 0 1 2 2 1 0 4 0 5 4 0 1 2 228 1 1 134 386
16 1 0 1 2 4 3 5 3 3 5 10 0 6 2 6 482 0 337 870
17 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 2 17 24 46
0 30 39 78 46 109 127 245 71 35 99 233 8 186 60 130 339 23 ---
Total 54 88 200 128 279 301 670 215 99 363 677 14 540 198 386 870 46 5128

Table 5: Topic transition matrix for the second half of the Excite 2001 transaction log

From topic

To topic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 end total
1 54 0 0 0 0 0 2 0 0 1 1 0 0 1 1 0 0 24 84
2 0 64 0 0 0 0 3 0 0 1 1 0 0 1 1 0 0 34 105
3 0 0 132 1 1 0 2 0 3 3 2 0 1 1 3 2 2 59 212
4 0 0 1 77 1 0 5 0 1 2 0 1 1 0 0 5 0 45 139
5 1 0 2 0 204 2 9 1 0 3 3 1 2 0 4 5 0 111 348
6 0 0 0 0 2 142 5 1 1 0 1 0 1 0 3 0 0 64 220
7 5 0 2 4 9 3 524 2 1 2 9 0 8 4 6 9 0 317 905
8 1 0 1 0 1 0 5 102 1 1 4 0 0 0 3 1 0 75 195
9 0 0 2 1 1 2 1 2 53 0 2 0 0 0 1 1 0 30 96
10 0 1 2 2 0 0 5 2 1 191 2 0 1 0 1 1 0 100 309
11 2 0 0 2 5 1 9 2 1 1 438 0 9 3 7 6 1 248 735
12 0 0 1 0 1 0 0 0 0 0 0 18 0 0 0 0 0 10 30
13 0 1 0 0 1 1 4 0 0 1 4 0 328 1 2 3 1 176 523
14 0 0 0 2 1 1 5 0 0 0 6 0 1 156 0 0 0 86 260
15 2 0 2 1 6 2 4 5 0 3 1 0 2 1 258 6 0 159 452
16 0 1 5 2 5 2 8 3 1 1 5 0 1 4 7 248 1 179 473
17 0 0 1 0 0 0 2 0 1 1 0 0 2 0 1 2 15 17 42
0 19 38 61 47 110 64 312 75 32 98 256 10 166 88 154 182 22 ---
Total 84 105 212 139 348 220 905 195 96 309 735 30 523 260 452 473 42 5128

Figure 1: Dispersion graph for the topic category distribution of two portions of the dataset

After applying Monte-Carlo simulation on the second portion of the dataset, we observe the following results. There are 390 topic shifts in the dataset, meaning that after 390 queries the users changed the topic of their queries. In two queries out of 390, the topic and the upcoming of a topic shift were estimated correctly. In 34 queries out of 390, an upcoming topic shift was estimated correctly, but the topic of the subsequent query could not be estimated correctly. In 354 queries out of 390, the upcoming topic shifts could not be estimated correctly (i.e. the algorithm could not estimate whether the user will change topics in the subsequent query). In addition, 291 queries were estimated as topic shifts, whereas they are topic continuations. Unfortunately, the application of Monte-Carlo simulation and the idea of conditional probabilities are not very successful in identifying topic shifts and the topic of subsequent queries. However, this does not show that the idea of conditional probabilities might be unsuccessful used in combination with other estimation parameters. Such combinations are directions of future research.

Conclusion

The objective of the study is to provide a detailed analysis of multitasking sessions, where multiple topics are integrated into a single search session, and to determine the prevalence and characteristics of multitasking Web searching. The current paper also attempts to identify the topic of subsequent queries. The analysis is not only on which topics the users are interested in, but also from which topics to which topics the users are switching, hence a topic transition matrix is formed. The topic transition matrix can be used to calculate the conditional probability on the topic of a subsequent query, given the topic of the current query. Using the conditional probabilities, Monte-Carlo simulation is employed to identify the topic of upcoming queries. The transaction log used in the study comes from the Excite search engine, retrieved in 2001. The dataset is divided to two equal parts, where the first part is used to compute the conditional probabilities of the topic of subsequent queries, and the second part is used to apply Monte-Carlo simulation. Findings include: (1) the number of topic shifts are small compared to the number of topic continuations in the dataset (2) the most frequently detected topics in the dataset are general information, entertainment and computers, followed by sexual, hobbies, shopping and travel in both portions of the dataset, and (3) Monte Carlo simulation and the use of conditional probabilities for subsequent queries have not performed favorably for topical estimation of subsequent queries. However, these results do not show that the idea of conditional probabilities might be unsuccessful used in combination with other estimation parameters. Such combinations are directions of future research.

References

Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D. and Frieder, O. (2004). "Efficiency and Scaling: Hourly Analysis of a Very Large Topically Categorized Web Query Log" in Proceedings of the 27th Inter. Conf. on Research and Development in Information Retrieval, Sheffield, UK, p.321-328.

Cooley, R., Mobasher, B. and Srivastava, J. (1999). "Data preparation for mining world wide web browsing patterns" in Knowledge and Information Systems, v.1, p.5–32.

Law, A.M. and Kelton, W.D. (1991). Simulation Modeling and Analysis, McGraw-Hill, New York

Muresan, G. and Harper, D. J. (2004). "Topic modeling for mediated access to very large document collections" in Journal of the American Society for Information Science and Technology, v.55, p.892–910.

Ozmutlu, S. (2006). "Automatic new topic identification using multiple linear regression" in Information Processing and Management, v.42, p.934–950

Ozmutlu, H.C. and Cavdur, F. (2005a). "Application of automatic topic identification on excite web search engine data logs" in Information Processing and Management, v.41, n.5, p.1243-1262

Ozmutlu, H.C. and Cavdur, F. (2005b). "Neural network applications for automatic new topic identification" in Online Information Review, v.29, p.35–53.

Ozmutlu, H.C., Cavdur, F., Ozmutlu, S. and Spink, A. (2004a). "Neural Network Applications for Automatic New Topic Identification on Excite Web search engine datalogs" in Proceedings of ASIST 2004, Providence, RI, p.310-316.

Ozmutlu, S., Ozmutlu, H.C. and Spink, A. (2003a). "Multitasking Web searching and implications for design" in Proceedings of ASIST 2003, Long Beach, CA, p.416-421.

Ozmutlu, S., Ozmutlu, H. C., & Spink, A., (2003b). "Are people asking questions of general web search engines"in Online Information Review, v.27, p.396-406.

Ozmutlu, H.C., Ozmutlu, S. and Spink, A. (2003d). "A Study of Multitasking Web Search" in Proceedings of the International Conference on Information Technology: Computers and Communications (ITCC 03)

Ozmutlu, S., Ozmutlu, H.C. and Spink, A. (2004b). "A day in the life of Web searching: an exploratory study" in Information Processing and Management, v.40, p.319-345.

Ozmutlu, S., Spink, A. and Ozmutlu, H.C. (2002a). "Analysis of large data logs: an application of Poisson sampling on excite web queries" in Information Processing and Management, v.38, p.473-490.

Ozmutlu, S., Spink, A. and Ozmutlu, H. C. (2003c). "Trends in multimedia web searching: 1997-2001" in Information Processing and Management, v.39, p.611-621.

Pegden, C.D., Shannon, R.E. and Sadowski, R.P. (1995). Introduction to Simulation using Siman, McGraw-Hill, New York

Pu, H. T., Chuang Shui-Lung and Yang, C. (2002). "Subject categorization of query terms for exploring web users' search interests" ,in Journal of the American Society for Information Science and Technology, v.53, p.617–630.

Shafer, G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton University Press.

Silverstein, C., Henzinger, M., Marais, H. and Moricz, M. (1999). "Analysis of a very large Web search engine query log" in ACM SIGIR Forum, v.33 n.1 p.6-12.

Spink, A. (2004), "Multitasking information behavior and information task switching: an exploratory study" in Journal of Documentation, v.60 n.4 p.336-45.

Spink, A., Jansen, B.J. and Ozmultu, H.C. (2000). "Use of query reformulation and relevance feedback by Excite users" in Internet Research: Electronic Networking Applications and Policy, v.10, p.317-328.

Spink, A., Jansen, B.J., Wolfram, D. and Saracevic, T. (2002a). "From e-sex to e-commerce: Web search changes" in IEEE Computer,v.35 n.3, p. 133-135.

Spink, A., Ozmutlu, H.C.and Ozmutlu, S. (2002b). "Multitasking information seeking and searching processes" in Journal of the American Society for Information Science and Technology, v.53 n.8 p.639-652.

Spink, A. and Park, M. (2005), "Information and non-information multitasking interplay", in Journal of Documentation, v.61 n.4 p.548-554.

Spink, A., Park, M., Jansen, B.J. and Pedersen, J. (2005). "Multitasking during Web search sessions" ,in Information Processing and Management, v.42, p.264–275.

Spink, A., Wolfram, D., Jansen, B.J. and Saracevic, T., (2001). "Searching the Web: The public and their queries" in Journal of the American Society for Information Science and Technology, v.53 n.2, p.226–234.

Wang, X., Mohanty, N. and McCallum, A. (2005). "Group and topic discovery from relations and text" in Proceedings of the 11th ACM SIGKDD International conference on knowledge discovery and data mining workshop on link discovery: Issues, approaches and applications (LinkKDD-05), Chicago, IL, USA p. 28–35.

Hypertext References

HREF1
http://www20.uludag.edu.tr/~seda
HREF2
http://www20.uludag.edu.tr/~endustri
HREF3
http://www.uludag.edu.tr
HREF4
http://http://www20.uludag.edu.tr/~hco
HREF5
http://www20.uludag.edu.tr/~endustri
HREF6
http://www.uludag.edu.tr
HREF7
http://sky.fit.qut.edu.au/~spinkah/
HREF8
http://www.fit.qut.edu.au
HREF9
http://www.qut.edu.au

Copyright

<Seda Ozmutlu, H. Cenk Ozmutlu and Amanda Spink>, © 2006. The authors assign to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.