Building a Semantically Rich Legal Case Repository in OWL

Yueting Shen [HREF1], Research Assistant, Department of Computer Systems, Faculty of Information Technology [HREF2], University of Technology, Sydney [HREF3] PO Box 123, Broadway NSW 2007, Australia. ritashen@it.uts.edu.au

Dr. Robert Steele [HREF5], Associate Professor, Department of Computer Systems, Faculty of Information Technology [HREF2], University of Technology, Sydney [HREF3] PO Box 123, Broadway NSW 2007, Australia. rsteele@it.uts.edu.au

John Murphy [HREF4], PhD Candidate, Department of Computer Systems, Faculty of Information Technology [HREF2], University of Technology, Sydney [HREF3] PO Box 123, Broadway NSW 2007, Australia. John.E.Murphy@it.uts.edu.au

Abstract

The retrieval of conceptual information from legal documents depends on the construction of a knowledge representation of the document. A number of interesting research works on legal case knowledge representation have been proposed including frame–like structures, semantic nets and dimensions. However some limitations exist in these works. For instance, some render little inferencing capabilities, some ignore contextual information essential to conceptual retrieval and some give no consideration to semantic interoperability. Our work addresses these limitations by using an open standard ontology language and a refined ontology architecture. Our ontology is easy to maintain, reuse, extend and renders rich inferencing and reasoning capabilities. In addition, a framework is proposed in our work in order to integrate heterogeneous knowledge representations and to reduce the manual effort required for the annotation process by enabling semi-automation of the annotation process.

Key words

Legal case, ontology, conceptual retrieval, OWL, knowledge representation

1 Introduction

The improvement of legal on-line publishing services through more effective and efficient methods of legal knowledge management is becoming increasingly important with the rapid accumulation of legal documents and the increasing availability of on-line legal information services. Most existing on-line legal publishing systems store information in plain text. Without the semantics being well understood by applications, these systems usually rely heavily on the presence of words rather than the concepts standing behind these tokens. As a result, these systems retrieve too much irrelevant data or fail to identify all the relevant documents in their database [1]. To enable applications to intelligently search information, the underlying semantics of documents have to be understood by applications in certain ways. This requires effective knowledge representation methods particularly for the legal domain. Many interesting approaches [2-11] have been proposed to address the representation issues for legal cases, legal statutes or legal knowledge in general. Nevertheless, our interest concentrates on those approaches particularly targeting legal cases representation, such as the semantic net based conceptual structure proposed in [2], the quadrant structure developed in [3 4], the frame-like structure used in [5 6 7], and the dimensions or vectors used in [8 9]. However, there are some limitations with these approaches, i.e., [5 6] render little inferencing capabilities; [2-4 8 9] give no consideration to web associated issues such as semantic interoperation of heterogeneous source and [7] ignore certain contextual information for conceptual searching.

This paper presents an approach for building a legal case knowledge base. The open standard language OWL (Web Ontology Language) is used to model the legal case. A framework for mapping other data models to the proposed legal case model is also proposed to address the integration of heterogeneous knowledge representations from distributed web sites. The proposed approach aims at web documents and in particular is motivated by Berners-Lee's vision for the future of the World Wide Web—the semantic web.

The structure of this paper is organized as follows: Section 2 introduces the background knowledge; Section 3 presents the approach applied for designing the legal case ontology; Section 4 describes the process of case repository population; Section 5 demonstrates how and why the developed ontology can enable semantic searching capabilities and discusses some remaining issues.

2 Semantic Web and Legal Case Representation

The Semantic Web is an extension of the World Wide Web, in which web information is expressed in a machine processable form [12]. The most important part of the Semantic Web is its markup language that includes Resource Description Framework (RDF), RDF Schema (RDFS) and Web Ontology Language (OWL). RDF is a data model where Web resources are defined as triples (subject, predicate, object) [13]. RDFS is a vocabulary for describing properties and classes of RDF-based resources. OWL, formally recommended by W3C on 2004 [18], is a relatively new knowledge representation language. OWL builds on RDF and RDFS and facilitates greater machine interpretability of Web content by providing additional vocabulary along with a formal semantic [14]. OWL’s semantics derives from description logics therefore OWL has attractive and well-understood computational properties [18].

In the legal case representation field, a number of approaches [5-7] have been proposed based on certain web markup language. Both SALOMON [6] and JUSTICE [5] proposed a data model using DTD (Document Type Definition) in their legal case knowledge base. The concepts such as jurisdiction, judge, parties and facts are extracted from case head-notes or abstract to form a case descriptor. The case knowledge base is composed of marked up XML or SGML instances based on the proposed DTD. However, their data models that are built on the DTD can only achieve limited semantics annotation due to the fact that XML only provides a surface syntax for structured documents, but imposes no semantic constraints on the meaning of these documents [14]. On the other hand, Adam’s approach [7] uses OWL to model legal case for case-based reasoning. Similar to many case-based reasoning approaches, the model uses legal issues and factors to represent a legal case. However, many contextual features essential to conceptual searching are missing in the modeling process. The approach made little attempt to text concepts analysis, which is a major stumbling block in developing conceptual retrieval systems [15].

3 Legal Case Ontology Design

The model created by our approach encodes both domain dependent and domain independent semantics into the case text. Contextual features are well captured therefore richer information is disclosed to applications to enable conceptual searching. In addition, OWL is used in coding our model in considering that it is an open and standardized web ontology description language and it has a rich set of available inferencing mechanisms. This section details the four steps used in our approach in building a legal case ontology.

3.1 Project Motivation and Ontology Scope

One of our work’s requirements is to generate marked-up case reports that target electronic storage and automated or semi-automated production of paper-based reports. To enable the generation of paper-based reports, the case ontology needs to cover surface information of a case, such as the dates, the title, the roles, the decisions and etc. Furthermore, the case ontology is also required to facilitate intelligent information searching of legal cases through extracting deep semantics contained in legal case report.

3.2 Knowledge Source Selection

Legal case report is a major method to record court events or legal cases in the legal system. The content of precedent cases is first manually recorded. Then a summary (also called head-note) is added and the content is reorganized by legal professionals to form a case report for paper-based print. The manually added case head-note represents the points that the legal professionals consider to be important. For instance, it may consist of case name, jurisdiction, hearing dates, parties, judges, legal representatives, cited cases, legislatives and decision summary. Under a standard scenario, readers often use the head-note as a guide to identify those being of their interesting. Therefore, our case ontology will treat the head-note as one of our major knowledge sources. On the other hand, legal case is the practice of law applied to resolve disputes among parties. The rights and the duties of the roles in which the parties took under a particular circumstance are the focus of the parties’ disputes. The arguments of the case are often surrounding the statutes that define those rights and duties in terms of the role relationship [2]. As such, statutes are another important knowledge resource for our case ontology.

3.3 Ontology Partitions

Instead of using one monolithic ontology, breaking an ontology into several relatively loosely-coupled sub-ontologies can bring many benefits such as low maintenance and high reusability. A design maximizing the modularity and reusability enables a good sharing across different domains as well as within a domain.

As mentioned in section 3.1, legal cases are used to describe how legal rules are applied to resolve disputes between parties. Cases are usually categorized by legal issues (see http://www.thomson.com.au for an example). Legal issues are the focus of both cases and statutes. The issues, together with the held on the issues, are thus important parts of a legal case. This indicates the need of a separated ontology to encode legal issues besides the legal case ontology. Nevertheless, the legal issue ontology in this paper has been simplified and only focuses on the hierarchy of legal issues within each category of legal statutes.

In addition, legal role relations are the foundation of case events since all the disputes that are raised surrounding the role duties and rights in a legal case [2]. In most cases when a user tries to search the cases with a specific topic, the role pairs that are associated with the parties in a specific case situation, such as seller and consumer, employer and employee, landlord and tenant, are apparently more meaningful than the parties’ name. Therefore, integrating legal role relationships into the ontology will no doubt contribute to an intelligent information search. This kind of information however, is not explicitly disclosed in most of case reports, due to the fact that the roles of two parties are often married with plaintiff and defendant (or claimant and respondent). In addition, this kind of information is used to describe the relationship associated with social events and therefore can be treated independently from legal cases. A separate legal role ontology also enables potential reusability of this kind of information (i.e., reusing role relationships in a statute ontology).

In order to reveal deep semantics embedded in case text, a mechanism for encoding linguistic semantic relationships with legal terminologies is needed. A semantic relationship ontology is proposed to interpret the semantics embedded in the statements of case fact and orders. Furthermore, we also consider that the possibilities of reusing existing ontologies by, for instance, importing Wordnet into the semantic relationship ontology and importing a people ontology and an organization ontology into the legal role ontology.

Figure 1 shows the relationship among these ontologies. The case ontology acts as a primary ontology while others are secondary ontologies.

ontology architecture

3.4 Conceptual Structure Construction

The construction of the conceptual structure is a step that includes extracting concepts, identifying classes and properties, and building class relationship.

3.4.1 Extracting Concepts

In order to identify the concepts for the aforementioned ontologies, we group concepts into three categories according to the semantics types and the source of knowledge commitment. They are context related concepts, domain specific semantically related concepts, and generic semantically related concepts. The first category includes those concepts representing surface features or other features derived from legal case reports. The second category consists of those concepts with domain-specific semantics. The third category includes the concepts for interpreting the linguistic semantics in a case report statement.

One straightforward way to identify the first category of concepts is to use our handy knowledge resource, namely a case report repository. Some concepts such as case title, jurisdiction, legislation, decision, citation, judge, lawyer, plaintiff, and defendant, can be extracted directly from legal case report. However, other concepts such as supportable, unsupportable, favorable, unfavorable, implied, and negligent and etc., can not be directly extracted from case report unless using conceptual deducing and abstracting approaches.

The concepts belonging to the second category are drawn mostly from the statutes applied in legal cases. This kind of information is used to express the legal issues and associated legal role relationship under a concrete dispute situation. Each statute refines the rights and duties of legal roles in regarding to specific legal issues. The legal roles might shift the focus from one issue to another. This is common when multiple legal issues are raised in one case. To capture this kind of concepts, the structure of the legal statutes categories and their sub issues need to be defined and the related roles for these issues also need to be extracted. In banking services for example, these concepts include banking, liability of bank, duty of bank, duty of confidential, duty of notification, drawer, holder, and endorser.

In order to extract the third category concepts, we represent a legal statement in a form of triple: {Subject; Relation; Object}. This triple connects any two concepts with a certain semantic relationship. In legal domain area, most important statements are relevant to the judgment of a matter. The value of the judgment is usually described using pairs of contradictory extremes such as true or not true, guilty or not guilty. Such negative and positive relationships are identified as those of significant legal importance. Therefore, they are encoded in the semantic relationship ontology. In addition, the semantic relationship ontology can also import existing ontologies such as Wordnet (http://wordnet.princeton.edu/) to reuse the semantic relations between verbs, nouns and adjectives such as synonym, hyponyms and related terms, which can be used to assist a user in the process of constructing a query or to automatically interpret the user’s query.

3.4.2 Class and Property Identification

A class provides an abstraction mechanism for grouping resources with similar characteristics [16], whilst a property is often used to identify the non-hierarchical relationships between domain and range (denoted as fomula1). OWL defines two types of properties: data property and object property. Data property is an alias of attribute while object property is a binary relationship between two classes.

There are some concepts that can be represented either by a class or by a property. For example, the concept "lawyer" can be identified as a lawyer class (fomula2 ) or an object property of the class plaintiff (fomula3 ). If we are more interested in the relationship (a plaintiff has a lawyer) than other details of the lawyer, then representing the concept lawyer as a property is sufficient and therefore preferred. Such a choice depends on whether the focus of interesting is at the concept level or the relationship level.

Figure 2 is a fragment of our case ontology. Concepts such as title, judge, jurisdiction, case and statute citation are represented by data properties. The relationships among case class and other classes such as legal role, order and legal issue are represented through object properties.

owl model picture

3.4.3 Class Taxonomy

Class taxonomy is the hierarchy of super-class and sub-class relationship. Three approaches for building the taxonomy are proposed by Uschold and Grüninger [17]: top-down, bottom-up and middle-out. In our approach, we use a top-down approach to identify the legal issues structure.

Figure 3 shows a part of legal issue taxonomy. Legal issue can be decomposed into contract, insurance, agency, banking & finance, intellectual property and so on. Each of these sub-issues can also be decomposed into finer issues. Intellectual property issue is further refined as copyright, layout, trademark, paten issues in this taxonomy.

legal issue taxonomy

3.4.4 Implementation in OWL

Protégé (http://protege.stanford.edu/) is used as the authoring tool for coding our ontologies in OWL. A partial definition of OWL is given in Figure 4. In this fragment, three ontologies are imported by the case ontology; some properties such as affirmedBy, approvedBy, hasIssue, hasJudgment and contains are explicitly defined.

model output

4. Population of OWL Repository

Building a legal case OWL repository based on our legal case ontology involves annotation and storing processes. The capability of converting from one data model to another also requires a mapping process to enable the integration of heterogeneous knowledge representations. In addition, we take into account the need for utilizing the structure information already contained in the SGML tagged case reports in our existing repository. As such, the mapping process in our framework can achieve the two needs in one goal.

Figure 5 illustrates an overview of our framework that integrates the aforementioned three processes. The annotation process and the mapping process are two parallel processes. The annotation process takes plain legal case texts as input and annotates the plain texts into OWL instances based on our legal case OWL ontology. The mapping process takes tagged legal case texts based on the input ontology and automatically transferring them into OWL instances through a translation layer. The mapping from one data model to another needs to be manually conducted but is a build-once process. The storing process stores the OWL instances into a knowledge base on which an SQL based query or XQuery can be performed. Please note that the framework is generic to the applications using mark-up languages as data model languages so it is not limited to legal case data models.

framework

5 Discussion

The case knowledge base built on the developed case ontology can facilitate conceptual information retrieval. Figure 6 presents a part of an OWL instance for the case "Hunter BNZ Finance LTD V C G Maloney PTY LTD And Others" [1] In this example, more accurate results can be achieved when searching is conducted on surface features such as parties, title and citations. The finely encoded semantics also enable more intelligent searchings. For example, through the encoded role relations, the OWL instance given in Figure 6 can differentiate various situations (i.e., cheque holder v bank or cheque endorser v bank) with which parties are bound. In case of a vague query with keywords "bank, cheque, and negligence", a user can also locate exact situations such as a bank is accused of negligence on checking a forged cheque or the claim, bank is negligence with processing a cheque, is not held to be true by the court (implying that it might be the holder’s negligence resulting in a loss).

In addition, our legal case ontology is designed for web systems particularly semantic web systems. The features inherent in OWL such as extensibility, reusability, interoperability and open standard render many benefits to our ontology. Our legal case ontology can be extended easily to adapt to frequently updating legal knowledge. The ontology can be free, publicly accessible and is also capable of web sharing and web heterogeneous merging.

Some issues need to be noted. Firstly, the text has to be structured in certain ways to enable automatic or semi-automatic annotation. This implies that the knowledge fragment containing each meta-tag pair has to be individually defined by legal professionals. While such a manual identification gives high quality, it requires extensive expert knowledge and work. Some techniques such as natural language processing and machine learning can be used to make this process semi-automatic in order to assist human experts. Secondly, the introduction of meta-information adds to the computational complexity of our data model when the granularity of semantics increases. On the other hand, finer-grained semantics can provide broad knowledge coverage. Therefore, a trade-off between the granularity of semantics and the computational complexity is always needed.

owl instance

6 Conclusion

This paper presents a new approach for building a legal case knowledge base. By encoding both domain-dependent semantics and domain-independent semantics into our legal case ontology, the legal case knowledge base is capable of providing more accurate results than that of traditional systems and enables intelligent information retrieval. Furthermore, using OWL as our ontology representation language, our approach is designed for web based systems particularly the semantic web. Our case ontology is easy to maintain, reuse and extend. It also enables merging of heterogeneous knowledge representation using the proposed framework. The proposed approach can be an alternative to overcome poor performance problems associated with existing legal knowledge on-line searching systems.

Acknowledgements

This research is supported by Council of Law Reporting for NSW and funded by Australian Research Council Linkage Grant LP0562623.

References

[1] Zeleznikow, J. and D. Hunter. (1994). "Representation and Reasoning in Law" in Computer Law Series 13, 1994.

[2] Hafner, C. D. (1987). "Conceptual Organization of case law knowledge bases" in Proc. 1st International Conference on Artificial Intelligence and Law, Boston, 1987. Available online [HREF1]

[3] Gelbart, D. and Smith, J. C. (1991). "Beyond Boolean Search: FLEXICON, a Legal Tex-Based Intelligent System" in ICAIL, pp. 225-234, 1991.Available online [HREF2]

[4] Gelbart, D. and Smit, J. C. (1993). "FLEXICON: An Evaluation of a Statistical Ranking Model Adapted to Intelligent Legal Text Management" in ICAIL pp. 142-151, 1993.Available online [HREF3]

[5] Osborn J. and Sterling, L.(1999). "JUSTICE: A Judicial Search Tool Using Intelligent Concept Extraction" in ICAIL, pp. 173-181, 1999. Available online [HREF4]

[6] Moens, M. F. , Uyttendaele, C. and J. Dumortier. (1997). "Abstracting of legal cases: the SALOMON experience" in Proceedings of the 6th international conference on Artificial intelligence and law, pp.114-122, 1997. Available online [[HREF5]]

[7] Wyner, A. (2007). "An Ontology in OWL for Legal Case-based Reasoning" in Atkin-son, K. (ed.) JURIX Workshop on Modelling Legal Cases, 2007.

[8] Ashley, K. (1992). "Case-based Reasoning and Its Implication for Legal Expert Systems" in Artificial Intelligence and Law , pp. 113-208, 1992.

[9] Salton, G. and McGill, M. (1983). "Introduction to Modern Information Retrieval" in McGraw-Hill, 1983.

[10] Stamper, R. K. (1980). "LEGOL: Modelling Legal Rules by Computer" in Computer Science and Law, Bryan Niblett (ed.) Cambridge University Press, Cambridge, UK, 1980.

[11] ] Stamper, R. K. (1991). "The Role of Semantics in Legal Expert Systems and Legal Reasoning" in Ratio Juris, Vol. 4, No. 2, pp.219-244, 1991.

[12] Berners-Lee, T. Semantic Web Road map. Available online [HREF6 ]

[13] Dumbill, E. "The Semantic Web: A Primer". Available online [HREF7 ]

[14] OWL Web Ontology Language Overview, W3C Recommendation 10 February 2004. Available online [HREF8]

[15] Dick, J. P. (1991). "Representation of Legal Text for Conceptual Retrieval" in ICAIL, pp. 244-253, 1991.Available online [HREF9]

[16] OWL Web Ontology Language Reference. W3C Recommendation 10 Febrary 2004. Available online [HREF10]

[17] Uschold, M. and Gruninger, M. (1996). "ONTOLOGIES: Principles, Methods and Applications" in Knowledge Engineering Review, vol. 11, No. 2, pp. 93-155, 1996. Available online [HREF11]

[18] WIKI page. Available online [HREF12 ]

Hypertext References

HREF1
http://portal.acm.org/citation.cfm?id=41735.41740
HREF2
http://portal.acm.org/citation.cfm?id=112646.112674
HREF3
http://portal.acm.org/citation.cfm?id=158994
HREF4
http://delivery.acm.org/10.1145/330000/323792/p173-osborn.pdf
HREF5
http://www.law.kuleuven.ac.be/icri/publications/43AILAW.pdf
HREF6
http://www.w3.org/DesignIssues/Semantic.html
HREF7
http://www.hotcoding.com/xmls/metadata/33048.html
HREF8
http://www.w3.org/TR/owl-features/
HREF9
http://portal.acm.org/citation.cfm?id=112676
HREF10
http://www.w3.org/TR/owl-ref/
HREF11
http://citeseer.ist.psu.edu/cache/papers/cs/3214
HREF12
http://en.wikipedia.org/wiki/Web_Ontology_Language

Copyright

<Yueting Shen, Robert Steele and John Murphy>, © 2008. The authors assign to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.


[1] 18 NSWLR 420