A Framework for Efficient Trust and Provenance Representation in Knowledge Bases

Info: 3091 words (12 pages) Example Research Project
Published: 26th Nov 2021

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

1 Background

Establishing provenance and trust for acquired knowledge within a knowledge base (KB) has been a critical challenge for many researchers. Investigating the value of the knowledge within a knowledge base, based on its trustworthiness, is an important metric of measurement for the usefulness of a knowledge base. While trust acquisition is an important area of concern, the other area that is getting increasing amount of attention is the storage and retrieval protocols for trust and provenance within a KB.

A system which uses knowledge representation (KR) to persist acquired knowledge, so that it can reason with that knowledge, typically uses ontology languages such as Resource Description Framework (RDF), Web Ontology Language (OWL), and other Description Logics (DL) languages, to do so. Typically, provenance and trust information about the knowledge within a knowledge base is also encoded as a part of it. Some of the existing, established, techniques rely on meta-knowledge (i.e. knowledge about knowledge) attached to various elements of the knowledge base, such as ontologies, and axioms, to store trust and provenance information about them (Dividino, Schenk, Sizov, & Staab, 2009). Presently, KR schemas and language structures are considered inefficient when storing such meta-knowledge for later update and querying by reasoners (McGlothlin & Khan, 2010). Some ontologies use schema annotations to store such meta-knowledge within a knowledge base (Dividino, Schenk, et al., 2009; Kro¨tzsch, Marx, Ozaki, & Thost, 2018). However, others disagree with using such annotations due to the potential of introducing ambiguity with other elements of the schema (such as properties) (Bock et al., 2012). Some ontologies, such as the ones based on RDF, do not support attaching annotations within their schema (Zimmermann et al., 2005). Further research is deemed necessary to quantify their performance and capabilities in large, scaled out, complex, domain agnostic KBs based on different types of ontologies (Dividino, Schenk, et al., 2009; Dividino, Sizov, Staab, & Schueler, 2009; Khan, Qadir, Abbas, & Afzal, 2017; McGlothlin & Khan, 2010). There is also a strong case for identifying alternate methods for trust and provenance representation within KBs, which can perform better and is adaptable to different types of ontologies.

This research project will aim to identify and baseline inefficiencies within the current methods and techniques of trust and provenance representation, while also providing an alternate framework to store, query and manage such information within a knowledge base.

In addition, it will also provide a comparative analysis between the efficiencies within the proposed alternate framework, and existing methodologies.

2 Problem Statement

As detailed above, it is yet to be proven that a consistent ontology agnostic method exists for holding meta-knowledge, such as trust and provenance information within knowledge bases. It is also to be proven that currently established techniques can scale to the requirements of large KBs, which are becoming increasingly prevalent in various domains.

More specifically, the following research questions need to be addressed:

Are the current methodologies and techniques sufficient to address the ever growingneed for larger, scaled out knowledge bases annotated with trust and provenance information?
Can a consistent ontology agnostic method be identified for storing, querying andmanaging trust and provenance information within such KBs?

3 Objectives

The long term goal of this research project is to develop a consistent, scalable, ontology agnostic method for managing trust and provenance information within a knowledge base. The objective of this study is to provide a comprehensive review of the literature and practice of existing methodologies for the purpose of provenance and trust management within a KB, and to outline an alternate framework for the same.

In particular, the following sub-objectives will be a part of this study:

To provide a comprehensive review of current methodologies and techniques available to represent trust and provenance information within a KB.

To develop an alternate method for representing, querying for and managing metaknowledge, such as trust and provenance information, within knowledge bases, that is ontology agnostic.

4 Methodology

The primary research method for this study will be a comprehensive literature review of existing methodologies in a specific domain, and conceptual outlining of the new methodology that is being proposed. By choosing a specific domain, such as clinical diagnosis support, consistent data can be acquired for building the knowledge base for the study, and a clear baseline can be established for comparative analysis. OWL2 schema, as recommended by some researchers, such as Antoniou and van Harmelen (2009), due to its relatively better expressiveness over RDF/S (RDF schema), will be used to construct ontologies for knowledge bases used in the study.

The study will first review existing literature on provenance and trust representation in general. Then, a more focused literature review will be conducted related to provenance and trust representation in clinical diagnosis support - the selected domain for the study specifically around knowledge bases which use OWL2 based ontologies. The outcomes of the review is expected to provide the following information:

Existing methodologies used in the domain specific knowledge bases for provenance and trust representation.
Performance benchmarks for queries executed against knowledge bases which use these methodologies for representation of trust and provenance.

Based on the outcomes of the review, a knowledge base, based on OWL2 ontology, will be constructed using domain specific data acquired using a combination of one ore more of the following methods:

Paper or electronic medical records
Clinical data collection forms
Electronic data capture (EDC) systems
Patient surveys

In the second stage of this study, a copy of the constructed knowledge base will be encoded with provenance and trust information, using existing methodologies derived from the outcomes of the literature review. Key Performance Indicators (KPIs), such as accuracy in knowledge retrieval when uncertainty and probabilistic queries are used, performance of the knowledge base and reasoners when such queries are executed, will be derived using existing methodologies, and will be set aside for comparative analysis in the later stages of the study.

Next, a conceptual outline for a new framework will be proposed and detailed in the form of a design specification. The framework will focus on the following key areas of concern:

A trust representation that can be managed, and queried independently of the knowledge representation.
A method to query and combine trust and provenance information with the represented knowledge, to provide a single view of the knowledge base.
Ability to propagate trust and provenance updates in the knowledge base efficiently and immediately.
A method to extract 'explanations' for results returned by queries, based on provenance and trust information in the trust representation.

In the final stage of the study, the proposed framework will be implemented within a knowledge base, and a comparative analysis of the KPIs, between the existing and the new methodologies, will be recorded and will be made a part of a final research report detailing the following:

A review of current literature on trust and provenance representation within knowledge bases, focused on a specific domain, clinical diagnosis support, and specific ontology language, such as OWL2.
A conceptual outline for a new framework for representing provenance and trust information within a knowledge base.
A comparative analysis of KPIs collected using existing methodologies and implementation of the new framework.

5 Further Considerations

Due to the nature of data that will be handled by the project team during the course of this study, the team will be fully compliant with academic, professional and ethics norms.

Clinical data, which will be a part of the knowledge base upon which the research will be conducted, will be acquired with open and full knowledge of the research participants and the organisations associated with the collection of the data - such as health services, surgeries and clinics.

All research participants, of whom data will be collected as a part of the study, will be provided with sufficient information about the project - including its purpose and goals. Appropriate level of anonymisation of data will be applied in order to remove any personally identifiable information about the participants.

As the project leader, I will be responsible for:

Ensuring that consent is received from the research ethics committee before commencing data collection from research participants.
Ensuring that regular audit of data access and data collection methods are completed during the lifetime of the study.
Ensuring that, where possible, social, personal and professional biases are removed from data collection or data analysis performed during the course of the study.
Ensuring that sufficient explanation has been given to the research participants about how their data will be used during the course of the study.
Ensuring that all participants have been given a choice to not be a part of the research and that their data is not collected, and that their participation is completely voluntary.
Ensuring that any personally identifiable information about the participants, where not required, is retracted and should the participants withdraw their consent to use of their data at any point of time during the study, that their data is immediately removed from the knowledge base.

6 Risk analysis

Table 1 details the risk assessment conducted for this study. It covers risk analysis across the following, broad, areas of concern:

Type

Description

Probability

(1-3)

Impact

(1-3)

P-I Score

Mitigation

Financial

Funding issues due to unavailability of matching funds, last minute budget changes, funding delays, or failure to deliver promised funds.

Ensure that proposal is submitted and approved. Ensure regular meetings with funders to identify potential funding issues. Schedule regular project meetings with the project team.

Financial

Project budget overspend

Schedule regular discussions on monthly finance reports. Designate budget holders. Ensure overspends are reviewed and sanctioned. Ensure all expenditure is strictly monitored.

Financial

- Legal and Ethical

- Methodological

- Impacts on timeline

Table 1: Risk Register

ID	Type	Description	Probability (1-3)	Impact (1-3)	P-I Score	Mitigation
3	Legal	Unintended compliance / GDPR issues with the clinical data acquired for research leading to legal and financial penalties.	1	3	3	Ensure appropriate data security is applied to acquired data. Ensure strict anonymisation of data to purge personally identifiable information. Ensure auditing of access to clinical data.
4	Methodology	Limited or low quality response to surveys and clinic data collection forms.	1	2	2	Ensure regular review of data collection methodologies. Implement a strategy to raise awareness among interest groups prior to commencement of data collection. Ensure alternate forms of data collection where appropriate (such as paper, online, emails etc.). Ensure broad sampling to avoid biases.
5	Delays	Rejection of the project, or amendment of project scope by the research ethics committee.	1	3	3	Ensure sufficient input is sought from the ethics committee before commencement of the project. Implement recommendations from the ethics committee members where appropriate and resubmit project proposal if required.

7 Project Plan

An outline project plan has been provided in Figure 1, detailing the estimated timelines for various phases with key milestones identified. In summary, the following milestones have been identified for the study:

Milestone	Estimated Date
Project Kick-Off	01/01/2022
Completion of a comprehensive literature review	29/04/2022
Completion of clinical data collection	01/09/2022
Preparation of conceptual outline for the new framework	01/11/2022
Knowledge base preparation and KPI collection	08/12/2022
Submission of final report	19/12/2022

8 Conclusion

Provenance and trust acquisition, and their representation can be considered as two sides of the same coin when it comes to building efficient, and complete ontologies and knowledge bases. While numerous researchers have attempted to solve the problem of efficient trust management and representation, a generic, ontology agnostic methodology is still considered a challenge which requires further research.

Using a specific domain as a baseline, this study will propose a novel framework to represent trust and provenance information within a knowledge base which can be managed and queried independent of the underlying knowledge, but referentially integrated with it. Additionally, the framework will also propose a method for querying for "explanations" about provenance and trust information within a knowledge base.

References

Antoniou, G., & van Harmelen, F. (2009). Web Ontology Language: OWL. In Handbook on ontologies (pp. 91–110). Berlin, Heidelberg: Springer Berlin Heidelberg. Retrieved from http://link.springer.com/10.1007/978-3-540-92673-3{\ }4 doi: 10.1007/ 978-3-540-92673-3 4

Bock, C., Fokoue, A., Haase, P., Hoekstra, R., Horrocks, I., Ruttenberg, A., ... Smith, M. (2012). OWL 2 Web Ontology Language - Structural Specification and Functional-Style Syntax (Second Edition) (Tech. Rep.). Retrieved from http://www.w3.org/TR/2012/PER-owl2-syntax-20121018/

Dividino, R., Schenk, S., Sizov, S., & Staab, S. (2009). Provenance, trust, explanations and all that other meta knowledge (Vol. 23; Tech. Rep. No. 2).

Dividino, R., Sizov, S., Staab, S., & Schueler, B. (2009). Querying for provenance, trust, uncertainty and other meta knowledge in RDF. Journal of Web Semantics, 7(3), 204–219. doi: https://doi.org/10.1016/j.websem.2009.07.004

Khan, S. A., Qadir, M. A., Abbas, M. A., & Afzal, M. T. (2017, jun). OWL2 benchmarking for the evaluation of knowledge based systems. PLoS ONE, 12(6). doi: 10.1371/ journal.pone.0179578

Kro¨tzsch, M., Marx, M., Ozaki, A., & Thost, V. (2018). Attributed description logics: Reasoning on knowledge graphs (Vol. 2018-July; Tech. Rep.). doi: 10.24963/ijcai.2018/ 74310

McGlothlin, J. P., & Khan, L. (2010). Efficient RDF data management including provenance and uncertainty. In Acm international conference proceeding series (pp. 193–198). doi: 10.1145/1866480.1866508

Zimmermann, A., Hogan, A., Luk, G., Polleres, A., Straccia, U., & Decker, S. (2005). 1 A Need for RDF Annotations. Framework(Zimmermann, A., Hogan, A., Luk, G., Polleres, A., Straccia, U., & Decker, S. (2005). 1 A Need for RDF Annotations. Framework, 1–5.), 1–5. Retrieved from http://www.w3.org/2005/Incubator/ssn/charter2http://www.w3.org/2005/Incubator/urw3/111