Data integration is the process of facilitating the access to data residing at multiple heterogeneous data sources, and providing users and applications with a unified view of these data. Data integration is crucial in enterprises that store and exchange data in disparate and incompatible formats, and is one of the big challenges of the World Wide Web, where millions of heterogeneous data sources are available. This book contributes to different aspects of the design of modern data integration systems in the context of the Web. On one hand, the work contributes to the Semantic Integration research trend, which refers to the problem of reconciling data from autonomous sources using ontologies and other semantic-based tools. The work suggests a novel solution to XML-RDF semantic integration and also contributes to the problem of Ontology Alignment, defining a rigorous and scalable semantic similarity measure for RDF labelled directed graphs. On the other hand, the book suggests a novel solution to the problem of translating a user query (targeting a logical mediated schema), into queries over a set of autonomous data sources provided with restricted web interfaces.
This book offers novel ideas for solving power system reliability estimation problems by the way of providing scalable and enhanced remote services in a heterogeneous environment. An innovative and comprehensive strategy has been implemented for power system reliability data representation, which is capable of accessing power system data from any data source and converts them into a common format using pre-defined XML document template. An XML annotation scheme is adopted for power system reliability data generation service. The XMLised representation of power system data offers reliable data exchange between legacy power system applications.
The aim of this research based book is to highlight the importance of real-time data warehousing (RTDW) to increase operational business intelligence in large organizations. We present state-of-the art survey and a comprehensive comparative analysis in three areas: (1) RTDW frameworks, (2) real-time data acquisition technologies, and (3) data warehouse multidimensional conceptual modeling (MCM) techniques. We also discuss some future research directions and open problems related to RTDW. In our point of view, semi-structured databases (i.e., XML) have greater capability to incorporate real-time data from operational sources than relational databases. Based on recent research articles, this book can greatly help the data warehouse practitioners and researchers to have a deep understanding of the issues related to RTDW and propose new solutions for the highlighted problems. In future, we aim to provide an RTDW framework based on the virtualization concept and a novel semi-structured MCM, which is suitable for OLTP and OLAP applications. This research is being conducted at Punjab University College of Information Technology (PUCIT), University of the Punjab, Lahore, Pakistan.
Today's Internet is accessible to diverse end devicesthrough a wide variety of network types. To react tothis diversity of usage contexts new media codecsinclude adaptation support in the codec design.Scalable media codecs enable to easily retrievedifferent qualities of the media content by simplydisregarding certain media segments. The MPEG-21Digital Item Adaptation (DIA) standard enablescodec-agnostic adaptation by specifying a set ofdescriptions in order to describe content, adaptationpossibilities and usage context in the XML domain.This book extends the DIA approach towards dynamicand distributed environments. To achieve this, novelmechanisms for fragmentation, storage and transportof XML metadata are introduced. Additionally, amechanism based on a novel binary header to enablecodec-agnostic adaptation is specified. This GenericScalability Header (GSH) enables codec-agnosticadaptation at a considerably lower performance costcompared to the DIA approach. An adaptation nodebased on these novel mechanisms is implemented andevaluated for several types of scalable media. Aconcluding discussion analyzes the results of theevaluation of both mechanisms.
The emerging Second-Generation Web is based entirely on XML and related technologies. This new version of the Web introduces a multitude of novel concepts, terms, and acronyms. The goal of this dictionary is not just to define the meaning of new words but to develop a proper understanding of leading-edge Web technologies. It will be an invaluable reference for all Internet professionals and practitioners as well as students and ordinary Web users. Key topics:- XML syntax and core technologies - All the major members of the XML family of technologies - Numerous XML-based domain-specific languages - Concept and architecture of the Semantic Web - Key Semantic Web technologies - Web services Features and Benefits:- Over 1,800 terms and definitions from a newly emerged area - Over 200 illustrations to promote an understanding of the latest technologies - Clear and accessible definitions and a unique writing style bridge the gap between definition and explanation - Extensive cross-referencing of terms and a CD-ROM containing a fully searchable version of the dictionary
This book constitutes the refereed proceedings of the Second International Conference on the Theory of Information Retrieval, ICTIR 2009, held in Cambridge, UK, in September 2009. The 18 revised full papers, 14 short papers, and 11 posters presented together with one invited talk were carefully reviewed and selected from 82 submissions. The papers are categorized into four main themes: novel IR models, evaluation, efficiency, and new perspectives in IR. Twenty-one papers fall into the general theme of novel IR models, ranging from various retrieval models, query and term selection models, Web IR models, developments in novelty and diversity, to the modeling of user aspects. There are four papers on new evaluation methodologies, e.g., modeling score distributions, evaluation over sessions, and an axiomatic framework for XML retrieval evaluation. Three papers focus on the issue of efficiency and offer solutions to improve the tractability of PageRank, data cleansing practices for training classifiers, and approximate search for distributed IR. Finally, four papers look into new perspectives of IR and shed light on some new emerging areas of interest, such as the application and adoption of quantum theory in IR.
This book describes novel software architectures for the integration of deep and shallow natural language processing (NLP) components in language technology. The generic markup language XML and the XML transformation language XSLT are used for flexible combination of linguistic markup produced by multiple NLP components. Shallow NLP components such as tokenizers, part-of-speech taggers, named entity recognizers and shallow parsers are combined with a deep parser, operating grammars written in the spirit of the Head-Driven Phrase Structure Grammar (HPSG) theory. The integration paradigm enables synergy leading to more robust deep parsing with increased coverage. It also constitutes a division of labor: the deep grammar models general, correct language use, while shallow systems are responsible for domain-specific extensions. Applications are presented in question answering, information extraction, natural language understanding, ontologies and the Semantic Web. The book addresses to software engineers, computational linguists and language technology engineers.
The emerging Second-Generation Web is based entirely on XML and related technologies. It is intended to result in the creation of the Semantic Web, on which computers will be able to deal with the meaning ("semantics") of Web data and hence to process them in a more effective and autono mous way. This new version of the Web introduces a multitude of novel concepts, terms, and acronyms. Purpose, Scope and Methods This dictionary is an effort to specify the terminological basis of emerging XML and Semantic Web technologies. The ultimate goal of this dictionary is even broader than just to define the meaning of newwords - itaims to develop aproper understandingofthese leading-edge technologies. To achieve this, comprehensible definitions of technical terms are supported by numerous diagrams and code snippets, clearly annotated and explained. The main areas covered in this dictionary are: (1) XML syntax and core technologies, such as Namespaces, Infoset and XML Schema, (2) all the major membersofthe XML family oftechnologies, such as XSLT, XPath and XLink, (3) numerous XML-based domain-specific languages, such as NewsML (News Markup Language), (4) the concept and architecture of the Semantic Web, (5) key Semantic Web technologies,such as RDF (Resource Description Framework), RDF Schema and OWL (Web Ontology Language), and (6) Web services, including WSDL (Web Services Description Lan guage) and SOAP (Simple Object Access Protocol).
Creating scientific workflow applications is a very challenging task due to the complexity of the distributed computing environments involved, the complex control and data flow requirements of scientific applications, and the lack of high-level languages and tools support. Particularly, sophisticated expertise in distributed computing is commonly required to determine the software entities to perform computations of workflow tasks, the computers on which workflow tasks are to be executed, the actual execution order of workflow tasks, and the data transfer between them.Qin and Fahringer present a novel workflow language called Abstract Workflow Description Language (AWDL) and the corresponding standards-based, knowledge-enabled tool support, which simplifies the development of scientific workflow applications. AWDL is an XML-based language for describing scientific workflow applications at a high level of abstraction. It is designed in a way that allows users to concentrate on specifying such workflow applications without dealing with either the complexity of distributed computing environments or any specific implementation technology. This research monograph is organized into five parts: overview, programming, optimization, synthesis, and conclusion, and is complemented by an appendix and an extensive reference list.The topics covered in this book will be of interest to both computer science researchers (e.g. in distributed programming, grid computing, or large-scale scientific applications) and domain scientists who need to apply workflow technologies in their work, as well as engineers who want to develop distributed and high-throughput workflow applications, languages and tools.