Prof. Christian S. Jensen
List of Publications


This page contains a list of research publications with
abstracts and, generally, links to full paper versions.

Due to the copyright restrictions of some publishers, not all the documents are available online on this page. If you cannot access the file you are interested in, please, feel free to contact me.



2023 Top

Dingming Wu, Erjia Xiao, Yi Zhu, Christian S. Jensen, Kezhong Lu ,"Efficient Retrieval of the Top-k Most Relevant Event-Partner Pairs" in IEEE Transactions on Knowledge and Data Engineering, 2023

The proliferation of event-based social networking (EBSN) motivates studies on topics such as event, venue, and friend recommendation as well as event creation and organization. In this setting, the notion of event-partner recommendation has attracted attention. When recommending an event to a user, this functionality allows the recommendation of partners with whom to attend the event. However, in existing proposals, recommendations are pushed to users at the system's initiative. In contrast, EBSNs provide users with keyword-based search functionality. This way, users may retrieve information in pull mode. We propose a new way of accessing information in EBSNs that combines pull and push, thus allowing users to not only conduct ad-hoc searches for events, but also to receive partner recommendations for retrieved events. Specifically, we define and study top-k k event-partner (k kEP) pair retrieval querying that integrates keyword-based search for events with event-partner recommendation. This type of query retrieves event-partner pairs, taking into account the relevance of events to user-supplied keywords and so-called together preferences that indicate the extent of a user's preference to attend an event with a given partner. To compute k kEP queries efficiently, we propose a rank-join based framework with three optimizations. Results of empirical studies with implementations of the proposed techniques demonstrate that the proposed techniques are capable of excellent performance.


Bolong Zheng, Qi Hu, Lingfeng Ming, Jilin Hu, Lu Chen, Kai Zheng, Christian S. Jensen ,"SOUP: Spatial-Temporal Demand Forecasting and Competitive Supply in Transportation" in IEEE Transactions on Knowledge and Data Engineering, 2023

We consider a setting with an evolving set of requests for transportation from an origin to a destination before a deadline and a set of agents capable of servicing the requests. In this setting, an authority assigns agents to requests such that the average idle time of the agents is minimized. An example is the scheduling of taxis (agents) to meet incoming passenger requests for trips while ensuring that the taxis are empty as little as possible. We address the problem of spatial-Temporal demand forecasting and competitive supply (SOUP) in two steps. First, we build a granular model that provides spatial-Temporal predictions of requests. Specifically, we propose a Spatial-Temporal Graph Convolutional Sequential Learning (ST-GCSL) model that predicts requests across locations and time slots. Second, we provide means of routing agents to request origins while avoiding competition among the agents. In particular, we develop a demand-Aware route planning (DROP) algorithm that considers both the spatial-Temporal predictions and the supply-demand state. We report on extensive experiments with real-world data that offer insight into the performance of the solution and show that it is capable of outperforming the state-of-The-Art proposals.


2022 Top

Yan Zhao, Liwei Deng, Xuanhao Chen, Chenjuan Guo, Bin Yang, Tung Kieu, Feiteng Huang, Torben Bach Pedersen, Kai Zheng, Christian S. Jensen ,"A Comparative Study on Unsupervised Anomaly Detection for Time Series: Experiments and Analysis." in arXiv, 2022

The continued digitization of societal processes translates into a proliferation of time series data that cover applications such as fraud detection, intrusion detection, and energy management, where anomaly detection is often essential to enable reliability and safety. Many recent studies target anomaly detection for time series data. Indeed, area of time series anomaly detection is characterized by diverse data, methods, and evaluation strategies, and comparisons in existing studies consider only part of this diversity, which makes it difficult to select the best method for a particular problem setting. To address this shortcoming, we introduce taxonomies for data, methods, and evaluation strategies, provide a comprehensive overview of unsupervised time series anomaly detection using the taxonomies, and systematically evaluate and compare state-of-the-art traditional as well as deep learning techniques. In the empirical study using nine publicly available datasets, we apply the most commonly-used performance evaluation metrics to typical methods under a fair implementation standard. Based on the structuring offered by the taxonomies, we report on empirical studies and provide guidelines, in the form of comparative tables, for choosing the methods most suitable for particular application settings. Finally, we propose research directions for this dynamic field.


Tung Kieu, Bin Yang, Chenjuan Guo, Razvan-Gabriel Cirstea, Yan Zhao, Yale Song, Christian S. JensenKontaktforfatter ,"Anomaly Detection in Time Series with Robust Variational Quasi-Recurrent Autoencoders" in 38th International Conference on Data Engineering (ICDE), 2022

We propose variational quasi-recurrent autoencoders (VQRAEs) to enable robust and efficient anomaly detection in time series in unsupervised settings. The proposed VQRAEs employs a judiciously designed objective function based on robust divergences, including a, ß, and, -divergence, making it possible to separate anomalies from normal data without the reliance on anomaly labels, thus achieving robustness and fully unsupervised training. To better capture temporal dependencies in time series data, VQRAEs are built upon quasi-recurrent neural networks, which employ convolution and gating mechanisms to avoid the inefficient recursive computations used by classic recurrent neural networks. Further, VQRAEs can be extended to bi-directional Bi VQRAEs that utilize bi-directional information to further improve the accuracy. The above design choices make VQRAEs not only robust and thus accurate, but also efficient at detecting anomalies in streaming settings. Experiments on five real-world time series offer insight into the design properties of VQRAEs and demonstrate that VQRAEs are capable of outperforming state-of-the-art methods.


Zezhi Shao, Zhao Zhang, Wei Wei, Fei Wang, Yongjun Xu, Xin Cao, Christian S. Jensen ,"Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting." in arXiv, 2022

We all depend on mobility, and vehicular transportation affects the daily lives of most of us. Thus, the ability to forecast the state of traffic in a road network is an important functionality and a challenging task. Traffic data is often obtained from sensors deployed in a road network. Recent proposals on spatial-temporal graph neural networks have achieved great progress at modeling complex spatial-temporal correlations in traffic data, by modeling traffic data as a diffusion process. However, intuitively, traffic data encompasses two different kinds of hidden time series signals, namely the diffusion signals and inherent signals. Unfortunately, nearly all previous works coarsely consider traffic signals entirely as the outcome of the diffusion, while neglecting the inherent signals, which impacts model performance negatively. To improve modeling performance, we propose a novel Decoupled Spatial-Temporal Framework (DSTF) that separates the diffusion and inherent traffic information in a data-driven manner, which encompasses a unique estimation gate and a residual decomposition mechanism. The separated signals can be handled subsequently by the diffusion and inherent modules separately. Further, we propose an instantiation of DSTF, Decoupled Dynamic Spatial-Temporal Graph Neural Network (D2STGNN), that captures spatial-temporal correlations and also features a dynamic graph learning module that targets the learning of the dynamic characteristics of traffic networks. Extensive experiments with four real-world traffic datasets demonstrate that the framework is capable of advancing the state-of-the-art.


Zezhi Shao, Zhao Zhang, Wei Wei, Fei Wang, Yongjun Xu, Xin Cao, Christian S. Jensen ,"Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting." in Proceedings of the VLDB Endowment, 2022

We all depend on mobility, and vehicular transportation affects the daily lives of most of us. Thus, the ability to forecast the state of traffic in a road network is an important functionality and a challenging task. Traffic data is often obtained from sensors deployed in a road network. Recent proposals on spatial-temporal graph neural networks have achieved great progress at modeling complex spatial-temporal correlations in traffic data, by modeling traffic data as a diffusion process. However, intuitively, traffic data encompasses two different kinds of hidden time series signals, namely the diffusion signals and inherent signals. Unfortunately, nearly all previous works coarsely consider traffic signals entirely as the outcome of the diffusion, while neglecting the inherent signals, which impacts model performance negatively. To improve modeling performance, we propose a novel Decoupled Spatial-Temporal Framework (DSTF) that separates the diffusion and inherent traffic information in a data-driven manner, which encompasses a unique estimation gate and a residual decomposition mechanism. The separated signals can be handled subsequently by the diffusion and inherent modules separately. Further, we propose an instantia-tion of DSTF, Decoupled Dynamic Spatial-Temporal Graph Neural Network (D2 STGNN), that captures spatial-temporal correlations and also features a dynamic graph learning module that targets the learning of the dynamic characteristics of traffic networks. Extensive experiments with four real-world traffic datasets demonstrate that the framework is capable of advancing the state-of-the-art.


Dingming Wu, Ilkcan Keles, Song Wu, Hao Zhou, Simonas Saltenis, Christian S. Jensen, Kezhong LuKontaktforfatter ,"Density-Based Top-K Spatial Textual Clusters Retrieval" in IEEE Transactions on Knowledge and Data Engineering, 2022

So-called spatial web queries retrieve web content representing points of interest, such that the points of interest have descriptions that are relevant to query keywords and are located close to a query location. Two broad categories of such queries exist. The first encompasses queries that retrieve single spatial web objects that each satisfy the query arguments. Most proposals belong to this category. The second category, to which this paper's proposal belongs, encompasses queries that support exploratory user behavior and retrieve sets of objects that represent regions of space that may be of interest to the user. Specifically, the paper proposes a new type of query, the top-k spatial textual cluster retrieval (k-STC) query that returns the top-k clusters that (i) are located close to a query location, (ii) contain objects that are relevant with regard to given query keywords, and (iii) have an object density that exceeds a given threshold. To compute this query, we propose a DBSCAN-based approach and an OPTICS-based approach that rely on on-line density-based clustering and that exploit early stop conditions. Empirical studies on real data sets offer evidence that the paper's proposals can find good quality clusters and are capable of excellent performance.


Dalin Zhang, Kaixuan Chen, Yan Zhao, Bin Yang, Lina Yao, Christian S. Jensen ,"Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey." in arXiv, 2022

Deep learning technologies have demonstrated remarkable effectiveness in a wide range of tasks, and deep learning holds the potential to advance a multitude of applications, including in edge computing, where deep models are deployed on edge devices to enable instant data processing and response. A key challenge is that while the application of deep models often incurs substantial memory and computational costs, edge devices typically offer only very limited storage and computational capabilities that may vary substantially across devices. These characteristics make it difficult to build deep learning solutions that unleash the potential of edge devices while complying with their constraints. A promising approach to addressing this challenge is to automate the design of effective deep learning models that are lightweight, require only a little storage, and incur only low computational overheads. This survey offers comprehensive coverage of studies of design automation techniques for deep learning models targeting edge computing. It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs. The survey then proceeds to cover three categories of the state-of-the-art of deep model design automation techniques: automated neural architecture search, automated model compression, and joint automated design and compression. Finally, the survey covers open issues and directions for future research.


Christian S. Jensen ,"Digitalization in the Service of Society: The Case of Big Vehicle Trajectory Data."

The ongoing, sweeping digitalization of societal processes generates massive volumes of data that capture the underlying processes at an unprecedented level of detail, in turn enabling us to better understand and improve those processes. Put differently, if harnessed properly, data holds the potential to enable value creation throughout society.Considering primarily vehicle trajectory data, this talk put focus on the important process of transportation: While we all depend on it for mobility, transportation has adverse effects on (i) our productivity due to lack of predictability and congestion, (ii) the climate due to greenhouse gas emissions, and (iii) our health and safety due to air and noise pollution and accidents. Thus, it makes good sense to invent techniques capable of leveraging big data for the improvement of transportation.The talk describes how the availability of massive trajectory data renders the traditional routing paradigm, where a road network is modeled as an edge-weighted graph, inadequate. Instead, new paradigms that thrive on massive trajectory data are called for. The talk covers several such paradigms, including path-centric, on-the-fly, and cost-oblivious routing [2, 3, 4, 10, 11, 12]. As even massive volumes of trajectory data are sparse in these settings, the talk also covers means of making good use of available data [6, 7, 13]. Finally, trajectory data has many uses beyond routing—the talk covers several such uses [1, 5, 8, 9].


Huan Li, Lanjing Yi, Bo Tang, Hua Lu, Christian S. Jensen ,"Efficient and Error-bounded Spatiotemporal Quantile Monitoring in Edge Computing Environments" in 48th International Conference on Very Large Data Bases, VLDB 2022, 2022

Underlying many types of data analytics, a spatiotemporal quantile monitoring (SQM) query continuously returns the quantiles of a dataset observed in a spatiotemporal range. In this paper, we study SQM in an Internet of Things (IoT) based edge computing environment, where concurrent SQM queries share the same infrastructure asynchronously. To minimize query latency while providing result accuracy guarantees, we design a processing framework that virtu-alizes edge-resident data sketches for quantile computing. In the framework, a coordinator edge node manages edge sketches and synchronizes edge sketch processing and query executions. The coordinator also controls the processed data fractions of edge sketches, which helps to achieve the optimal latency with error-bounded results for each single query. To support concurrent queries, we employ a grid to decompose queries into subqueries and process them efficiently using shared edge sketches. We also devise a relaxation algorithm to converge to optimal latencies for those subqueries whose result errors are still bounded. We evaluate our proposals using two high-speed streaming datasets in a simulated IoT setting with edge nodes. The results show that our proposals achieve efficient, scalable, and error-bounded SQM.


Lu Chen, Yunjun Gao, Xingrui Huang, Christian S. Jensen, Bolong Zheng ,"Efficient Distributed Clustering Algorithms on Star-Schema Heterogeneous Graphs" in IEEE Transactions on Knowledge and Data Engineering, 2022

Clustering graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be considered, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarities. We employ DBSCAN for clustering, and update edge weights iteratively to balance the importance of different attributes. The rapidly growing volume of data nowadays challenges traditional clustering algorithms, and thus, a distributed method is required. Hence, we adopt a popular distributed graph computing system Blogel, based on which, we develop four exact and approximate approaches that enable efficient PPR score computation when edge weights are updated. To improve the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. Also, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets demonstrate the effectiveness and efficiency of our proposals.


Tianyi Li, Christian S. Jensen, Torben Bach Pedersen, Yunjun Gao, Jilin Hu ,"Evolutionary Clustering of Moving Objects" in 38th IEEE International Conference on Data Engineering, ICDE 2022, 2022

The widespread deployment of smartphones, net-worked in-vehicle devices with geo-positioning capabilities, and vessel tracking technologies renders it feasible to collect the evolving geo-locations of populations of land- and sea-based moving objects. The continuous clustering of such data can enable a variety of real-time services, such as road traffic management and vessel collision risk assessment. However, little attention has so far been given to the quality of moving-object clusters-for example, it is beneficial to smooth short-term fluctuations in clusters to achieve robustness to exceptional data and to improve existing applications. We propose the notion of evolutionary clustering of moving objects, abbreviated ECM, that enhances the quality of moving object clustering by means of temporal smoothing that prevents abrupt changes in clusters across successive timestamps. Employing the notions of snapshot and historical costs, we formalize ECM and formulate ECM as an optimization problem. We prove that ECM can be performed approximately in linear time, thus eliminating iterative processes employed in previous studies. Further, we propose a minimal-group structure and a seed-point shifting strategy to facilitate temporal smoothing. Finally, we present all algorithms underlying ECM along with a set of optimization techniques. Extensive experiments with three real-life datasets offer insights into ECM and show that it outperforms state-of-the-art solutions in terms of both clustering quality and clustering efficiency.


Lu Chen, Yunjun Gao, Xuan Song, Zheng Li, Yifan Zhu, Xiaoye Miao, Christian S. JensenKontaktforfatter ,"Indexing Metric Spaces for Exact Similarity Search" in ACM Computing Surveys, 2022

With the continued digitization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity, and variety. Many studies address volume or velocity, while fewer studies concern the variety. Metric spaces are ideal for addressing variety because they can accommodate any data as long as it can be equipped with a distance notion that satisfies the triangle inequality. To accelerate search in metric spaces, a collection of indexing techniques for metric data have been proposed. However, existing surveys offer limited coverage, and a comprehensive empirical study exists has yet to be reported. We offer a comprehensive survey of existing metric indexes that support exact similarity search: we summarize existing partitioning, pruning, and validation techniques used by metric indexes to support exact similarity search; we provide the time and space complexity analyses of index construction; and we offer an empirical comparison of their query processing performance. Empirical studies are important when evaluating metric indexing performance, because performance can depend highly on the effectiveness of available pruning and validation as well as on the data distribution, which means that complexity analyses often offer limited insights. This article aims at revealing strengths and weaknesses of different indexing techniques to offer guidance on selecting an appropriate indexing technique for a given setting, and to provide directions for future research on metric indexing.


Xuanhao Chen, Yan Zhao, Kai Zheng, Bin Yang, Christian S. JensenKontaktforfatter ,"Influence-aware Task Assignment in Spatial Crowdsourcing" in 38th International Conference on Data Engineering (ICDE), 2022

With the widespread diffusion of smartphones, Spatial Crowdsourcing (SC), which aims to assign spatial tasks to mobile workers, has drawn increasing attention in both academia and industry. One of the major issues is how to best assign tasks to workers. Given a worker and a task, the worker will choose to accept the task based on her affinity towards the task, and the worker can propagate the information of the task to attract more workers to perform it. These factors can be measured as worker-task influence. Since workers' affinities towards tasks are different and task issuers may ask workers who performed tasks to propagate the information of tasks to attract more workers to perform them, it is important to analyze worker-task influence when making assignments. We propose and solve a novel influence-aware task assignment problem in SC, where tasks are assigned to workers in a manner that achieves high worker-task influence. In particular, we aim to maximize the number of assigned tasks and worker-task influence. To solve the problem, we first determine workers' affinities towards tasks by identifying workers' historical task-performing patterns. Next, a Historical Acceptance approach is developed to measure workers' willingness of performing a task, i.e., the probability of workers visiting the location of the task when they are informed. Next, we propose a Random reverse reachable-based Propagation Optimization algorithm that exploits reverse reachable sets to calculate the probability of workers being informed about tasks in a social network. Based on worker-task influence derived from the above three factors, we propose three influence-aware task assignment algorithms that aim to maximize the number of assigned tasks and worker-task influence. Extensive experiments on two real-world datasets offer detailed insight into the effectiveness of our solutions.


Xuanhao Chen, Yan Zhao, Kai Zheng, Bin Yang, Christian S. Jensen ,"Influence-aware Task Assignment in Spatial Crowdsourcing (Technical Report)." in arXiv, 2022

With the widespread diffusion of smartphones, Spatial Crowdsourcing (SC), which aims to assign spatial tasks to mobile workers, has drawn increasing attention in both academia and industry. One of the major issues is how to best assign tasks to workers. Given a worker and a task, the worker will choose to accept the task based on her affinity towards the task, and the worker can propagate the information of the task to attract more workers to perform it. These factors can be measured as worker-task influence. Since workers' affinities towards tasks are different and task issuers may ask workers who performed tasks to propagate the information of tasks to attract more workers to perform them, it is important to analyze worker-task influence when making assignments. We propose and solve a novel influence-aware task assignment problem in SC, where tasks are assigned to workers in a manner that achieves high worker-task influence. In particular, we aim to maximize the number of assigned tasks and worker-task influence. To solve the problem, we first determine workers' affinities towards tasks by identifying workers' historical task-performing patterns. Next, a Historical Acceptance approach is developed to measure workers' willingness of performing a task, i.e., the probability of workers visiting the location of the task when they are informed. Next, we propose a Random reverse reachable-based Propagation Optimization algorithm that exploits reverse reachable sets to calculate the probability of workers being informed about tasks in a social network. Based on worker-task influence derived from the above three factors, we propose three influence-aware task assignment algorithms that aim to maximize the number of assigned tasks and worker-task influence. Extensive experiments on two real-world datasets offer detailed insight into the effectiveness of our solutions.


Anton Dignös, Michael H. Böhlen, Johann Gamper, Christian S. Jensen, Peter MoserKontaktforfatter ,"Leveraging range joins for the computation of overlap joins" in VLDB Journal, 2022

Joins are essential and potentially expensive operations in database management systems. When data is associated with time periods, joins commonly include predicates that require pairs of argument tuples to overlap in order to qualify for the result. Our goal is to enable built-in systems support for such joins. In particular, we present an approach where overlap joins are formulated as unions of range joins, which are more general purpose joins compared to overlap joins, i.e., are useful in their own right, and are supported well by B+-trees. The approach is sufficiently flexible that it also supports joins with additional equality predicates, as well as open, closed, and half-open time periods over discrete and continuous domains, thus offering both generality and simplicity, which is important in a system setting. We provide both a stand-alone solution that performs on par with the state-of-the-art and a DBMS embedded solution that is able to exploit standard indexing and clearly outperforms existing DBMS solutions that depend on specialized indexing techniques. We offer both analytical and empirical evaluations of the proposals. The empirical study includes comparisons with pertinent existing proposals and offers detailed insight into the performance characteristics of the proposals.


Pengfei Jin, Lu Chen, Yunjun Gao, Xueqin Chang, Zhanyu Liu, Shu Shen, Christian S. JensenKontaktforfatter ,"Maximizing the influence of bichromatic reverse k nearest neighbors in geo-social networks" in World Wide Web, 2022

Geo-social networks offer opportunities for the marketing and promotion of geo-located services. In this setting, we explore a new problem, called Maximizing the Influence of Bichromatic Reverse kNearest Neighbors (MaxInfBRkNN). The objective is to find a set of points of interest (POIs), which are geo-textually and socially relevant to social influencers who are expected to largely promote the POIs online. In other words, the problem aims to detect an optimal set of POIs with the largest word-of-mouth (WOM) marketing potential. This functionality is useful in various real-life applications, including social advertising, location-based viral marketing, and personalized POI recommendation. However, solving MaxInfBRkNN with theoretical guarantees is challenging because of the prohibitive overheads on BRkNN retrieval in geo-social networks, and the NP and #P-hardness of finding the optimal POI set. To achieve practical solutions, we present a framework with carefully designed indexes, efficient batch BRkNN processing algorithms, and alternative POI selection policies that support both approximate and heuristic solutions. Extensive experiments on real and synthetic datasets demonstrate the good performance of our proposed methods.


Pengfei Jin, Lu Chen, Yunjun Gao, Xueqin Chang, Zhanyu Liu, Christian S. Jensen ,"Maximizing the Influence of Bichromatic Reverse k Nearest Neighbors in Geo-Social Networks." in arXiv, 2022

Geo-social networks offer opportunities for the marketing and promotion of geo-located services. In this setting, we explore a new problem, called Maximizing the Influence of Bichromatic Reverse k Nearest Neighbors (MaxInfBRkNN). The objective is to find a set of points of interest (POIs), which are geo-textually and socially attractive to social influencers who are expected to largely promote the POIs through online influence propagation. In other words, the problem aims to detect an optimal set of POIs with the largest word-of-mouth (WOM) marketing potential. This functionality is useful in various real-life applications, including social advertising, location-based viral marketing, and personalized POI recommendation. However, solving MaxInfBRkNN with theoretical guarantees is challenging, because of the prohibitive overheads on BRkNN retrieval in geo-social networks, and the NP and #P-hardness in finding the optimal POI set. To achieve practical solutions, we present a framework with carefully designed indexes, efficient batch BRkNN processing algorithms, and alternative POI selection policies that support both approximate and heuristic solutions. Extensive experiments on real and synthetic datasets demonstrate the good performance of our proposed methods.


Karl Aberer, Christian S. Jensen, Kian Lee Tan ,"Message from the Test-of-Time Committee" in 23rd IEEE International Conference on Mobile Data Management, MDM 2022, 2022

Presents the conference keynote speech or messages from conference chairs.


Mohamed F. Mokbel, Mahmoud Attia Sakr, Li Xiong, Andreas Züfle, Jussara M. Almeida, Taylor Anderson, Walid G. Aref, Gennady L. Andrienko, Natalia V. Andrienko, Yang Cao, Sanjay Chawla, Reynold Cheng, Panos K. Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, Dimitrios Gunopulos, Christian S. Jensen, Joon-Sook Kim, Kyoung-Sook KimPeer Kröger, John Krumm, Johannes Lauer, Amr Magdy, Mario A. Nascimento, Siva Ravada, Matthias Renz, Dimitris Sacharidis, Cyrus Shahabi, Flora D. Salim, Mohamed Sarwat, Maxime Schoemans, Bettina Speckmann, Egemen Tanin, Yannis Theodoridis, Kristian Torp, Goce Trajcevski, Marc J. van Kreveld, Carola Wenk, Martin Werner, Raymond Chi-Wing Wong, Song Wu, Jianqiu Xu, Moustafa Youssef, Demetris Zeinalipour, Mengxuan Zhang, Esteban ZimányiVis 27 andreVis mindre ,"Mobility Data Science: Dagstuhl Seminar 22021" in Dagstuhl Seminar 22021 , 2022

This report documents the program and the outcomes of Dagstuhl Seminar 22021 "Mobility Data Science". This seminar was held January 9-14, 2022, including 47 participants from industry and academia. The goal of this Dagstuhl Seminar was to create a new research community of mobility data science in which the whole is greater than the sum of its parts by bringing together established leaders as well as promising young researchers from all fields related to mobility data science.Specifically, this report summarizes the main results of the seminar by (1) defining Mobility Data Science as a research domain, (2) by sketching its agenda in the coming years, and by (3) building a mobility data science community. (1) Mobility data science is defined as spatiotemporal data that additionally captures the behavior of moving entities (human, vehicle, animal, etc.). To understand, explain, and predict behavior, we note that a strong collaboration with research in behavioral and social sciences is needed. (2) Future research directions for mobility data science described in this report include a) mobility data acquisition and privacy, b) mobility data management and analysis, and c) applications of mobility data science. (3) We identify opportunities towards building a mobility data science community, towards collaborations between academic and industry, and towards a mobility data science curriculum.


Yan Zhao, Xuanhao Chen, Liwei Deng, Tung Kieu, Chenjuan Guo, Bin Yang, Kai Zheng, Christian S. Jensen ,"Outlier Detection for Streaming Task Assignment in Crowdsourcing." in 31st ACM Web Conference, WWW 2022, 2022

Crowdsourcing aims to enable the assignment of available resources to the completion of tasks at scale. The continued digitization of societal processes translates into increased opportunities for crowdsourcing. For example, crowdsourcing enables the assignment of computational resources of humans, called workers, to tasks that are notoriously hard for computers. In settings faced with malicious actors, detection of such actors holds the potential to increase the robustness of crowdsourcing platform. We propose a framework called Outlier Detection for Streaming Task Assignment that aims to improve robustness by detecting malicious actors. In particular, we model the arrival of workers and the submission of tasks as evolving time series and provide means of detecting malicious actors by means of outlier detection. We propose a novel socially aware Generative Adversarial Network (GAN) based architecture that is capable of contending with the complex distributions found in time series. The architecture includes two GANs that are designed to adversarially train an autoencoder to learn the patterns of distributions in worker and task time series, thus enabling outlier detection based on reconstruction errors. A GAN structure encompasses a game between a generator and a discriminator, where it is desirable that the two can learn to coordinate towards socially optimal outcomes, while avoiding being exploited by selfish opponents. To this end, we propose a novel training approach that incorporates social awareness into the loss functions of the two GANs. Additionally, to improve task assignment efficiency, we propose an efficient greedy algorithm based on degree reduction that transforms task assignment into a bipartite graph matching. Extensive experiments offer insight into the effectiveness and efficiency of the proposed framework.


Yifan Zhu, Lu Chen, Yunjun Gao, Christian S. Jensen ,"Pivot selection algorithms in metric spaces: a survey and experimental study" in VLDB Journal, 2022

Similarity search in metric spaces is used widely in areas such as multimedia retrieval, data mining, data integration, to name but a few. To accelerate metric similarity search, pivot-based indexing is often employed. Pivot-based indexing first computes the distances between data objects and pivots and then exploits filtering techniques that use the triangle inequality on pre-computed distances to prune search space during search. The performance of pivot-based indexing depends on the quality of the pivots used, and many algorithms have been proposed for selecting high-quality pivots. We present a comprehensive empirical study of pivot selection algorithms. Specifically, we classify all existing algorithms into three categories according to the types of distances they use for selecting pivots. We also propose a new pivot selection algorithm that exploits the power law probabilistic distribution. Next, we report on a comprehensive empirical study of the search performance enabled by different pivot selection approaches, using different datasets and indexes, thus contributing new insight into the strengths and weaknesses of existing selection techniques. Finally, we offer advice on how to select appropriate pivot selection algorithms for different settings.


Bolong Zheng, Xi Zhao, Lianggui Weng, Quoc Viet Hung Nguyen, Hang Liu, Christian S. JensenKontaktforfatter ,"PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search" in VLDB Journal, 2022

Nearest neighbor (NN) search is inherently computationally expensive in high-dimensional spaces due to the curse of dimensionality. As a well-known solution, locality-sensitive hashing (LSH) is able to answer c-approximate NN (c-ANN) queries in sublinear time with constant probability. Existing LSH methods focus mainly on building hash bucket-based indexing such that the candidate points can be retrieved quickly. However, existing coarse-grained structures fail to offer accurate distance estimation for candidate points, which translates into additional computational overhead when having to examine unnecessary points. This in turn reduces the performance of query processing. In contrast, we propose a fast and accurate in-memory LSH framework, called PM-LSH, that aims to compute the c-ANN query on large-scale, high-dimensional datasets. First, we adopt a simple yet effective PM-tree to index the data points. Second, we develop a tunable confidence interval to achieve accurate distance estimation and guarantee high result quality. Third, we propose an efficient algorithm on top of the PM-tree to improve the performance of computing c-ANN queries. In addition, we extend PM-LSH to support closest pair (CP) search in high-dimensional spaces. Here, we again adopt the PM-tree to organize the points in a low-dimensional space, and we propose a branch and bound algorithm together with a radius pruning technique to improve the performance of computing c-approximate closest pair (c-ACP) queries. Extensive experiments with real-world data offer evidence that PM-LSH is capable of outperforming existing proposals with respect to both efficiency and accuracy for both NN and CP search.


Yan Zhao, Kai Zheng, Yunchuan Li, Jinfu Xia, Bin Yang, Torben Bach Pedersen, Rui Mao, Christian S. Jensen, Xiaofang Zhou ,"Profit Optimization in Spatial Crowdsourcing: Effectiveness and Efficiency" in IEEE Transactions on Knowledge and Data Engineering, 2022

In Spatial crowdsourcing, mobile users perform spatio-temporal tasks that involve travel to specified locations. Spatial crowdsourcing (SC) is enabled by SC platforms that support mobile worker recruitment and retention, as well as task assignment, which is essential to maximize profits that are accrued from serving task requests. Specifically, how to best achieve task assignment in a cost-effective manner while contending with spatio-temporal constraints is a key challenge in SC. To address this challenge, we formalize and study a novel Profit-driven Task Assignment problem. We first establish a task reward pricing model that takes into account the temporal constraints (i.e., expected completion time and deadline) of tasks. Then we adopt an optimal algorithm based on tree decomposition to achieve an optimal task assignment and propose greedy algorithms based on Random Tuning Optimization to improve the computational efficiency. To balance effectiveness and efficiency, we also provide a heuristic task assignment algorithm based on Ant Colony Optimization that assigns tasks by simulating behavior of ant colonies foraging for food. Finally, we conduct extensive experiments using real and synthetic data, offering detailed insight into effectiveness and efficiency of the proposed methods.


Tobias Skovgaard Jepsen, Christian S. Jensen, Thomas Dyhre NielsenKontaktforfatter ,"Relational Fusion Networks: Graph Convolutional Networks for Road Networks" in IEEE Transactions on Intelligent Transportation Systems, 2022

The application of machine learning techniques in the setting of road networks holds the potential to facilitate many important intelligent transportation applications. Graph Convolutional Networks (GCNs) are neural networks that are capable of leveraging the structure of a network. However, many implicit assumptions of GCNs do not apply to road networks. We introduce the Relational Fusion Network (RFN), a novel type of Graph Convolutional Network (GCN) designed specifically for road networks. In particular, we propose methods that outperform state-of-the-art GCN architectures by up to 21-40% on two machine learning tasks in road networks. Furthermore, we show that state-of-the-art GCNs may fail to effectively leverage road network structure and may not generalize well to other road networks.


Tung Kieu, Bin Yang, Chenjuan Guo, Christian S. Jensen, Yan Zhao, Feiteng Huang, Kai ZhengKontaktforfatter ,"Robust and Explainable Autoencoders for Unsupervised Time Series Outlier Detection"

Time series data occurs widely, and outlier detection is a fundamental problem in data mining, which has numerous applications. Existing autoencoder-based approaches deliver state-of-the-art performance on challenging real-world data but are vulnerable to outliers and exhibit low explainability. To address these two limitations, we propose robust and explainable unsupervised autoencoder frameworks that decompose an input time series into a clean time series and an outlier time series using autoencoders. Improved explainability is achieved because clean time series are better explained with easy-to-understand patterns such as trends and periodicities. We provide insight into this by means of a post-hoc explainability analysis and empirical studies. In addition, since outliers are separated from clean time series iteratively, our approach offers improved robustness to outliers, which in turn improves accuracy. We evaluate our approach on five real-world datasets and report improvements over the state-of-the-art approaches in terms of robustness and explainability.


Tung Kieu, Bin Yang, Chenjuan Guo, Christian S. Jensen, Yan Zhao, Feiteng Huang, Kai Zheng ,"Robust and Explainable Autoencoders for Unsupervised Time Series Outlier Detection - Extended Version." in CoRR, 2022

Time series data occurs widely, and outlier detection is a fundamental problem in data mining, which has numerous applications. Existing autoencoder-based approaches deliver state-of-the-art performance on challenging real-world data but are vulnerable to outliers and exhibit low explainability. To address these two limitations, we propose robust and explainable unsupervised autoencoder frameworks that decompose an input time series into a clean time series and an outlier time series using autoencoders. Improved explainability is achieved because clean time series are better explained with easy-to-understand patterns such as trends and periodicities. We provide insight into this by means of a post-hoc explainability analysis and empirical studies. In addition, since outliers are separated from clean time series iteratively, our approach offers improved robustness to outliers, which in turn improves accuracy. We evaluate our approach on five real-world datasets and report improvements over the state-of-the-art approaches in terms of robustness and explainability.This is an extended version of "Robust and Explainable Autoencoders for Unsupervised Time Series Outlier Detection", to appear in IEEE ICDE 2022.


Huan Li, Bo Tang, Hua Lu, Muhammad Aamir Cheema, Christian S. Jensen ,"Spatial Data Quality in the IoT Era: Management and Exploitation" in 2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022, 2022

Within the rapidly expanding Internet of Things (IoT), growing amounts of spatially referenced data are being generated. Due to the dynamic, decentralized, and heterogeneous nature of the IoT, spatial IoT data (SID) quality has attracted considerable attention in academia and industry. How to invent and use technologies for managing spatial data quality and exploiting low-quality spatial data are key challenges in the IoT. In this tutorial, we highlight the SID consumption requirements in applications and offer an overview of spatial data quality in the IoT setting. In addition, we review pertinent technologies for quality management and low-quality data exploitation, and we identify trends and future directions for quality-aware SID management and utilization. The tutorial aims to not only help researchers and practitioners to better comprehend SID quality challenges and solutions, but also offer insights that may enable innovative research and applications.


Ziquan Fang, Yuntao Du, Xinjun Zhu, Danlei Hu, Lu Chen, Yunjun Gao, Christian S. Jensen ,"Spatio-Temporal Trajectory Similarity Learning in Road Networks" in 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022, 2022

Deep learning based trajectory similarity computation holds the potential for improved efficiency and adaptability over traditional similarity computation. However, existing learning-based trajectory similarity learning solutions prioritize spatial similarity over temporal similarity, making them suboptimal for time-aware analyses. To this end, we propose ST2Vec, a representation learning based solution that considers fine-grained spatial and temporal relations between trajectories to enable spatio-temporal similarity computation in road networks. Specifically, ST2Vec encompasses two steps: (i) spatial and temporal modeling that encode spatial and temporal information of trajectories, where a generic temporal modeling module is proposed for the first time; and (ii) spatio-temporal co-attention fusion, where two fusion strategies are designed to enable the generation of unified spatio-temporal embeddings of trajectories. Further, under the guidance of triplet loss, ST2Vec employs curriculum learning in model optimization to improve convergence and effectiveness. An experimental study offers evidence that ST2Vec outperforms state-of-the-art competitors substantially in terms of effectiveness and efficiency, while showing low parameter sensitivity and good model robustness. Moreover, similarity involved case studies including top-k querying and DBSCAN clustering offer further insight into the capabilities of ST2Vec.


Bezaye Tesfaye, Nikolaus Augsten, Mateusz Pawlik, Michael H. Böhlen, Christian S. JensenKontaktforfatter ,"Speeding Up Reachability Queries in Public Transport Networks Using Graph Partitioning" in Information Systems Frontiers, 2022

Computing path queries such as the shortest path in public transport networks is challenging because the path costs between nodes change over time. A reachability query from a node at a given start time on such a network retrieves all points of interest (POIs) that are reachable within a given cost budget. Reachability queries are essential building blocks in many applications, for example, group recommendations, ranking spatial queries, or geomarketing. We propose an efficient solution for reachability queries in public transport networks. Currently, there are two options to solve reachability queries. (1) Execute a modified version of Dijkstra’s algorithm that supports time-dependent edge traversal costs; this solution is slow since it must expand edge by edge and does not use an index. (2) Issue a separate path query for each single POI, i.e., a single reachability query requires answering many path queries. None of these solutions scales to large networks with many POIs. We propose a novel and lightweight reachability index. The key idea is to partition the network into cells. Then, in contrast to other approaches, we expand the network cell by cell. Empirical evaluations on synthetic and real-world networks confirm the efficiency and the effectiveness of our index-based reachability query solution.


Tobias Skovgaard Jepsen, Christian S. Jensen, Thomas Dyhre Nielsen ,"UniTE - The Best of Both Worlds - Unifying Function-Fitting and Aggregation-Based Approaches to Travel Time and Travel Speed Estimation." in Transactions on Spatial Algorithms and Systems, 2022

Travel time and speed estimation are part of many intelligent transportation applications. Existing estimation approaches rely on either function fitting or data aggregation and represent different tradeoffs between generalizability and accuracy.Function-fitting approaches learn functions that map feature vectors of, e.g., routes to travel time or speed estimates, which enables generalization to unseen routes. However, mapping functions are imperfect and offer poor accuracy in practice. Aggregation-based approaches instead form estimates by aggregating historical data, e.g., traversal data for routes. This enables very high accuracy given sufficient data. However, they rely on simplistic heuristics when insufficient data is available, yielding poor generalizability.We present a Unifying approach to Travel time and speed Estimation (UniTE) that combines function-fitting and aggregation-based approaches into a unified framework that aims to achieve the generalizability of function-fitting approaches and the accuracy of aggregation-based approaches when data is available. We demonstrate empirically that an instance of UniTE can improve the accuracies of travel speed and travel time estimation by 40–64% and 3–23%, respectively, compared to using only function fitting or data aggregation.


Sean Bin Yang, Chenjuan Guo, Jilin Hu, Bin Yang, Jian Tang, Christian S. Jensen ,"Weakly-supervised Temporal Path Representation Learning with Contrastive Curriculum Learning"

In step with the digitalization of transportation, we are witnessing a growing range of path-based smart-city applications, e.g., travel-time estimation and travel path ranking. A temporal path (TP) that includes temporal information, e.g., departure time, into the path is of fundamental to enable such applications. In this setting, it is essential to learn generic temporal path representations (TPRs) that consider spatial and temporal correlations simultaneously and that can be used in different applications, i.e., downstream tasks. Existing methods fail to achieve the goal since (i) supervised methods require large amounts of task-specific labels when training and thus fail to generalize the obtained TPRs to other tasks; (ii) though unsupervised methods can learn generic representations, they disregard the temporal aspect, leading to sub-optimal results. To contend with the limitations of existing solutions, we propose a Weakly-Supervised Contrastive learning model. We first propose a temporal path encoder that encodes both the spatial and temporal information of a temporal path into a TPR. To train the encoder, we introduce weak labels that are easy and inexpensive to obtain, and are relevant to different tasks, e.g., temporal labels indicating peak vs. off-peak hour from departure times. Based on the weak labels, we construct meaningful positive and negative temporal path samples by considering both spatial and temporal information, which facilities training the encoder using contrastive learning by pulling closer the positive samples' representations while pushing away the negative samples' representations. To better guide the contrastive learning, we propose a learning strategy based on Curriculum Learning such that the learning performs from easy to hard training instances. Experimental studies involving three downstream tasks, i.e., travel time estimation, path ranking, and path recommendation, on three road networks offer strong evidence that the proposal is superior to state-of-the-art unsupervised and supervised methods and that it can be used as a pre-training approach to enhance supervised TPR learning.


Bin Yang, Chenjuan Guo, Jilin Hu, Bin Yang, Jian Tang, Christian S. Jensen ,"Weakly-supervised Temporal Path Representation Learning with Contrastive Curriculum Learning - Extended Version." in arXiv, 2022

In step with the digitalization of transportation, we are witnessing a growing range of path-based smart-city applications, e.g., travel-time estimation and travel path ranking. A temporal path(TP) that includes temporal information, e.g., departure time, into the path is fundamental to enable such applications. In this setting, it is essential to learn generic temporal path representations(TPRs) that consider spatial and temporal correlations simultaneously and that can be used in different applications, i.e., downstream tasks. Existing methods fail to achieve the goal since (i) supervised methods require large amounts of task-specific labels when training and thus fail to generalize the obtained TPRs to other tasks; (ii) through unsupervised methods can learn generic representations, they disregard the temporal aspect, leading to sub-optimal results. To contend with the limitations of existing solutions, we propose a Weakly-Supervised Contrastive (WSC) learning model. We first propose a temporal path encoder that encodes both the spatial and temporal information of a temporal path into a TPR. To train the encoder, we introduce weak labels that are easy and inexpensive to obtain and are relevant to different tasks, e.g., temporal labels indicating peak vs. off-peak hours from departure times. Based on the weak labels, we construct meaningful positive and negative temporal path samples by considering both spatial and temporal information, which facilities training the encoder using contrastive learning by pulling closer to the positive samples' representations while pushing away the negative samples' representations. To better guide contrastive learning, we propose a learning strategy based on Curriculum Learning such that the learning performs from easy to hard training instances. Experiments studies verify the effectiveness of the proposed method.


Jingyi Wan, Yongyong Gao, Yong Ma, Kai Huang, Xiaofang Zhou, Christian S. Jensen, Bolong Zheng ,"Workload-Aware Shortest Path Distance Querying in Road Networks" in 38th IEEE International Conference on Data Engineering, ICDE 2022, 2022

Computing shortest-path distances in road networks is core functionality in a range of applications. To enable the efficient computation of such distance queries, existing proposals frequently apply 2-hop labeling that constructs a label for each vertex and enables the computation of a query by performing only a linear scan of labels. However, few proposals take into account the spatio-temporal characteristics of query workloads. We observe that real-world workloads exhibit (1) spatial skew, meaning that only a small subset of vertices are queried frequently, and (2) temporal locality, meaning that adjacent time intervals have similar query distributions. We propose a Workload-aware Core-Forest label index (WCF) to exploit spatial skew in workloads. In addition, we develop a Reinforcement Learning based Time Interval Partitioning (RL-TIP) algorithm that exploits temporal locality to partition workloads to achieve further performance improvements. Extensive experiments with real-world data offer insights into the performance of the proposals, showing that they achieve 62% speedup on average for query processing with less preprocessing time and space overhead when compared with the state-of-the-art proposals.


2021 Top

Xinle Wu, Dalin Zhang, Chenjuan Guo, Chaoyang He, Bin Yang, Christian S. Jensen ,"AutoCTS: Automated Correlated Time Series Forecasting" in Proceedings of the VLDB Endowment, 2021

Correlated time series (CTS) forecasting plays an essential role in many cyber-physical systems, where multiple sensors emit time series that capture interconnected processes. Solutions based on deep learning that deliver state-of-the-art CTS forecasting performance employ a variety of spatio-temporal (ST) blocks that are able to model temporal dependencies and spatial correlations among time series. However, two challenges remain. First, ST-blocks are designed manually, which is time consuming and costly. Second, existing forecasting models simply stack the same ST-blocks multiple times, which limits the model potential. To address these challenges, we propose AutoCTS that is able to automatically identify highly competitive ST-blocks as well as forecasting models with heterogeneous ST-blocks connected using diverse topologies, as opposed to the same ST-blocks connected using simple stacking. Specifically, we design both a micro and a macro search space to model possible architectures of ST-blocks and the connections among heterogeneous ST-blocks, and we provide a search strategy that is able to jointly explore the search spaces to identify optimal forecasting models. Extensive experiments on eight commonly used CTS forecasting benchmark datasets justify our design choices and demonstrate that AutoCTS is capable of automatically discovering forecasting models that outperform state-of-the-art human-designed models.


Xinle Wu, Dalin Zhang, Chenjuan Guo, Chaoyang He, Bin Yang, Christian S. Jensen ,"AutoCTS - Automated Correlated Time Series Forecasting - Extended Version."




Christian S. Jensen (Redaktør), Ee-Peng Lim (Redaktør), De-Nian Yang (Redaktør), Wang-Chien Lee (Redaktør), Vincent S. Tseng (Redaktør), Vana Kalogeraki (Redaktør), Jen-Wei Huang (Redaktør), Chih-Ya Shen (Redaktør) ,"Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11-14, 2021, Proceedings, Part II" in 26th International Conference, DASFAA 2021, 2021

The three-volume set LNCS 12681-12683 constitutes the proceedings of the 26th International Conference on Database Systems for Advanced Applications, DASFAA 2021, held in Taipei, Taiwan, in April 2021.The total of 156 papers presented in this three-volume set was carefully reviewed and selected from 490 submissions.The topic areas for the selected papers include information retrieval, search and recommendation techniques; RDF, knowledge graphs, semantic web, and knowledge management; and spatial, temporal, sequence, and streaming data management, while the dominant keywords are network, recommendation, graph, learning, and model. These topic areas and keywords shed the light on the direction where the research in DASFAA is moving towards.Due to the Corona pandemic this event was held virtually.


Christian S. Jensen (Redaktør), Ee-Peng Lim (Redaktør), De-Nian Yang (Redaktør), Wang-Chien Lee (Redaktør), Vincent S. Tseng (Redaktør), Vana Kalogeraki (Redaktør), Jen-Wei Huang (Redaktør), Chih-Ya Shen (Redaktør) ,"Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11-14, 2021, Proceedings, Part I" in 26th International Conference, DASFAA 2021, 2021

The three-volume set LNCS 12681-12683 constitutes the proceedings of the 26th International Conference on Database Systems for Advanced Applications, DASFAA 2021, held in Taipei, Taiwan, in April 2021.The total of 156 papers presented in this three-volume set was carefully reviewed and selected from 490 submissions.The topic areas for the selected papers include information retrieval, search and recommendation techniques; RDF, knowledge graphs, semantic web, and knowledge management; and spatial, temporal, sequence, and streaming data management, while the dominant keywords are network, recommendation, graph, learning, and model. These topic areas and keywords shed the light on the direction where the research in DASFAA is moving towards.Due to the Corona pandemic this event was held virtually.


Christian S. Jensen (Redaktør), Ee-Peng Lim (Redaktør), De-Nian Yang (Redaktør), Wang-Chien Lee (Redaktør), Vincent S. Tseng (Redaktør), Vana Kalogeraki (Redaktør), Jen-Wei Huang (Redaktør), Chih-Ya Shen (Redaktør) ,"Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11-14, 2021, Proceedings, Part III" in 26th International Conference, DASFAA 2021, 2021

The three-volume set LNCS 12681-12683 constitutes the proceedings of the 26th International Conference on Database Systems for Advanced Applications, DASFAA 2021, held in Taipei, Taiwan, in April 2021.The total of 156 papers presented in this three-volume set was carefully reviewed and selected from 490 submissions.The topic areas for the selected papers include information retrieval, search and recommendation techniques; RDF, knowledge graphs, semantic web, and knowledge management; and spatial, temporal, sequence, and streaming data management, while the dominant keywords are network, recommendation, graph, learning, and model. These topic areas and keywords shed the light on the direction where the research in DASFAA is moving towards.Due to the Corona pandemic this event was held virtually.


Christian S. Jensen (Redaktør), Ee-Peng Lim (Redaktør), De-Nian Yang (Redaktør), Chia-Hui Chang (Redaktør), Jianliang Xu (Redaktør), Wen-Chih Peng (Redaktør), Jen-Wei Huang (Redaktør), Chih-Ya Shen (Redaktør) ,"Database Systems for Advanced Applications. DASFAA 2021 International Workshops: BDQM, GDMA, MLDLDSA, MobiSocial, and MUST, Taipei, Taiwan, April 11-14, 2021, Proceedings" in 26th International Conference, DASFAA 2021, 2021

This volume constitutes the papers of several workshops which were held in conjunction with the 26th International Conference on Database Systems for Advanced Applications, DASFAA 2021, held in Taipei, Taiwan, in April 2021.The 29 revised full papers presented in this book were carefully reviewed and selected from 84 submissions. DASFAA 2021 presents the following five workshops:6th International Workshop on Big Data Quality Management (BDQM 2021)5th International Workshop on Graph Data Management and Analysis (GDMA 2021)First International Workshop on Machine Learning and Deep Learning for Data Security Applications (MLDLDSA 2021)6th International Workshop on Mobile Data Management, Mining, and Computing on Social Network (MobiSocial 2021)2021 International Workshop on Mobile Ubiquitous Systems and Technologies (MUST 2021)Due to the Corona pandemic this event was held virtually.


Ziquan Fang, Lu Chen, Yunjun Gao, Lu Pan, Christian S. JensenKontaktforfatter ,"Dragoon: a hybrid and efficient big trajectory management system for offline and online analytics" in VLDB Journal, 2021

With the explosive use of GPS-enabled devices, increasingly massive volumes of trajectory data capturing the movements of people and vehicles are becoming available, which is useful in many application areas, such as transportation, traffic management, and location-based services. As a result, many trajectory data management and analytic systems have emerged that target either offline or online settings. However, some applications call for both offline and online analyses. For example, in traffic management scenarios, offline analyses of historical trajectory data can be used for traffic planning purposes, while online analyses of streaming trajectories can be adopted for congestion monitoring purposes. Existing trajectory-based systems tend to perform offline and online trajectory analysis separately, which is inefficient. In this paper, we propose a hybrid and efficient framework, called Dragoon, based on Spark, to support both offline and online big trajectory management and analytics. The framework features a mutable resilient distributed dataset model, including RDD Share, RDD Update, and RDD Mirror, which enables hybrid storage of historical and streaming trajectories. It also contains a real-time partitioner capable of efficiently distributing trajectory data and supporting both offline and online analyses. Therefore, Dragoon provides a hybrid analysis pipeline. Support for several typical trajectory queries and mining tasks demonstrates the flexibility of Dragoon. An extensive experimental study using both real and synthetic trajectory datasets shows that Dragoon (1) has similar offline trajectory query performance with the state-of-the-art system UlTraMan; (2) decreases up to doubled storage overhead compared with UlTraMan during trajectory editing; (3) achieves at least 40% improvement of scalability compared with popular streaming processing frameworks (i.e., Flink and Spark Streaming); and (4) offers an average doubled performance improvement for online trajectory data analytics.


Tianyi Li, Lu Chen, Christian S. Jensen, Torben Bach Pedersen, Jilin Hu ,"Evolutionary Clustering of Streaming Trajectories."




Yan Zhao, Kai Zheng, Jiannan Guo, Bin Yang, Torben Bach Pedersen, Christian S. JensenKontaktforfatter ,"Fairness-aware task assignment in spatial crowdsourcing: Game-theoretic approaches" in 37th IEEE International Conference on Data Engineering, ICDE 2021, 2021

The widespread diffusion of smartphones offers a capable foundation for the deployment of Spatial Crowdsourcing (SC), where mobile users, called workers, perform location- dependent tasks assigned to them. A key issue in SC is how best to assign tasks, e.g., the delivery of food and packages, to appropriate workers. Specifically, we study the problem of Fairness-aware Task Assignment (FTA) in SC, where tasks are to be assigned in a manner that achieves some notion of fairness across workers. In particular, we aim to minimize the payoff difference among workers while maximizing the average worker payoff. To solve the problem, we first generate so-called Valid Delivery Point Sets (VDPSs) for each worker according to an approach that exploits dynamic programming and distance- constrained pruning. Next, we show that FTA is NP-hard and proceed to propose two heuristic algorithms, a Fairness-aware Game-Theoretic (FGT) algorithm and an Improved Evolutionary Game-Theoretic (IEGT) algorithm. More specifically, we formulate FTA as a multi-player game. In this setting, the FGT approach represents a best-response method with sequential and asynchronous updates of workers' strategies, given by the VDPSs, that achieves a satisfying task assignment when a pure Nash equilibrium is reached. Next, the IEGT approach considers a setting with a large population of workers that repeatedly engage in strategic interactions. The IEGT approach exploits replicator dynamics that cause the whole population to evolve and choose better resources, i.e., VDPSs. Using the property of evolutionary equilibrium, a satisfying task assignment is obtained that corresponds to a stable state with similar payoffs among workers and good average worker payoff. Extensive experiments offer insight into the effectiveness and efficiency of the proposed solutions.


Zhida Chen, Lisi Chen, Gao Cong, Christian S. JensenKontaktforfatter ,"Location- and keyword-based querying of geo-textual data: a survey" in VLDB Journal, 2021

With the broad adoption of mobile devices, notably smartphones, keyword-based search for content has seen increasing use by mobile users, who are often interested in content related to their geographical location. We have also witnessed a proliferation of geo-textual content that encompasses both textual and geographical information. Examples include geo-tagged microblog posts, yellow pages, and web pages related to entities with physical locations. Over the past decade, substantial research has been conducted on integrating location into keyword-based querying of geo-textual content in settings where the underlying data is assumed to be either relatively static or is assumed to stream into a system that maintains a set of continuous queries. This paper offers a survey of both the research problems studied and the solutions proposed in these two settings. As such, it aims to offer the reader a first understanding of key concepts and techniques, and it serves as an “index” for researchers who are interested in exploring the concepts and techniques underlying proposed solutions to the querying of geo-textual data.


Bolong Zheng, Xi Zhao, Lianggui Weng, Nguyen Quoc Viet Hung, Hang Liu, Christian S. Jensen ,"PM-LSH - a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search."




Zhe Li, Tsz Nam Chan, Man Lung Yiu, Christian S. Jensen ,"PolyFit: Polynomial-based indexing approach for fast approximate range aggregate queries" in Advances in Database Technology - 24th International Conference on Extending Database Technology, EDBT 2021, 2021

Range aggregate queries find frequent application in data analytics. In many use cases, approximate results are preferred over accurate results if they can be computed rapidly and satisfy approximation guarantees. Inspired by a recent indexing approach, we provide means of representing a discrete point dataset by continuous functions that can then serve as compact index structures. More specifically, we develop a polynomial-based indexing approach, called PolyFit, for processing approximate range aggregate queries. PolyFit is capable of supporting multiple types of range aggregate queries, including COUNT, SUM, MIN and MAX aggregates, with guaranteed absolute and relative error bounds. Experimental results show that PolyFit is faster and more accurate and compact than existing learned index structures.


Bolong Zheng, Lianggui Weng, Xi Zhao, Kai Zeng, Xiaofang Zhou, Christian S. Jensen ,"REPOSE: Distributed Top-k Trajectory Similarity Search with Local Reference Point Tries"

Trajectory similarity computation is a fundamental component in a variety of real-world applications, such as ridesharing, road planning, and transportation optimization. Recent advances in mobile devices have enabled an unprecedented increase in the amount of available trajectory data such that efficient query processing can no longer be supported by a single machine. As a result, means of performing distributed in-memory trajectory similarity search are called for. However, existing distributed proposals suffer from either computing resource waste or are unable to support the range of similarity measures that are being used. We propose a distributed in-memory management framework called REPOSE for processing top-k trajectory similarity queries on Spark. We develop a reference point trie (RP-Trie) index to organize trajectory data for local search. In addition, we design a novel heterogeneous global partitioning strategy to eliminate load imbalance in distributed settings. We report on extensive experiments with real-world data that offer insight into the performance of the solution, and show that the solution is capable of outperforming the state-of-the-art proposals.


Bolong Zheng, Lianggui Weng, Xi Zhao, Kai Zeng, Xiaofang Zhou, Christian S. Jensen ,"REPOSE: Distributed top-k trajectory similarity search with local reference point tries" in 37th IEEE International Conference on Data Engineering, ICDE 2021, 2021

Trajectory similarity computation is a fundamental component in a variety of real-world applications, such as ridesharing, road planning, and transportation optimization. Recent advances in mobile devices have enabled an unprecedented increase in the amount of available trajectory data such that efficient query processing can no longer be supported by a single machine. As a result, means of performing distributed in-memory trajectory similarity search are called for. However, existing distributed proposals either suffer from computing resource waste or are unable to support the range of similarity measures that are being used. We propose a distributed in-memory management framework called REPOSE for processing top-k trajectory similarity queries on Spark. We develop a reference point trie (RP-Trie) index to organize trajectory data for local search. In addition, we design a novel heterogeneous global partitioning strategy to eliminate load imbalance in distributed settings. We report on extensive experiments with real-world data that offer insight into the performance of the solution, and show that the solution is capable of outperforming the state-of-the-art proposals.


Qi Hu, Lingfeng Ming, Ruijie Xi, Lu Chen, Christian S. Jensen, Bolong ZhengKontaktforfatter ,"SOUP: A fleet management system for passenger demand prediction and competitive taxi supply" in 37th IEEE International Conference on Data Engineering, ICDE 2021, 2021

Online car-hailing services have gained substantial popularity. An effective taxi fleet management strategy should not only increase taxi utilization by reducing taxi idle time, but should also improve passenger satisfaction by minimizing passenger waiting time. We demonstrate a fleet management system called SOUP that aims at minimizing taxi idle time and that monitors the fleet movement status. SOUP includes a passenger request prediction model called ST-GCSL that predicts the number of requests in the near future, and it includes a demand-aware route planning algorithm called DROP that provides idle taxis with search routes to serve potential requests. In addition, SOUP supports visualizing and analyzing historical passenger requests, simulating fleet movement, and computing evaluation metrics. We demonstrate how SOUP accurately predicts passenger demand and significantly reduces taxi idle time.


Lei Bi, Juan Cao, Guohui Li, Nguyen Quoc Viet Hung, Christian S. Jensen, Bolong ZhengKontaktforfatter ,"SpeakNav: A voice-based navigation system via route description language understanding" in 37th IEEE International Conference on Data Engineering, ICDE 2021, 2021

Many navigation applications take natural language speech as input, which avoids typing in words with their hands and decreases the occurrence of traffic accidents. We propose the SpearkNav navigation system that enables users to describe intended routes via speech and supports clue-based route retrieval. SpeakNav includes a route description language understanding model for determining POIs and distances along expected routes, and it includes an efficient algorithm to compute desired routes. In addition, SpeakNav supports basic POI and location search and location-based route navigation. We demonstrate how SpeakNav accurately recognizes users' intentions and recommends appropriate routes in real application scenarios.


Bolong Zheng, Lei Bi, Juan Cao, Hua Chai, Jun Fang, Lu Chen, Yunjun Gao, Xiaofang Zhou, Christian S. Jensen ,"Speaknav: Voice-based route description language understanding for template-driven path search" in 47th International Conference on Very Large Data Bases, VLDB 2021, 2021

Many navigation applications take natural language speech as input, which avoids users typing in words and thus improves traffic safety. However, navigation applications often fail to understand a user’s free-form description of a route. In addition, they only support input of a specific source or destination, which does not enable users to specify additional route requirements. We propose a SpeakNav framework that enables users to describe intended routes via speech and then recommends appropriate routes. Specifically, we propose a novel Route Template based Bidirectional Encoder Representation from Transformers (RT-BERT) model that supports the understanding of natural language route descriptions. The model enables extraction of information of intended POI keywords and related distances. Then we formalize a template-driven path query that uses the extracted information. To enable efficient query processing, we develop a hybrid label index for computing network distances between POIs, and we propose a branch-and-bound algorithm along with a pivot reverse B-tree (PB-tree) index. Experiments with real and synthetic data indicate that RT-BERT offers high accuracy and that the proposed algorithm is capable of outperforming baseline algorithms.


Ziquan Fang, Yuntao Du, Xinjun Zhu, Lu Chen, Yunjun Gao, Christian S. Jensen ,"ST2Vec - Spatio-Temporal Trajectory Similarity Learning in Road Networks."




Hao Huang, Qian Yan, Lu Chen, Yunjun Gao, Christian S. Jensen ,"Statistical Inference of Diffusion Networks" in I E E E Transactions on Knowledge & Data Engineering, 2021

To infer structures in diffusion networks, existing approaches mostly need to know not only the final infection statuses of network nodes, but also the exact times when infections occur. In contrast, in many real-world settings, such as disease propagation, monitoring exact infection times is often infeasible due to a high cost. We investigate the problem of how to learn diffusion network structures based on only the final infection statuses of nodes. Instead of utilizing sequences of timestamps to determine potential parent-child influence relationships between nodes, we propose to find influence relationships with high statistical significance. To this end, we design a probabilistic generative model of the final infection statuses to quantitatively measure the likelihood of potential structures of the objective diffusion network, taking into account network complexity. Based on this model, we can infer an appropriate number of most probable parent nodes for each node in the network. Furthermore, to reduce redundant inference computations, we are able to preclude insignificant candidate parent nodes from being considered during inferencing, if their infections have little correlation with the infections of the corresponding child nodes. Extensive experiments on both synthetic and real-world networks offer evidence that the proposed approach is effective and efficient.


Tianyi Li, Lu Chen, Christian S. Jensen, Torben Bach Pedersen ,"TRACE: Real-time Compression of Streaming Trajectories in Road Networks" in Proceedings of the VLDB Endowment, 2021

The deployment of vehicle location services generates increasingly massive vehicle trajectory data, which incurs high storage and transmission costs. A range of studies target offline compression to reduce the storage cost. However, to enable online services such as real-time traffic monitoring, it is attractive to also reduce transmission costs by being able to compress streaming trajectories in real-time. Hence, we propose a framework called TRACE that enables compression, transmission, and querying of network-constrained streaming trajectories in a fully online fashion. We propose a compact two-stage representation of streaming trajectories: a speed-based representation removes redundant information, and a multiple-references based referential representation exploits subtrajectory similarities. In addition, the online referential representation is extended with reference selection, deletion and rewriting functions that further improve the compression performance. An efficient data transmission scheme is provided for achieving low transmission overhead. Finally, indexing and filtering techniques support efficient real-time range queries over compressed trajectories. Extensive experiments with real-life and synthetic datasets evaluate the different parts of TRACE, offering evidence that it is able to outperform the existing representative methods in terms of both compression ratio and transmission cost.


David Campos, Tung Kieu, Chenjuan Guo, Feiteng Huang, Kai Zheng, Bin Yang, Christian S. Jensen ,"Unsupervised Time Series Outlier Detection with Diversity-Driven Convolutional Ensembles." in Proceedings of the VLDB Endowment, 2021

With the sweeping digitalization of societal, medical, industrial,and scientific processes, sensing technologies are being deployedthat produce increasing volumes of time series data, thus fueling aplethora of new or improved applications. In this setting, outlierdetection is frequently important, and while solutions based onneural networks exist, they leave room for improvement in termsof both accuracy and efficiency. With the objective of achievingsuch improvements, we propose a diversity-driven, convolutionalensemble. To improve accuracy, the ensemble employs multiplebasic outlier detection models built on convolutional sequence-tosequence autoencoders that can capture temporal dependencies intime series. Further, a novel diversity-driven training method maintains diversity among the basic models, with the aim of improvingthe ensemble’s accuracy. To improve efficiency, the approach enables a high degree of parallelism during training. In addition, itis able to transfer some model parameters from one basic modelto another, which reduces training time. We report on extensiveexperiments using real-world multivariate time series that offerinsight into the design choices underlying the new approach andoffer evidence that it is capable of improved accuracy and efficiency.


David Campos, Tung Kieu, Chenjuan Guo, Feiteng Huang, Kai Zheng, Bin Yang, Christian S. Jensen ,"Unsupervised Time Series Outlier Detection with Diversity-Driven Convolutional Ensembles - Extended Version."




2020 Top

Zhong Yang, Bolong Zheng, Guohui Li, Zhao Xi, Xiaofang Zhou, Christian S. Jensen ,"Adaptive Top-k Overlap Set Similarity Joins" in 36th IEEE International Conference on Data Engineering, 2020

The set similarity join (SSJ) is core functionality in a range of applications, including data cleaning, near-duplicate object detection, and data integration. Threshold-based SSJ queries return all pairs of sets with similarity no smaller than a given threshold. As results, and their utility, are very sensitive to the choice of threshold value, it is a problem that it is difficult to choose such an appropriate value. Doing so requires prior knowledge of the data, which users often do not have. To avoid this problem, we propose a solution to the top-k overlap set similarity join (TkOSSJ) that returns k pairs of sets with the highest overlap similarities. The state-of-the-art solution disregards the effect of the so-called step size, which is the number of elements accessed in each iteration of the algorithm. This affects its performance negatively. To address this issue, we first propose an algorithm that uses a fixed step size, thus taking advantage of the benefits of a large step size, and then we present an adaptive step size algorithm that is capable of automatically adjusting the step size, thus reducing redundant computations. An extensive empirical study offers insight into the new algorithms and indicates that they are capable of outperforming the state-of-the-art method on real, large-scale data sets.


Simon Aagaard Pedersen, Bin Yang, Christian S. Jensen ,"A Hybrid Learning Approach to Stochastic Routing" in International Conference on Data Engineering, 2020

Increasingly available trajectory data enables detailed capture of traffic conditions. We consider an uncertain road network graph, where each graph edge is associated with a travel time distribution, and we study probabilistic budget routing that aims to find the path with the highest probability of arriving within a given time budget. In this setting, a fundamental operation is to compute the travel cost distribution of a path from the cost distributions of the edges in the path. Solutions that rely on convolution generally assume independence among the edges' distributions, which often does not hold and thus incurs poor accuracy. We propose a hybrid approach that combines convolution and machine learning-based estimation to take into account dependencies among distributions in order to improve accuracy. Next, we propose an efficient routing algorithm that is able to utilize the hybrid approach and that features effective pruning techniques to enable faster routing. Empirical studies on a substantial real-world trajectory set offer insight into the properties of the proposed solution, indicating that it is promising.


Bezaye Tesfaye, Nikolaus Augsten, Mateusz Pawlik, Michael Hanspeter Böhlen, Christian S. Jensen ,"An Efficient Index for Reachability Queries in Public Transport Networks" in European Conference on Advances in Databases and Information Systems 2020, 2020

Computing path queries such as the shortest path in public transport networks is challenging because the path costs between nodes change over time. A reachability query from a node at a given start time on such a network retrieves all points of interest (POIs) that are reachable within a given cost budget. Reachability queries are essential building blocks in many applications, for example, group recommendations, ranking spatial queries, or geomarketing. We propose an efficient solution for reachability queries in public transport networks. Currently, there are two options to solve reachability queries. (1) Execute a modified version of Dijkstra’s algorithm that supports time-dependent edge traversal costs; this solution is slow since it must expand edge by edge and does not use an index. (2) Issue a separate path query for each single POI, i.e., a single reachability query requires answering many path queries. None of these solutions scales to large networks with many POIs. We propose a novel and lightweight reachability index. The key idea is to partition the network into cells. Then, in contrast to other approaches, we expand the network cell by cell. Empirical evaluations on synthetic and real-world networks confirm the efficiency and the effectiveness of our index-based reachability query solution.


Bolong Zheng, Kai Zheng, Christian S. Jensen, Quoc Viet Hung Nguyen, Han Su, Guohui Li, Xiaofang Zhou ,"Answering Why-Not Group Spatial Keyword Queries" in I E E E Transactions on Knowledge & Data Engineering, 2020

With the proliferation of geo-textual objects on the web, extensive efforts have been devoted to improving the efficiency of top-kk spatial keyword queries in different settings. However, comparatively much less work has been reported on enhancing the quality and usability of such queries. In this context, we propose means of enhancing the usability of a top-kk group spatial keyword query, where a group of users aim to find kk objects that contain given query keywords and are nearest to the users. Specifically, when users receive the result of such a query, they may find that one or more objects that they expect to be in the result are in fact missing, and they may wonder why. To address this situation, we develop a so-called why-not query that is able to minimally modify the original query into a query that returns the expected, but missing, objects, in addition to other objects. Specifically, we formalize the why-not query in relation to the top-kk group spatial keyword query, called the Why-not Group Spatial Keyword Query (WGSKWGSK) that is able to provide a group of users with a more satisfactory query result. We propose a three-phase framework for efficiently computing the WGSKWGSK. The first phase substantially reduces the search space for the subsequent phases by retrieving a set of objects that may affect the ranking of the user-expected objects. The second phase provides an incremental sampling algorithm that generates candidate weightings of more promising queries. The third phase determines the penalty of each refined query and returns the query with minimal penalty, i.e., the minimally modified query. Extensive experiments with real and synthetic data offer evidence that the proposed solution excels over baselines with respect to both effectiveness and efficiency.


Simon Aagaard Pedersen, Bin Yang, Christian S. Jensen ,"Anytime Stochastic Routing with Hybrid Learning" in 2020 International Conference on Very Large Databases PhD Workshop, VLDB-PhD 2020, 2020

Increasingly massive volumes of vehicle trajectory data hold the potential to enable higher-resolution traffic services than hitherto possible. We use trajectory data to create a high-resolution, uncertain road-network graph, where edges are associated with travel-time distributions. In this setting, we study probabilistic budget routing that aims to find the path with the highest probability of arriving at a destination within a given time budget. A key challenge is to compute accurately and efficiently the travel-time distribution of a path from the travel-time distributions of the edges in the path. Existing solutions that rely on convolution assume independence among the distributions to be convolved, but as distributions are often dependent, the result distributions exhibit poor accuracy. We propose a hybrid approach that combines convolution with estimation based on machine learning to account for dependencies among distributions in order to improve accuracy. Since the hybrid approach cannot rely on the independence assumption that enables effective pruning during routing, naive use of the hybrid approach is costly. To address the resulting efficiency challenge, we propose an anytime routing algorithm that is able to return a “good enough” path at any time and that eventually computes a high-quality path. Empirical studies involving a substantial real-world trajectory set offer insight into the design properties of the proposed solution, indicating that it is practical in real-world settings.


Ziquan Fang, Yunjun Gao, Lu Pan, Lu Chen, Xiaoye Miao, Christian S. Jensen ,"CoMing: A Real-time Co-Movement Mining System for Streaming Trajectories" in ACM SIGMOD International Conference on Management of Data 2020, 2020

The aim of real-time co-movement pattern mining for streaming trajectories is to discover co-moving objects that satisfy specific spatio-temporal constraints in real time. This functionality serves a range of real-world applications, such as traffic monitoring and management. However, little work targets the visualization and interaction with such co-movement detection on streaming trajectories. To this end, we develop CoMing, a real-time co-movement pattern mining system, to handle streaming trajectories. CoMing leverages ICPE, a real-time distributed co-movement pattern detection framework, and thus, it has its capacity of good performance. This demonstration offers hands-on experience with CoMing's visual and user-friendly interface. Moreover, several applications in the traffic domain, including object monitoring and traffic statistics visualization, are also provided to users.


Tianyi Li, Ruikai Huang, Lu Chen, Christian S. Jensen, Torben Bach Pedersen ,"Compression of Uncertain Trajectories in Road Networks" in Proceedings of the VLDB Endowment, 2020

Massive volumes of uncertain trajectory data are being generated by GPS devices. Due to the limitations of GPS data, these trajectories are generally uncertain. This state of affairs renders it is attractive to be able to compress uncertain trajectories and to be able to query the trajectories efficiently without the need for (full) decompression. Unlike existing studies that target accurate trajectories, we propose a framework that accommodates uncertain trajectories in road networks. To address the large cardinality of instances of a single uncertain trajectory, we exploit the similarity between uncertain trajectory instances and provide a referential representation. First, we propose a reference selection algorithm based on the notion of Fine-grained Jaccard Distance to efficiently select trajectory instances as references. Then we provide referential representations of the different types of information contained in trajectories to achieve high compression ratios. In particular, a new compression scheme for temporal information is presented to take into account variations in sample intervals. Finally, we propose an index and develop filtering techniques to support efficient queries over compressed uncertain trajectories. Extensive experiments with real-life datasets offer insight into the properties of the framework and suggest that it is capable of outperforming the existing state-of-the-art method in terms of both compression ratio and efficiency.


Chenjuan Guo, Bin Yang, Jilin Hu, Christian S. Jensen, Lu ChenKontaktforfatter ,"Context-aware, preference-based vehicle routing" in V L D B Journal, 2020

Vehicle routing is an important service that is used by both private individuals and commercial enterprises. Drivers may have different contexts that are characterized by different routing preferences. For example, during different times of day or weather conditions, drivers may make different routing decisions such as preferring or avoiding highways. The increasing availability of vehicle trajectory data yields an increasingly rich data foundation for context-aware, preference-based vehicle routing. We aim to improve routing quality by providing new, efficient routing techniques that identify and take contexts and their preferences into account. In particular, we first provide means of learning contexts and their preferences, and we apply these to enhance routing quality while ensuring efficiency. Our solution encompasses an off-line phase that exploits a contextual preference tensor to learn the relationships between contexts and routing preferences. Given a particular context for which trajectories exist, we learn a routing preference. Then, we transfer learned preferences from contexts with trajectories to similar contexts without trajectories. In the on-line phase, given a context, we identify the corresponding routing preference and use it for routing. To achieve efficiency, we propose preference-based contraction hierarchies that are capable of speeding up both off-line learning and on-line routing. Empirical studies with vehicle trajectory data offer insight into the properties of proposed solution, indicating that it is capable of improving quality and is efficient.


Christian S. Jensen ,"Editorial: Updates to the Editorial Board" in A C M Transactions on Database Systems, 2020




Jianzhong Qi, Guanli Liu, Christian S. Jensen, Lars Kulik ,"Effectively Learning Spatial Indices" in Proceedings of the VLDB Endowment, 2020

Machine learning, especially deep learning, is used increasingly to enable better solutions for data management tasks previously solved by other means, including database indexing. A recent study shows that a neural network can not only learn to predict the disk address of the data value associated with a one-dimensional search key but also outperform B-tree-based indexing, thus promises to speed up a broad range of database queries that rely on B-trees for efficient data access. We consider the problem of learning an index for two-dimensional spatial data. A direct application of a neural network is unattractive because there is no obvious ordering of spatial point data. Instead, we introduce a rank space based ordering technique to establish an ordering of point data and group the points into blocks for index learning. To enable scalability, we propose a recursive strategy that partitions a large point set and learns indices for each partition. Experiments on real and synthetic data sets with more than 100 million points show that our learned indices are highly effective and efficient. Query processing using our indices is more than an order of magnitude faster than the use of R-trees or a recently proposed learned index.


Jiehuan Luo, Xin Cao, Xike Xie, Qiang Qu, Zhiqiang Xu, Christian S. Jensen ,"Efficient Attribute-Constrained Co-Located Community Search" in 36th IEEE International Conference on Data Engineering, 2020

Networked data, notably social network data, often comes with a rich set of annotations, or attributes, such as documents (e.g., tweets) and locations (e.g., check-ins). Community search in such attributed networks has been studied intensively due to its many applications in friends recommendation, event organization, advertising, etc. We study the problem of attribute-constrained co-located community (ACOC) search, which returns a community that satisfies three properties: i) structural cohesiveness: the members in the community are densely connected; ii) spatial co-location: the members are close to each other; and iii) attribute constraint: a set of attributes are covered by the attributes associated with the members. The ACOC problem is shown to be NP-hard. We develop four efficient approximation algorithms with guaranteed error bounds in addition to an exact solution that works on relatively small graphs. Extensive experiments conducted with both real and synthetic data offer insight into the efficiency and effectiveness of the proposed methods, showing that they outperform three adapted state-of-the-art algorithms by an order of magnitude. We also find that the approximation algorithms are much faster than the exact solution and yet offer high accuracy.


Xinjue Wang, Ke Deng, Jianxing Li, Jeffery Xu Yu, Christian S. Jensen, Xiaochun Yang ,"Efficient targeted influence minimization in big social networks" in World Wide Web, 2020

An online social network can be used for the diffusion of malicious information like derogatory rumors, disinformation, hate speech, revenge pornography, etc. This motivates the study of influence minimization that aim to prevent the spread of malicious information. Unlike previous influence minimization work, this study considers the influence minimization in relation to a particular group of social network users, called targeted influence minimization. Thus, the objective is to protect a set of users, called target nodes, from malicious information originating from another set of users, called active nodes. This study also addresses two fundamental, but largely ignored, issues in different influence minimization problems: (i) the impact of a budget on the solution; (ii) robust sampling. To this end, two scenarios are investigated, namely unconstrained and constrained budget. Given an unconstrained budget, we provide an optimal solution; Given a constrained budget, we show the problem is NP-hard and develop a greedy algorithm with an (1−1e)-approximation. More importantly, in order to solve the influence minimization problem in large, real-world social networks, we propose a robust sampling-based solution with a desirable theoretic bound. Extensive experiments using real social network datasets offer insight into the effectiveness and efficiency of the proposed solutions.


Simon Aagaard Pedersen, Bin Yang, Christian S. Jensen ,"Fast stochastic routing under time-varying uncertainty" in The VLDB Journal, 2020

Data are increasingly available that enable detailed capture of travel costs associated with the movements of vehicles in road networks, notably travel time, and greenhouse gas emissions. In addition to varying across time, such costs are inherently uncertain, due to varying traffic volumes, weather conditions, different driving styles among drivers, etc. In this setting, we address the problem of enabling fast route planning with time-varying, uncertain edge weights. We initially present a practical approach to transforming GPS trajectories into time-varying, uncertain edge weights that guarantee the first-in-first-out property. Next, we propose time-dependent uncertain contraction hierarchies (TUCHs), a generic speed-up technique that supports a wide variety of stochastic route planning functionality in the paper’s setting. In particular, we propose query processing methods based on TUCH for two representative types of stochastic routing: non-dominated routing and probabilistic budget routing. Experimental studies with a substantial GPS data set offer insight into the design properties of the paper’s proposals and suggest that they are capable of enabling efficient stochastic routing.


Shuo Shang, Lisi Chen, Christian S. Jensen, Panos Kalnis ,"Introduction to Spatio-temporal data management and analytics for Smart City research" in Geoinformatica, 2020

This special issue of the GeoInformatica journal covers recent advances in spatio-temporal data management and analytics in the context of smart city and urban computing. It contains 11 articles that present solid research studies and innovative ideas in the area of spatio-temporal data management for smart city research. All of the 11 papers went through several rounds of rigorous reviews by the guest editors and invited reviewers.Geo-textual query processing has been receiving much attention in area of spatio-temporal data management. The paper, by Xinyu Chen et al., “S2R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search,” proposes a pivot-based hierarchical indexing structure to integrate spatial and semantic information in a seamless way. The proposed index is able to return accurate query results that take semantic meaning of geo-textual objects into consideration. Another paper, by Zhongpu Chen et al., “ITISS: an efficient framework for querying big temporal data,” proposes an in-memory based two-level index structure in Spark, which is easily understood and implemented, but without loss of effectiveness and efficiency. Additionally, the paper, by Xiaozhao Song et al., “Collective spatial keyword search on activity trajectories,” presents an effective and efficient collective spatial keyword query processing algorithm on activity trajectories. Finally, Lisi Chen et al., “Spatial keyword search: a survey,” present a survey of existing studies regarding spatial keyword search.Location-based social networks (LBSNs) are becoming increasingly indispensable in smart cities. Hao Wang and Ziyu Lu develop the first unified and generic framework to support user-preference based sequence matching in their paper “Preference-aware sequence matching for location-based services.” Yanhui Li et al. propose an approach to extracting similar user pattern from LBSNs and annotating semantic tags of locations in their paper “Annotating semantic tags of locations in location-based social networks.” The problem is solved by training a binary ELM classifier for each tag in the tag space to support multi-label classification.Spatial crowdsourcing (SC) is an emerging research direction in spatio-temporal data analytics. Tianshu Song et al. focus on solving a fundamental issue in SC, assigning tasks to suitable workers to obtain multiple global objectives, in their paper “Multi-skill aware task assignment in real-time spatial crowdsourcing.” They define the multi-skill aware task assignment problem in real-time SC, which is proven to be NP-hard, and propose an online greedy algorithm that iteratively assigns optimal workers. Yiming Li et al., in their paper “Two-sided online bipartite matching in spatial data: experiments and analysis,” present a comprehensive evaluation and analysis of the representative algorithms for the two-sided online bipartite matching problem, which is widely studied in the area of spatio-temporal data management.Furthermore, the paper, by Yuliang Ma et al., “Graph simulation on large scale temporal graphs,” investigates the problem of temporal bounded simulation on temporal graphs, which is a fundamental problem in urban computing. It presents a simulation matching framework consisting of pattern segmentation, temporal bounded simulation of pattern segments, and result integration. Mengqing Mei et al. focus on another fundamental problem in urban computing, identifying the correlation between features and labels from multi-label urban datasets, in their paper “An innovative multi-label learning based algorithm for city data computing.” In particular, they propose a multi-label learning algorithm that learns separate subspaces for features and labels by maximizing the independence between the components in each subspace.Finally, the paper, by Jihai Yang et al., “Joint hyperspectral unmixing for urban computing,” focuses on an important problem related to urban computing: joint hyperspectral unmixing. Specifically, it presents an algorithm to process two hyperspectral images, simultaneously, and makes full use of the available information when most of the signals at the two end points are similar.These papers represent a variety of directions in the fast-growing area of spatio-temporal data management and analytics in smart city applications. We hope that these papers will foster the development of smart cities and inspire more research in this promising area.


Bolong Zheng, Chenze Huang, Christian S. Jensen, Lu Chen, Nguyen Quoc Viet Hung, Guanfeng Liu, Guohui Li, Kai Zheng ,"Online Trichromatic Pickup and Delivery Scheduling in Spatial Crowdsourcing" in International Conference on Data Engineering, 2020

In Pickup-and-Delivery problems (PDP), mobile workers are employed to pick up and deliver items with the goal of reducing travel and fuel consumption. Unlike most existing efforts that focus on finding a schedule that enables the delivery of as many items as possible at the lowest cost, we consider trichromatic (worker-item-task) utility that encompasses worker reliability, item quality, and task profitability. Moreover, we allow customers to specify keywords for desired items when they submit tasks, which may result in multiple pickup options, thus further increasing the difficulty of the problem. Specifically, we formulate the problem of Online Trichromatic Pickup and Delivery Scheduling (OTPD) that aims to find optimal delivery schedules with highest overall utility. In order to quickly respond to submitted tasks, we propose a greedy solution that finds the schedule with the highest utility-cost ratio. Next, we introduce a skyline kinetic tree-based solution that materializes intermediate results to improve the result quality. Finally, we propose a density-based grouping solution that partitions streaming tasks and efficiently assigns them to the workers with high overall utility. Extensive experiments with real and synthetic data offer evidence that the proposed solutions excel over baselines with respect to both effectiveness and efficiency.


Lisi Chen, Shuo Shang, Christian S. Jensen, Bin Yao, Panos Kalnis ,"Parallel Semantic Trajectory Similarity Join" in International Conference on Data Engineering, 2020

Matching similar pairs of trajectories, called trajectory similarity join, is a fundamental functionality in spatial data management. We consider the problem of semantic trajectory similarity join (STS-Join). Each semantic trajectory is a sequence of Points-of-interest (POIs) with both location and text information. Thus, given two sets of semantic trajectories and a threshold θ, the STS-Join returns all pairs of semantic trajectories from the two sets with spatio-textual similarity no less than θ. This join targets applications such as term-based trajectory near-duplicate detection, geo-text data cleaning, personalized ridesharing recommendation, keyword-aware route planning, and travel itinerary recommendation.With these applications in mind, we provide a purposeful definition of spatio-textual similarity. To enable efficient STS-Join processing on large sets of semantic trajectories, we develop trajectory pair filtering techniques and consider the parallel processing capabilities of modern processors. Specifically, we present a two-phase parallel search algorithm. We first group semantic trajectories based on their text information. The algorithm's per-group searches are independent of each other and thus can be performed in parallel. For each group, the trajectories are further partitioned based on the spatial domain. We generate spatial and textual summaries for each trajectory batch, based on which we develop batch filtering and trajectory-batch filtering techniques to prune unqualified trajectory pairs in a batch mode. Additionally, we propose an efficient divide-and-conquer algorithm to derive bounds of spatial similarity and textual similarity between two semantic trajectories, which enable us prune dissimilar trajectory pairs without the need of computing the exact value of spatio-textual similarity. Experimental study with large semantic trajectory data confirms that our algorithm of processing semantic trajectory join is capable of outperforming our well-designed baseline by a factor of 8-12.


Bolong Zheng, Zhao Xi, Lianggui Weng, Nguyen Quoc Viet Hung, Hang Liu, Christian S. Jensen ,"PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search" in Proceedings of the VLDB Endowment, 2020

Nearest neighbor (NN) search in high-dimensional spaces isinherently computationally expensive due to the curse of dimensionality. As a well-known solution to approximate NNsearch, locality-sensitive hashing (LSH) is able to answerc-approximate NN (c-ANN) queries in sublinear time withconstant probability. Existing LSH methods focus mainlyon building hash bucket based indexing such that the candidate points can be retrieved quickly. However, existingcoarse-grained structures fail to offer accurate distance estimation for candidate points, which translates into additionalcomputational overhead when having to examine unnecessary points. This in turn reduces the performance of queryprocessing. In contrast, we propose a fast and accurate LSHframework, called PM-LSH, that aims to compute the cANN query on large- scale, high-dimensional datasets. First,we adopt a simple yet effective PM-tree to index the datapoints. Second, we develop a tunable confidence intervalto achieve accurate distance estimation and guarantee highresult quality. Third, we propose an efficient algorithm ontop of the PM-tree to improve the performance of computing c-ANN queries. Extensive experiments with real-worlddata offer evidence that PM-LSH is capable of outperforming existing proposals with respect to both efficiency andaccuracy.


Dingming Wu, Can Hou, Erjia Xiao, Christian S. Jensen ,"Semantic Region Retrieval from Spatial RDF Data" in International Conference on Database Systems for Advanced Applications, 2020

The top-k most relevant Semantic Place retrieval (kSP) query on spatial RDF data combines keyword-based and location-based retrieval. The query returns semantic places that are subgraphs rooted at a place entity with an associated location. The relevance to the query keywords of a semantic place is measured by a looseness score that aggregates the graph distances between the place (root) and the occurrences of the keywords in the nodes of the tree. We observe that kSP queries may retrieve semantic places that are spatially close to the query location, but with very low keyword relevance. When any single nearby place has low relevance, returning instead multiple relevant places maybe helpful. Hence, we propose a generalization of semantic place retrieval, namely semantic region (SR) retrieval. An SR query aims to return multiple places that are spatially close to the query location such that each place is relevant to one or more query keywords. An algorithm and optimization techniques are proposed for the efficient processing of SR queries. Extensive empirical studies with two real datasets offer insight into the performance of the proposals.


Jilin Hu, Bin Yang, Chenjuan Guo, Christian S. Jensen, Hui Xiong ,"Stochastic Origin-Destination Matrix Forecasting Using Dual-Stage Graph Convolutional, Recurrent Neural Networks" in International Conference on Data Engineering, 2020

Origin-destination (OD) matrices are used widely in transportation and logistics to record the travel cost (e.g., travel speed or greenhouse gas emission) between pairs of OD regions during different intervals within a day. We model a travel cost as a distribution because when traveling between a pair of OD regions, different vehicles may travel at different speeds even during the same interval, e.g., due to different driving styles or different waiting times at intersections. This yields stochastic OD matrices. We consider an increasingly pertinent setting where a set of vehicle trips is used for instantiating OD matrices. Since the trips may not cover all OD pairs for each interval, the resulting OD matrices are likely to be sparse. We then address the problem of forecasting complete, near future OD matrices from sparse, historical OD matrices. To solve this problem, we propose a generic learning framework that (i) employs matrix factorization and graph convolutional neural networks to contend with the data sparseness while capturing spatial correlations and that (ii) captures spatio-temporal dynamics via recurrent neural networks extended with graph convolutions. Empirical studies using two taxi trajectory data sets offer detailed insight into the properties of the framework and indicate that it is effective.


Lisi Chen, Shuo Shang, Christian S. Jensen, Jianliang Xu, Panos Kalnis, Bin Yao, Ling Shao ,"Top-k term publish/subscribe for geo-textual data streams" in V L D B Journal, 2020

Massive amounts of data that contain spatial, textual, and temporal information are being generated at a rapid pace. With streams of such data, which includes check-ins and geo-tagged tweets, available, users may be interested in being kept up-to-date on which terms are popular in the streams in a particular region of space. To enable this functionality, we aim at efficiently processing two types of general top-k term subscriptions over streams of spatio-temporal documents: region-based top-k spatial-temporal term (RST) subscriptions and similarity-based top-k spatio-temporal term (SST) subscriptions. RST subscriptions continuously maintain the top-k most popular trending terms within a user-defined region. SST subscriptions free users from defining a region and maintain top-k locally popular terms based on a ranking function that combines term frequency, term recency, and term proximity. To solve the problem, we propose solutions that are capable of supporting real-life location-based publish/subscribe applications that process large numbers of SST and RST subscriptions over a realistic stream of spatio-temporal documents. The performance of our proposed solutions is studied in extensive experiments using two spatio-temporal datasets.


2019 Top

Simon Aagaard Pedersen, Bin Yang, Christian S. Jensen ,"A Hybrid Learning Approach to Stochastic Routing"

Emerging disruptive innovations in transportation, e.g., autonomous vehicles and transportation-as-a-service, will benefit from high-resolution routing, where travel-time uncertainty is captured accurately.


Robert Waury, Peter Dolog, Christian S. Jensen, Kristian Torp ,"Analyzing trajectories using a path-based API" in 16th International Symposium on Spatial and Temporal Databases, SSTD 2019, 2019

Large vehicle trajectory data sets can give detailed insight into traffic and congestion that is useful for routing as well as transportation planning. Making information from such data sets available to more users can enable applications that reduce travel time and fuel consumption. However, extracting such information efficiently requires deep knowledge of the underlying schema and indexing methods. To enable more users to extract information from trajectory data, we have developed an API that removes the need to be familiar with the schema. Furthermore, when giving access to trajectory data, privacy concerns often call for the application of anonymization methods before analysis results are made available. In our demonstration, owners of trajectory data are able to experiment with different levels of anonymization to see how this affects the quality of different types of trajectory analysis services implemented on top of a large trajectory data set.


Bolong Zheng, Kai Zheng, Christian S. Jensen, Nguyen Quoc Viet Hung, Han Su, Guohui Li, Xiaofang Zhou ,"Answering Why-Not Group Spatial Keyword Queries (Extended Abstract)" in 35th IEEE International Conference on Data Engineering, ICDE 2019, 2019

With the proliferation of geo-textual objects on the web, extensive efforts have been devoted to improving the efficiency of top-k spatial keyword queries in different settings. However, comparatively much less work has been reported on enhancing the quality and usability of such queries. In this context, we propose means of enhancing the usability of a top-k group spatial keyword query, where a group of users aim to find k objects that contain given query keywords and are nearest to the users. Specifically, when users receive the result of such a query, they may find that one or more objects that they expect to be in the result are in fact missing, and they may wonder why. To address this situation, we develop a so-called why-not query that is able to minimally modify the original query into a query that returns the expected, but missing, objects, in addition to other objects. Specifically, we formalize the why-not query in relation to the top-k group spatial keyword query, called the Why-not Group Spatial Keyword Query (WGSK) that is able to provide a group of users with a more satisfactory query result. We propose a three-phase framework for efficiently computing he WGSK. Extensive experiments with real and synthetic data offer evidence that the proposed solution excels over baselines with respect to both effectiveness and efficiency.


Robert Waury, Christian S. Jensen, Kristian Torp ,"A NUMA-aware Trajectory Store for Travel-Time Estimation" in International Conference on Advances in Geographic Information Systems, 2019

The increasingly massive volumes of vehicle trajectory data that are becoming available hold the potential to enable more accurate vehicle travel-time estimation than hitherto possible. To enable such uses, we present a multi-threaded, in-memory trajectory store that supports efficient and accurate travel-time estimation for road-network paths based on network-constrained trajectories. The trajectory store employs advanced indexing to support so-called strict-path queries that retrieve all trajectories that traverse a given path to provide accurate travel-time estimations. As a key novel feature, the store is designed and implemented to exploit modern non-uniform memory access (NUMA) systems. We provide a detailed experimental study of the performance of the trajectory store using a synthetic trajectory data set based on real traffic data. The study shows that query latency can be halved compared to our baseline system.


Lisi Chen, Shuo Shang, Christian S. Jensen, Bin Yao, Zhiwei Zhang, Ling Shao ,"Effective and Efficient Reuse of Past Travel Behavior for Route Recommendation" in ACM Conference on Knowledge Discovery and Data Mining , 2019

With the increasing availability of moving-object tracking data, use of this data for route search and recommendation is increasingly important. To this end, we propose a novel parallel split-and-combine approach to enable route search by locations (RSL-Psc). Given a set of routes, a set of places to visit O, and a threshold θ, we retrieve the route composed of sub-routes that (i) has similarity to O no less than θ and (ii) contains the minimum number of sub-route combinations. The resulting functionality targets a broad range of applications, including route planning and recommendation, ridesharing, and location-based services in general. To enable efficient and effective RSL-Psc computation on massive route data, we develop novel search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we develop two parallel algorithms, Fully-Split Parallel Search (FSPS) and Group-Split Parallel Search (GSPS). We divide the route split-and-combine task into ∑k=0 M S(|O|,k+1) sub-tasks, where M is the maximum number of combinations and S(⋅) is the Stirling number of the second kind. In each sub-task, we use network expansion and exploit spatial similarity bounds for pruning. The algorithms split candidate routes into sub-routes and combine them to construct new routes. The sub-tasks are independent and are performed in parallel. Extensive experiments with real data offer insight into the performance of the algorithms, indicating that our RSL-Psc problem can generate high-quality results and that the two algorithms are capable of achieving high efficiency and scalability.


Lu Chen, Yunjun Gao, Yuanliang Zhang, Christian S. Jensen, Bolong Zheng ,"Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs" in The 35th IEEE International Conference on Data Engineering (ICDE), 2019

Many datasets including social media data and bibliographic data can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as star-schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarity. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. To improve the efficiency of the clustering, we develop two incremental approaches that aim to enable efficient PPR score computation when edge weights are updated. To boost the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insight into the effectiveness and efficiency of our proposals, compared with existing methods.


Tianming Zhang, Yunjun Gao, Lu Chen, Wei Guo, Shiliang Pu, Baihua Zheng, Christian S. Jensen ,"Efficient distributed reachability querying of massive temporal graphs" in VLDB Journal, 2019

Reachability computation is a fundamental graph functionality with a wide range of applications. In spite of this, little work has as yet been done on efficient reachability queries over temporal graphs, which are used extensively to model time-varying networks, such as communication networks, social networks, and transportation schedule networks. Moreover, we are faced with increasingly large real-world temporal networks that may be distributed across multiple data centers. This state of affairs motivates the paper’s study of efficient reachability queries on distributed temporal graphs. We propose an efficient index, called Temporal Vertex Labeling (TVL), which is a labeling scheme for distributed temporal graphs. We also present algorithms that exploit TVL to achieve efficient support for distributed reachability querying over temporal graphs in Pregel-like systems. The algorithms exploit several optimizations that hinge upon non-trivial lemmas. Extensive experiments using massive real and synthetic temporal graphs are conducted to provide detailed insight into the efficiency and scalability of the proposed methods, covering both index construction and query processing. Compared with the state-of-the-art methods, the TVL based query algorithms are capable of up to an order of magnitude speedup with lower index construction overhead.


Dingming Wu, Dexin Luo, Christian S. Jensen, Joshua Zhexu Huang ,"Efficiently Mining Maximal Diverse Frequent Itemsets" in International Conference on Database Systems for Advanced Applications, 2019

Given a database of transactions, where each transaction is a set of items, maximal frequent itemset mining aims to find all itemsets that are frequent, meaning that they consist of items that co-occur in transactions more often than a given threshold, and that are maximal, meaning that they are not contained in other frequent itemsets. Such itemsets are the most interesting ones in a meaningful sense. We study the problem of efficiently finding such itemsets with the added constraint that only the top-k most diverse ones should be returned. An itemset is diverse if its items belong to many different categories according to a given hierarchy of item categories. We propose a solution that relies on a purposefully designed index structure called the FP*-tree and an accompanying bound-based algorithm. An extensive experimental study offers insight into the performance of the solution, indicating that it is capable of outperforming an existing method by orders of magnitude and of scaling to large databases of transactions


Kaiyu Feng, Gao Cong, Christian S. Jensen, Tao Guo ,"Finding Attribute-Aware Similar Region for Data Analysis" in Proceedings of the VLDB Endowment, 2019

With the proliferation of mobile devices and location-based services, increasingly massive volumes of geo-tagged data are becoming available. This data typically also contains non-location information. We study how to use such information to characterize a region and then how to find a region of the same size and with the most similar characteristics. This functionality enables a user to identify regions that share characteristics with a user-supplied region that the user is familiar with and likes. More specifically, we formalize and study a new problem called the attribute-aware similar region search (ASRS) problem. We first define so-called composite aggregators that are able to express aspects of interest in terms of the information associated with a user-supplied region. When applied to a region, an aggregator captures the region's relevant characteristics. Next, given a query region and a composite aggregator, we propose a novel algorithm called DS-Search to find the most similar region of the same size. Unlike any previous work on region search, DS-Search repeatedly discretizes and splits regions until an split region either satisfies a drop condition or it is guaranteed to not contribute to the result. In addition, we extend DS-Search to solve the ASRS problem approximately. Finally, we report on extensive empirical studies that offer insight into the efficiency and effectiveness of the paper's proposals.


Tobias Skovgaard Jepsen, Christian S. Jensen, Thomas Dyhre NielsenKontaktforfatter ,"Graph Convolutional Networks for Road Networks" in 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2019

The application of machine learning techniques in the setting of road networks holds the potential to facilitate many important transportation applications. Graph Convolutional Networks (GCNs) are neural networks that are capable of leveraging the structure of a network. However, many implicit assumptions of GCNs do not apply to road networks.We introduce the Relational Fusion Network (RFN), a novel type of GCN designed specifically for road networks. In particular, we pro- pose methods that substantially outperform state-of-the-art GCNs on two machine learning tasks in road networks. Furthermore, we show that state-of-the-art GCNs fail to effectively leverage road network structure on these tasks.


Congcong Ge, Yunjun Gao, Xiaoye Miao, Lu Chen, Christian S. Jensen, Ziyuan Zhu ,"IHCS: An Integrated Hybrid Cleaning System" in 45th International Conference on Very Large Data Bases, 2019

Data cleaning is a prerequisite to subsequent data analysis, and is know to often be time-consuming and labor-intensive. We present IHCS, a hybrid data cleaning system that integrates error detection and repair to contend effectively with multiple error types. In a preprocessing step that precedes the data cleaning, IHCS formats an input dataset to be cleaned, and transforms applicable data quality rules into a unified format. Then, an MLN index structure is formed according to the unified rules, enabling IHCS to handle multiple error types simultaneously. During the cleaning, IHCS first tackles abnormalities through an abnormal group process, and then, it generates multiple data versions based on the MLN index. Finally, IHCS eliminates conflicting values across the multiple versions, and derives the final unified clean data. A visual interface enables cleaning process monitoring and cleaning result analysis.


Robert Waury, Christian S. Jensen, Satoshi Koide, Yoshiharu Ishikawa, Chuan Xiao ,"Indexing Trajectories for Travel-Time Histogram Retrieval" in 22nd International Conference on Extending Database Technology, EDBT 2019, 2019

A key service in vehicular transportation is routing according to estimated travel times. With the availability of massive volumes of vehicle trajectory data, it has become increasingly feasible to estimate travel times, which are typically modeled as probability distributions in the form of histograms. An earlier study shows that use of a carefully selected, context-dependent subset of available trajectories when estimating a travel-time histogram along a user-specified path can significantly improve the accuracy of the estimates. This selection of trajectories cannot occur in a pre-processing step, but must occur online—it must be integrated into the routing itself. It is then a key challenge to be able to select very efficiently the "right" subset of trajectories that offer the best accuracy when the cost of a route is to be assessed. To address this challenge, we propose a solution that applies novel indexing to all available trajectories and that then is capable of selecting the most relevant trajectories and of computing a travel-time distribution based on these trajectories. Specifically, the solution utilizes an in-memory trajectory index and a greedy algorithm to identify and retrieve the relevant trajectories. The paper reports on an extensive empirical study with a large real-world GPS data set that offers insight into the accuracy and efficiency of the proposed solution. The study shows that the proposed online selection of trajectories can be performed efficiently and is able to provide highly accurate travel-time distributions.


Dingming Wu, Yi Zhu, Christian S. Jensen ,"In Good Company: Efficient Retrieval of the Top-k Most Relevant Event-Partner Pairs" in International Conference on Database Systems for Advanced Applications, 2019

The proliferation of event-based social networking (ESBN) motivates a range of studies on topics such as event, venue, and friend recommendation and event creation and organization. In this setting, the notion of event-partner recommendation has recently attracted attention. When recommending an event to a user, this functionality allows recommendation of partner with whom to attend the event. However, existing proposals are push-based: recommendations are pushed to users at the system’s initiative. In contrast, EBSNs provide users with keyword-based search functionality. This way, users may retrieve information in pull mode. We propose a new way of accessing information in EBSNs that combines push and pull, thus allowing users to not only conduct ad-hoc searches for events, but also to receive partner recommendations for retrieved events. Specifically, we define and study the top-k event-partner (kEP) pair retrieval query that integrates event-partner recommendation and keyword-based search for events. The query retrieves event-partner pairs, taking into account the relevance of events to user-supplied keywords and so-called together preferences that indicate the extent of a user’s preference to attend an event with a given partner. In order to compute kEP queries efficiently, we propose a rank-join based framework with three optimizations. Results of empirical studies with implementations of the proposed techniques demonstrate that the proposed techniques are capable of excellent performance.


Christian S. Jensen ,"Letter from the Impact Award Winner"




Christian S. Jensen, Dik Lee, Ling Liu ,"Message from the General Co-Chairs" in 20th International Conference on Mobile Data Management, MDM 2019, 2019

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.


Wenfei Fan, Xuemin Lin, Divesh Srivastava, Christian S. Jensen, Lionel M. Ni, M. Tamer Özsu ,"Message from the ICDE 2019 Chairs" in The 35th IEEE International Conference on Data Engineering (ICDE), 2019

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.


Alvis Logins, Panagiotis Karras, Christian S. Jensen ,"Multicapacity Facility Selection in Networks" in The 35th IEEE International Conference on Data Engineering (ICDE), 2019

Consider the task of selecting a set of facilities, e.g., hotspots, shops, or utility stations, each with a capacity to serve a certain number of customers. Given a set of customer locations, we have to minimize a cumulative distance between each customer and the facility earmarked to serve this customer within its capacity. This problem is known as the Capacitated k-Median (CKM) problem. In a data-intensive variant, distances are calculated over a network, while a data set associates each candidate facility location with a different capacity. In other words, going beyond positioning facilities in a metric space, the problem is to select a small subset out of a large data set of candidate network-based facilities with capacity constraints. We call this variant the Multicapacity Facility Selection (MCFS) problem. Linear Programming solutions are unable to contend with the network sizes and supplies of candidate facilities encountered in real-world applications; yet the problem may need to be solved scalably and repeatedly, as in applications requiring the dynamic reallocation of customers to facilities. We present the first, to our knowledge, solution to the MCFS problem that achieves both scalability and high quality, the Wide Matching Algorithm (WMA). WMA iteratively assigns customers to candidate facilities and leverages a data-driven heuristic for the SETCOVER problem inherent to the MCFS problem. An extensive experimental study with real-world and synthetic networks demonstrates that WMA scales gracefully to million-node networks and large facility and customer data sets; further, WMA provides a solution quality superior to scalable baselines (also proposed in the paper) and competitive vis-á-vis the optimal solution, returned by an off-the-shelf solver that runs only on small facility databases.


Qiang Qu, Ildar Nurgaliev, Muhammad Muzammal, Christian S. Jensen, Jianping Fan ,"On spatio-temporal blockchain query processing" in Future Generation Computer Systems, 2019

Recent advances in blockchain technology suggest that the technology has potential for use in applications in a variety of new domains including spatio-temporal data management. The reliability and immutability of blockchains combined with the support for decentralized, trustless data processing offer new opportunities for applications in such domains. However, current blockchain proposals do not support spatio-temporal data processing, and the block-based sequential access in blockchain hinders efficient query processing. We propose spatio-temporal blockchain technology that supports fast query processing. More specifically, we propose blockchain technology that records time and location attributes for the transactions, maintains data integrity, and supports fast spatial queries by the introduction of a cryptographically signed tree data structure, the Merkle Block Space Index (BSI), which is a modification of the Merkle KD-tree. We consider Bitcoin-like near-uniform block generation, and we process temporal queries by means of a block-DAG data structure, called Temporal Graph Search (TGS), without the need for temporal indexes. To enable the experiments, we propose a random graph model to generate a block-DAG topology for an abstract peer-to-peer network. We perform a comprehensive evaluation to offer insight into the applicability and effectiveness of the proposed technology. The evaluation indicates that TGS-BSI is a promising solution for efficient spatio-temporal query processing on blockchains.


Tung Kieu, Bin Yang, Chenjuan Guo, Christian S. Jensen ,"Outlier Detection for Time Series with Recurrent Autoencoder Ensembles" in the 28th International Joint Conference on Artificial Intelligence, 2019

We propose two solutions to outlier detection in time series based on recurrent autoencoder ensembles. The solutions exploit autoencoders built using sparsely-connected recurrent neural networks (S-RNNs). Such networks make it possible to generate multiple autoencoders with different neural network connection structures. The two solutions are ensemble frameworks, specifically an independent framework and a shared framework, both of which combine multiple S-RNN based autoencoders to enable outlier detection. This ensemble-based approach aims to reduce the effects of some autoencoders being overfitted to outliers, this way improving overall detection quality. Experiments with two large real-world time series data sets, including univariate and multivariate time series, offer insight into the design properties of the proposed frameworks and demonstrate that the resulting solutions are capable of outperforming both baselines and the state-of-the-art methods.


Walid Aref, Michela Bertolotto, Panagiotis Bouros, Christian S. Jensen, Ahmed Mahmood, Kjetil Nørvåg, Dimitris Sacharidis, Mohamed Sarwat ,"Preface" in 16th International Symposium on Spatial and Temporal Databases, SSTD 2019, 2019

Symposium brought together, for three days, researchers, practitioners, and developers for the presentation and discussion of current research on concepts, tools, and techniques related to spatial and temporal databases. SSTD 2019 was the 16th in a series of biannual events. Previous symposia were held in Santa Barbara (1989), Zurich (1991), Singapore (1993), Portland (1995),Berlin (1997), Hong Kong (1999), Los Angeles (2001), Santorini, Greece (2003), Angrados Reis (2005), Boston (2007), Aalborg (2009), Minneapolis (2011), Munich (2013),Hong Kong (2015), and Arlington (2017).


Walid Aref (Redaktør), Michela Bertolotto (Redaktør), Panagiotis Bouros (Redaktør), Christian S. Jensen (Redaktør), Ahmed Mahmood (Redaktør), Kjetil Nørvåg (Redaktør), Dimitris Sacharidis (Redaktør), Mohammed Sarwat (Redaktør) ,"Proceedings of the 16th International Symposium on Spatial and Temporal Databases" in 16th International Symposium on Spatial and Temporal Databases, SSTD 2019, 2019




Lu Chen, Yunjun Gao, Ziquan Fang, Xiaoye Miao, Christian S. Jensen, Chenjuan Guo ,"Real-time Distributed Co-Movement Pattern Detection on Streaming Trajectories" in Proceedings of the VLDB Endowment, 2019

With the widespread deployment of mobile devices with positioning capabilities, increasingly massive volumes of trajectory data are being collected that capture the movements of people and vehicles. This data enables co-movement pattern detection, which is important in applications such as trajectory compression and future-movement prediction. Existing co-movement pattern detection studies generally consider historical data and thus propose offline algorithms. However, applications such as future movement prediction need real-time processing over streaming trajectories. Thus, we investigate real-time distributed co-movement pattern detection over streaming trajectories.Existing off-line methods assume that all data is available when the processing starts. Nevertheless, in a streaming setting, unbounded data arrives in real time, making pattern detection challenging. To this end, we propose a framework based on Apache Flink, which is designed for efficient distributed streaming data processing. The framework encompasses two phases: clustering and pattern enumeration. To accelerate the clustering, we use a range join based on two-layer indexing, and provide techniques that eliminate unnecessary verifications. To perform pattern enumeration efficiently, we present two methods FBA and VBA that utilize id-based partitioning. When coupled with bit compression and candidate-based enumeration techniques, we reduce the enumeration cost from exponential to linear. Extensive experiments offer insight into the efficiency of the proposed framework and its constituent techniques compared with existing methods.


Gao Cong, Christian Søndergaard Jensen ,"Spatio-Textual Data"




Jilin Hu, Chenjuan Guo, Bin Yang, Christian Søndergaard Jensen ,"Stochastic Weight Completion for Road Networks using Graph Convolutional Networks" in The 35th IEEE International Conference on Data Engineering (ICDE), 2019

Innovations in transportation, such as mobility-on-demand services and autonomous driving, call for high-resolution routing that relies on an accurate representation of travel time throughout the underlying road network. Specifically, the travel time of a road-network edge is modeled as a time-varying distribution that captures the variability of traffic over time and the fact that different drivers may traverse the same edge at the same time at different speeds. Such stochastic weights may be extracted from data sources such as GPS and loop detector data. However, even very large data sources are incapable of covering all edges of a road network at all times. Yet, high-resolution routing needs stochastic weights for all edges. We solve the problem of filling in the missing weights. To achieve that, we provide techniques capable of estimating stochastic edge weights for all edges from traffic data that covers only a fraction of all edges. We propose a generic learning framework called Graph Convolutional Weight Completion (GCWC) that exploits the topology of a road network graph and the correlations of weights among adjacent edges to estimate stochastic weights for all edges. Next, we incorporate contextual information into GCWC to further improve accuracy. Empirical studies using loop detector data from a highway toll gate network and GPS data from a large city offer insight into the design properties of GCWC and its effectiveness.


Christian S. Jensen ,"Value Creation from Massive Data in Transportation - The Case of Vehicle Routing."

Innovations in transportation, such as mobility-on-demand services and autonomous driving, call for high-resolution routing that relies on an accurate representation of travel time throughout the underlying road network. Specifically, the travel time of a road-network edge is modeled as a time-varying distribution that captures the variability of traffic over time and the fact that different drivers may traverse the same edge at the same time at different speeds. Such stochastic weights may be extracted from data sources such as GPS and loop detector data. However, even very large data sources are incapable of covering all edges of a road network at all times. Yet, high-resolution routing needs stochastic weights for all edges. We solve the problem of filling in the missing weights. To achieve that, we provide techniques capable of estimating stochastic edge weights for all edges from traffic data that covers only a fraction of all edges. We propose a generic learning framework called Graph Convolutional Weight Completion (GCWC) that exploits the topology of a road network graph and the correlations of weights among adjacent edges to estimate stochastic weights for all edges. Next, we incorporate contextual information into GCWC to further improve accuracy. Empirical studies using loop detector data from a highway toll gate network and GPS data from a large city offer insight into the design properties of GCWC and its effectiveness.


2018 Top

Robert Waury, Christian Søndergaard Jensen, Kristian Torp ,"Adaptive Travel-Time Estimation: A Case for Custom Predicate Selection" in 19th IEEE International Conference on Mobile Data Management, MDM 2018, 2018

Travel-time estimation for paths in a road network often relies on pre-computed histograms that are usually available on a road segment level. Then the pre-computed histograms of the segments of a path are convolved to obtain a histogram that estimates the travel time. With the growing sizes of trajectory datasets, it becomes possible to compute histograms for increasingly longer sub-paths. Since pre-computation is infeasible for all sub-paths in a road network, we propose computing histograms on-the-fly, i.e., during routing. Such an on-the-fly method must filter the underlying trajectory dataset by spatio-temporal predicates to obtain the relevant trajectories and offers the opportunity to apply additional filtering predicates to the trajectories with little overhead. We report on a study showing that considerable improvements in accuracy of the histograms obtained for paths can be obtained by choosing filtering predicates that not only adapt to the intended start of a trip, but also to the driver and the weather. We also make the cases for a sub-path partitioning based on segment categories since there are significant differences between road types when applying our on-the-fly method.


Jianzhong Qi, Rui Zhang, Christian Søndergaard Jensen, Ramamohanarao Kotagiri, Jiayuan He ,"Continuous Spatial Query Processing: A Survey of Safe Region Based Techniques" in A C M Computing Surveys, 2018

In the past decade, positioning system-enabled devices such as smartphones have become most prevalent. This functionality brings the increasing popularity of location-based services in business as well as daily applications such as navigation, targeted advertising, and location-based social networking. Continuous spatial queries serve as a building block for location-based services. As an example, an Uber driver may want to be kept aware of the nearest customers or service stations. Continuous spatial queries require updates to the query result as the query or data objects are moving. This poses challenges to the query efficiency, which is crucial to the user experience of a service. A large number of approaches address this efficiency issue using the concept of safe region. A safe region is a region within which arbitrary movement of an object leaves the query result unchanged. Such a region helps reduce the frequency of query result update and hence improves query efficiency. As a result, safe region-based approaches have been popular for processing various types of continuous spatial queries. Safe regions have interesting theoretical properties and are worth in-depth analysis. We provide a comparative study of safe region-based approaches. We describe how safe regions are computed for different types of continuous spatial queries, showing how they improve query efficiency. We compare the different safe region-based approaches and discuss possible further improvements.


Michael Hanspeter Böhlen, Anton Dignös, Johann Gamper, Christian Søndergaard Jensen ,"Database Technology for Processing Temporal Data" in 25th International Symposium on Temporal Representation and Reasoning, 2018

Despite the ubiquity of temporal data and considerable research on processing such data, database systems largely remain designed for processing the current state of some modeled reality. More recently, we have seen an increasing interest in processing historical or temporal data. The SQL:2011 standard introduced some temporal features, and commercial database management systems have started to offer temporal functionalities in a step-by-step manner. There has also been a proposal for a more fundamental and comprehensive solution for sequenced temporal queries, which allows a tight integration into relational database systems, thereby taking advantage of existing query optimization and evaluation technologies. New challenges for processing temporal data arise with multiple dimensions of time and the increasing amounts of data, including time series data that represent a special kind of temporal data.


Xiucheng Li, Kaiqi Zhao, Gao Cong, Christian Søndergaard Jensen, Wei Wei ,"Deep representation learning for trajectory similarity computation" in 34th IEEE International Conference on Data Engineering, ICDE 2018, 2018

Trajectory similarity computation is fundamental functionality with many applications such as animal migration pattern studies and vehicle trajectory mining to identify popular routes and similar drivers. While a trajectory is a continuous curve in some spatial domain, e.g., 2D Euclidean space, trajectories are often represented by point sequences. Existing approaches that compute similarity based on point matching suffer from the problem that they treat two different point sequences differently even when the sequences represent the same trajectory. This is particularly a problem when the point sequences are non-uniform, have low sampling rates, and have noisy points. We propose the first deep learning approach to learning representations of trajectories that is robust to low data quality, thus supporting accurate and efficient trajectory similarity computation and search. Experiments show that our method is capable of higher accuracy and is at least one order of magnitude faster than the state-of-The-Art methods for k-nearest trajectory search.


Tung Kieu, Bin Yang, Chenjuan Guo, Christian S. Jensen ,"Distinguishing Trajectories from Different Drivers using Incompletely Labeled Trajectories" in 27th ACM International Conference on Information and Knowledge Management, 2018

We consider a scenario that occurs often in the auto insurance industry. We are given a large collection of trajectories that stem from many different drivers. Only a small number of the trajectories are labeled with driver identifiers, and only some drivers are used in labels. The problem is to label correctly the unlabeled trajectories with driver identifiers. This is important in auto insurance to detect possible fraud and to identify the driver in, e.g., pay-as-you-drive settings when a vehicle has been involved in an incident. To solve the problem, we first propose a Trajectory-to-Image( T2I) encoding scheme that captures both geographic features and driving behavior features of trajectories in 3D images. Next, we propose a multi-task, deep learning model called T2INet for estimating the total number of drivers in the unlabeled trajectories, and then we partition the unlabeled trajectories into groups so that the trajectories in a group belong to the same driver. Experimental results on a large trajectory data set offer insight into the design properties of T2INet and demonstrate that T2INet is capable of outperforming baselines and the state-of-the-art method.


Christian Søndergaard Jensen ,"Editorial: Updates to the Editorial Board" in A C M Transactions on Database Systems, 2018




Ilkcan Keles, Christian Søndergaard Jensen, Simonas Saltenis ,"Extracting Rankings for Spatial Keyword Queries from GPS Data" in 14th International Conference on Location Based Services, 2018

Studies suggest that many search engine queries have local intent. We consider the evaluation of ranking functions important for such queries. The key challenge is to be able to determine the “best” ranking for a query, as this enables evaluation of the results of ranking functions. We propose a model that synthesizes a ranking of points of interest (PoI) for a given query using historical trips extracted from GPS data. To extract trips, we propose a novel PoI assignment method that makes use of distances and temporal information. We also propose a PageRank-based smoothing method to be able to answer queries for regions that are not covered well by trips. We report experimental results on a large GPS dataset that show that the proposed model is capable of capturing the visits of users to PoIs and of synthesizing rankings.


Qing Liu, Zijin Feng, Xike Xi, Jianliang Xu, Xin Lin, Christian Søndergaard Jensen ,"IZone: Efficient influence zone evaluation over geo-Textual Data" in 34th IEEE International Conference on Data Engineering, ICDE 2018, 2018

Owing to the widespread use of location-Aware devices and the increased popularity of micro-blogging applications, we are witnessing a rapid proliferation of geo-Textual data. In this demonstration, we present iZone, an efficient system for determining influence zones over geo-Textual data. Specifically, iZone allows users to browse geo-Textual objects, evaluate the influence zones of specified geo-Textual objects, and obtain explanations of the evaluation results. The iZone system adopts a browser-server model. The server side integrates two types of spatial keyword search, namely top-k spatial keyword query and reverse top-k keyword-based location query, to support the functionality of the system. A variety of spatial indexes are employed to enhance the efficiency of the system. The browser side provides a map-based GUI interface, which enables convenient and user-friendly interaction with the system. Using a real hotel dataset from Hong Kong, iZone offers hands-on experience with influence zone evaluation in real-life applications.


Chenjuan Guo, Bin Yang, Jilin Hu, Christian Søndergaard JensenKontaktforfatter ,"Learning to route with sparse trajectory sets" in 34th IEEE International Conference on Data Engineering, ICDE 2018, 2018

Motivated by the increasing availability of vehicle trajectory data, we propose learn-To-route, a comprehensive trajectory-based routing solution. Specifically, we first construct a graph-like structure from trajectories as the routing infrastructure. Second, we enable trajectory-based routing given an arbitrary (source, destination) pair. In the first step, given a road network and a collection of trajectories, we propose a trajectory-based clustering method that identifies regions in a road network. If a pair of regions are connected by trajectories, we maintain the paths used by these trajectories and learn a routing preference for travel between the regions. As trajectories are skewed and sparse, %and although the introduction of regions serves to consolidate the sparse data, many region pairs are not connected by trajectories. We thus transfer routing preferences from region pairs with sufficient trajectories to such region pairs and then use the transferred preferences to identify paths between the regions. In the second step, we exploit the above graph-like structure to achieve a comprehensive trajectory-based routing solution. Empirical studies with two substantial trajectory data sets offer insight into the proposed solution, indicating that it is practical. A comparison with a leading routing service offers evidence that the paper's proposal is able to enhance routing quality.


Lisi Chen, Shuo Shang, Zhiwei Zhang, Xin Cao, Christian Søndergaard Jensen, Panos Kalnis ,"Location-aware top-κ term publish/subscribe" in 34th IEEE International Conference on Data Engineering, ICDE 2018, 2018

Massive amount of data that contain spatial, textual, and temporal information are being generated at a high scale. These spatio-Temporal documents cover a wide range of topics in local area. Users are interested in receiving local popular terms from spatio-Temporal documents published with a specified region. We consider the Top-k Spatial-Temporal Term (ST2) Subscription. Given an ST2 subscription, we continuously maintain up-To-date top-k most popular terms over a stream of spatio-Temporal documents. The ST2 subscription takes into account both frequency and recency of a term generated from spatio-Temporal document streams in evaluating its popularity. We propose an efficient solution to process a large number of ST2 subscriptions over a stream of spatio-Temporal documents. The performance of processing ST2 subscriptions is studied in extensive experiments based on two real spatio-Temporal datasets.


Tobias S. Jepsen, Christian Søndergaard Jensen, Thomas Dyhre Nielsen, Kristian Torp ,"On Network Embedding for Machine Learning on Road Networks: A Case Study on the Danish Road Network" in 2018 IEEE International Conference on Big Data, 2018

Road networks are a type of spatial network, whereedges may be associated with qualitative information such asroad type and speed limit. Unfortunately, such information isoften incomplete; for instance, OpenStreetMap only has speedlimits for 13% of all Danish road segments. This is problematicfor analysis tasks that rely on such information for machinelearning. To enable machine learning in such circumstances, onemay consider the application of network embedding methods toextract structural information from the network. However, thesemethods have so far mostly been used in the context of socialnetworks, which differ significantly from road networks in termsof, e.g., node degree and level of homophily (which are key tothe performance of many network embedding methods).We analyze the use of network embedding methods, specifically node2vec, for learning road segment embeddings in roadnetworks. Due to the often limited availability of informationon other relevant road characteristics, the analysis focuses onleveraging the spatial network structure. Our results suggest thatnetwork embedding methods can indeed be used for derivingrelevant network features (that may, e.g, be used for predictingspeed limits), but that the qualities of the embeddings differ fromembeddings for social networks.


Tung Kieu, Bin Yang, Christian Søndergaard JensenKontaktforfatter ,"Outlier Detection for Multidimensional Time Series using Deep Neural Networks" in 19th IEEE International Conference on Mobile Data Management, MDM 2018, 2018

Due to the continued digitization of industrial and societal processes, including the deployment of networked sensors, we are witnessing a rapid proliferation of time-ordered observations, known as time series. For example, the behavior of drivers can be captured by GPS or accelerometer as a time series of speeds, directions, and accelerations. We propose a framework for outlier detection in time series that, for example, can be used for identifying dangerous driving behavior and hazardous road locations. Specifically, we first propose a method that generates statistical features to enrich the feature space of raw time series. Next, we utilize an autoencoder to reconstruct the enriched time series. The autoencoder performs dimensionality reduction to capture, using a small feature space, the most representative features of the enriched time series. As a result, the reconstructed time series only capture representative features, whereas outliers often have non-representative features. Therefore, deviations of the enriched time series from the reconstructed time series can be taken as indicators of outliers. We propose and study autoencoders based on convolutional neural networks and long-short term memory neural networks. In addition, we show that embedding of contextual information into the framework has the potential to further improve the accuracy of identifying outliers. We report on empirical studies with multiple time series data sets, which offers insight into the design properties of the proposed framework, indicating that it is effective at detecting outliers.


Bin Yang, Jian Dai, Chenjuan Guo, Christian S. Jensen, Jilin HuKontaktforfatter ,"PACE: a PAth-CEntric paradigm for stochastic path finding" in VLDB Journal, 2018

With the growing volumes of vehicle trajectory data, it becomes increasingly possible to capture time-varying and uncertain travel costs, e.g., travel time, in a road network. The current paradigm for doing so is edge-centric: it represents a road network as a weighted graph and splits trajectories into small fragments that fit the underlying edges to assign time-varying and uncertain weights to edges. It then applies path finding algorithms to the resulting, weighted graph. We propose a new PAth-CEntric paradigm, PACE, that targets more accurate and more efficient path cost estimation and path finding. By assigning weights to paths, PACE avoids splitting trajectories into small fragments. We solve two fundamental problems to establish the PACE paradigm: (i) how to compute accurately the travel cost distribution of a path and (ii) how to conduct path finding for a source–destination pair. To solve the first problem, given a departure time and a query path, we show how to select an optimal set of paths that cover the query path and such that the weights of the paths enable the most accurate joint cost distribution estimation for the query path. The joint cost distribution models well the travel cost dependencies among the edges in the query path, which in turn enables accurate estimation of the cost distribution of the query path. We solve the second problem by showing that the resulting path cost distribution estimation method satisfies an incremental property that enables the method to be integrated seamlessly into existing stochastic path finding algorithms. Further, we propose a new stochastic path finding algorithm that fully explores the improved accuracy and efficiency provided by PACE. Empirical studies with trajectory data from two different cities offer insight into the design properties of the PACE paradigm and offer evidence that PACE is accurate, efficient, and effective in real-world settings.


Shuo Shang, Lisi Chen, Zhewei Wei, Christian S. Jensen, Kai Zheng, Panos KalnisKontaktforfatter ,"Parallel trajectory similarity joins in spatial networks" in VLDB Journal, 2018

The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-k TS-Join (k-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold θ, the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above θ. In contrast, the k-TS-Join does not take a threshold as a parameter, and it returns the top-k most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the k-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude.


Shuo Shang, Lisi Chen, Kai Zheng, Christian S. Jensen, Zhewei Wei, Panos Kalnis ,"Parallel Trajectory-to-Location Join" in IEEE Transactions on Knowledge and Data Engineering, 2018

The matching between trajectories and locations, called Trajectory-to-Location join (TL-Join), is fundamental functionality in spatiotemporal data management. Given a set of trajectories, a set of locations, and a threshold θ, the TL-Join finds all (trajectory, location) pairs from the two sets with spatiotemporal correlation above θ. This join targets diverse applications, including location recommendation, event tracking, and trajectory activity analyses. We address three challenges in relation to the TL-Join: how to define the spatiotemporal correlation between trajectories and locations, how to prune the search space effectively when computing the join, and how to perform the computation in parallel. Specifically, we define new metrics to measure the spatiotemporal correlation between trajectories and locations. We develop a novel parallel collaborative (PCol) search method based on a divide-and-conquer strategy. For each location $o$, we retrieve the trajectories with high spatiotemporal correlation to $o$, and then we merge the results. An upper bound on the spatiotemporal correlation and a heuristic scheduling strategy are developed to prune the search space. The trajectory searches from different locations are independent and are performed in parallel, and the result merging cost is independent of the degree of parallelism. Studies of the performance of the developed algorithms using large spatiotemporal data sets are reported.


Lu Chen, Qilu Zhong, Xiaokui Xiao, Yunjun Gao, Pengfei Jin, Christian Søndergaard Jensen ,"Price-and-Time-Aware Dynamic Ridesharing" in 34th IEEE International Conference on Data Engineering, ICDE 2018, 2018

Ridesharing refers to a transportation scenario where travellers with similar itineraries and time schedules share a vehicle for a trip and split the travel cost, which may include fuel, tolls, and parking fees. Ridesharing is popular among travellers because it can reduce their travel costs, and it also holds the potential to reduce travel time, congestion, air pollution, and overall fuel consumption. However, existing ridesharing systems often offer each traveller only one choice that aims to minimize system-wide vehicle travel distance or time. We propose a solution that offers more options. Specifically, we do this by considering both pick-up time and price, so that travellers are able to choose the vehicle that matches their preferences best. In order to identify quickly vehicles that satisfy incoming ridesharing requests, we propose two efficient matching algorithms that follow the single-side and dual-side search paradigms, respectively. To further accelerate the matching, indexes on the road network and vehicles are developed, based on which several pruning heuristics are designed. Extensive experiments on a large Shanghai taxi dataset offer insights into the performance of our proposed techniques and compare with a baseline that extends the state-of-The art method. © 2018 IEEE.


Lu Chen, Yunjun Gao, Zixian Liu, Xiaokui Xiao, Christian Søndergaard Jensen, Yifan Zhu ,"PTRider: A Price-and-Time-Aware Ridesharing System" in Proceedings of the VLDB Endowment, 2018

Ridesharing is popular among travellers because it can reducetheir travel costs, and it also holds the potential to reduce traveltime, congestion, air pollution, and overall fuel consumption.Existing ridesharing systems (e.g., lyft, uberPOOL) often offereach traveler only one choice that aims to minimize system-widevehicle travel distance or time. In this demonstration, we present aprice-and-time-aware ridesharing system, termed as PTRider,which provides more options. It considers both pick-up time andprice, so that travellers are able to choose the vehicle matchingtheir preferences best. To answer the ridesharing request in realtime, PTRider builds indexes on the road network and vehiclesseparately, and utilizes corresponding efficient matching methods.A real-life dataset that contains 432,327 trips extracted from17,000 Shanghai taxis for one day (May 29, 2009) is used todemonstrate that PTRider can return various options for everyridesharing request in real time.


Jilin Hu, Bin Yang, Chenjuan Guo, Christian Søndergaard JensenKontaktforfatter ,"Risk-aware path selection with time-varying, uncertain travel costs: a time series approach" in VLDB Journal, 2018

We address the problem of choosing the best paths among a set of candidate paths between the same origin–destination pair. This functionality is used extensively when constructing origin–destination matrices in logistics and flex transportation. Because the cost of a path, e.g., travel time, varies over time and is uncertain, there is generally no single best path. We partition time into intervals and represent the cost of a path during an interval as a random variable, resulting in an uncertain time series for each path. When facing uncertainties, users generally have different risk preferences, e.g., risk-loving or risk-averse, and thus prefer different paths. We develop techniques that, for each time interval, are able to find paths with non-dominated lowest costs while taking the users’ risk preferences into account. We represent risk by means of utility function categories and show how the use of first-order and two kinds of second-order stochastic dominance relationships among random variables makes it possible to find all paths with non-dominated lowest costs. We report on empirical studies with large uncertain time series collections derived from a 2-year GPS data set. The study offers insight into the performance of the proposed techniques, and it indicates that the best techniques combine to offer an efficient and robust solution.


Shuo Shang, Lisi Chen, Christian Søndergaard Jensen, Ji-Rong Wen, Panos Kalnis ,"Searching Trajectories by Regions of Interest" in 34th IEEE International Conference on Data Engineering, ICDE 2018, 2018

We propose and investigate a novel query type named trajectory search by regions of interest (TSR query). Given an argument set of trajectories, a TSR query takes a set of regions of interest as a parameter and returns the trajectory in the argument set with the highest spatial-density correlation to the query regions. This type of query is useful in applications such as trip planning and recommendation. To process the TSR query, a set of new metrics are defined to model spatial-density correlations. An efficient trajectory search algorithm is developed that exploits upper and lower bounds to prune the search space and that adopts a query-source selection strategy, as well as integrates a heuristic search strategy based on priority ranking to schedule multiple query sources. The performance of TSR query processing is studied in extensive experiments based on real and synthetic spatial data.


Xinjue Wang, Ke Deng, Jianxin Li, Jeffery Xu Yu, Christian Søndergaard Jensen, Xiaochun Yang ,"Targeted Influence Minimization in Social Networks" in 22nd Pacific-Asia Conference, 2018

An online social network can be used for the diffusion of malicious information like derogatory rumors, disinformation, hate speech, revenge pornography, etc. This motivates the study of influence minimization that aim to prevent the spread of malicious information. Unlike previous influence minimization work, this study considers the influence minimization in relation to a particular group of social network users, called targeted influence minimization. Thus, the objective is to protect a set of users, called target nodes, from malicious information originating from another set of users, called active nodes. This study also addresses two fundamental, but largely ignored, issues in different influence minimization problems: (i) the impact of a budget on the solution; (ii) robust sampling. To this end, two scenarios are investigated, namely unconstrained and constrained budget. Given an unconstrained budget, we provide an optimal solution; Given a constrained budget, we show the problem is NP-hard and develop a greedy algorithm with an (1−1/e) -approximation. More importantly, in order to solve the influence minimization problem in large, real-world social networks, we propose a robust sampling-based solution with a desirable theoretic bound. Extensive experiments using real social network datasets offer insight into the effectiveness and efficiency of the proposed solutions.


Michael Hanspeter Böhlen, Anton Dignös, Johann Gamper, Christian Søndergaard Jensen ,"Temporal Data Management—An Overview" in European Business Intelligence and Big Data Summer School, 2018

Despite the ubiquity of temporal data and considerable research on the effective and efficient processing of such data, database systems largely remain designed for processing the current state of some modeled reality. More recently, we have seen an increasing interest in the processing of temporal data that captures multiple states of reality. The SQL:2011 standard incorporates some temporal support, and commercial DBMSs have started to offer temporal functionality in a step-by-step manner, such as the representation of temporal intervals, temporal primary and foreign keys, and the support for so-called time-travel queries that enable access to past states.This tutorial gives an overview of state-of-the-art research results and technologies for storing, managing, and processing temporal data in relational database management systems. Following an introduction that offers a historical perspective, we provide an overview of basic temporal database concepts. Then we survey the state-of-the-art in temporal database research, followed by a coverage of the support for temporal data in the current SQL standard and the extent to which the temporal aspects of the standard are supported by existing systems. The tutorial ends by covering a recently proposed framework that provides comprehensive support for processing temporal data and that has been implemented in PostgreSQL.


Lei Chen, Yafei Li, Jianliang Xu, Christian S. Jensen ,"Towards Why-Not Spatial Keyword Top-k Queries: A Direction-Aware Approach" in IEEE Transactions on Knowledge and Data Engineering, 2018

With the continued proliferation of location-based services, a growing number of web-accessible data objects are geo-tagged and have text descriptions. An important query over such web objects is the direction-aware spatial keyword query that aims to retrieve the top-k objects that best match query parameters in terms of spatial distance and textual similarity in a given query direction. In some cases, it can be difficult for users to specify appropriate query parameters. After getting a query result, users may find some desired objects are unexpectedly missing and may therefore question the entire result. Enabling why-not questions in this setting may aid users to retrieve better results, thus improving the overall utility of the query functionality. This paper studies the direction-aware why-not spatial keyword top-k query problem. We propose efficient query refinement techniques to revive missing objects by minimally modifying users direction-aware queries. We prove that the best refined query directions lie in a finite solution space for a special case and reduce the search for the optimal refinement to a linear programming problem for the general case. Extensive experimental studies demonstrate that the proposed techniques outperform a baseline method by two orders of magnitude and are robust in a broad range of settings.


Xin Ding, Lu Chen, Yunjun Gao, Christian Søndergaard Jensen, Hujun Bao ,"UlTraMan: A Unified Platform for Big Trajectory Data Management and Analytics" in Proceedings of the VLDB Endowment, 2018

Massive trajectory data is being generated by GPS-equipped devices, such as cars and mobile phones, which is used increasingly in transportation, location-based services, and urban computing. As a result, a variety of methods have been proposed for trajectory data management and analytics. However, traditional systems and methods are usually designed for very specific data management or analytics needs, which forces users to stitch together heterogeneous systems to analyze trajectory data in an inefficient manner. Targeting the overall data pipeline of big trajectory data management and analytics, we present a unified platform, termed as UlTraMan. In order to achieve scalability, efficiency, persistence, and flexibility, (i) we extend Apache Spark with respect to both data storage and computing by seamlessly integrating a key-value store, and (ii) we enhance the MapReduce paradigm to allow flexible optimizations based on random data access. We study the resulting system's flexibility using case studies on data retrieval, aggregation analyses, and pattern mining. Extensive experiments on real and synthetic trajectory data are reported to offer insight into the scalability and performance of UlTraMan.


Xin Ding, Rui Chen, Lu Chen, Yunjun Gao, Christian Søndergaard Jensen ,"VIPTRA: Visualization and Interactive Processing on Big Trajectory Data" in 19th IEEE International Conference on Mobile Data Management, MDM 2018, 2018

Massive trajectory data is being collected and used widely in many applications such as transportation, location-based services, and urban computing. As a result, abundant methods and systems have been proposed for managing and processing trajectory data. However, it remains difficult for users to interact well with data management and processing, due to the lack of efficient data processing methods and effective visualization techniques for big trajectory data. In this demonstration, we present a new framework, VIPTRA, to process big trajectory data visually and interactively. VIPTRA builds upon UlTraMan, a distributed in-memory system for big trajectory data, and thus, it takes advantage of its capability of high performance. The demonstration shows the efficiency of data processing and user-friendly visualization and interaction techniques provided in VIPTRA, via several scenarios of visual analysis and trajectory editing tasks.


2017 Top

Robert Waury, Jilin Hu, Bin Yang, Christian S. Jensen ,"Assessing the accuracy benefits of on-the-fly trajectory selection in fine-grained travel-time estimation" in 18th IEEE International Conference on Mobile Data Management, MDM 2017, 2017

Today's one-size-fits-all approach to travel-time computation in spatial networks proceeds in two steps. In a preparatory off-line step, a set of distributions, e.g., one per hour of the day, is computed for each network segment. Then, when a path and a departure time are provided, a distribution for the path is computed on-line from pertinent pre-computed distributions. Motivated by the availability of massive trajectory data from vehicles, we propose a completely on-line approach, where distributions are computed from trajectories on-the-fly, i.e., when a query arrives. This new approach makes it possible to use arbitrary sets of underlying trajectories for a query. Specifically, we study the potential for accuracy improvements over the one-size-fits-all approach that can be obtained using the on-the-fly approach and report findings from an empirical study that suggest that the on-the-fly approach is able to improve accuracy significantly and has the potential to replace the current one-size-fits-all approach.


junling liu, Ke Deng, Huanliang Sun, Yu Ge, Xiaofang Zhou, Christian Søndergaard Jensen ,"Clue-based Spatio-textual Query" in Proceedings of the VLDB Endowment, 2017




Shuo Shang, Lisi Chen, Zhewei Wei, Christian S. Jensen, Ji Rong Wen, Panos Kalnis ,"Collective travel planning in spatial networks" in 33rd IEEE International Conference on Data Engineering, ICDE 2017, 2017




Jinpeng Chen, Hua Lu, Ilkcan Keles, Christian S. Jensen ,"Crowdsourcing Based Evaluation of Ranking Approaches for Spatial Keyword Querying" in 18th IEEE International Conference on Mobile Data Management, MDM 2017, 2017




Lei Chen, Yafei Li, Jianliang Xu, Christian S. Jensen ,"Direction-Aware why-not spatial keyword Top-k queries" in 33rd IEEE International Conference on Data Engineering, ICDE 2017, 2017

With the continued proliferation of location-based services, a growing number of web-Accessible data objects are geotagged and have text descriptions. An important query over such web objects is the direction-Aware spatial keyword query that aims to retrieve the top-k objects that best match query parameters in terms of spatial distance and textual similarity in a given query direction. In some cases, it can be difficult for users to specify appropriate query parameters. After getting a query result, users may find some desired objects are unexpectedly missing and may therefore question the entire result. Enabling why-not questions in this setting may aid users to retrieve better results, thus improving the overall utility of the query functionality. This paper studies the direction-Aware why-not spatial keyword top-k query problem. We propose efficient query refinement techniques to revive missing objects by minimally modifying users' directionaware queries. Experimental studies demonstrate the efficiency and effectiveness of the proposed techniques.


Christian Søndergaard Jensen ,"Editorial: Updates to the Editorial Board" in A C M Transactions on Database Systems, 2017




Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, Gang Chen ,"Efficient Metric Indexing for Similarity Search and Similarity Joins" in IEEE Transactions on Knowledge and Data Engineering, 2017

Spatial queries including similarity search and similarity joins are useful in many areas, such as multimedia retrieval, data integration, and so on. However, they are not supported well by commercial DBMSs. This may be due to the complex data types involved and the needs for flexible similarity criteria seen in real applications. In this paper, we propose a versatile and efficient disk-based index for metric data, the S pace-filling curve and Pivot-based B+-tree (SPB-tree). This index leverages the B+-tree, and uses space-filling curve to cluster data into compact regions, thus achieving storage efficiency. It utilizes a small set of so-called pivots to reduce significantly the number of distance computations when using the index. Further, it makes use of a separate random access file to support a broad range of data. By design, it is easy to integrate the SPB-tree into an existing DBMS. We present efficient algorithms for processing similarity search and similarity joins, as well as corresponding cost models based on SPB-trees. Extensive experiments using both real and synthetic data show that, compared with state-of-the-art competitors, the SPB-tree has much lower construction cost, smaller storage size, and supports more efficient similarity search and similarity joins with high accuracy cost models.


Jilin Hu, Bin Yang, Christian Søndergaard Jensen, Yu Ma ,"Enabling time-dependent uncertain eco-weights for road networks" in Geoinformatica, 2017

Reduction of greenhouse gas (GHG) emissions from transportation is an essential part of the efforts to prevent global warming and climate change. Eco-routing, which enables drivers to use the most environmentally friendly routes, is able to substantially reduce GHG emissions from vehicular transportation. The foundation of eco-routing is a weighted-graph representation of a road network in which road segments, or edges, are associated with eco-weights that capture the GHG emissions caused by traversing the edges. Due to the dynamics of traffic, the eco-weights are best modeled as being time dependent and uncertain. We formalize the problem of assigning a time-dependent, uncertain eco-weight to each edge in a road network based on historical GPS records. In particular, a sequence of histograms is employed to describe the uncertain eco-weight of an edge at different time intervals. Compression techniques, including histogram merging and buckets reduction, are proposed to maintain compact histograms while retaining their accuracy. In addition, to better model real traffic conditions, virtual edges and extended virtual edges are proposed in order to represent adjacent edges with highly dependent travel costs. Based on the techniques above, different histogram aggregation methods are proposed to accurately estimate time-dependent GHG emissions for routes. Based on a 200-million GPS record data set collected from 150 vehicles in Denmark over two years, a comprehensive empirical study is conducted in order to gain insight into the effectiveness and efficiency of the proposed approach.


Ilkcan Keles, Matthias Schubert, Peer Kröger, Simonas Saltenis, Christian Søndergaard Jensen ,"Extracting Visited Points of Interest from Vehicle Trajectories"

Identifying visited points of interest (PoIs) from vehicle trajectories remains an open problem that is difficult due to vehicles parking often at some distance from the visited PoI and due to some regions having a high PoI density. We propose a visited PoI extraction (VPE) method that identifies visited PoIs using a Bayesian network. The method considers stay duration, weekday, arrival time, and PoI category to compute the probability that a PoI is visited. We also provide a method to generate labeled data from unlabeled GPS trajectories. An experimental evaluation shows that VPE achieves a precision@3 value of 0.8, indicating that VPE is able to model the relationship between the temporal features of a stop and the category of the visited PoI.


Saad Aljubayrin, Jianzhong Qi, Christian S. Jensen, Rui Zhang, Zhen He, Yuan LiKontaktforfatter ,"Finding lowest-cost paths in settings with safe and preferred zones" in VLDB Journal, 2017

We define and study Euclidean and spatial network variants of a new path finding problem: given a set of safe or preferred zones with zero or low cost, find paths that minimize the cost of travel from an origin to a destination. In this problem, the entire space is passable, with preference given to safe or preferred zones. Existing algorithms for problems that involve unsafe regions to be avoided strictly are not effective for this new problem. To solve the Euclidean variant, we devise a transformation of the continuous data space with safe zones into a discrete graph upon which shortest path algorithms apply. A naive transformation yields a large graph that is expensive to search. In contrast, our transformation exploits properties of hyperbolas in Euclidean space to safely eliminate graph edges, thus improving performance without affecting correctness. To solve the spatial network variant, we propose a different graph-to-graph transformation that identifies critical points that serve the same purpose as do the hyperbolas, thus also avoiding the extraneous edges. Having solved the problem for safe zones with zero costs, we extend the transformations to the weighted version of the problem, where travel in preferred zones has nonzero costs. Experiments on both real and synthetic data show that our approaches outperform baseline approaches by more than an order of magnitude in graph construction time, storage space, and query response time.


Lu Chen, Yunjun Gao, Aoxiao Zhong, Christian S. Jensen, Gang Chen, Baihua ZhengKontaktforfatter ,"Indexing metric uncertain data for range queries and range joins" in VLDB Journal, 2017

Range queries and range joins in metric spaces have applications in many areas, including GIS, computational biology, and data integration, where metric uncertain data exist in different forms, resulting from circumstances such as equipment limitations, high-throughput sequencing technologies, and privacy preservation. We represent metric uncertain data by using an object-level model and a bi-level model, respectively. Two novel indexes, the uncertain pivot B+-tree (UPB-tree) and the uncertain pivot B+-forest (UPB-forest), are proposed in order to support probabilistic range queries and range joins for a wide range of uncertain data types and similarity metrics. Both index structures use a small set of effective pivots chosen based on a newly defined criterion and employ the B+-tree(s) as the underlying index. In addition, we present efficient metric probabilistic range query and metric probabilistic range join algorithms, which utilize validation and pruning techniques based on derived probability lower and upper bounds. Extensive experiments with both real and synthetic data sets demonstrate that, compared against existing state-of-the-art indexes for metric uncertain data, the UPB-tree and the UPB-forest incur much lower construction costs, consume less storage space, and can support more efficient metric probabilistic range queries and metric probabilistic range joins.


Christian Søndergaard Jensen, Dan Lin, Beng Chin Ooi ,"Indexing of Moving Objects, Bx-Tree"




Sadegh Nobari, Qiang Qu, Christian Søndergaard Jensen ,"In-Memory Spatial Join: The Data Matters!" in 20th International Conference on Extending Database Technology, 2017




Johannes Lindhart Borresen, Ove Andersen, Christian Søndergaard Jensen, Kristian Torp ,"Interactive Intersection Analysis using Trajectory Data" in 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2017




Francesco Lettich, Salvatore Orlando, Claudio Silvestri, Christian S. JensenKontaktforfatter ,"Manycore GPU processing of repeated range queries over streams of moving objects observations" in Concurrency Computation, 2017

The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised systems to feature low latency and high scalability. In this paper, we focus on a specific data-intensive problem concerning the repeated processing of huge amounts of range queries over massive sets of moving objects, where the spatial extent of queries and objects is continuously modified over time. To tackle this problem and significantly accelerate query processing, we devise a hybrid CPU/GPU pipeline that compresses data output and saves query processing work. The devised system relies on an ad-hoc spatial index leading to a problem decomposition that results in a set of independent data-parallel tasks. The index is based on a point-region quadtree space decomposition and allows to tackle effectively a broad range of spatial object distributions, even those very skewed. Also, to deal with the architectural peculiarities and limitations of the GPUs, we adopt non-trivial GPU data structures that avoid the need of locked memory accesses while favouring coalesced memory accesses, thus enhancing the overall memory throughput. To the best of our knowledge, this is the first work that exploits GPUs to efficiently solve repeated range queries over massive sets of continuously moving objects, possibly characterized by highly skewed spatial distributions. In comparison with state-of-the-art CPU-based implementations, our method highlights significant speedups in the order of 10 − 20×, depending on the dataset.


Christian Søndergaard Jensen, Dan Lin, Beng Chin Ooi ,"Maximum Update Interval in Moving Objects Databases"




Lu Chen, Yunjun Gao, Baihua Zheng, Christian Søndergaard Jensen, Hanyu Yang, Keyu Yang ,"Pivot-based Metric Indexing" in Proceedings of the VLDB Endowment, 2017




Xike Xie, Xin Lin, Jianliang Xu, Christian S. Jensen ,"Reverse keyword-based location search" in 33rd IEEE International Conference on Data Engineering, ICDE 2017, 2017

The proliferation of geo-Textual data gives prominence to spatial keyword search. The basic top-k spatial keyword query, returns k geo-Textual objects that rank the highest according to their textual relevance and spatial proximity to query keywords and a query location. We define, study, and provide means of computing the reverse top-k keyword-based location query. This new type of query takes a set of keywords, a query object q, and a number k as arguments, and it returns a spatial region such that any top-k spatial keyword query with the query keywords and a location in this region would contain object q in its result. This query targets applications in market analysis, geographical planning, and location optimization, and it may support applications related to safe zones and influence zones that are used widely in location-based services. We show that computing an exact query result requires evaluating and merging a set of weighted Voronoi cells, which is expensive. We therefore devise effective algorithms that approximate result regions with quality guarantees. We develop novel pruning techniques on top of an index, and we offer a series of optimization techniques that aim to further accelerate query processing. Empirical studies suggest that the proposed query processing is efficient and scalable.


Jingwen Zhao, Yunjun Gao, Gang Chen, Christian S. Jensen, Rui Chen, Deng Cai ,"Reverse Top-k geo-social keyword queries in road networks" in 33rd IEEE International Conference on Data Engineering, ICDE 2017, 2017

Identifying prospective customers is an important aspect of marketing research. In this paper, we provide support for a new type of query, the Reverse Top-k Geo-Social Keyword (RkGSK) query. This query takes into account spatial, textual, and social information, and finds prospective customers for geotagged objects. As an example, a restaurant manager might apply the query to find prospective customers. To address this, we propose a hybrid index, the GIM-Tree, which indexes locations, keywords, and social information of geo-Tagged users and objects, and then, using the GIM-Tree, we present efficient RkGSK query processing algorithms that exploit several pruning strategies. The effectiveness of RkGSK retrieval is characterized via a case study, and extensive experiments using real datasets offer insight into the efficiency of the proposed index and algorithms.


Shuo Shang, Lisi Chen, Christian S. Jensen, Ji-Rong Wen, Panos Kalnis ,"Searching trajectories by regions of interest" in IEEE Transactions on Knowledge and Data Engineering, 2017

With the increasing availability of moving-object tracking data, trajectory search is increasingly important. We propose and investigate a novel query type named trajectory search by regions of interest (TSR query). Given an argument set of trajectories, a TSR query takes a set of regions of interest as a parameter and returns the trajectory in the argument set with the highest spatial-density correlation to the query regions. This type of query is useful in many popular applications such as trip planning and recommendation, and location based services in general. TSR query processing faces three challenges: how to model the spatial-density correlation between query regions and data trajectories, how to effectively prune the search space, and how to effectively schedule multiple so-called query sources. To tackle these challenges, a series of new metrics are defined to model spatial-density correlations. An efficient trajectory search algorithm is developed that exploits upper and lower bounds to prune the search space and that adopts a query-source selection strategy, as well as integrates a heuristic search strategy based on priority ranking to schedule multiple query sources. The performance of TSR query processing is studied in extensive experiments based on real and synthetic spatial data.


Nectaria Tryfona, Christian Søndergaard Jensen ,"Spatiotemporal Database Modeling with an Extended Entity-Relationship Model"




Mann Willi, Nikolaus Augsten, Christian Søndergaard Jensen ,"SWOOP: Top-k Similarity Joins over Set Streams" in 43rd International Conference on Very Large Data Bases, VLDB 2017, 2017




Shuo Shang, Lisi Chen, Zhewei Wei, Christian Søndergaard Jensen, Kai Zheng, Panos Kalnis ,"Trajectory Similarity Join in Spatial Networks" in Proceedings of the VLDB Endowment, 2017




Christian Søndergaard Jensen ,"Updates to the TODS Editorial Board" in SIGMOD Record, 2017




Lei Chen (Redaktør), Christian Søndergaard Jensen (Redaktør), Cyrus Shahabi (Redaktør), Xiaochun Yang (Redaktør), Xiang Lian (Redaktør) ,"Web and Big Data: First International Joint Conference, APWeb-WAIM 2017, Part II"




Lei Chen (Redaktør), Christian Søndergaard Jensen (Redaktør), Cyrus Shahabi (Redaktør), Xiaochun Yang (Redaktør), Xiang Lian (Redaktør) ,"Web and Big Data: First International Joint Conference, APWeb-WAIM 2017, Proceedings, Part I"




2016 Top

Xie, X., P. Jin, M.-L. Yiu, J. Du, M. Yuan, C. S. Jensen, ,"Enabling Scalable Geographic Service Sharing with Weighted Imprecise Voronoi Cells" in in IEEE Transactions on Knowledge and Data Engineering, 28(2): 439–453,, 2016

Publication
Online at IEEE Xplore Digital Library

We provide techniques that enable a scalable so-called Volunteered Geographic Services system. This system targets the increasing populations of online mobile users, e.g., smartphone users, enabling such users to provide location-based services to each other, thus enabling citizen reporter or citizen as a sensor scenarios. More specifically, the system allows users to register as service volunteers, or micro-service providers, by accepting service descriptions and periodically updated locations from such volunteers; and the system allows users to subscribe to notifications of available, nearby relevant services by accepting subscriptions, formalized as continuous queries, that take service preferences and user locations as arguments and return relevant services. Services are ranked according to their relevance and distance to a query, and the highest ranked services are returned. The key challenge addressed is that of scalably providing up-to-date results to queries when the query locations change continuously. This is achieved by the proposal of a new so-called safe-zone model. With safe zones, query results are accompanied by safe zones with the property that a query result remains the same for all locations in its safe zone. Then, query users need only notify the system when they exit their current safe zone. Existing safe-zone models fall short in the papers setting. The new model is enabled by (i) weighted and (ii) set weighted imprecise Voronoi cells. The paper covers underlying concepts, properties, and algorithms, and it covers applications in VGS tracking and presents findings of empirical performance studies.


Lu, H., C. Guo, B. Yang, C. S. Jensen, ,"Finding Frequently Visited Indoor POIs Using Symbolic Indoor Tracking Data" in in Proceedings of the 19th International Conference on Extending Database Technology, Bordeaux, France, pp. 449–460,, 2016

Publication
Online at OpenProceedings

Indoor tracking data is being amassed due to the deployment of indoor positioning technologies. Analysing such data discloses useful insights that are otherwise hard to obtain. For example, by studying tracking data from an airport, we can identify the shops and restaurants that are most popular among passengers. In this paper, we study two query types for finding frequently visited Points of Interest (POIs) from symbolic indoor tracking data. The snapshot query finds those POIs that were most frequently visited at a given time point, whereas the interval query finds such POIs for a given time interval. A typical example of symbolic tracking is RFID-based tracking, where an object with an RFID tag is detected by an RFID reader when the object is in the reader’s detection range. A symbolic indoor tracking system deploys a limited number of proximity detection devices, like RFID readers, at preselected locations, covering only part of the host indoor space. Consequently, symbolic tracking data is inherently uncertain and only enables the discrete capture of the trajectories of indoor moving objects in terms of coarse regions. We provide uncertainty analyses of the data in relation to the two kinds of queries. The outcomes of the analyses enable us to design processing algorithms for both query types. An experimental evaluation with both real and synthetic data suggests that the framework and algorithms enable efficient and scalable query processing.


2015 Top

Kaul, M., R. C.-W. Wong, C. S. Jensen, ,"New Lower and Upper Bounds for Shortest Distance Queries on Terrains" in in Proceedings of the VLDB Endowment, 9(3): 168–179,, 2015

Publication
Online at VLDB

The increasing availability of massive and accurate laser data enables the processing of spatial queries on terrains. As shortest-path computation, an integral element of query processing, is inherently expensive on terrains, a key approach to enabling efficient query processing is to reduce the need for exact shortest-path computation in query processing. We develop new lower and upper bounds on terrain shortest distances that are provably tighter than any existing bounds. Unlike existing bounds, the new bounds do not rely on the quality of the triangulation. We show how use of the new bounds speeds up query processing by reducing the need for exact distance computations. Speedups of of nearly an order of magnitude are demonstrated empirically for well-known spatial queries


Chen, L., Y. Gao, Z. Xing, C. S. Jensen, G. Chen, ,"I2RS: A Distributed Geo-Textual Image Retrieval and Recommendation System" in in Proceedings of the VLDB Endowment, 8(12): 1884–1887, (demo paper),, 2015

Publication
Online at VLDB

Massive amounts of geo-tagged and textually annotated images are provided by online photo services such as Flickr and Zommr. However, most existing image retrieval engines only consider text annotations. We present I2RS, a system that allows users to view geo-textual images on Google Maps, find hot topics within a specific geographic region and time period, retrieve images similar to a query image, and receive recommended images that they might be interested in. I2RS is a distributed geo-textual image retrieval and recommendation system that employs SPB-trees to index geotextual images, and that utilizes metric similarity queries, including top-m spatio-temporal range and k nearest neighbor queries, to support geo-textual image retrieval and recommendation. The system adopts the browser-server model, whereas the server is deployed in a distributed environment that enables efficiency and scalability to huge amounts of data and requests. A rich set of 100 million geo-textual images crawled from Flickr is used to demonstrate that, I2RS can return high-quality answers in an interactive way and support efficient updates for high image arrival rates.


Jin, P., X. Xie, C. S. Jensen, Y. Jin, L. Yue, ,"HAG: An Energy-Proportional Data Storage Scheme for Disk Array Systems" in Journal of Computer Science and Technology, 30(4): 679–695,, 2015

Publication
Online at Springer

Energy consumption has been a critical issue for data storage systems, especially for modern data centers. A recent survey has showed that power costs amount to about 50% of the total cost of ownership in a typical data center, with about 27% of the system power being consumed by storage systems. This paper aims at providing an effective solution to reducing the energy consumed by disk storage systems, by proposing a new approach to reduce the energy consumption. Differing from previous approaches, we adopt two new designs. 1) We introduce a hotness-aware and group-based system model (HAG) to organize the disks, in which all disks are partitioned into a hot group and a cold group. We only make file migration between the two groups and avoid the migration within a single group, so that we are able to reduce the total cost of file migration. 2) We use an on-demand approach to reorganize files among the disks that is based on workload change as well as the change of data hotness. We conduct trace-driven experiments involving two real and nine synthetic traces and we make detailed comparisons between our method and competitor methods according to different metrics. The results show that our method can dynamically select hot files and disks when the workload changes and that it is able to reduce energy consumption for all the traces. Furthermore, its time performance is comparable to that of the compared algorithms. In general, our method exhibits the best energy efficiency in all experiments, and it is capable of maintaining an improved trade-off between performance and energy consumption.


Skovsgaard, A., C. S. Jensen, ,"Finding top-k relevant groups of spatial web objects" in in The VLDB Journal, 24(4): 537–555,, 2015

Publication
Online at Springer

The web is increasingly being accessed from geo-positioned devices such as smartphones, and rapidly increasing volumes of web content are geo-tagged. In addition, studies show that a substantial fraction of all web queries has local intent. This development motivates the study of advanced spatial keyword-based querying of web content. Previous research has primarily focused on the retrieval of the top-k individual spatial web objects that best satisfy a query specifying a location and a set of keywords. This paper proposes a new type of query functionality that returns top-k groups of objects while taking into account aspects such as group density, distance to the query, and relevance to the query keywords. To enable efficient processing, novel indexing and query processing techniques for single and multiple keyword queries are proposed. Empirical performance studies with an implementation of the techniques and real data suggest that the proposals are viable in practical settings.


Guo, C., B. Yang, O. Andersen, C. S. Jensen, K. Torp, ,"EcoMark 2.0: empowering eco-routing with vehicular environmental models and actual vehicle fuel consumption data" in in GeoInformatica, 19(3): 567–599,, 2015

Publication
Online at Springer

Eco-routing is a simple yet effective approach to substantially reducing the environmental impact, e.g., fuel consumption and greenhouse gas (GHG) emissions, of vehicular transportation. Eco-routing relies on the ability to reliably quantify the environmental impact of vehicles as they travel in a spatial network. The procedure of quantifying such vehicular impact for road segments of a spatial network is called eco-weight assignment. EcoMark 2.0 proposes a general framework for eco-weight assignment to enable eco-routing. It studies the abilities of six instantaneous and five aggregated models to estimating vehicular environmental impact. In doing so, it utilizes travel information derived from GPS trajectories (i.e., velocities and accelerations) and actual fuel consumption data obtained from vehicles. The framework covers analyses of actual fuel consumption, impact model calibration, and experiments for assessing the utility of the impact models in assigning eco-weights. The application of EcoMark 2.0 indicates that the instantaneous model EMIT and the aggregated model SIDRA-Running are suitable for assigning eco-weights under varying circumstances. In contrast, other instantaneous models should not be used for assigning eco-weights, and other aggregated models can be used for assigning eco-weights under certain circumstances.


Cao, X., G. Cong, C. S. Jensen, ,"Efficient Processing of Spatial Group Keyword Queries" in in ACM Transactions on Database Systems, 40(2), Article 13, 48 pages,, 2015

Publication
ACM Author-Izer

With the proliferation of geo-positioning and geo-tagging techniques, spatio-textual objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group together satisfy a query. We define the problem of retrieving a group of spatio-textual objects such that the group's keywords cover the query's keywords and such that the objects are nearest to the query location and have the smallest inter-object distances. Specifically, we study three instantiations of this problem, all of which are NP-hard. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. In addition, we solve the problems of retrieving top-k groups of three instantiations, and study a weighted version of the problem that incorporates object weights. We present empirical studies that offer insight into the efficiency of the solutions, as well as the accuracy of the approximate solutions.


Shang, S., K. Zheng, C. S. Jensen, B. Yang, P. Kalnis, G. Li, J.-R. Wen, ,"Discovery of Path Nearby Clusters in Spatial Networks" in in IEEE Transactions on Knowledge and Data Engineering, 27(6): 1505–1518,, 2015

Publication
Online at IEEE

The discovery of regions of interest in large cities is an important challenge. We propose and investigate a novel query called the path nearby cluster (PNC) query that finds regions of potential interest (e.g., sightseeing places and commercial districts) with respect to a user-specified travel route. Given a set of spatial objects O (e.g., POIs, geo-tagged photos, or geo-tagged tweets) and a query route q, if a cluster c has high spatial-object density and is spatially close to q, it is returned by the query (a cluster is a circular region defined by a center and a radius). This query aims to bring important benefits to users in popular applications such as trip planning and location recommendation. Efficient computation of the PNC query faces two challenges: how to prune the search space during query processing, and how to identify clusters with high density effectively. To address these challenges, a novel collective search algorithm is developed. Conceptually, the search process is conducted in the spatial and density domains concurrently. In the spatial domain, network expansion is adopted, and a set of vertices are selected from the query route as expansion centers. In the density domain, clusters are sorted according to their density distributions and they are scanned from the maximum to the minimum. A pair of upper and lower bounds are defined to prune the search space in the two domains globally. The performance of the PNC query is studied in extensive experiments based on real and synthetic spatial data.


Yang, B., C. Guo, Y. Ma, C. S. Jensen, ,"Towards Personalized, Context-Aware Routing" in in The VLDB Journal, 24(2): 297–318,, 2015

Publication
Online at Springer

A driver’s choice of a route to a destination may depend on the route’s length and travel time, but a multitude of other, possibly hard-to-formalize aspects, may also factor into the driver’s decision. There is evidence that a driver’s choice of route is context dependent, e.g., varies across time, and that route choice also varies from driver to driver. In contrast, conventional routing services support little in the way of context dependence, and they deliver the same routes to all drivers. We study how to identify context-aware driving preferences for individual drivers from historical trajectories, and thus how to provide foundations for personalized navigation, but also professional driver education and traffic planning. We provide techniques that are able to capture time-dependent and uncertain properties of dynamic travel costs, such as travel time and fuel consumption, from trajectories, and we provide techniques capable of capturing the driving behaviors of different drivers in terms of multiple dynamic travel costs. Further, we propose techniques that are able to identify a driver’s contexts and then to identify driving preferences for each context using historical trajectories from the driver. Empirical studies with a large trajectory data set offer insight into the design properties of the proposed techniques and suggest that they are effective.


Wu, D., B. Choi, J. Xu, C. S. Jensen, ,"Authentication of Moving Top-k Spatial Keyword Queries" in in IEEE Transactions on Knowledge and Data Engineering, 27(4): 922–935,, 2015

Publication
Online at IEEE

A moving top-k spatial keyword (MkSK) query, which takes into account a continuously moving query location, enables a mobile client to be continuously aware of the top-k spatial web objects that best match a query with respect to location and text relevance. The increasing mobile use of the web and the proliferation of geo-positioning render it of interest to consider a scenario where spatial keyword search is outsourced to a separate service provider capable at handling the voluminous spatial web objects available from various sources. A key challenge is that the service provider may return inaccurate or incorrect query results (intentionally or not), e.g., due to cost considerations or invasion of hackers. Therefore, it is attractive to be able to authenticate the query results at the client side. Existing authentication techniques are either inefficient or inapplicable for the kind of query we consider. We propose new authentication data structures, the MIR-tree and MIR*-tree, that enable the authentication of MkSK queries at low computation and communication costs. We design a verification object for authenticating MkSK queries, and we provide algorithms for constructing verification objects and using these for verifying query results. A thorough experimental study on real data shows that the proposed techniques are capable of outperforming two baseline algorithms by orders of magnitude.


Keles, I, S. Saltenis, C. S. Jensen, ,"Synthesis of Partial Rankings of Points of Interest Using Crowdsourcing" in in Proceedings of the Ninth Workshop on Geographic Information Retrieval, Paris, France, article 15, 10 pages,, 2015

Publication
ACM Author-Izer

The web is increasingly being accessed from mobile devices, and studies suggest that a large fraction of keyword-based search engine queries have local intent, meaning that users are interested in local content and that the underlying ranking function should take into account both relevance to the query keywords and the query location. A key challenge in being able to make progress on the design of ranking functions is to be able to assess the quality of the results returned by ranking functions. We propose a model that synthesizes a ranking of points of interest from answers to crowdsourced pairwise relevance questions. To evaluate the model, we propose an innovative methodology that enables evaluation of the quality of synthesized rankings in a simulated setting. We report on an experimental evaluation based on the methodology that shows that the proposed model produces promising results in pertinent settings and that it is capable of outperforming an approach based on majority voting.


Silvestri, C., F. Lettich, S. Orlando, C. S. Jensen, ,"A wait-free output data structure for GPU-based streaming query processing" in in Proceedings of the 23rd Italian Symposium on Advanced Database Systems, Gaeta, Italy, pp. 232–239,, 2015

Publication

The performance of GPU-based algorithms can be reduced significantly by contention among memory accesses and by locking. We focus on highvolume output in GPU-based algorithms for streaming query processing: a very large number of cores process input streams and simultaneously produce a sustained output stream whose volume is sometimes orders of magnitude larger than that of the input streams. In this context, several cores can produce results simultaneously that must be written in the output buffer according to some order and without conflicts with other writers. To enable this behavior, we propose a waitfree bitmap-based data structure and a usage pattern that combine to obviate the use of locks and atomic operations. In our experiments, where the GPU-based algorithm considered is otherwise unchanged, the introduction of the new wait-free data structure entails a performance improvement of one order of magnitude.


Čeikute, V., C. S. Jensen, ,"Vehicle Routing With User-Generated Trajectory Data" in in Proceedings of the Sixteenth IEEE International Conference on Mobile Data Management - Volume I, Pittsburgh, PA, pp. 14–23,, 2015

Publication
Online at IEEE

Rapidly increasing volumes of GPS data collected from vehicles provide new and increasingly comprehensive insight into the routes that drivers prefer. While routing services generally compute shortest or fastest routes, recent studies suggest that local drivers often prefer routes that are neither shortest nor fastest, indicating that drivers value route properties that are diverse and hard to quantify or even identify. We propose a routing service that uses an existing routing service while exploiting the availability of historical route usage data from local drivers. Given a source and destination, the service recommends a corresponding route that is most preferred by local drivers. It uses a route preference function that takes into account the number of distinct drivers and the number of trips associated with a route, as well as temporal aspects of the trips. The paper provides empirical studies with real route usage data and an existing online routing service.


Chen, L., Y. Gao, C. S. Jensen, X. Li, B. Zheng, G. Chen, ,"Indexing Metric Uncertain Data for Range Queries" in in Proceedings of the 2015 ACM SIGMOD International Conference on the Management of Data, Melbourne, Vic., Australia, pp. 951–965,, 2015

Publication
ACM Author-Izer

Range queries in metric spaces have applications in many areas such as multimedia retrieval, computational biology, and location-based services, where metric uncertain data exists in different forms, resulting from equipment limitations, high-throughput sequencing technologies, privacy preservation, or others. In this paper, we represent metric uncertain data by using an object-level model and a bi-level model, respectively. Two novel indexes, the uncertain pivot B+-tree (UPB-tree) and the uncertain pivot B+-forest (UPB-forest), are proposed accordingly in order to support probabilistic range queries w.r.t. a wide range of uncertain data types and similarity metrics. Both index structures use a small set of effective pivots chosen based on a newly defined criterion, and employ the B+-tree(s) as the underlying index. By design, they are easy to be integrated into any existing DBMS. In addition, we present efficient metric probabilistic range query algorithms, which utilize the validation and pruning techniques based on our derived probability lower and upper bounds. Extensive experiments with both real and synthetic data sets demonstrate that, compared against existing state-of-the-art indexes for metric uncertain data, the UPB-tree and UPB-forest incur much lower construction costs, consume smaller storage spaces, and can support more efficient metric probabilistic range queries.


Guo, C., B. Yang, O. Andersen, C. S. Jensen, K. Torp, ,"EcoSky: An Eco Routing System for Reducing the Vehicular Environmental Impact" in in Proceedings of the 31st IEEE International Conference on Data Engineering, Seoul, South Korea, pp. 1412–1415,, 2015

Publication
Online at IEEE

Reduction in greenhouse gas emissions from transportation attracts increasing interest from governments, fleet managers, and individual drivers. Eco-routing, which enables drivers to use eco-friendly routes, is a simple and effective approach to reducing emissions from transportation. We present EcoSky, a system that annotates edges of a road network with time dependent and uncertain eco-weights using GPS data and that supports different types of eco-routing. Basic eco-routing returns the most eco-friendly routes; skyline eco-routing takes into account not only fuel consumption but also travel time and distance when computing eco-routes; and personalized eco-routing considers each driver's past behavior and accordingly suggests different routes to different drivers.


Aljubayrin, S., J. Qi, C. S. Jensen, R. Zhang, Z. He, Z. Wen, ,"The Safest Path via Safe Zones" in in Proceedings of the 31st IEEE International Conference on Data Engineering, Seoul, South Korea, pp. 531–542,, 2015

Publication
Online at IEEE

We define and study Euclidean and spatial network variants of a new path finding problem: given a set of safe zones, find paths that minimize the distance traveled outside the safe zones. In this problem, the entire space with the exception of the safe zones is unsafe, but passable, and it differs from problems that involve unsafe regions to be strictly avoided. As a result, existing algorithms are not effective solutions to the new problem. To solve the Euclidean variant, we devise a transformation of the continuous data space with safe zones into a discrete graph upon which shortest path algorithms apply. A naive transformation yields a very large graph that is expensive to search. In contrast, our transformation exploits properties of hyperbolas in the Euclidean space to safely eliminate graph edges, thus improving performance without affecting the shortest path results. To solve the spatial network variant, we propose a different graph-to-graph transformation that identifies critical points that serve the same purpose as do the hyperbolas, thus avoiding the creation of extraneous edges. This transformation can be extended to support a weighted version of the problem, where travel in safe zones has non-zero cost. We conduct extensive experiments using both real and synthetic data. The results show that our approaches outperform baseline approaches by more than an order of magnitude in graph construction time, storage space and query response time.


Chen, L., Y. Gao, X. Li, C. S. Jensen, G. Chen, ,"Efficient Metric Indexing for Similarity Search" in in Proceedings of the 31st IEEE International Conference on Data Engineering, Seoul, South Korea, pp. 591–602,, 2015

Publication
Online at IEEE

The goal in similarity search is to find objects similar to a specified query object given a certain similarity criterion. Although useful in many areas, such as multimedia retrieval, pattern recognition, and computational biology, to name but a few, similarity search is not yet supported well by commercial DBMS. This may be due to the complex data types involved and the needs for flexible similarity criteria seen in real applications. We propose an efficient disk-based metric access method, the Space-filling curve and Pivot-based B+-tree (SPB-tree), to support a wide range of data types and similarity metrics. The SPB-tree uses a small set of so-called pivots to reduce significantly the number of distance computations, uses a space-filling curve to cluster the data into compact regions, thus improving storage efficiency, and utilizes a B+-tree with minimum bounding box information as the underlying index. The SPB-tree also employs a separate random access file to efficiently manage a large and complex data. By design, it is easy to integrate the SPB-tree into an existing DBMS. We present efficient similarity search algorithms and corresponding cost models based on the SPB-tree. Extensive experiments using real and synthetic data show that the SPB-tree has much lower construction cost, smaller storage size, and can support more efficient similarity queries with high accuracy cost models than is the case for competing techniques. Moreover, the SPB-tree scales sublinearly with growing dataset size.


Chen, L., X. Lin, H. Hu, C. S. Jensen, J. Xu, ,"Answering Why-Not Questions on Spatial Keyword Top-k Queries" in in Proceedings of the 31st IEEE International Conference on Data Engineering, Seoul, South Korea, pp. 279–290,, 2015

Publication
Online at IEEE

Large volumes of geo-tagged text objects are available on the web. Spatial keyword top-k queries retrieve k such objects with the best score according to a ranking function that takes into account a query location and query keywords. In this setting, users may wonder why some known object is unexpectedly missing from a result; and understanding why may aid users in retrieving better results. While spatial keyword querying has been studied intensively, no proposals exist for how to offer users explanations of why such expected objects are missing from results. We provide techniques that allow the revision of spatial keyword queries such that their results include one or more desired, but missing objects. In doing so, we adopt a query refinement approach to provide a basic algorithm that reduces the problem to a two-dimensional geometrical problem. To improve performance, we propose an index-based ranking estimation algorithm that prunes candidate results early. Extensive experimental results offer insight into design properties of the proposed techniques and suggest that they are efficient in terms of both running time and I/O cost.


Jensen, C. S., C. Jermaine, X. Zhou, editors, ,"Special Section on the International Conference on Data Engineering" in IEEE Transaction on Knowledge and Data Engineering, 27(7), 99 pages,, 2015

Publication
Online at IEEE



Jensen, C. S., X. Xie, V. I. Zadorozhny, S. Madria, E. Pitoura, B. Zheng, C.-Y. Chow, editors, , in Proceedings of the Sixtenth International Conference on Mobile Data Management - Volume I, Pittsburgh, PA, USA, 332+xxvii pages,, 2015

Online at IEEE



Jensen, C. S., X. Xie, V. I. Zadorozhny, S. Madria, E. Pitoura, B. Zheng, C.-Y. Chow, editors, , in Proceedings of the Sixtenth International Conference on Mobile Data Management - Volume II, Pittsburgh, PA, USA, 130+xiv pages,, 2015

Online at IEEE



Candan, K. S., C. S. Jensen, M. Parashar, K. D. Ryu, H. Yeom, editors, , in Proceeedings of the 2015 IEEE International Conference on Cloud Engineering, Tempe, AZ, USA, 514+xxix pages,, 2015

Online at IEEE



Jensen, C. S., ,"Keyword-Based Querying of Geo-Tagged Web Content" in in Proceedings of the Fifth International Conference on Model & Data Engineering, Rhodes, Greece, p. XIII,, 2015

Publication
Online at Springer

The web is being accessed increasingly by users for which an accurate geo-location is available, and increasing volumes of geo-tagged content are available on the web, including web pages, points of interest, and microblog posts. Studies suggest that each week, several billions of keyword-based queries are issued that have some form of local intent and that target geo-tagged web content with textual descriptions. This state of affairs gives prominence to spatial web data management, and it opens to a research area full of new and exciting opportunities and challenges. A prototypical spatial web query takes a user location and user-supplied keywords as arguments, and it returns content that is spatially and textually relevant to these arguments. Due perhaps to the rich semantics of geographical space and its importance to our daily lives, many different kinds of relevant spatial web query functionality may be envisioned. Based on recent and ongoing work by the speaker and his colleagues, the talk presents key functionality, concepts, and techniques relating to spatial web querying; it presents functionality that addresses different kinds of user intent; and it outlines directions for the future development of keyword-based spatial web querying.


Jensen, C. S., ,"Querying of Geo-TextualWeb Content: Concepts and Techniques" in in Proceedings of the Sixteenth IEEE International Conference on Mobile Data Management - Volume II, Pittsburgh, PA, pp. 1–2,, 2015

Publication
Online at IEEE



Qu, Q., C. Chen, C. S. Jensen, A. Skovsgaard, ,"Space-Time Aware Behavioral Topic Modeling for Microblog Posts" in in X. Zhou (ed.): Special Issue on Location-based Social Media Analysis, IEEE Data Engineering Bulletin, 38(2): 58–67, Invited paper,, 2015

Publication

How can we automatically identify the topics of microblog posts? This question has received substantial attention in the research community and has led to the development of different topic models, which are mathematically well-founded statistical models that enable the discovery of topics in document collections. Such models can be used for topic analyses according to the interests of user groups, time, geographical locations, or social behavior patterns. The increasing availability of microblog posts with associated users, textual content, timestamps, geo-locations, and user behaviors, offers an opportunity to study space-time dependent behavioral topics. Such a topic is described by a set of words, the distribution of which varies according to the time, geo-location, and behaviors (that capture how a user interacts with other users by using functionality such as reply or re-tweet) of users. This study jointly models user topic interest and behaviors considering both space and time at a fine granularity. We focus on the modeling of microblog posts like Twitter tweets, where the textual content is short, but where associated information in the form of timestamps, geo-locations, and user interactions is available. The model aims to have applications in location inference, link prediction, online social profiling, etc. We report on experiments with tweets that offer insight into the design properties of the papers proposal.


Candan, K. S., C. S. Jensen, M. Parashar, K. D. Ryu, H. Yeom, ,"Guest Editors’ Introduction: Cloud Engineering" in IEEE Cloud Computing, 2(5): 6–8,, 2015

Publication
Online at IEEE

Cloud engineering leverages innovations from a diverse spectrum of disciplines, from computer science and engineering to business informatics, toward the holistic treatment of key technical and business issues related to clouds.


Donald, K., A. Ailamaki, M. Balazinska, K. S. Candan, Y. Diao, C. Dyreson, Y. Ioanidis, C. S. Jensen, T. Milo, F. Spinola, ,"Letter from the SIGMOD Executive Committee" in ACM SIGMOD Record, 44(3): 5–6,, 2015

Online at SIGMOD



Jensen, C. S., C. Jermaine, X. Zhou, ,"Guest Editorial: Special Section on the International Conference on Data Engineering" in IEEE Transactions on Knowledge and Data Engineering, 27(7): 1739–1740,, 2015

Publication



Jensen, C. S., ,"Editorial: The Best of Two Worlds – Present Your TODS Paper at SIGMOD" in ACM Transactions on Database Systems, 40(2), Article 7, 2 pages,, 2015

Publication
ACM Author-Izer



Jensen, C. S., X. Xie, V. I. Zadorozhny, ,"Message from the General Co-chairs" in in Proceedings of the Sixtenth International Conference on Mobile Data Management - Volume 1, Pittsburgh, PA, USA, pp. xi–xii,, 2015

Publication
Online at IEEE



Jensen, C. S., ,"Changes to the TODS Editorial Board" in ACM SIGMOD Record, 44(1): 5,, 2015

Publication
ACM Author-Izer



Jensen, C. S., M. Parashar, H. Yeom, ,"IC2E 2015: Message from the Program Chairs" in in Proceeedings of the 2015 IEEE International Conference on Cloud Engineering, Tempe, AZ, USA, p. xiv,, 2015

Publication
Online at IEEE



Jensen, C. S., ,"Editorial: Updates to the Editorial Board" in ACM Transactions on Database Systems, 40(1), article 1e, 1 pages,, 2015

Publication
Online at ACM Digital Library



Dai, J., B. Yang, C. Guo, C. S. Jensen, ,"Efficient and Accurate Path Cost Estimation Using Trajectory Data" in Technical Report, October 2015, 16 pages. arXiv:1510.02886 [cs.DB],, 2015

Online at Cornell University Library

Using the growing volumes of vehicle trajectory data, it becomes increasingly possible to capture time-varying and uncertain travel costs in a road network, including travel time and fuel consumption. The current paradigm represents a road network as a graph, assigns weights to the graph's edges by fragmenting trajectories into small pieces that fit the underlying edges, and then applies a routing algorithm to the resulting graph. We propose a new paradigm that targets more accurate and more efficient estimation of the costs of paths by associating weights with sub-paths in the road network. The paper provides a solution to a foundational problem in this paradigm, namely that of computing the time-varying cost distribution of a path. The solution consists of several steps. We first learn a set of random variables that capture the joint distributions of sub-paths that are covered by sufficient trajectories. Then, given a departure time and a path, we select an optimal subset of learned random variables such that the random variables' corresponding paths together cover the path. This enables accurate joint distribution estimation of the path, and by transferring the joint distribution into a marginal distribution, the travel cost distribution of the path is obtained. The use of multiple learned random variables contends with data sparseness, the use of multi-dimensional histograms enables compact representation of arbitrary joint distributions that fully capture the travel cost dependencies among the edges in paths. Empirical studies with substantial trajectory data from two different cities offer insight into the design properties of the proposed solution and suggest that the solution is effective in real-world settings.


2014 Top

Guo, C., C. S. Jensen, B. Yang, ,"Towards Total Traffic Awareness" in in ACM SIGMOD Record, 43(3): 18–23,, 2014

Publication
ACM Author-Izer

A combination of factors render the transportation sector a highly desirable area for data management research. The transportation sector receives substantial investments and is of high societal interest across the globe. Since there is limited room for new roads, smarter use of the existing infrastructure is of essence. The combination of the continued proliferation of sensors and mobile devices with the drive towards open data will result in rapidly increasing volumes of data becoming available. The data management community is well positioned to contribute to building a smarter transportation infrastructure. We believe that efficient management and effective analysis of big transportation data will enable us to extract transportation knowledge, which will bring significant and diverse benefits to society. We describe the data, present key challenges related to the extraction of thorough, timely, and trustworthy traffic knowledge to achieve total traffic awareness, and we outline services that may be enabled. It is thus our hope that the paper will inspire data management researchers to address some of the many challenges in the transportation area.


Šidlauskas, D, C. S. Jensen, ,"Spatial Joins in Main Memory: Implementation Matters!" in in Proceedings of the VLDB Endowment, 8(1): 97–100, (Experiment and Analysis Paper),, 2014

Publication
Online at ACM Digital Library

A recent PVLDB paper reports on experimental analyses of ten spatial join techniques in main memory. We build on this comprehensive study to raise awareness of the fact that empirical running time performance findings in main-memory settings are results of not only the algorithms and data structures employed, but also their implementation, which complicates the interpretation of the results. In particular, we re-implement the worst performing technique without changing the underlying high-level algorithm, and we then offer evidence that the resulting re-implementation is capable of outperforming all the other techniques. This study demonstrates that in main memory, where no time-consuming I/O can mask variations in implementation, implementation details are very important; and it offers a concrete illustration of how it is difficult to make conclusions from empirical running time performance findings in main-memory settings about data structures and algorithms studied.


Šidlauskas, D., S. Šaltenis, C. S. Jensen, ,"Processing of Extreme Moving-Object Update and Query Workloads in Main Memory" in in The VLDB Journal, 23(5): 817–841, (Extended version of [146].),, 2014

Publication
Online at Springer

The efficient processing of workloads that interleave moving-object updates and queries is challenging. In addition to the conflicting needs for update-efficient versus query-efficient data structures, the increasing parallel capabilities of multi-core processors yield challenges. To prevent concurrency anomalies and to ensure correct system behavior, conflicting update and query operations must be serialized. In this setting, it is a key concern to avoid that operations are blocked, which leaves processing cores idle. To enable efficient processing, we first examine concurrency degrees from traditional transaction processing in the context of our target domain and propose new semantics that enable a high degree of parallelism and ensure up-to-date query results. We define the new semantics for range and kk-nearest neighbor queries. Then, we present a main-memory indexing technique called parallel grid that implements the proposed semantics as well as two other variants supporting different semantics. This enables us to quantify the effects that different degrees of consistency have on performance. We also present an alternative time-partitioning approach. Empirical studies with the above and three existing proposals conducted on modern processors show that our proposals scale near-linearly with the number of hardware threads and thus are able to benefit from increasing on-chip parallelism.


Yang, B., M. Kaul, C. S. Jensen, ,"Using Incomplete Information for Complete Weight Annotation of Road Networks" in in IEEE Transactions on Knowledge and Data Engineering, 26(5): 1267–1279,, 2014

Publication
Online at IEEE

We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using a graph model for routing that all edges have weights. Weights that capture travel times and GHG emissions can be extracted from GPS trajectory data collected from the network. However, GPS trajectory data typically lack the coverage needed to assign weights to all edges. This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost. A general framework is proposed to solve the problem. Specifically, the problem is modeled as a regression problem and solved by minimizing a judiciously designed objective function that takes into account the topology of the road network. In particular, the use of weighted PageRank values of edges is explored for assigning appropriate weights to all edges, and the property of directional adjacency of edges is also taken into account to assign weights. Empirical studies with weights capturing travel time and GHG emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark) offer insight into the design properties of the proposed techniques and offer evidence that the techniques are effective.


Shang, S., R. Ding, K. Zheng, C. S. Jensen, P. Kalnis, X. Zhou, ,"Personalized Trajectory Matching in Spatial Networks" in in The VLDB Journal, 23(3): 449–468,, 2014

Publication
Online at Springer

With the increasing availability of moving-object tracking data, trajectory search and matching is increasingly important. We propose and investigate a novel problem called personalized trajectory matching (PTM). In contrast to conventional trajectory similarity search by spatial distance only, PTM takes into account the significance of each sample point in a query trajectory. A PTM query takes a trajectory with user-specified weights for each sample point in the trajectory as its argument. It returns the trajectory in an argument data set with the highest similarity to the query trajectory. We believe that this type of query may bring significant benefits to users in many popular applications such as route planning, carpooling, friend recommendation, traffic analysis, urban computing, and location-based services in general. PTM query processing faces two challenges: how to prune the search space during the query processing and how to schedule multiple so-called expansion centers effectively. To address these challenges, a novel two-phase search algorithm is proposed that carefully selects a set of expansion centers from the query trajectory and exploits upper and lower bounds to prune the search space in the spatial and temporal domains. An efficiency study reveals that the algorithm explores the minimum search space in both domains. Second, a heuristic search strategy based on priority ranking is developed to schedule the multiple expansion centers, which can further prune the search space and enhance the query efficiency. The performance of the PTM query is studied in extensive experiments based on real and synthetic trajectory data sets.


Cao, X., G. Cong, C. S. Jensen, M. L. Yiu, ,"Retrieving Regions of Interest for User Exploration" in in Proceedings of the VLDB Endowment, 7(9): 733–744,, 2014

Publication
Online at VLDB

We consider an application scenario where points of interest (PoIs) each have a web presence and where a web user wants to identify a region that contains relevant PoIs that are relevant to a set of keywords, e.g., in preparation for deciding where to go to conveniently explore the PoIs. Motivated by this, we propose the lengthconstrained maximum-sum region (LCMSR) query that returns a spatial-network region that is located within a general region of interest, that does not exceed a given size constraint, and that best matches query keywords. Such a query maximizes the total weight of the PoIs in it w.r.t. the query keywords. We show that it is NPhard to answer this query. We develop an approximation algorithm with a (5 + ǫ) approximation ratio utilizing a technique that scales node weights into integers. We also propose a more efficient heuristic algorithm and a greedy algorithm. Empirical studies on real data offer detailed insight into the accuracy of the proposed algorithms and show that the proposed algorithms are capable of computing results efficiently and effectively.


Skovsgaard, A., C. S. Jensen, ,"Top-k Point of Interest Retrieval Using Standard Indexes" in in Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, TX, USA, pp. 172–182,, 2014

Publication
ACM Author-Izer

With the proliferation of Internet-connected, location-aware mobile devices, such as smartphones, we are also witnessing a proliferation and increased use of map-based services that serve information about relevant Points of Interest (PoIs) to their users. We provide an efficient and practical foundation for the processing of queries that take a keyword and a spatial region as arguments and return the k most relevant PoIs that belong to the region, which may be the part of the map covered by the user's screen. The paper proposes a novel technique that encodes the spatio-textual part of a PoI as a compact bit string. This technique extends an existing spatial encoding to also encode the textual aspect of a PoI in compressed form. The resulting bit strings may then be indexed using index structures such as B-trees or hashing that are standard in DBMSs and key-value stores. As a result, it is straightforward to support the proposed functionality using existing data management systems. The paper also proposes a novel top-k query algorithm that merges partial results while providing an exact result. An empirical study with real-world data indicates that the proposed techniques enable excellent indexing and query execution performance on a standard DBMS.


Rishede, J., M. L. Yiu, C. S. Jensen, ,"Concise Caching of Driving Instructions" in in Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, TX, USA, pp. 23–32,, 2014

Publication
ACM Author-Izer

Online driving direction services offer fundamental functionality to mobile users, and such services see substantial and increasing loads as mobile access continues to proliferate. Cache servers can be deployed in order to reduce the resulting network traffic. We define so-called concise shortest paths that are equivalent to driving instructions. A concise shortest path occupies much less space than a shortest path; yet it provides sufficient navigation information to mobile users. Then we propose techniques that enable the caching of concise shortest paths in order to improve the cache hit ratio. Interestingly, the use of concise shortest paths in caching has two opposite effects on the cache hit ratio. The cache can accommodate a larger number of concise paths, but each individual concise path contains fewer nodes and so may answer fewer shortest path queries. The challenge is to strike a balance between these two effects in order to maximize the overall cache hit ratio. In this paper, we revisit two classes of caching methods and develop effective caching techniques for concise paths. Empirical results on real trajectory-induced workloads confirm the effectiveness of the proposed techniques.


Qu, Q., S. Liu, C. S. Jensen, F. Zhu, C. Faloutsos, ,"Interestingness-Driven Diffusion Process Summarization in Dynamic Networks" in in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Part II, LNCS 8725, Nancy, France, pp. 597–613,, 2014

Publication
Online at Springer

The widespread use of social networks enables the rapid diffusion of information, e.g., news, among users in very large communities. It is a substantial challenge to be able to observe and understand such diffusion processes, which may be modeled as networks that are both large and dynamic. A key tool in this regard is data summarization. However, few existing studies aim to summarize graphs/networks for dynamics. Dynamic networks raise new challenges not found in static settings, including time sensitivity and the needs for online interestingness evaluation and summary traceability, which render existing techniques inapplicable. We study the topic of dynamic network summarization: how to summarize dynamic networks with millions of nodes by only capturing the few most interesting nodes or edges over time, and we address the problem by finding interestingness-driven diffusion processes. Based on the concepts of diffusion radius and scope, we define interestingness measures for dynamic networks, and we propose OSNet, an online summarization framework for dynamic networks. We report on extensive experiments with both synthetic and real-life data. The study offers insight into the effectiveness and design properties of OSNet.


Skovsgaard, A., D. Šidlauskas, C. S. Jensen, ,"A Clustering Approach to the Discovery of Points of Interest from Geo-Tagged Microblog Posts" in in Proceedings of the Fifteenth IEEE International Conference on Mobile Data Management, Brisbane, Australia, pp. 178–189,, 2014

Publication
Online at IEEE

Points of interest (PoI) data serves an important role as a foundation for a wide variety of location-based services. Such data is typically obtained from an authoritative source or from users through crowd sourcing. It can be costly to maintain an up-to-date authoritative source, and data obtained from users can vary greatly in coverage and quality. We are also witnessing a proliferation of both GPS-enabled mobile devices and geotagged content generated by users of such devices. This state of affairs motivates the paper's proposal of techniques for the automatic discovery of PoI data from geo-tagged microblog posts. Specifically, the paper proposes a new clustering technique that takes into account both the spatial and textual attributes of microblog posts to obtain clusters that represent PoIs. The technique expands clusters based on a proposed quality function that enables clusters of arbitrary shape and density. An empirical study with a large database of real geo-tagged microblog posts offers insight into the properties of the proposed techniques and suggests that they are effective at discovering real-world points of interest.


Qu, Q., S. Liu, B. Yang, C. S. Jensen, ,"Efficient Top-k Spatial Locality Search for Co-located Spatial Web Objects" in in Proceedings of the Fifteenth IEEE International Conference on Mobile Data Management, Brisbane, Australia, pp. 269–278,, 2014

Publication
Online at IEEE

In step with the web being used widely by mobile users, user location is becoming an essential signal in services, including local intent search. Given a large set of spatial web objects consisting of a geographical location and a textual description (e.g., Online business directory entries of restaurants, bars, and shops), how can we find sets of objects that are both spatially and textually relevant to a query? Most of existing studies solve the problem by requiring that all query keywords are covered by the returned objects and then rank the sets by spatial proximity. The needs for identifying sets with more textually relevant objects render these studies inapplicable. We propose locality Search, a query that returns top-k sets of spatial web objects and integrates spatial distance and textual relevance in one ranking function. We show that computing the query is NP-hard, and we present two efficient exact algorithms and one generic approximate algorithm based on greedy strategies for computing the query. We report on findings from an empirical study with three real-life datasets. The study offers insight into the efficiency and effectiveness of the proposed algorithms.


Qu, Q., S. Liu, B. Yang, C. S. Jensen, ,"Integrating Non-Spatial Preferences into Spatial Location Queries" in in Proceedings of the 26th International Conference on Scientific and Statistical Database Management, Aalborg, Denmark, article no. 8, 12 pages,, 2014

Publication
ACM Author-Izer

Increasing volumes of geo-referenced data are becoming available. This data includes so-called points of interest that describe businesses, tourist attractions, etc. by means of a geo-location and properties such as a textual description or ratings. We propose and study the efficient implementation of a new kind of query on points of interest that takes into account both the locations and properties of the points of interest. The query takes a result cardinality, a spatial range, and property-related preferences as parameters, and it returns a compact set of points of interest with the given cardinality and in the given range that satisfies the preferences. Specifically, the points of interest in the result set cover so-called allying preferences and are located far from points of interest that possess so-called alienating preferences. A unified result rating function integrates the two kinds of preferences with spatial distance to achieve this functionality. We provide efficient exact algorithms for this kind of query. To enable queries on large datasets, we also provide an approximate algorithm that utilizes a nearest-neighbor property to achieve scalable performance. We develop and apply lower and upper bounds that enable search-space pruning and thus improve performance. Finally, we provide a generalization of the above query and also extend the algorithms to support the generalization. We report on an experimental evaluation of the proposed algorithms using real point of interest data from Google Places for Business that offers insight into the performance of the proposed solutions.


Ma, Y., B. Yang, C. S. Jensen, ,"Enabling Time-Dependent Uncertain Eco-Weights For Road Networks" in in Proceedings of the 2014 Workshop on Managing and Mining Enriched Geo-Spatial Data, Snowbird, UT, USA, 6 pages.,, 2014

Publication
ACM Author-Izer

Reduction of greenhouse gas (GHG) emissions from transportation is an essential part of the efforts to prevent global warming and climate change. Eco-routing, which enables drivers to use the most environmentally friendly routes, is able to substantially reduce GHG emissions from vehicular transportation. The foundation of eco-routing is a weighted-graph representation of a road network in which road segments, or edges, are associated with eco-weights that capture the GHG emissions caused by traversing the edges. Due to the dynamics of traffic, the eco-weights are typically time dependent and uncertain. We formalize the problem of assigning a time-dependent, uncertain eco-weight to each edge in a road network. In particular, a sequence of histograms are employed to describe the uncertain eco-weight during different time intervals for each edge. Various compression techniques, including histogram merging and buckets reduction, are proposed to maintain compact histograms while achieving good accuracy. Histogram aggregation methods are proposed that use these to accurately estimate GHG emissions for routes. A comprehensive empirical study is conducted based on two years of GPS data from vehicles in order to gain insight into the effectiveness and efficiency of the proposed approach.


Radaelli, L., Y. Moses, C. S. Jensen, ,"Using Cameras to Improve Wi-Fi Based Indoor Positioning" in in Proceedings of the Thirteenth International Symposium on Web and Wireless Geographical Information Systems, Seoul, South Korea, pp. 166–183,, 2014

Publication
Online at Springer

Indoor positioning systems are increasingly being deployed to enable indoor navigation and other indoor location-based services. Systems based on Wi-Fi and video cameras rely on different technologies and techniques and have so far been developed independently by different research communities; we show that integrating information provided by a video system into a Wi-Fi based system increases its maintainability and avoid drops in accuracy over time. Specifically, we consider a Wi-Fi system that uses fingerprints measurements collected in the space for positioning. We improve the system’s room-level accuracy by means of automatic, video-driven collection of fingerprints. Our method is able to relate a Wi-Fi user to unidentified movements detected by cameras by exploiting the existing Wi-Fi system, thus generating fingerprints automatically. This use of video for fingerprint collection reduces the need for manual collection and allows online updating of fingerprints. Hence, increasing system accuracy. We report on an empirical study that shows that automatic fingerprinting induces only few false positives and yields a substantial accuracy improvement.


Skovsgaard, A., D. ˇ Sidlauskas, C. S. Jensen, ,"Scalable Top-k Spatio-Temporal Term Querying" in in Proceedings of the 30th IEEE International Conference on Data Engineering, Chicago, IL, USA, pp. 148–159,, 2014

Publication
Online at IEEE

With the rapidly increasing deployment of Internet-connected, location-aware mobile devices, very large and increasing amounts of geo-tagged and timestamped user-generated content, such as microblog posts, are being generated. We present indexing, update, and query processing techniques that are capable of providing the top-k terms seen in posts in a user-specified spatio-temporal range. The techniques enable interactive response times in the millisecond range in a realistic setting where the arrival rate of posts exceeds today's average tweet arrival rate by a factor of 4-10. The techniques adaptively maintain the most frequent items at various spatial and temporal granularities. They extend existing frequent item counting techniques to maintain exact counts rather than approximations. An extensive empirical study with a large collection of geo-tagged tweets shows that the proposed techniques enable online aggregation and query processing at scale in realistic settings.


Yang, B., C. Guo, C. S. Jensen, M. Kaul, S. Shang, ,"Stochastic Skyline Route Planning under Time-Varying Uncertainty" in in Proceedings of the 30th IEEE International Conference on Data Engineering, Chicago, IL, USA, pp. 136–147,, 2014

Publication
Online at IEEE

Different uses of a road network call for the consideration of different travel costs: in route planning, travel time and distance are typically considered, and green house gas (GHG) emissions are increasingly being considered. Further, travel costs such as travel time and GHG emissions are time-dependent and uncertain. To support such uses, we propose techniques that enable the construction of a multi-cost, time-dependent, uncertain graph (MTUG) model of a road network based on GPS data from vehicles that traversed the road network. Based on the MTUG, we define stochastic skyline routes that consider multiple costs and time-dependent uncertainty, and we propose efficient algorithms to retrieve stochastic skyline routes for a given source-destination pair and a start time. Empirical studies with three road networks in Denmark and a substantial GPS data set offer insight into the design properties of the MTUG and the efficiency of the stochastic skyline routing algorithms.


Silvestri, C., F. Lettich, S. Orlando, C. S. Jensen, ,"GPU-based Computing of Repeated Range Queries over Moving Objects" in in Proceedings of the 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Turin, Italy, pp. 640–647,, 2014

Publication
Online at IEEE

In this paper we investigate the use of GPUs to solve a data-intensive problem that involves huge amounts of moving objects. The scenario which we focus on regards objects that continuously move in a 2D space, where a large percentage of them also issues range queries. The processing of these queries entails a large quantity of objects falling into the range queries to be returned. In order to solve this problem by maintaining a suitable throughput, we partition the time into ticks, and defer the parallel processing of all the objects events (location updates and range queries) occurring in a given tick to the next tick, thus slightly delaying the overall computation. We process in parallel all the events of each tick by adopting an hybrid approach, based on the combined use of CPU and GPU, and show the suitability of the method by discussing performance results. The exploitation of a GPU allow us to achieve a speedup of more than 20× on several datasets with respect to the best sequential algorithm solving the same problem. More importantly, we show that the adoption of new bitmap-based intermediate data structure we propose to avoid memory access contention entails a 10× speedup with respect to naive GPU based solutions.


Jensen, C. S., H. Lu, T. B. Pedersen, C. Thomsen, K. Torp, editors, , in Proceeedings of the 26th International Conference on Scientific and Statistical Database Management, Aalborg, Denmark,, 2014

Online at ACM Digital Library



Bhowmick, S., C. E. Dyreson, C. S. Jensen, M. L. Lee, A. Muliantara, B. Thalheim, editors, , in Proceedings of the 19th International Conference on Database Systems for Advanced Applications, Part I, LNCS 8421, Bali, Indonesia, 514+ xxv pages,, 2014

Online at Springer



Bhowmick, S., C. E. Dyreson, C. S. Jensen, M. L. Lee, A. Muliantara, B. Thalheim, editors, , in Proceedings of the 19th International Conference on Database Systems for Advanced Applications, Part II, LNCS 8422, Bali, Indonesia, 558+ xxvi pages,, 2014

Online at Springer



Jensen, C. S., A. Friis-Christensen, T. B. Pedersen, D. Pfoser, S. Šaltenis, N. Tryfona, ,"Location-Based Services—A Database Perspective" in Chapter 6, pp. 82–93 in Breaking New Ground - Dedicated to Finn Kjærsdam, edited by L. Dirckinck-Holmfeld, N.-H. Gylstorff, H. K. Krogstrup, L. Lange, E. H. Nielsen, E. Toft, and R. Ærø, Aalborg University Press, Reprint of [421] with a foreword,, 2014

Publication

We are heading rapidly towards a global computing and information infrastructure that will contain billions of wirelessly connected devices, many of which will offer so-called location-based services to their mobile users always and everywhere. Indeed, users will soon take ubiquitous wireless access to information and services for granted. This scenario is made possible by the rapid advances in the underlying hardware technologies, which continue to follow variants of Moore’s Law. Examples of location-based services include tracking, way-finding, traffic management, safety-related services, and mixed-reality games, to name but a few. This paper outlines a general usage scenario for location-based services, in which data warehousing plays a central role, and it describes central challenges to be met by the involved software technologies in order for them to reach their full potential for usage in location-based services.


Jensen, C. S., ,"Foreword to Invited Paper Issue" in ACM Transactions on Database Systems, 39(4), article 26, 2 pages,, 2014

Publication
Online at ACM Digital Library



Sheng, Q. Z., J. He, G. Wang, C. S. Jensen, ,"Guest editorial: Web technologies and applications" in World Wide Web, 17(4): 455-456,, 2014

Publication
Online at Springer



Jensen, C. S., ,"Foreword" in ACM Transactions on Database Systems, 39(3), article 18, 1 page,, 2014

Publication
Online at ACM Digital Library



Jensen, C. S., H. Lu, T. B. Pedersen, C. Thomsen, K. Torp, ,"Foreword" in in Proceeedings of the 26th International Conference on Scientific and Statistical Database Management, Aalborg, Denmark, 2 pages,, 2014

Publication
Online at ACM Digital Library

The International Conference on Scientific and Statistical Database Management (SSDBM) brings together scientific domain experts, database researchers, practitioners, and developers for the presentation and exchange of current research results on concepts, tools, and techniques for scientific and statistical database applications. This year, the 26th SSDBM takes place in Aalborg, Denmark, from June 30 to July 2, 2014.


Bhowmick, S., C. E. Dyreson, C. S. Jensen, ,"Preface" in in Proceedings of the 19th International Conference on Database Systems for Advanced Applications, Parts I and II, LNCS 8421, Bali, Indonesia, pp. v–vii,, 2014

Publication
Online at Springer



Lettich, F., S. Orlando, C. Silvestri, C. S. Jensen, ,"Manycore processing of repeated range queries over massive moving objects observations" in Technical Report, 36 pages. arXiv:1411.3212v1 [cs.DB] 12 Nov 2014,, 2014

Online at Cornell University Library

The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised systems to feature low latency and high scalability. In this paper we focus on a specific data-intensive problem, concerning the repeated processing of huge amounts of range queries over massive sets of moving objects, where the spatial extents of queries and objects are continuously modified over time. To tackle this problem and significantly accelerate query processing we devise a hybrid CPU/GPU pipeline that compresses data output and save query processing work. The devised system relies on an ad-hoc spatial index leading to a problem decomposition that results in a set of independent data-parallel tasks. The index is based on a point-region quadtree space decomposition and allows to tackle effectively a broad range of spatial object distributions, even those very skewed. Also, to deal with the architectural peculiarities and limitations of the GPUs, we adopt non-trivial GPU data structures that avoid the need of locked memory accesses and favour coalesced memory accesses, thus enhancing the overall memory throughput. To the best of our knowledge this is the first work that exploits GPUs to efficiently solve repeated range queries over massive sets of continuously moving objects, characterized by highly skewed spatial distributions. In comparison with state-of-the-art CPU-based implementations, our method highlights significant speedups in the order of 14x-20x, depending on the datasets, even when considering very cheap GPUs.


2013 Top

Li, X., V. Ceikute, C. S. Jensen, K.-L. Tan, ,"Effective Online Group Discovery in Trajectory Databases" in in IEEE Transactions on Knowledge and Data Engineering, 25(12): 2752–2766,, 2013

Publication
Online at IEEE

GPS-enabled devices are pervasive nowadays. Finding movement patterns in trajectory data stream is gaining in importance. We propose a group discovery framework that aims to efficiently support the online discovery of moving objects that travel together. The framework adopts a sampling-independent approach that makes no assumptions about when positions are sampled, gives no special importance to sampling points, and naturally supports the use of approximate trajectories. The framework's algorithms exploit state-of-the-art, density-based clustering (DBScan) to identify groups. The groups are scored based on their cardinality and duration, and the top-k groups are returned. To avoid returning similar subgroups in a result, notions of domination and similarity are introduced that enable the pruning of low-interest groups. Empirical studies on real and synthetic data sets offer insight into the effectiveness and efficiency of the proposed framework.


Kaul, M., R. C.-W. Wong, B. Yang, C. S. Jensen, ,"Finding Shortest Paths on Terrains by Killing Two Birds with One Stone" in in Proceedings of the VLDB Endowment, 7(1): 73–84,, 2013

Publication
Online at VLDB

With the increasing availability of terrain data, e.g., from aerial laser scans, the management of such data is attracting increasing attention in both industry and academia. In particular, spatial queries, e.g., k-nearest neighbor and reverse nearest neighbor queries, in Euclidean and spatial network spaces are being extended to terrains. Such queries all rely on an important operation, that of finding shortest surface distances. However, shortest surface distance computation is very time consuming. We propose techniques that enable efficient computation of lower and upper bounds of the shortest surface distance, which enable faster query processing by eliminating expensive distance computations. Empirical studies show that our bounds are much tighter than the best-known bounds in many cases and that they enable speedups of up to 43 times for some well-known spatial queries.


Bøgh, K. S., A. Skovsgaard, C. S. Jensen, ,"GroupFinder: A New Approach to Top-K Point-of-Interest Group Retrieval" in in Proceedings of the VLDB Endowment, 6(12): 1226–1229,, 2013

Publication
Online at VLDB

The notion of point-of-interest (PoI) has existed since paper road maps began to include markings of useful places such as gas stations, hotels, and tourist attractions. With the introduction of geopositioned mobile devices such as smartphones and mapping services such as Google Maps, the retrieval of PoIs relevant to a user’s intent has became a problem of automated spatio-textual information retrieval. Over the last several years, substantial research has gone into the invention of functionality and efficient implementations for retrieving nearby PoIs. However, with a couple of exceptions existing proposals retrieve results at single-PoI granularity. We assume that a mobile device user issues queries consisting of keywords and an automatically supplied geo-position, and we target the common case where the user wishes to find nearby groups of PoIs that are relevant to the keywords. Such groups are relevant to users who wish to conveniently explore several options before making a decision such as to purchase a specific product. Specifically, we demonstrate a practical proposal for finding top-k PoI groups in response to a query. We show how problem parameter settings can be mapped to options that are meaningful to users. Further, although this kind of functionality is prone to combinatorial explosion, we will demonstrate that the functionality can be supported efficiently in practical settings.


Yang, B., C. Guo, C. S. Jensen, ,"Travel Cost Inference from Sparse, Spatio-Temporally Correlated Time Series Using Markov Models" in in Proceedings of the VLDB Endowment, 6(9): 769–780,, 2013

Publication
Online at VLDB

The monitoring of a system can yield a set of measurements that can be modeled as a collection of time series. These time series are often sparse, due to missing measurements, and spatio-temporally correlated, meaning that spatially close time series exhibit temporal correlation. The analysis of such time series offers insight into the underlying system and enables prediction of system behavior. While the techniques presented in the paper apply more generally, we consider the case of transportation systems and aim to predict travel cost from GPS tracking data from probe vehicles. Specifi- cally, each road segment has an associated travel-cost time series, which is derived from GPS data. We use spatio-temporal hidden Markov models (STHMM) to model correlations among different traffic time series. We provide algorithms that are able to learn the parameters of an STHMM while contending with the sparsity, spatio-temporal correlation, and heterogeneity of the time series. Using the resulting STHMM, near future travel costs in the transportation network, e.g., travel time or greenhouse gas emissions, can be inferred, enabling a variety of routing services, e.g., eco-routing. Empirical studies with a substantial GPS data set offer insight into the design properties of the proposed framework and algorithms, demonstrating the effectiveness and efficiency of travel cost inferencing.


Wu, D., M. L. Yiu, C. S. Jensen, ,"Moving Spatial Keyword Queries: Formulation, Methods, and Analysis" in in ACM Transactions on Database Systems, 38(1), 45 pages, (Extended version of [153].),, 2013

Publication
ACM Author-Izer

Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving top-k spatial keyword (MkSK) queries over spatial text data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within the safe zone associated with a result. However, existing safe-zone methods focus solely on spatial locations and ignore text relevancy. We propose two algorithms for computing safe zones that guarantee correct results at any time and that aim to optimize the server-side computation as well as the communication between the server and the client. We exploit tight and conservative approximations of safe zones and aggressive computational space pruning. We present techniques that aim to compute the next safe zone efficiently, and we present two types of conservative safe zones that aim to reduce the communication cost. Empirical studies with real data suggest that the proposals are efficient. To understand the effectiveness of the proposed safe zones, we study analytically the expected area of a safe zone, which indicates on average for how long a safe zone remains valid, and we study the expected number of influence objects needed to define a safe zone, which gives an estimate of the average communication cost. The analytical modeling is validated through empirical studies.


Chen, L., G. Cong, C. S. Jensen, D. Wu, ,"Spatial Keyword Query Processing: An Experimental Evaluation" in in Proceedings of the VLDB Endowment, 6(3): 217–228,, 2013

Publication
Online at VLDB

Geo-textual indices play an important role in spatial keyword querying. The existing geo-textual indices have not been compared systematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. We provide an all-around survey of 12 stateof-the-art geo-textual indices. We propose a benchmark that enables the comparison of the spatial keyword query performance. We also report on the findings obtained when applying the benchmark to the indices, thus uncovering new insights that may guide index selection as well as further research.


Tzoumas, K., A. Deshpande, C. S. Jensen, ,"Efficiently Adapting Graphical Models for Cardinality Estimation" in in The VLDB Journal, 22(1): 3–27, (Special issue on best papers of VLDB 2011. Extended version of [38].),, 2013

Publication
Online at ACM Digital Library

Query optimizers rely on statistical models that succinctly describe the underlying data. Models are used to derive cardinality estimates for intermediate relations, which in turn guide the optimizer to choose the best query execution plan. The quality of the resulting plan is highly dependent on the accuracy of the statistical model that represents the data. It is well known that small errors in the model estimates propagate exponentially through joins, and may result in the choice of a highly sub-optimal query execution plan. Most commercial query optimizers make the attribute value independence assumption: all attributes are assumed to be statistically independent. This reduces the statistical model of the data to a collection of one-dimensional synopses (typically in the form of histograms), and it permits the optimizer to estimate the selectivity of a predicate conjunction as the product of the selectivities of the constituent predicates. However, this independence assumption is more often than not wrong, and is considered to be the most common cause of sub-optimal query execution plans chosen by modern query optimizers. We take a step towards a principled and practical approach to performing cardinality estimation without making the independence assumption. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss in estimation accuracy. We show how to efficiently construct such a graphical model from the database using only two-way join queries, and we show how to perform selectivity estimation in a highly efficient manner. We integrate our algorithms into the PostgreSQL DBMS. Experimental results indicate that estimation errors can be greatly reduced, leading to orders of magnitude more efficient query execution plans in many cases. Optimization time is kept in the range of tens of milliseconds, making this a practical approach for industrial-strength query optimizers.


Radaelli, L., C. S. Jensen, ,"Towards Fully Organic Indoor Positioning" in in Proceedings of the Fifth ACM SIGSPATIAL International Workshop on Indoor Spatial Awareness, Orlando, FL, USA, pp. 16–20,, 2013

Publication
ACM Author-Izer

Indoor positioning systems based on fingerprinting techniques generally require costly initialization and maintenance by trained surveyors. Organic positioning systems aim to eliminate these deficiencies by managing their own accuracy and obtaining input from users and other sources. Such systems introduce new challenges, e.g., detection and filtering of erroneous user input, estimation of the positioning accuracy, and means of obtaining user input when necessary. We envision a fully organic indoor positioning system, where all available sources of information are exploited in order to provide room-level accuracy with no active intervention of users. For example, such systems can exploit pre-installed cameras to associate a user's location with a Wi-Fi fingerprint from the user's phone; and it can use a calendar to determine whether a user is in the room reported by the positioning system. Numerous possibilities for integration exist that may provide better indoor positioning.


Li, X., V. Čeikute, C. S. Jensen, K.-L. Tan, ,"Trajectory Based Optimal Segment Computation in Road Network Databases" in in Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Orlando, FL, USA, pp. 386–389,, 2013

Publication
ACM Author-Izer

Finding a location for a new facility s.t. the facility attracts the maximal number of customers is a challenging problem. Existing studies either model customers as static sites and thus do not consider customer movement, or they focus on theoretical aspects and do not provide solutions that are shown empirically to be scalable. Given a road network, a set of existing facilities, and a collection of customer route traversals, an optimal segment query returns the optimal road network segment(s) for a new facility. We propose a practical framework for computing this query, where each route traversal is assigned a score that is distributed among the road segments covered by the route according to a score distribution model. We propose two algorithms that adopt different approaches to computing the query. Empirical studies with real data sets demonstrate that the algorithms are capable of offering high performance in realistic settings.


Kjærgaard, M. B., M. V. Krarup, A. Stisen, T. S. Prentow, H. Blunck, K. Grønbæk, C. S. Jensen, ,"Indoor Positioning using Wi-Fi—How Well Is the Problem Understood?" in in Proceedings of the 2013 International Conference on Indoor Positioning and Indoor Navigation, Montbéliard-Belfort, France, 6 pages,, 2013

Publication
Online at Scholar

The past decade has witnessed substantial research on methods for indoor Wi-Fi positioning. While much effort has gone into achieving high positioning accuracy and easing fingerprint collection, it is our contention that the general problem is not sufficiently well understood, thus preventing deployments and their usage by applications to become more widespread. Based on our own and published experiences on indoor Wi-Fi positioning deployments, we hypothesize the following: Current indoor WiFi positioning systems and their utilization in applications are hampered by the lack of understanding of the requirements present in the real-world deployments. In this paper, we report findings from qualitatively studying organisational requirements for indoor Wi-Fi positioning. The studied cases and deployments cover both company and public-sector settings and the deployment and evaluation of several types of indoor Wi-Fi positioning systems over durations of up to several years. The findings suggest among others a need for supporting all case-specific user groups, providing software platform independence, low maintenance, allowing positioning of all user devices, regardless of platform and form factor. Furthermore, the findings also vary significantly across organisations, for instance in terms of need for coverage, which motivates the design of orthogonal solutions.


Brucato, M., L. Derczynski, H. Llorens, K. Bontcheva, C. S. Jensen, ,"Recognising and Interpreting Named Temporal Expressions" in in Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 113–121,, 2013

Publication
Online at RANLP

This paper introduces a new class of temporal expression – named temporal expressions – and methods for recognising and interpreting its members. The commonest temporal expressions typically contain date and time words, like April or hours. Research into recognising and interpreting these typical expressions is mature in many languages. However, there is a class of expressions that are less typical, very varied, and difficult to automatically interpret. These indicate dates and times, but are harder to detect because they often do not contain time words and are not used frequently enough to appear in conventional temporally-annotated corpora – for example Michaelmas or Vasant Panchami. Using Wikipedia and linked data, we automatically construct a resource of English named temporal expressions, and use it to extract training examples from a large corpus. These examples are then used to train and evaluate a named temporal expression recogniser. We also introduce and evaluate rules for automatically interpreting these expressions, and we observe that use of the rules improves temporal annotation performance over existing corpora


Čeikute, V., C. S. Jensen, ,"Routing Service Quality—Local Driver Behavior Versus Routing Services" in in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, pp. 97–106,, 2013

Publication
Online at IEEE

Mobile location-based services is a very successful class of services that are being used frequently by users with GPS-enabled mobile devices such as smartphones. This paper presents a study of how to exploit GPS trajectory data, which is available in increasing volumes, for the assessment of the quality of one kind of location-based service, namely routing services. Specifically, the paper presents a framework that enables the comparison of the routes provided by routing services with the actual driving behaviors of local drivers. Comparisons include route length, travel time, and also route popularity, which are enabled by common driving behaviors found in available trajectory data. The ability to evaluate the quality of routing services enables service providers to improve the quality of their services and enables users to identify the services that best serve their needs. The paper covers experiments with real vehicle trajectory data and an existing online navigation service. It is found that the availability of information about previous trips enables better prediction of route travel time and makes it possible to provide the users with more popular routes than does a conventional navigation service.


Kaul, M., B. Yang, C. S. Jensen, ,"Building Accurate 3D Spatial Networks to Enable Next Generation Intelligent Transportation Systems" in in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, pp. 137–146,, 2013

Publication
Online at IEEE

The use of accurate 3D spatial network models can enable substantial improvements in vehicle routing. Notably, such models enable eco-routing, which reduces the environmental impact of transportation. We propose a novel filtering and lifting framework that augments a standard 2D spatial network model with elevation information extracted from massive aerial laser scan data and thus yields an accurate 3D model. We present a filtering technique that is capable of pruning irrelevant laser scan points in a single pass, but assumes that the 2D network fits in internal memory and that the points are appropriately sorted. We also provide an external-memory filtering technique that makes no such assumptions. During lifting, a triangulated irregular network (TIN) surface is constructed from the remaining points. The 2D network is projected onto the TIN, and a 3D network is constructed by means of interpolation. We report on a large-scale empirical study that offers insight into the accuracy, efficiency, and scalability properties of the framework.


Radaelli, L., D. Sabonis, H. Lu, C. S. Jensen, ,"Identifying Typical Movements Among Indoor Objects—Concepts and Empirical Study" in in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, pp. 197–206,, 2013

Publication
Online at IEEE

With the proliferation of mobile computing, positioning systems are becoming available that enable indoor location-based services. As a result, indoor tracking data is also becoming available. This paper puts focus on one use of such data, namely the identification of typical movement patterns among indoor moving objects. Specifically, the paper presents a method for the identification of movement patterns. Leveraging concepts from sequential pattern mining, the method takes into account the specifics of spatial movement and, in particular, the specifics of tracking data that captures indoor movement. For example, the paper's proposal supports spatial aggregation and utilizes the topology of indoor spaces to achieve better performance. The paper reports on empirical studies with real and synthetic data that offer insights into the functional and computational aspects of its proposal.


Baniukevic, A., C. S. Jensen, H. Lu, ,"Hybrid Indoor Positioning With Wi-Fi and Bluetooth: Architecture and Performance" in in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, pp. 207–216,, 2013

Publication
Online at IEEE

Reliable indoor positioning is an important foundation for emerging indoor location based services. Most existing indoor positioning proposals rely on a single wireless technology, e.g., Wi-Fi, Bluetooth, or RFID. A hybrid positioning system combines such technologies and achieves better positioning accuracy by exploiting the different capabilities of the different technologies. In a hybrid system based on Wi-Fi and Bluetooth, the former works as the main infrastructure to enable fingerprint based positioning, while the latter (via hotspot devices) partitions the indoor space as well as a large Wi-Fi radio map. As a result, the Wi-Fi based online position estimation is improved in a divide-and-conquer manner. We study three aspects of such a hybrid indoor positioning system. First, to avoid large positioning errors caused by similar reference positions that are hard to distinguish, we design a deployment algorithm that identifies and separates such positions into different smaller radio maps by deploying Bluetooth hotspots at particular positions. Second, we design methods that improve the partition switching that occurs when a user leaves the detection range of a Bluetooth hotspot. Third, we propose three architectural options for placement of the computation workload. We evaluate all proposals using both simulation and walkthrough experiments in two indoor environments of different sizes. The results show that our proposals are effective and efficient in achieving very good indoor positioning performance.


Andersen, O., C. S. Jensen, K. Torp, B. Yang, ,"EcoTour: Reducing the Environmental Footprint of Vehicles Using Eco-Routes" in in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, pp. 338–340,, 2013

Publication
Online at IEEE

Reduction in greenhouse gas emissions from transportation is essential in combating global warming and climate change. Eco-routing enables drivers to use the most eco-friendly routes and is effective in reducing vehicle emissions. The EcoTour system assigns eco-weights to a road network based on GPS and fuel consumption data collected from vehicles to enable ecorouting. Given an arbitrary source-destination pair in Denmark, EcoTour returns the shortest route, the fastest route, and the eco-route, along with statistics for the three routes. EcoTour also serves as a testbed for exploring advanced solutions to a range of challenges related to eco-routing.


Derczynski, L. R. A., B. Yang, C. S. Jensen, ,"Towards Context-Aware Search and Analysis on Social Media Data" in in Proceedings of the 16th International Conference on Extending Database Technology, Genoa, Italy, pp. 137–142,, 2013

Publication
ACM Author-Izer

Social media has changed the way we communicate. Social media data capture our social interactions and utterances in machine readable format. Searching and analysing massive and frequently updated social media data brings significant and diverse rewards across many different application domains, from politics and business to social science and epidemiology. A notable proportion of social media data comes with explicit or implicit spatial annotations, and almost all social media data has temporal metadata. We view social media data as a constant stream of data points, each containing text with spatial and temporal contexts. We identify challenges relevant to each context, which we intend to subject to context aware querying and analysis, specifically including longitudinal analyses on social media archives, spatial keyword search, local intent search, and spatio-temporal intent search. Finally, for each context, emerging applications and further avenues for investigation are discussed.


Yang, B., N. Fantini, C. S. Jensen, ,"iPark: Identifying Parking Spaces from Trajectories" in in Proceedings of the 16th International Conference on Extending Database Technology, Genoa, Italy, pp. 705–708,, 2013

Publication
ACM Author-Izer

A wide variety of desktop and mobile Web applications involve geo-tagged content, e.g., photos and (micro-) blog postings. Such content, often called User Generated Geo-Content (UGGC), plays an increasingly important role in many applications. However, a great demand also exists for "core" UGGC where the geo-spatial aspect is not just a tag on other content, but is the primary content, e.g., a city street map with up-to-date road construction data. Along these lines, the iPark system aims to turn volumes of GPS data obtained from vehicles into information about the locations of parking spaces, thus enabling effective parking search applications. In particular, we demonstrate how iPark helps ordinary users annotate an existing digital map with two types of parking, on-street parking and parking zones, based on vehicular tracking data.


Jensen, C. S., C. Jermaine, X. Zhou, editors, , in Proceedings of the 29th IEEE International Conference on Data Engineering, Brisbane, QLD, Australia,, 2013

Online at IEEE



Hector, G., P. Venetis, C. S. Jensen, A. Y. Halevy, co-inventors, ,"Directions-based ranking of places returned by local search queries" in United States Patent No. 8538973 B1, (filed June 4, 2010),, 2013

Online at Google Inc.

A system and a method for ranking search results of local search queries. A local search query and a current location of a user are received. Next, two or more places that satisfy the local search query are identified, and for each respective place a corresponding distance from the current location of the user to the respective place is also identified. The two or more places are then ranked in accordance with scores that are based, at least in part, on popularity of the two or more places and the corresponding distances from the current location of the user, to produce a set of ranked places. The ranked set of places is then provided to the user.


Jensen, C. S., ,"Spatial Keyword Querying of Geo-Tagged Web Content" in in Proceedings of the Seventh International Workshop on Ranking in Databases, Riva del Garda, Italy, article no. 1, 4 pages. Invited paper.,, 2013

Publication
ACM Author-Izer

The web is increasingly being used by mobile users, and it is increasingly possible to accurately geo-position mobile users. In addition, increasing volumes of geo-tagged web content are becoming available. Further, indications are that a substantial fraction of web keyword queries target local content. When combined, these observations suggest that spatial keyword querying is important and indeed gaining in importance. A prototypical spatial keyword query takes a user location and user-supplied keywords as parameters and returns web content that is spatially and textually relevant to these parameters. The paper reviews key concepts related to spatial keyword querying and reviews recent proposals by the author and his colleagues for spatial keyword querying functionality that is easy to use, relevant to users, and can be supported efficiently.


Moreira, J., C. S. Jensen, P. Dias, P. Mesquita, ,"Creating data representations for moving objects with extent from images" in presented at the COST MOVE Workshop at Moving Objects at Sea, Brest, France, 4 pages,, 2013

Publication



Atzeni, P., C. S. Jensen, G. Orsi, S. Ram, L. Tanca, R. Torlone, ,"The relational model is dead, SQL is dead, and I don’t feel so good myself" in ACM SIGMOD Record, 42(2):64–68,, 2013

Publication
Online at ACM Digital Library

We report the opinions expressed by well-known database researchers on the future of the relational model and SQL during a panel at the International Workshop on Non-Conventional Data Access (NoCoDa 2012), held in Florence, Italy in October 2012 in conjunction with the 31st International Conference on Conceptual Modeling. The panelists include: Paolo Atzeni (Università Roma Tre, Italy), Umeshwar Dayal (HP Labs, USA), Christian S. Jensen (Aarhus University, Denmark), and Sudha Ram (University of Arizona, USA). Quotations from movies are used as a playful though effective way to convey the dramatic changes that database technology and research are currently undergoing.


Jensen, C. S., ,"Querying the Web with Local Intent" in in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, p. 1. Invited abstract,, 2013

Publication
Online at IEEE

In step with the rapid proliferation of mobile devices with Internet access, the Web is increasingly being access by mobile-device users on the move. Further, it is increasingly possible to accurately geo-position mobile devices, and increasing volumes of geo-positioned content, e.g., Web pages, business directory entries, and microblog posts, are becoming available on the Web. In short, an increasingly mobile and spatial Web is fast emerging. This development enables Web queries with local intent, i.e., keyword-based queries issued by users who are looking for Web content near them. In addition, it implies an increasing demand for query functionality that supports local intent.


Hu, H., C. S. Jensen, D. Wu, ,"Message from the LBS n.0 Workshop Organizers" in in Proceedings of the 14th International Conference on Mobile Data Management, Milan, Italy, Volume 2, p. xiii,, 2013

Publication
Online at IEEE



Jensen, C. S., C. Jermaine, R. Kotagiri, B. C. Ooi, ,"Message from the ICDE 2013 Program Committee and General Chairs" in in Proceedings of the 29th IEEE International Conference on Data Engineering, Brisbane, QLD, Australia, pp. i–ii,, 2013

Publication
Online at IEEE



Yang, B., M. Kaul, C. S. Jensen, ,"Using Incomplete Information for Complete Weight Annotation of Road Networks—Extended Version" in Technical Report, 17 pages. CoRR cs.DB/1308.0484 (2013), (Extended version of [22].),, 2013

Publication
Online at Cornell University Library

We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using a graph model for routing that all edges have weights. Weights that capture travel times and GHG emissions can be extracted from GPS trajectory data collected from the network. However, GPS trajectory data typically lack the coverage needed to assign weights to all edges. This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost. A general framework is proposed to solve the problem. Specifically, the problem is modeled as a regression problem and solved by minimizing a judiciously designed objective function that takes into account the topology of the road network. In particular, the use of weighted PageRank values of edges is explored for assigning appropriate weights to all edges, and the property of directional adjacency of edges is also taken into account to assign weights. Empirical studies with weights capturing travel time and GHG emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark) offer insight into the design properties of the proposed techniques and offer evidence that the techniques are effective.


Li, X., V. Cˇ eikute˙, C. S. Jensen, K.-L. Tan, ,"Trajectory Based Optimal Segment Computation in Road Network Databases" in Technical Report, 28 pages. CoRR cs.DB/1303.2310 (2013), (Extended version of [102].),, 2013

Publication
Online at Cornell University Library

Finding a location for a new facility s.t. the facility attracts the maximal number of customers is a challenging problem. Existing studies either model customers as static sites and thus do not consider customer movement, or they focus on theoretical aspects and do not provide solutions that are shown empirically to be scalable. Given a road network, a set of existing facilities, and a collection of customer route traversals, an optimal segment query returns the optimal road network segment(s) for a new facility. We propose a practical framework for computing this query, where each route traversal is assigned a score that is distributed among the road segments covered by the route according to a score distribution model. We propose two algorithms that adopt different approaches to computing the query. Empirical studies with real data sets demonstrate that the algorithms are capable of offering high performance in realistic settings.


2012 Top

Cao, X., G. Cong, C. S. Jensen, J. J. Ng, B. C. Ooi, N.-T. Phan, D. Wu, ,"SWORS: A System for the Efficient Retrieval of Relevant Spatial Web Objects" in Proceedings of the VLDB Endowment, 5(12): 1914–1917,, 2012

Publication
Online at VLDB

Spatial web objects that possess both a geographical location and a textual description are gaining in prevalence. This gives prominence to spatial keyword queries that exploit both location and textual arguments. Such queries are used in many web services such as yellow pages and maps services. We present SWORS, the Spatial Web Object Retrieval System, that is capable of efficiently retrieving spatial web objects that satisfy spatial keyword queries. Specifically, SWORS supports two types of queries: a) the location-aware top-k text retrieval (LkT) query that retrieves k individual spatial web objects taking into account query location proximity and text relevancy; b) the spatial keyword group (SKG) query that retrieves a group of objects that cover the query keywords and are nearest to the query location and have the shortest inter-object distances. SWORS provides browser-based interfaces for desktop and laptop computers and provides a client application for mobile devices. The interfaces and the client enable users to formulate queries and view the query results on a map. The server side stores the data and processes the queries. We use three real-life data sets to demonstrate the functionality and performance of SWORS.


Guo, C., Y. Ma, B. Yang, C. S. Jensen, M. Kaul, ,"EcoMark: Evaluating Models of Vehicular Environmental Impact" in in Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, pp. 269–278,, 2012

Publication
ACM Author-Izer

The reduction of greenhouse gas (GHG) emissions from transportation is essential for achieving politically agreed upon emissions reduction targets that aim to combat global climate change. So-called eco-routing and eco-driving are able to substantially reduce GHG emissions caused by vehicular transportation. To enable these, it is necessary to be able to reliably quantify the emissions of vehicles as they travel in a spatial network. Thus, a number of models have been proposed that aim to quantify the emissions of a vehicle based on GPS data from the vehicle and a 3D model of the spatial network the vehicle travels in. We develop an evaluation framework, called EcoMark, for such environmental impact models. In addition, we survey all eleven state-of-the-art impact models known to us. To gain insight into the capabilities of the models and to understand the effectiveness of the EcoMark, we apply the framework to all models.


Sheng, Q. Z., G. Wang, C. S. Jensen, G. Xu, editors, , in Proceedings of the 14th Asia-Pacific Web Conference, Kunming, China, 799+xix pages,, 2012

Online at Springer



Cao, X., L. Chen, G. Cong, C. S. Jensen, Q. Qu, A. Skovsgaard, D. Wu, M. L. Yiu, ,"Spatial Keyword Querying" in in Proceedings of the 31st International Conference on Conceptual Modeling, Florence, Italy, pp. 16–29. Invited paper,, 2012

Publication
Online at Springer

The web is increasingly being used by mobile users. In addition, it is increasingly becoming possible to accurately geo-position mobile users and web content. This development gives prominence to spatial web data management. Specifically, a spatial keyword query takes a user location and user-supplied keywords as arguments and returns web objects that are spatially and textually relevant to these arguments. This paper reviews recent results by the authors that aim to achieve spatial keyword querying functionality that is easy to use, relevant to users, and can be supported efficiently. The paper covers different kinds of functionality as well as the ideas underlying their definition.


Jensen, C. S., ,"Data management on the Spatial Web" in Proceedings of the VLDB Endowment, 5(12): 1696, Invited abstract,, 2012

Publication
Online at ACM Digital Library

Due in part to the increasing mobile use of the web and the proliferation of geo-positioning, the web is fast acquiring a significant spatial aspect. Content and users are being augmented with locations that are used increasingly by location-based services. Studies suggest that each week, several billion web queries are issued that have local intent and target spatial web objects. These are points of interest with a web presence, and they thus have locations as well as textual descriptions. This development has given prominence to spatial web data management, an area ripe with new and exciting opportunities and challenges. The research community has embarked on inventing and supporting new query functionality for the spatial web. Different kinds of spatial web queries return objects that are near a location argument and are relevant to a text argument. To support such queries, it is important to be able to rank objects according to their relevance to a query. And it is important to be able to process the queries with low latency. The talk offers an overview of key aspects of the spatial web. Based on recent results obtained by the speaker and his colleagues, the talk explores new query functionality enabled by the setting. Further, the talk offers insight into the data management techniques capable of supporting such functionality.


Jensen, C. S., ,"Internettet – nu med en geografisk dimension" in in Årsskrift 2011, Villum Fonden and Velux Fonden, pp. 38–41. Invited article,, 2012

Publication
Publication in Danish

Mængden af data på elektronisk form vokser for tiden eksponentielt. Den it-infrastruktur, herunder internettet, som vi benytter dagligt, udvikler sig samtidig hastigt. Fx ser vi i den ene ende af infrastrukturen, at smartphones udbredes hastigt, samtidig med at den mobile båndbredde vokser og vokser. I den anden ende ser vi såkaldte datacentre skyde op. Disse er bygninger med et stort antal processorer og harddiske, der muliggør håndtering af enorme datamængder så billigt som muligt. Denne udvikling skaber hele tiden nye udfordringer og muligheder. Christian S. Jensen har modtaget Villum Kann Rasmussens Årslegat til Teknisk og Naturvidenskabelig Forskning for bl.a. sine bidrag inden for effektiv lagring af, og søgning i, spatiotemporale data, dvs. data, hvor tid og sted indgår. En del af disse bidrag retter sig mod at give internettet en geografisk dimension. Årslegatet på 2.500.000 kr. skal ifølge Christian S. Jensen anvendes til at muliggøre yderligere forskning i fundamentet for fremtidens internet.


Jensen, C. S., ,"Foreword" in in The Knowledge Grid—in Cyber-Physical Society, by H. Zhuge, 2nd Edition, World Scientific, p. vii,, 2012

Publication



Bernstein, P. A., C. S. Jensen, K. L. Tan, ,"A Call for Surveys" in ACM SIGMOD Record, 41(2): 47,, 2012

Publication
Online at ACM Digital Library



Sheng, Q.Z., G.Wang, C. S. Jensen, ,"Message from the Program Chairs" in Proceedings of the 14th Asia-Pacific Web Conference, Kunming, China, p. vii,, 2012

Publication
Online at Springer



Jensen, C. S., E. Ofek, E. Tanin, ,"Highlights from ACM SIGSPATIAL GIS 2011—The 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (Chicago, Illinois - November 1–4, 2011)" in in The SIGSPATIAL Special, 4(1): 2–4,, 2012

Publication
ACM Author-Izer

ACM SIGSPATIAL GIS 2011 was the 19th gathering of the premier event on spatial information and Geographic Information Systems (GIS). It is also the fourth year that the conference was held under the auspices of ACM's most recent special interest group, SIGSPATIAL. Since its start in 1993, the conference has targeted researchers, developers, and users whose work relates to spatial information and GIS, and it has a tradition of interdisciplinary discussions and presentations. It provides a forum for original research contributions that cover conceptual, design, and implementation aspects of spatial information systems and GIS.


D. Sidlauskas, C. S. Jensen, S. Šaltenis, ," A comparison of the use of virtual versus physical snapshots for supporting update-intensive workloads " in in DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware ,, 2012

Publication
ACM Author-Izer

Deployments of networked sensors fuel online applications that feed on real-time sensor data. This scenario calls for techniques that support the management of workloads that contain queries as well as very frequent updates. This paper compares two well-chosen approaches to exploiting the parallelism offered by modern processors for supporting such workloads. A general approach to avoiding contention among parallel hardware threads and thus exploiting the parallelism available in processors is to maintain two copies, or snapshots, of the data: one for the relatively long-duration queries and one for the frequent and very localized updates. The snapshot that receives the updates is frequently made available to queries, so that queries see up-to-date data. The snapshots may be physical or virtual. Physical snapshots are created using the C library memcpy function. Virtual snapshots are created by the fork system function that creates a new process that initially has the same data snapshot as the process it was forked from. When the new process carries out updates, this triggers the actual memory copying in a copy-on-write manner at memory page granularity. This paper characterizes the circumstances under which each technique is preferable. The use of physical snapshots is surprisingly efficient.


Cao, X., G. Cong, B. Cui, C. S. Jensen, Q. Yuan, ," Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives " in in ACM Transactions on Information Systems, 34 pages,, 2012

Publication
ACM Author-Izer

Community Question Answering (CQA) is a popular type of service where users ask questions and where answers are obtained from other users or from historical question-answer pairs. CQA archives contain large volumes of questions organized into a hierarchy of categories. As an essential function of CQA services, question retrieval in a CQA archive aims to retrieve historical question-answer pairs that are relevant to a query question. This article presents several new approaches to exploiting the category information of questions for improving the performance of question retrieval, and it applies these approaches to existing question retrieval models, including a state-of-the-art question retrieval model. Experiments conducted on real CQA data demonstrate that the proposed techniques are effective and efficient and are capable of outperforming a variety of baseline methods significantly.


Wu, D., G. Cong, C. S. Jensen, ,"A Framework for Efficient Spatial Web Object Retrieval" in in The VLDB Journal, 25 pages,, 2012

Publication
Online at Springer

The conventional Internet is acquiring a geospatial dimension. Web documents are being geo-tagged and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables new kinds of queries that take into account both location proximity and text relevancy. This paper proposes a new indexing framework for top-k spatial text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within this framework. The framework encompasses algorithms that utilize the proposed indexes for computing location-aware as well as region-aware top-k text retrieval queries, thus taking into account both text relevancy and spatial proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper’s proposal is capable of excellent performance.


Li, X., P. Karras, L. Shi, K.-L. Tan, C. S. Jensen, ,"Cooperative Scalable Moving Continuous Query Processing" in in Proceedings of the Thirteenth International Conference on Mobile Data Management, Bengaluru, India ,, 2012




Rishede, J., M. L. Yiu, C. S. Jensen, ," Effective Caching of Shortest Paths for Location-Based Services " in in Proceedings of the 2012 ACM SIGMOD International Conference on the Management of Data, Scotttsdale, AZ, USA, pp. 313-324 ,, 2012

Publication
ACM Author-Izer

Web search is ubiquitous in our daily lives. Caching has been extensively used to reduce the computation time of the search engine and reduce the network traffic beyond a proxy server. Another form of web search, known as online shortest path search, is popular due to advances in geo-positioning. However, existing caching techniques are ineffective for shortest path queries. This is due to several crucial differences between web search results and shortest path results, in relation to query matching, cache item overlapping, and query cost variation. Motivated by this, we identify several properties that are essential to the success of effective caching for shortest path search. Our cache exploits the optimal subpath property, which allows a cached shortest path to answer any query with source and target nodes on the path. We utilize statistics from query logs to estimate the benefit of caching a specific shortest path, and we employ a greedy algorithm for placing beneficial paths in the cache. Also, we design a compact cache structure that supports efficient query matching at runtime. Empirical results on real datasets confirm the effectiveness of our proposed techniques.


Sidlauskas, D., S. Šaltenis, C. S. Jensen, ," Parallel Main-Memory Indexing for Moving-Object Query and Update Workloads " in in Proceedings of the 2012 ACM SIGMOD International Conference on the Management of Data, Scottsdale, AZ, USA, pp. 37-48 ,, 2012

Publication
ACM Author-Izer

We are witnessing a proliferation of Internet-worked, geo-positioned mobile devices such as smartphones and personal navigation devices. Likewise, location-related services that target the users of such devices are proliferating. Consequently, server-side infrastructures are needed that are capable of supporting the location-related query and update workloads generated by very large populations of such moving objects. This paper presents a main-memory indexing technique that aims to support such workloads. The technique, called PGrid, uses a grid structure that is capable of exploiting the parallelism offered by modern processors. Unlike earlier proposals that maintain separate structures for updates and queries, PGrid allows both long-running queries and rapid updates to operate on a single data structure and thus offers up-to-date query results. Because PGrid does not rely on creating snapshots, it avoids the stop-the-world problem that occurs when workload processing is interrupted to perform such snapshotting. Its concurrency control mechanism relies instead on hardware-assisted atomic updates as well as object-level copying, and it treats updates as non-divisible operations rather than as combinations of deletions and insertions; thus, the query semantics guarantee that no objects are missed in query results. Empirical studies demonstrate that PGrid scales near-linearly with the number of hardware threads on four modern multi-core processors. Since both updates and queries are processed on the same current data-store state, PGrid outperforms snapshot-based techniques in terms of both query freshness and CPU cycle-wise efficiency.


Lu, H., X. Cao, C. S. Jensen, ,"A Foundation for Efficient Indoor Distance-Aware Query Processing" in in Proceedings of the 28th IEEE International Conference on Data Engineering, 12 pages ,, 2012

Indoor spaces accommodate large numbers of spatial objects, e.g., points of interest (POIs), and moving populations. A variety of services, e.g., location-based services and security control, are relevant to indoor spaces. Such services can be improved substantially if they are capable of utilizing indoor distances. However, existing indoor space models do not account well for indoor distances. To address this shortcoming, we propose a data management infrastructure that captures indoor distance and facilitates distance-aware query processing. In particular, we propose a distance-aware indoor space model that integrates indoor distance seamlessly. To enable the use of the model as a foundation for query processing, we develop accompanying, efficient algorithms that compute indoor distances for different indoor entities like doors as well as locations. We also propose an indexing framework that accommodates indoor distances that are pre-computed using the proposed algorithms. On top of this foundation, we develop efficient algorithms for typical indoor, distance-aware queries. The results of an extensive experimental evaluation demonstrate the efficacy of the proposals.


Lu, H., C. S. Jensen, ,"Upgrading Uncompetitive Products Economically" in in Proceedings of the 28th IEEE International Conference on Data Engineering, Washington, DC, USA ,, 2012

The skyline of a multidimensional point set consists of the points that are not dominated by other points. In a scenario where product features are represented by multidimensional points, the skyline points may be viewed as representing competitive products. A product provider may wish to upgrade uncompetitive products to become competitive, but wants to take into account the upgrading cost. We study the top-k product upgrading problem. Given a set P of competitor products, a set T of products that are candidates for upgrade, and an upgrading cost function f that applies to T, the problem is to return the k products in T that can be upgraded to not be dominated by any products in P at the lowest cost. This problem is nontrivial due to not only the large data set sizes, but also to the many possibilities for upgrading a product. We identify and provide solutions for the different options for upgrading an uncompetitive product, and combine the solutions into a single solution. We also propose a spatial join-based solution that assumes P and T are indexed by an R-tree. Given a set of products in the same R-tree node, we derive three lower bounds on their upgrading costs. These bounds are employed by the join approach to prune upgrade candidates with uncompetitive upgrade costs. Empirical studies with synthetic and real data show that the join approach is efficient and scalable.


2011 Top

Jensen, C. S., K.-J. Li, S. Winter, ,"ISA 2010 Workshop Report The Other 87%: A Report on the Second International Workshop on Indoor Spatial Awareness (San Jose, California - November 2, 2010)" in March 2011., 2011

Publication
ACM Author-Izer

With the increasing deployment of location-based services, geographic information systems, and ubiquitous computing, technologies and services that target indoor spaces are receiving increasing attention. This development is quite understandable because, as a paper presented at ISA 2010 points out, studies show that we lead most of our lives, 87% to be specific, in indoor settings. Those 87% are the focus of ISA 2010.


Jeung, H., M. L. Yiu, C. S. Jensen, ,"Trajectory Pattern Mining" in Computing with Spatial Trajectories, pp. 143-177,, 2011

Online at Springer

In step with the rapidly growing volumes of available moving-object trajectory data, there is also an increasing need for techniques that enable the analysis of trajectories. Such functionality may benefit a range of application area and services, including transportation, the sciences, sports, and prediction-based and social services, to name but a few. The chapter first provides an overview trajectory patterns and a categorization of trajectory patterns from the literature. Next, it examines relative motion patterns, which serve as fundamental background for the chapter's subsequent discussions. Relative patterns enable the specification of patterns to be identified in the data that refer to the relationships of motion attributes among moving objects. The chapter then studies disc-based and density-based patterns, which address some of the limitations of relative motion patterns. The chapter also reviews indexing structures and algorithms for trajectory pattern mining.


Lin, D., C. S. Jensen, R. Zhang, L. Xiao, J. Lu, ," A Moving-Object Index for Efficient Query Processing with Peer-Wise Location Privacy " in in Proceedings of the VLDB Endowment, 5(1), pp. 37-48,, 2011

Publication
Online at VLDB

With the growing use of location-based services, location privacy attracts increasing attention from users, industry, and the research community. While considerable effort has been devoted to inventing techniques that prevent service providers from knowing a user's exact location, relatively little attention has been paid to enabling so-called peer-wise privacy---the protection of a user's location from unauthorized peer users. This paper identifies an important efficiency problem in existing peer-privacy approaches that simply apply a filtering step to identify users that are located in a query range, but that do not want to disclose their location to the querying peer. To solve this problem, we propose a novel, privacy-policy enabled index called the PEB-tree that seamlessly integrates location proximity and policy compatibility. We propose efficient algorithms that use the PEB-tree for processing privacy-aware range and kNN queries. Extensive experiments suggest that the PEB-tree enables efficient query processing.


Sidlauskas, D., K. A. Ross, C. S. Jensen, S. Šaltenis, ," Thread-Level Parallel Indexing of Update Intensive Moving-Object Workloads " in in Proceedings of the Twelfth International Symposium on Spatial and Temporal Databases, Minneapolis, MN, pp. 186-204 ,, 2011

Publication
Online at Springer

Modern processors consist of multiple cores that each support parallel processing by multiple physical threads, and they offer ample main-memory storage. This paper studies the use of such processors for the processing of update-intensive moving-object workloads that contain very frequent updates as well as contain queries. The non-trivial challenge addressed is that of avoiding contention between long-running queries and frequent updates. Specifically, the paper proposes a grid-based indexing technique. A static grid indexes a near up-to-date snapshot of the data to support queries, while a live grid supports updates. An efficient cloning technique that exploits the memcpy system call is used to maintain the static grid. An empirical study conducted with three modern processors finds that very frequent cloning, on the order of tens of milliseconds, is feasible, that the proposal scales linearly with the number of hardware threads, and that it significantly outperforms the previous state-of-the-art approach in terms of update throughput and query freshness.


Cao, X., G. Cong, C. S. Jensen, B. C. Ooi, ,"Collective Spatial Keyword Querying" in in Proceedings of the 2011 ACM SIGMOD International Conference on the Management of Data, Athens, Greece, pp. 373-384 ,, 2011

Publication
ACM Author-Izer

With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group collectively satisfy a query. We define the problem of retrieving a group of spatial web objects such that the group's keywords cover the query's keywords and such that objects are nearest to the query location and have the lowest inter-object distances. Specifically, we study two variants of this problem, both of which are NP-complete. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. We present empirical studies that offer insight into the efficiency and accuracy of the solutions.


Banukievic, A., D. Sabonis, C. S. Jensen, H. Lu, ,"Improving Wi-Fi Based Indoor Positioning Using Bluetooth Add-Ons" in in Proceedings of the Twelfth International Conference on Mobile Data Management, Luleå, pp. 246-255 ,, 2011

Publication

Location-Based Services (LBSs) constitutes one of the most popular classes of mobile services. However, while current LBSs typically target outdoor settings, we lead large parts of our lives indoors. The availability of easy-to-use and low-cost indoor positioning services is essential in also enabling indoor LBSs. Existing indoor positioning services typically use a single technology such as Wi-Fi, RFID or Bluetooth. Wi-Fi based indoor positioning is relatively easy to deploy, but does often not offer good positioning accuracy. In contrast, the use of RFID or Bluetooth for positioning requires considerable investments in equipment in order to ensure good positioning accuracy. Motivated by these observations, we propose a hybrid approach to indoor positioning. In particular, we introduce Bluetooth hotspots into an indoor space with an existing Wi-Fi infrastructure such that better positioning is achieved than what can be achieved by each technology in isolation. We design a flexible and extensible system architecture with an effective online position estimation algorithm for the hybrid system. The system is evaluated empirically in the building of our department. The results show that the hybrid approach improves positioning accuracy markedly.


Vicente, C. R., I. Assent, C. S. Jensen, ,"Effective Privacy-Preserving Online Route Planning" in in Proceedings of the Twelfth International Conference on Mobile Data Management, Luleå, Sweden, pp. 119-128 ,, 2011

Publication

An online Route Planning Service (RPS) computes a route from one location to another. Current RPSs such as Google Maps require the use of precise locations. However, some users may not want to disclose their source and destination locations due to privacy concerns. An approach that supplies fake locations to an existing service incurs a substantial loss of quality of service, and the service may well return a result that may be not helpful to the user. We propose a solution that is able to return accurate route planning results when source and destination regions are used in order to achieve privacy. The solution re-uses a standard online RPS rather than replicate this functionality, and it needs no trusted third party. The solution is able to compute the exact results without leaking of the exact locations to the RPS or untrusted parties. In addition, we provide heuristics that reduce the number of times that the RPS needs to be queried, and we also describe how the accuracy and privacy requirements can be relaxed to achieve better performance. An empirical study offers insight into key properties of the approach.


Wu, D., M. L. Yiu, G. Cong, C. S. Jensen, ,"Joint Top-K Spatial Keyword Query Processing" in IEEE Transaction on Knowledge and Data Engineering, 16 pages,, 2011

Publication

Web users and content are increasingly being geo-positioned, and increased focus is being given to serving local content in response to web queries. This development calls for spatial keyword queries that take into account both the locations and textual descriptions of content. We study the efficient, joint processing of multiple top-k spatial keyword queries. Such joint processing is attractive during high query loads and also occurs when multiple queries are used to obfuscate a user’s true query. We propose a novel algorithm and index structure for the joint processing of top-k spatial keyword queries. Empirical studies show that the proposed solution is efficient on real datasets. We also offer analytical studies on synthetic datasets to demonstrate the efficiency of the proposed solution.


Tzoumas, K., A. Deshpande, C. S. Jensen, ," Lightweight Graphical Models for Selectivity Estimation Without Independence Assumptions " in in Proceedings of the VLDB Endowment, 4(7), 12 pages,, 2011

Publication

As a result of decades of research and industrial development, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the quality of the underlying statistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintain one-dimensional statistical summaries and make the attribute value independence and join uniformity assumptions for efficiently estimating selectivities. Therefore, selectivity estimation errors in today’s optimizers are frequently caused by missed correlations between attributes. We present a selectivity estimation approach that does not make the independence assumptions. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution of all the attributes in the database into small, usually two-dimensional distributions. We describe several optimizations that can make selectivity estimation highly efficient, and we present a complete implementation inside PostgreSQL’s query optimizer. Experimental results indicate an order of magnitude better selectivity estimates, while keeping optimization time in the range of tens of milliseconds.


Lu, H., C. S. Jensen, Z. Zhang, ,"Flexible and Efficient Resolution of Skyline Query Size Constraints" in in IEEE Transactions on Knowledge and Data Engineering, 23(7): 991-1005,, 2011

Publication
Online at IEEE

Given a set of multidimensional points, a skyline query returns the interesting points that are not dominated by other points. It has been observed that the actual cardinality (s) of a skyline query result may differ substantially from the desired result cardinality (k), which has prompted studies on how to reduce s for the case where ks. Based on these observations, the paper proposes a new approach, called skyline ordering, that forms a skyline-based partitioning of a given data set such that an order exists among the partitions. Then, set-wide maximization techniques may be applied within each partition. Efficient algorithms are developed for skyline ordering and for resolving size constraints using the skyline order. The results of extensive experiments show that skyline ordering yields a flexible framework for the efficient and scalable resolution of arbitrary size constraints on skyline queries.


Yiu, M. L., C. S. Jensen, J. Møller, H. Lu, ," Design and Analysis of a Ranking Approach to Private Location-Based Services " in ACM Transactions on Database Systems, 36(2), article 10, 43 pages,, 2011

Publication
ACM Author-Izer

Users of mobile services wish to retrieve nearby points of interest without disclosing their locations to the services. This article addresses the challenge of optimizing the query performance while satisfying given location privacy and query accuracy requirements. The article's proposal, SpaceTwist, aims to offer location privacy for k nearest neighbor (kNN) queries at low communication cost without requiring a trusted anonymizer. The solution can be used with a conventional DBMS as well as with a server optimized for location-based services. In particular, we believe that this is the first solution that expresses the server-side functionality in a single SQL statement. In its basic form, SpaceTwist utilizes well-known incremental NN query processing on the server. When augmented with a server-side granular search technique, SpaceTwist is capable of exploiting relaxed query accuracy guarantees for obtaining better performance. We extend SpaceTwist with so-called ring ranking, which improves the communication cost, delayed termination, which improves the privacy afforded the user, and the ability to function in spatial networks in addition to Euclidean space. We report on analytical and empirical studies that offer insight into the properties of SpaceTwist and suggest that our proposal is indeed capable of offering privacy with very good performance in realistic settings.


Vicente, C. R., D. Freni, C. Bettini, C. S. Jensen, ,"Location-Related Privacy in Geo-Social Networks" in IEEE Internet Computing, 15(3): 20-27,, 2011

Publication
Online at IEEE

Geo-social networks (GeoSNs) provide context-aware services that help associate location with users and content. The proliferation of GeoSNs indicates that they're rapidly attracting users. GeoSNs currently offer different types of services, including photo sharing, friend tracking, and "check-ins." However, this ability to reveal users' locations causes new privacy threats, which in turn call for new privacy-protection methods. The authors study four privacy aspects central to these social networks - location, absence, co-location, and identity privacy - and describe possible means of protecting privacy in these circumstances.


Venetis, P., H. Gonzalez, C. S. Jensen, A. Halevy, ,"Hyper-Local, Directions-Based Ranking of Places" in in Proceedings of the VLDB Endowment, 4(5): 290-301,, 2011

Publication
Online at VLDB

Studies find that at least 20% of web queries have local intent; and the fraction of queries with local intent that originate from mobile properties may be twice as high. The emergence of standardized support for location providers in web browsers, as well as of providers of accurate locations, enables so-called hyper-local web querying where the location of a user is accurate at a much finer granularity than with IP-based positioning. This paper addresses the problem of determining the importance of points of interest, or places, in local-search results. In doing so, the paper proposes techniques that exploit logged directions queries. A query that asks for directions from a location a to a location b is taken to suggest that a user is interested in traveling to b and thus is a vote that location b is interesting. Such user-generated directions queries are particularly interesting because they are numerous and contain precise locations. Specifically, the paper proposes a framework that takes a user location and a collection of near-by places as arguments, producing a ranking of the places. The framework enables a range of aspects of directions queries to be exploited for the ranking of places, including the frequency with which places have been referred to in directions queries. Next, the paper proposes an algorithm and accompanying data structures capable of ranking places in response to hyper-local web queries. Finally, an empirical study with very large directions query logs offers insight into the potential of directions queries for the ranking of places and suggests that the proposed algorithm is suitable for use in real web search engines.


Olsen, M. G., D. Susar, A. Nietzio, M. Snaprud, C. S. Jensen, ," Global Web Accessibility Analysis of National Government Portals and Ministry Web Sites " in Journal of Information Technology and Politics, 8(1): 41-67,, 2011

Publication
Online at Informaworld

Equal access to public information and services for all is an essential part of the United Nations Declaration of Human Rights. Today, the Web plays an important role in providing information and services to citizens. Unfortunately, many government Web sites are poorly designed and have accessibility barriers that prevent people with disabilities from using them. This paper combines current Web accessibility benchmarking methodologies with a sound strategy for comparing Web accessibility among countries and continents. Furthermore, the paper presents the first global analysis of the Web accessibility of 192 United Nation member states made publically available. The paper also identifies common properties of member states that have accessible and inaccessible Web sites and shows that implementing anti-disability discrimination laws is highly beneficial for the accessibility of Web sites, while signing the United Nations Rights and Dignity of Persons with Disabilities has had no such effect yet. The paper demonstrates that, despite the commonly held assumption to the contrary, mature high-quality Web sites are more accessible than lower quality ones. Moreover, Web accessibility conformance claims by Web site owners are generally exaggerated.


Yiu, M.L., I. Assent, C. S. Jensen, P. Kalnis, ,"Outsourced Similarity Search on Metric Data Assets" in in IEEE Transactions on Knowledge and Data Engineering, 24(2), pp. 338-352,, 2012

Publication
Online at IEEE

This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.


2010 Top

Tzoumas, K., A. Deshpande, C. S. Jensen, ," Sharing-Aware Horizontal Partitioning for Exploiting Correlations during Query Processing " in in Proceedings of the VLDB Endowment, 3(1): 542-554,, 2010

Publication
Online at VLDB

Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query, one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routing-based query execution engine based on eddies [1]. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plans. We define the notion of a conditional join plan, a novel representation of the search space that enables us to address the problem in a principled way. We present a lowoverhead greedy algorithm that uses statistical summaries based on graphical models. Experimental results suggest an order of magnitude faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations. [1] R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, pp. 261272, 2000.


Cao, X., G. Cong, C. S. Jensen, ,"Retrieving Top-k Prestige-Based Relevant Spatial Web Objects" in in Proceedings of the VLDB Endowment, 3(1): 373-384,, 2010

Publication
Online at VLDB

The location-aware keyword query returns ranked objects that are near a query location and that have textual descriptions that match query keywords. This query occurs inherently in many types of mobile and traditional web services and applications, e.g., Yellow Pages and Maps services. Previous work considers the potential results of such a query as being independent when ranking them. However, a relevant result object with nearby objects that are also relevant to the query is likely to be preferable over a relevant object without relevant nearby objects. The paper proposes the concept of prestige-based relevance to capture both the textual relevance of an object to a query and the effects of nearby objects. Based on this, a new type of query, the Location-aware top-k Prestige-based Text retrieval (LkPT) query, is proposed that retrieves the top-k spatial web objects ranked according to both prestige-based relevance and location proximity. We propose two algorithms that compute LkPT queries. Empirical studies with real-world spatial data demonstrate that LkPT queries are more effective in retrieving web objects than a previous approach that does not consider the effects of nearby objects; and they show that the proposed algorithms are scalable and outperform a baseline approach significantly.


Cao, X., G. Cong, C. S. Jensen, ,"Mining Significant Semantic Locations From GPS Data" in in Proceedings of the VLDB Endowment, 3(1): 1009-1020,, 2010

Publication
Online at VLDB

With the increasing deployment and use of GPS-enabled devices, massive amounts of GPS data are becoming available. We propose a general framework for the mining of semantically meaningful, significant locations, e.g., shopping malls and restaurants, from such data. We present techniques capable of extracting semantic locations from GPS data. We capture the relationships between locations and between locations and users with a graph. Significance is then assigned to locations using random walks over the graph that propagates significance among the locations. In doing so, mutual reinforcement between location significance and user authority is exploited for determining significance, as are aspects such as the number of visits to a location, the durations of the visits, and the distances users travel to reach locations. Studies using up to 100 million GPS records from a confined spatio-temporal region demonstrate that the proposal is effective and is capable of outperforming baseline methods and an extension of an existing proposal.


Jeung, H., M. L. Yiu, X. Zhou, C. S. Jensen, ,"Path Prediction and Predictive Range Querying in Road Network Databases" in in The VLDB Journal, 19(4): 585-602,, 2010

Publication
Online at Springer

In automotive applications, movement-path prediction enables the delivery of predictive and relevant services to drivers, e.g., reporting traffic conditions and gas stations along the route ahead. Path prediction also enables better results of predictive range queries and reduces the location update frequency in vehicle tracking while preserving accuracy. Existing moving-object location prediction techniques in spatial-network settings largely target short-term prediction that does not extend beyond the next road junction. To go beyond short-term prediction, we formulate a network mobility model that offers a concise representation of mobility statistics extracted from massive collections of historical object trajectories. The model aims to capture the turning patterns at junctions and the travel speeds on road segments at the level of individual objects. Based on the mobility model, we present a maximum likelihood and a greedy algorithm for predicting the travel path of an object (for a time duration h into the future). We also present a novel and efficient server-side indexing scheme that supports predictive range queries on the mobility statistics of the objects. Empirical studies with real data suggest that our proposals are effective and efficient.


Freni, D., C. Bettini, C. R. Vicente, S. Mascetti, C. S. Jensen, ,"Preserving Location and Absence Privacy in Geo-Social Networks" in in Proceedings of the 19th ACM Conference on Information and Knowledge Management, Toronto, Canada, pp. 309-318 ,, 2010

Publication
ACM Author-Izer

Online social networks often involve very large numbers of users who share very large volumes of content. This content is increasingly being tagged with geo-spatial and temporal coordinates that may then be used in services. For example, a service may retrieve photos taken in a certain region. The resulting geo-aware social networks (GeoSNs) pose privacy threats beyond those found in location-based services. Con- tent published in a GeoSN is often associated with references to multiple users, without the publisher being aware of the privacy preferences of those users. Moreover, this content is often accessible to multiple users. This renders it difficult for GeoSN users to control which information about them is available and to whom it is available. This paper addresses two privacy threats that occur in GeoSNs: location privacy and absence privacy. The former concerns the availability of information about the presence of users in specific locations at given times, while the latter concerns the availability of information about the absence of an individual from specific locations during given periods of time. The challenge addressed is that of supporting privacy while still enabling useful services. We believe this is the first paper to formalize these two notions of privacy and to propose techniques for enforcing them. The techniques offer privacy guarantees, and the paper reports on empirical performance studies of the techniques.


Hansen, R., R. Wind, C. S. Jensen, B. Thomsen, ," Algorithmic Strategies for Adapting to Environmental Changes in 802.11 Location Fingerprinting " in in Proceedings of the 2010 International Conference on Indoor Positioning and Indoor Navigation, Zurich, Switzerland, 10 pages ,, 2010

Publication
Online at IEEE

This paper studies novel algorithmic strategies that enable 802.11 location fingerprinting to adapt to environmental changes. A long-standing challenge in location fingerprinting has been that dynamic changes, such as people presence, opening/closing of doors, or changing humidity levels, may influence the 802.11 signal strengths to an extent where a static radio map is rendered useless. To counter this effect, related research efforts propose to install additional sensors in order to adapt a previously built radio map to the circumstances at a given time. Although effective, this is not a viable solution for ubiquitous positioning where localization is required in many different buildings. Instead, we propose algorithmic strategies for dealing with changing environmental dynamics. We have performed an evaluation of our algorithms on signal strength data collected over a two month period at Aalborg University. The results show a vast improvement over using traditional static radio maps.


Gonzalez, H., A. Halevy, C. S. Jensen, A. Langen, J. Madhavan, R. Shapely, W. Shen, J. Goldberg-Kidon, ,"Google Fusion Tables: Web-Centered Data Management and Collaboration" in in Proceedings of the 2010 ACM Symposium on Cloud Computing, Indianapolis, IN, USA, pp. 175-180 ,, 2010

Publication
ACM Author-Izer

It has long been observed that database management systems focus on traditional business applications, and that few people use a database management system outside their workplace. Many have wondered what it will take to enable the use of data management technology by a broader class of users and for a much wider range of applications. Google Fusion Tables represents an initial answer to the question of how data management functionality that focussed on enabling new users and applications would look in today's computing environment. This paper characterizes such users and applications and highlights the resulting principles, such as seamless Web integration, emphasis on ease of use, and incentives for data sharing, that underlie the design of Fusion Tables. We describe key novel features, such as the support for data acquisition, collaboration, visualization, and web-publishing.


Goldberg-Kidon, J., H. Gonzalez, A. Halevy, C. S. Jensen, A. Langen, J. Madhavan, R. Shapely, ," Google Fusion Tables: Data Management, Integration and Collaboration in the Cloud " in in Proceedings of the 2010 ACM SIGMOD ACM SIGMOD International Conference on the Management of Data, Indianapolis, IN, USA, pp. 1061-1066 ,, 2010

Publication
ACM Author-Izer

Google Fusion Tables is a cloud-based service for data management and integration. Fusion Tables enables users to upload tabular data files (spreadsheets, CSV, KML), currently of up to 100MB. The system provides several ways of visualizing the data (e.g., charts, maps, and timelines) and the ability to filter and aggregate the data. It supports the integration of data from multiple sources by performing joins across tables that may belong to different users. Users can keep the data private, share it with a select set of collaborators, or make it public and thus crawlable by search engines. The discussion feature of Fusion Tables allows collaborators to conduct detailed discussions of the data at the level of tables and individual rows, columns, and cells. This paper describes the inner workings of Fusion Tables, including the storage of data in the system and the tight integration with the Google Maps infrastructure.


Jensen, C.S., H. Lu, B. Yang, ,"Indoor-A New Data Management Frontier" in in M. Mokbel (ed.): Special Issue on New Frontiers in Spatial and Spatio-temporal Database Systems, IEEE Data Engineering Bulletin, 33(2): 12-17 ,, 2010

Publication
Online

Much research has been conducted on the management of outdoor moving objects. In contrast, relatively little research has been conducted on indoor moving objects. The indoor setting differs from outdoor settings in important ways, including the following two. First, indoor spaces exhibit complex topologies. They are composed of entities that are unique to indoor settings, e.g., rooms and hallways that are connected by doors. As a result, conventional Euclidean distance and spatial network distance are inapplicable in indoor spaces. Second, accurate, GPS-like positioning is typically unavailable in indoor spaces. Rather, positioning is achieved through the use of technologies such as Bluetooth, Infrared, RFID, or Wi-Fi. This typically results in much less reliable and accurate positioning. This paper covers some preliminary research that explicitly targets an indoor setting. Specifically, we describe a graph-based model that enables the effective and efficient tracking of indoor objects using proximity-based positioning technologies like RFID and Bluetooth. Furthermore, we categorize objects according to their position-related states, present an on-line hash-based object indexing scheme, and conduct an uncertainty analysis for indoor objects. We end by identifying several interesting and important directions for future research.


Jensen, C.S., S. Madria, ,"Message from the General Chairs" in in Proceedings of the Eleventh International Conference on Mobile Data Management, Kansas City, MO, USA, p. xii ,, 2010

Publication



Jensen, C.S., H. Lu, ,"The Great Indoors: A Data Management Frontier" in in Proceedings of the Second Workshop on Research Directions in Situational-aware Self-managed Proactive Computing in Wireless Adhoc Networks, Kansas City, MO, USA, 3 pages ,, 2010

Publication

Much of the research on data management for moving objects has assumed an outdoor setting in which objects move in Euclidean space (possibly constrained) or some form of spatial network and in which GPS or GPS-like positioning is assumed explicitly or implicitly. That body of research provides part of an enabling foundation for the growing Location-Based Services industry. However, we lead large parts of our lives in indoor spaces: homes, office buildings, shopping and leisure facilities, and collective transportation infrastructures. The latter may be large: For example, each day in 2009, London Heathrow Airport, UK had on average 180,000 passengers, and the Tokyo Subway (Tokyo Metro and Toei Subway), Japan delivered a daily average of 8.7 million passenger rides in 2008. Tokyo's Shinjuku Station alone was used by an average of 3.64 million passengers per day in 2007. Indoor differs from outdoor in important ways and thus calls for new research. The remainder of this paper covers selected differences between indoor and outdoor and discusses the implications for research.


Cao, X., G. Cong, B. Cui, C. S. Jensen, ," A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives " in in Proceedings of the Ninteenth International World Wide Web Conference, Raleigh, NC, USA, pp. 201-210 ,, 2010

Publication
ACM Author-Izer

Community Question Answering (CQA) has emerged as a popular type of service where users ask and answer questions and access historical question-answer pairs. CQA archives contain very large volumes of questions organized into a hierarchy of categories. As an essential function of CQA services, question retrieval in a CQA archive aims to retrieve historical question-answer pairs that are relevant to a query question. In this paper, we present a new approach to exploiting category information of questions for improving the performance of question retrieval, and we apply the approach to existing question retrieval models, including a state-of-the-art question retrieval model. Experiments conducted on real CQA data demonstrate that the proposed techniques are capable of outperforming a variety of baseline methods significantly.


Yang, B., H. Lu, C. S. Jensen, ," Probabilistic Threshold k Nearest Neighbor Queries over Moving Objects in Symbolic Indoor Space " in in Proceedings of the Thirteenth International Conference on Extending Database Technology, Lausanne, Switzerland, pp. 335-346 ,, 2010

Publication
ACM Author-Izer

The availability of indoor positioning renders it possible to deploy location-based services in indoor spaces. Many such services will benefit from the efficient support for k nearest neighbor (kNN) queries over large populations of indoor moving objects. However, existing kNN techniques fall short in indoor spaces because these differ from Euclidean and spatial network spaces and because of the limited capabilities of indoor positioning technologies. To contend with indoor settings, we propose the new concept of minimal indoor walking distance (MIWD) along with algorithms and data structures for distance computing and storage; and we differentiate the states of indoor moving objects based on a positioning device deployment graph, utilize these states in effective object indexing structures, and capture the uncertainty of object locations. On these foundations, we study the probabilistic threshold kNN (PTkNN) query. Given a query location q and a probability threshold T, this query returns all subsets of k objects that have probability larger than T of containing the kNN query result of q. We propose a combination of three techniques for processing this query. The first uses the MIWD metric to prune objects that are too far away. The second uses fast probability estimates to prune unqualified objects and candidate result subsets. The third uses efficient probability evaluation for computing the final result on the remaining candidate subsets. An empirical study using both synthetic and real data shows that the techniques are efficient.


Jensen, C.S., ,"Foreword" in Moving Objects Management-Models, Techniques and Applications, by X. Meng and J. Chen, Springer Verlag ,, 2010

Publication
Online at Springer



Lu, H., C. S. Jensen, Z. Zhang, ," Skyline Ordering: A Flexible Framework for Efficient Resolution of Size Constraints on Skyline Queries " in DB Technical Report TR-27, Department of Computer Science, Aalborg University, 28 pages ,, 2010

Publication

Given a set of multi-dimensional points, a skyline query returns the interesting points that are not dominated by other points. It has been observed that the actual cardinality (s) of a skyline query result may differ substantially from the desired result cardinality (k), which has prompted studies on how to reduce s for the case where k < s. This paper goes further by addressing the general case where the relationship between k and s is not known beforehand. Due to their complexity, the existing pointwise ranking and set-wide maximization techniques are not well suited for this problem. Moreover, the former often incurs too many ties in its ranking, and the latter is inapplicable for k > s. Based on these observations, the paper proposes a new approach, called skyline ordering, that forms a skyline-based partitioning of a given data set, such that an order exists among the partitions. Then set-wide maximization techniques may be applied within each partition. Efficient algorithms are developed for skyline ordering and for resolving size constraints using the skyline order. The results of extensive experiments show that skyline ordering yields a flexible framework for the efficient and scalable resolution of arbitrary size constraints on skyline queries.


Yiu, M.L., G. Ghinita, C. S. Jensen, P. Kalnis, ,"Enabling Search Services on Outsourced Private Spatial Data" in in The VLDB Journal, 19(3): 363-384,, 2010

Publication
Online at Springer

Cloud computing services enable organizations and individuals to outsource the management of their data to a service provider in order to save on hardware investments and reduce maintenance costs. Only authorized users are allowed to access the data. Nobody else, including the service provider, should be able to view the data. For instance, a real-estate company that owns a large database of properties wants to allow its paying customers to query for houses according to location. On the other hand, the untrusted service provider should not be able to learn the property locations and, e.g., selling the information to a competitor. To tackle the problem, we propose to transform the location datasets before uploading them to the service provider. The paper develops a spatial transformation that re-distributes the locations in space, and it also proposes a cryptographic-based transformation. The data owner selects the transformation key and shares it with authorized users. Without the key, it is infeasible to reconstruct the original data points from the transformed points. The proposed transformations present distinct trade-offs between query efficiency and data confidentiality. In addition, we describe attack models for studying the security properties of the transformations. Empirical studies demonstrate that the proposed methods are efficient and applicable in practice.


Ruxanda, M. M., A. Nanopoulos, C. S. Jensen, ,"Flexible Fusion of Relevance and Importance in Music Ranking" in in Journal of New Music Research, 39(1): 35-45,, 2010

Publication
Online at informaworld

Due to the proliferation of audio files on the Web and in large digital music collections, the ranking of the retrieved music becomes an important issue in Music Information Retrieval. This paper proposes a music-ranking strategy that can identify and flexibly fuse music audio, ranging from similar and potentially serendipitous music to authoritative or mainstream music. The notions of similar and authoritative music are double-folded identified based on user preference data and acoustic features extracted from the audio. The music-ranking employs kernel functions, which can be user-controllable through an intuitive parameter tuning. A research prototype system that incorporates the ranking mechanism is developed, and a user study is conducted on its use. The results of the survey envisage users' satisfaction with respect to the proposed music-ranking strategy, and its real-world applicability.


2009 Top

Tiesyte, D., C. S. Jensen, ,"Assessing the Predictability of Scheduled-Vehicle Travel Times" in in Proceedings of the Seventeenth ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, pp. 416-419 ,, 2009

Publication
ACM Author-Izer

One of the most desired and challenging services in collective transport systems is the real-time prediction of the near-future travel times of scheduled vehicles, especially public buses, thus improving the experience of the transportation users, who may be able to better schedule their travel, and also enabling system operators to perform real-time monitoring. While travel-time prediction has been researched extensively during the past decade, the accuracies of existing techniques fall short of what is desired, and proposed mathematical prediction models are often not transferable to other systems because the properties of the travel-time-related data of vehicles are highly context-dependent, making the models difficult to fit. We propose a framework for evaluating various predictability types of the data independently of the model, and we also compare predictability analysis results of travel times with the actual prediction errors for real bus trajectories. We have applied the proposed framework to real-time data collected from buses operating in Copenhagen, Denmark.


Yang, B., H. Lu, C. S. Jensen, ," Scalable Continuous Range Monitoring of Moving Objects in Symbolic Indoor Space " in in Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, pp. 671-680,, 2009

Publication
ACM Author-Izer

Indoor spaces accommodate large populations of individuals. The continuous range monitoring of such objects can be used as a foundation for a wide variety of applications, e.g., space planning, way finding, and security. Indoor space differs from outdoor space in that symbolic locations, e.g., rooms, rather than Euclidean positions or spatial network locations are important. In addition, positioning based on presence sensing devices, rather than, e.g., GPS, is assumed. Such devices report the objects in their activation ranges. We propose an incremental, query-aware continuous range query processing technique for objects moving in this setting. A set of critical devices is determined for each query, and only the observations from those devices are used to continuously maintain the query result. Due to the limitations of the positioning devices, queries contain certain and uncertain results. A maximum-speed constraint on object movement is used to refine the latter results. A comprehensive experimental study with both synthetic and real data suggests that our proposal is efficient and scalable.


Cao, X., G. Cong, B. Cui, C. S. Jensen, C. Zhang, ,"The use of categorization information in language models for question retrieval" in in Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, pp. 265-274 ,, 2009

Publication
ACM Author-Izer

Community Question Answering (CQA) has emerged as a popular type of service meeting a wide range of information needs. Such services enable users to ask and answer questions and to access existing question-answer pairs. CQA archives contain very large volumes of valuable user-generated content and have become important information resources on the Web. To make the body of knowledge accumulated in CQA archives accessible, effective and effcient question search is required. Question search in a CQA archive aims to retrieve historical questions that are relevant to new ques tions posed by users. This paper proposes a category-based framework for search in CQA archives. The framework embodies several new techniques that use language models to exploit categories of questions for improving question-answer search. Experiments conducted on real data from Yahoo! Answers demonstrate that the proposed techniques are effective and effcient and are capable of outperforming baseline methods significantly.


Jensen, C. S., H. Lu, B. Yang, ,"Indexing the Trajectories of Moving Objects in Symbolic Indoor Space" in in Proceedings of the Eleventh International Symposium on Spatial and Temporal Databases, Aalborg, Denmark, pp. 208-227 ,, 2009

Publication
Online at Springer

Indoor spaces accommodate large populations of individuals. With appropriate indoor positioning, e.g., Bluetooth and RFID, in place, large amounts of trajectory data result that may serve as a foundation for a wide variety of applications, e.g., space planning, way finding, and security. This scenario calls for the indexing of indoor trajectories. Based on an appropriate notion of indoor trajectory and definitions of pertinent types of queries, the paper proposes two R-tree based structures for indexing object trajectories in symbolic indoor space. The RTR-tree represents a trajectory as a set of line segments in a space spanned by positioning readers and time. The TP2R-tree applies a data transformation that yields a representation of trajectories as points with extension along the time dimension. The paper details the structure, node organization strategies, and query processing algorithms for each index. An empirical performance study suggests that the two indexes are effective, efficient, and robust. The study also elicits the circumstances under which our proposals perform the best.


Hansen, R., R. Wind, C. S. Jensen, B. Thomsen, ,"Pretty Easy Pervasive Positioning" in in Proceedings of the Eleventh International Symposium on Spatial and Temporal Databases, Aalborg, Denmark, pp. 417-421 ,, 2009

Publication
Online at Springer

With the increasing availability of positioning based on GPS, Wi-Fi, and cellular technologies and the proliferation of mobile devices with GPS, Wi-Fi and cellular connectivity, ubiquitous positioning is becoming a reality. While offerings by companies such as Google, Skyhook, and Spotigo render positioning possible in outdoor settings, including urban environments with limited GPS coverage, they remain unable to offer accurate indoor positioning. We will demonstrate a software infrastructure that makes it easy for anybody to build support for accurate Wi-Fi based positioning in buildings. All that is needed is a building with Wi-Fi coverage, access to the building, a floor plan of the building, and a Wi-Fi enabled device. Specifically, we will explain the software infrastructure and the steps that must be completed to obtain support for positioning. And we will demonstrate the positioning obtained, including how it interoperates with outdoor GPS positioning.


Vicente, C. R., M. Kirkpatrick, G. Ghinita, E. Bertino, C. S. Jensen, ,"Towards location-based access control in healthcare emergency response" in in Proceedings of the Second SIGSPATIAL ACM GIS International Workshop on Security and Privacy in GIS and LBS, Seattle, WA, USA, pp. 22-26 ,, 2009

Publication
ACM Author-Izer

Recent advances in positioning and tracking technologies have led to the emergence of novel location-based applications that allow participants to access information relevant to their spatio-temporal context. Traditional access control models, such as role-based access control (RBAC), are not sufficient to address the new challenges introduced by these location-based applications. Several recent research efforts have enhanced RBAC with spatio-temporal features. Nevertheless, the state-of-the-art does not deal with mobility of both subjects and objects, and does not support complex access control decisions based on spatio-temporal relationships among subjects and objects. Furthermore, such relationships change frequently in dynamic environments, requiring efficient mechanisms to monitor and re-evaluate access control decisions. In this position paper, we present a healthcare emergency response scenario which highlights the novel challenges that arise when enforcing access control in an environment with moving subjects and objects. To address a realistic application scenario, we consider movement on road networks, and we identify complex access control decisions relevant to such settings. We overview the main technical issues to be addressed, and we describe the architecture for policy decision and enforcement points.


Tzoumas, K., M. L. Yiu, C. S. Jensen, ,"Workload-Aware Indexing of Continuously Moving Objects" in in Proceedings of the VLDB Endowment, Vol. 2, No. 1-2, pp. 1186-1197,, 2009

Publication
Online at VLDB

The increased deployment of sensors and data communication networks yields data management workloads with update loads that are intense, skewed, and highly bursty. Query loads resulting from location-based services are expected to exhibit similar characteristics. In such environments, index structures can easily become performance bottlenecks. We address the need for indexing that is adaptive to the workload characteristics, called workload-aware, in order to cover the space in between maintaining an accurate index, and having no index at all. Our proposal, QU-Trade, extends R-tree type indexing and achieves workload-awareness by controlling the underlying index's filtering quality. QU-Trade safely drops index updates, increasing the overlap in the index when the workload is update-intensive, and it restores the filtering capabilities of the index when the workload becomes query-intensive. This is done in a non-uniform way in space so that the quality of the index remains high in frequently queried regions, while it deteriorates in frequently updated regions. The adaptation occurs online, without the need for a learning phase. We apply QU-Trade to the R-tree and the TPR-tree, and we offer analytical and empirical studies. In the presence of substantial workload skew, QU-Trade can achieve index update costs close to zero and can also achieve virtually the same query cost as the underlying index.


Cong, G., C. S. Jensen, D. Wu, ,"Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects" in in Proceedings of the VLDB Endowment, Vol. 2, No. 1-2, pp. 337-348,, 2009

Publication
Online at VLDB

The conventional Internet is acquiring a geo-spatial dimension. Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables a new kind of top-k query that takes into account both location proximity and text relevancy. To our knowledge, only naive techniques exist that are capable of computing a general web information retrieval query while also taking location into account. This paper proposes a new indexing framework for locationaware top-k text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within the framework. The framework encompasses algorithms that utilize the proposed indexes for computing the top-k query, thus taking into account both text relevancy and location proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper's proposal offers scalability and is capable of excellent performance.


Zhang, M., S. Chen, C. S. Jensen, B. C. Ooi, Z. Zhang, ,"Effectively Indexing Uncertain Moving Objects for Predictive Queries" in in Proceedings of the VLDB Endowment, Vol. 2, No. 1-2, pp. 1198-1209,, 2009

Publication
Online at VLDB

Moving object indexing and query processing is a well studied research topic, with applications in areas such as intelligent transport systems and location-based services. While much existing work explicitly or implicitly assumes a deterministic object movement model, real-world objects often move in more complex and stochastic ways. This paper investigates the possibility of a marriage between moving-object indexing and probabilistic object modeling. Given the distributions of the current locations and velocities of moving objects, we devise an efficient inference method for the prediction of future locations. We demonstrate that such prediction can be seamlessly integrated into existing index structures designed for moving objects, thus improving the meaningfulness of range and nearest neighbor query results in highly dynamic and uncertain environments. The paper reports on extensive experiments on the B*-tree that offer insights into the properties of the paper's proposal.


Jensen, C. S., H. Lu, B. Yang, ,"Graph Model Based Indoor Tracking" in in Proceedings of the Tenth International Conference on Mobile Data Management, Taipei, Taiwan, pp. 122-131,, 2009

Publication
Online at IEEE

The tracking of the locations of moving objects in large indoor spaces is important, as it enables a range of applications related to, e.g., security and indoor navigation and guidance. This paper presents a graph model based approach to indoor tracking that offers a uniform data management infrastructure for different symbolic positioning technologies, e.g., Bluetooth and RFID. More specifically, the paper proposes a model of indoor space that comprises a base graph and mappings that represent the topology of indoor space at different levels. The resulting model can be used for one or several indoor positioning technologies. Focusing on RFID-based positioning, an RFID specific reader deployment graph model is built from the base graph model. This model is then used in several algorithms for constructing and refining trajectories from raw RFID readings. Empirical studies with implementations of the models and algorithms suggest that the paper's proposals are effective and efficient.


Pedersen, T. B., A. Shoshani, J. Gu, C. S. Jensen, ,"Object-Extended OLAP Querying" in in Data and Knowledge Engineering, Vol. 68, No. 5, pp. 453-480,, 2009

Publication
Online by Elsevier at ScienceDirect

On-line analytical processing (OLAP) systems based on a dimensional view of data have found widespread use in business applications and are being used increasingly in nonstandard applications. These systems provide good performance and ease-of-use. However, the complex structures and relationships inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This paper presents the concepts and techniques underlying a flexible, "multi-model" federated system that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. This allows data analysis on the OLAP data to be significantly enriched by the use of additional object data. Additionally, physical integration of the OLAP and the object data can be avoided. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally support queries that involve data in object databases. The language permits selection criteria that reference object data, queries that return combinations of OLAP and object data, and queries that group dimensional data according to object data. The system is designed to be aggregation-safe, in the sense that it exploits the aggregation semantics of the data to prevent incorrect or meaningless query results. These capabilities may also be integrated into existing languages. It is shown how to integrate relational and XML data using the technology. A prototype implementation of the system is reported, along with performance measurements that show that the approach is a viable alternative to a physically integrated data warehouse.


Ruxanda, M. M., B. Y. Chua, A. Nanopoulos, C. S. Jensen, ,"Emotion-based Music Retrieval on a Well-reduced Audio Feature Space" in in Proceedings of the 34th International Conference on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, pp. 181-184,, 2009

Publication
Online at IEEE

Music expresses emotion. A number of audio extracted features have influence on the perceived emotional expression of music. These audio features generate a high-dimensional space, on which music similarity retrieval can be performed effectively, with respect to human perception of the musicemotion. However, the real-time systems that retrieve music over large music databases, can achieve order of magnitude performance increase, if applying multidimensional indexing over a dimensionally reduced audio feature space. To meet this performance achievement, in this paper, extensive studies are conducted on a number of dimensionality reduction algorithms, including both classic and novel approaches. The paper clearly envisages which dimensionality reduction techniques on the considered audio feature space, can preserve in average the accuracy of the emotion-based music retrieval.


Zhou, Y., G. Cong, B. Cui, C. S. Jensen, J. Yao, ,"Routing Questions to the Right Users in Online Communities" in in Proceedings of the 25th International Conference on Data Engineering, Shanghai, China, pp. 700-711 ,, 2009

Publication
Online at IEEE

Online forums contain huge amounts of valuable user-generated content. In current forum systems, users have to passively wait for other users to visit the forum systems and read/answer their questions. The user experience for question answering suffers from this arrangement. In this paper, we address the problem of "pushing" the right questions to the right persons, the objective being to obtain quick, high-quality answers, thus improving user satisfaction. We propose a framework for the efficient and effective routing of a given question to the top-k potential experts (users) in a forum, by utilizing both the content and structures of the forum system. First, we compute the expertise of users according to the content of the forum system.—This is to estimate the probability of a user being an expert for a given question based on the previous question answering of the user. Specifically, we design three models for this task, including a profile-based model, a thread-based model, and a clusterbased model. Second, we re-rank the user expertise measured in probability by utilizing the structural relations among users in a forum system. The results of the two steps can be integrated naturally in a probabilistic model that computes a final ranking score for each user. Experimental results show that the proposals are very promising.


Yiu, M. L., G. Ghinita, C. S. Jensen, P. Kalnis, ,"Outsourcing Search Services on Private Spatial Data" in in Proceedings of the 25th International Conference on Data Engineering, Shanghai, China, pp. 1140-1143 ,, 2009

Publication
Online at IEEE

Social networking and content sharing service providers, e.g., Facebook and Google Maps, enable their users to upload and share a variety of user-generated content, including location data such as points of interest. Users wish to share location data through an (untrusted) service provider such that trusted friends can perform spatial queries on the data. We solve the problem by transforming the location data before uploading them. We contribute spatial transformations that redistribute locations in space and a transformation that employs cryptographic techniques. The data owner selects transformation keys and shares them with the trusted friends. Without the keys, it is infeasible for an attacker to reconstruct the exact original data points from the transformed points. These transformations achieve different tradeoffs between query efficiency and data security. In addition, we describe an attack model for studying the security properties of the transformations. Empirical studies suggest that the proposed methods are secure and efficient.


Tradišauskas, N., J. Juhl, H. Lahrmann, C. S. Jensen, ,"Map Matching for Intelligent Speed Adaptation" in in IET Intelligent Transport Systems, Vol. 3, No. 1, pp. 57-66,, 2009

Publication
Online at IET Digital Library

The availability of Global Navigation Satellite Systems (GNSS) enables sophisticated vehicle guidance and advisory systems such as Intelligent Speed Adaptation (ISA) systems. In ISA systems, it is essential to be able to position vehicles within a road network. Because digital road networks as well as GNSS positioning are often inaccurate, a technique known as map matching is needed that aims to use this inaccurate data for determining a vehicle's real road-network position. Then, knowing this position, an ISA system can compare speed with the speed limit in effect and take measures against speeding. An on-line map-matching algorithm is presented with an extensive number of weighting parameters that allow better determination of a vehicle's road network position. The algorithm uses certainty value to express its belief in the correctness of its results. The algorithm was designed and implemented to be used in the large scale ISA project "Spar på farten". Using test data and data collected from project participants, the algorithm's performance is evaluated. It is shown that the algorithm performs correctly 95% of the time and is capable of handling GNSS positioning errors in a conservative manner.


Hansen, R., R. Wind, C. S. Jensen, B. Thomsen, ," Seamless Indoor/Outdoor Positioning Handover for Location-Based Services in Streamspin " in in Proceedings of the Tenth International Conference on Mobile Data Management, Taipei, Taiwan, pp. 267-272 ,, 2009

Publication
Online at IEEE

This paper presents the implementation of a novel seamless indoor/outdoor positioning service for mobile users.The service is being made available in the Streamspin system(www.streamspin.com), an open platform for the creation and delivery of location-based services. Streamspin seeks to enable the delivery of truly ubiquitous location-based services by integrating GPS and Wi-Fi location fingerprinting. The paper puts focus on key aspects of the seamless handover between outdoor to indoor positioning. Several different handover solutions are presented,and their applicability is evaluated with respect to positioning accuracy and battery consumption of the mobile device.


Jensen, C. S., ,"Data Management Infrastructure for the Mobile Web" in in Proceedings of the Fifth International Conference on Semantics, Knowledge and Grid, Zhuhai, China, p. 1. Invited paper ,, 2009

Publication

The Internet is going mobile, and indications are that the mobile Internet will be "bigger" than the conventional Internet. Due to aspects such as user mobility, much more diverse use situations, and the form factor of mobile devices, context awareness is important on the mobile Internet. Focusing on geo-spatial context awareness, this talk covers research that aims to build infrastructure for mobile data management.


Shestakov, N. A, C. S. Jensen, ," Extending Mobile Service Context with User Routes in the Streamspin Platform " in Bulletin of the Tomsk Polytechnic University, Vol. 314, No. 5, pp. 170-175 (in Russian) ,, 2009

Publication



Jensen, C. S., R. T. Snodgrass, ,"Absolute Time" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 1 ,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Applicability Period" in in Encyclopedia of Database Systems, edited by L. Liu and M. T.–Özsu, Springer Verlag, pp. 98-99 ,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Bitemporal Interval" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 243 ,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Bitemporal Relation" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 243-244 ,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Calendar" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 304-305 ,, 2009

Online at Springer



Dyreson, C., C. S. Jensen, R. T. Snodgrass, ,"Calendric System" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 305 ,, 2009

Online at Springer



Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, ,"Current Semantics" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 544-545 ,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Event in Temporal Databases" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1045-1046 ,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Fixed Time Span" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 1141,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Forever" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 1161,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"History in Temporal Databases" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 1319,, 2009

Online at Springer



Šaltenis, S., C. S. Jensen, ,"Indexing of the Current and Near-Future Positions of Moving Objects" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1458-1463,, 2009

Online at Springer



Jensen, C. S., ,"Lifespan" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1612-1613,, 2009

Online at Springer



Jensen, C. S., N. Tradišauskas, ,"Map Matching" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1692-1696,, 2009

Online at Springer



Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, ,"Nonsequenced Semantics" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1913-1915,, 2009

Online at Springer



Dyreson, C., C. S. Jensen, R. T. Snodgrass, ,"Now in Temporal Databases" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1920-1924,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Relative Time" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2376-2377,, 2009

Online at Springer



Böhlen, M. H., C. S. Jensen, ,"Sequenced Semantics" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2619-2621,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Snapshot Equivalence" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 2659,, 2009

Online at Springer



Böhlen, M. H., J. Gamper, C. S. Jensen, R. T. Snodgrass, ,"SQL-Based Temporal Query Languages" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2762-2768,, 2009

Online at Springer



Böhlen, M. H., J. Gamper, C. S. Jensen, ,"Temporal Aggregation" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2924-2929,, 2009

Online at Springer



Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, ,"Temporal Compatibility" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2936-2939,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Temporal Database" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2957-2960,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Temporal Data Models" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2952-2957,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Temporal Element" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 2966,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Temporal Expression" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 2967,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Temporal Generalization" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2967-2968,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Temporal Homogeneity" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 2973,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Temporal Projection" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 3008,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Temporal Query Languages" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3009-3012,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Temporal Specialization" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3017-3018,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Time Instant" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 3112,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Time Interval" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3112-3113,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Time Span" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 3119,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Timeslice Operator" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3120-3121,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Transaction Time" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3162-3163,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"User-Defined Time" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 3252,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Valid Time" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3253-3254,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Variable Time Span" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 3259,, 2009

Online at Springer



Jensen, C. S., R. T. Snodgrass, ,"Weak Equivalence" in in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3455-3456,, 2009

Online at Springer



Jensen, C. S., H. Lu, M. L. Yiu, ,"Location Privacy Techniques in Client-Server Architectures" in Chapter 2 in Privacy in Location-Based Applications, edited by C. Bettini, S. Jajodia, P. Samarati, and X. Sean Wang, Lecture Notes on Computer Science Vol. 5599, Springer Verlag, pp. 31-58.,, 2009

Publication
Online at Springer

A typical location-based service returns nearby points of interest in response to a user location. As such services are becoming increasingly available and popular, location privacy emerges as an important issue. In a system that does not offer location privacy, users must disclose their exact locations in order to receive the desired services. We view location privacy as an enabling technology that may lead to increased use of location-based services. In this chapter, we consider location privacy techniques that work in traditional client-server architectures without any trusted components other than the client's mobile device. Such techniques have important advantages. First, they are relatively easy to implement because they do not rely on any trusted third-party components. Second, they have potential for wide application, as the client-server architecture remains dominant for web services. Third, their effectiveness is independent of the distribution of other users, unlike the k-anonymity approach. The chapter characterizes the privacy models assumed by existing techniques and categorizes these according to their approach. The techniques are then covered in turn according to their category. The first category of techniques enlarge the client's position into a region before it is sent to the server. Next, dummy-based techniques hide the user's true location among fake locations, called dummies. In progressive retrieval, candidate results are retrieved iteratively from the server, without disclosing the exact user location. Finally, transformation-based techniques employ cryptographic transformations so that the service provider is unable to decipher the exact user locations. We end by pointing out promising directions and open problems.


2008 Top

Redoutey, M., E. Scotti, C. S. Jensen, C. Ray, C. Claramunt, ,"Efficient Vessel Tracking with Accuracy Guarantees" in in Proceedings of the Eighth International Symposium on Web and Wireless Geographical Information Systems, Shanghai, China, pp. 140-151 ,, 2008

Publication
Online at Springer

Safety and security are top concerns in maritime navigation, particularly as maritime traffic continues to grow and as crew sizes are reduced. The Automatic Identification System (AIS) plays a key role in regard to these concerns. This system, whose objective is in part to identify and locate vessels, transmits location-related information from vessels to ground stations that are part of a so-called Vessel Traffic Service (VTS), thus enabling these to track the movements of the vessels. This paper presents techniques that improve the existing AIS by offering better and guaranteed tracking accuracies at lower communication costs. The techniques employ movement predictions that are shared between vessels and the VTS. Empirical studies with a prototype implementation and real vessel data demonstrate that the techniques are capable of significantly improving the AIS.


Jensen, C. S., C. R. Vicente, R. Wind, ,"User-Generated Content-The Case for Mobile Services" in IEEE Computer, Vol. 41, No. 12, pp. 116-118,, 2008

Publication
Online at IEEE

Enabling user-generated services could help fuel the mobile revolution. Web sites that enable the sharing of user- generated content such as photos and videos are immensely popular, and their use is on the rise. Technologies that enable Web sites to support the creation, sharing, and deployment of user-generated mobile services could be key factors in the spread of the mobile Internet.


Tiesyte, D., C. S. Jensen, ," Similarity-Based Prediction of Travel Times for Vehicles Traveling on Known Routes " in in Proceedings of the Sixteenth ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Irvine, CA, USA, pp. 105-114 ,, 2008

Publication
ACM Author-Izer

The use of centralized, real-time position tracking is proliferating in the areas of logistics and public transportation. Real-time positions can be used to provide up-to-date information to a variety of users, and they can also be accumulated for uses in subsequent data analyses. In particular, historical data in combination with real-time data may be used to predict the future travel times of vehicles more accurately, thus improving the experience of the users who rely on such information. We propose a Nearest-Neighbor Trajectory (NNT) technique that identifies the historical trajectory that is the most similar to the current, partial trajectory of a vehicle. The historical trajectory is then used for predicting the future movement of the vehicle. The paper's specific contributions are two-fold. First, we define distance measures and a notion of nearest neighbor that are specific to trajectories of vehicles that travel along known routes. In empirical studies with real data from buses, we evaluate how well the proposed distance functions are capable of predicting future vehicle movements. Second, we propose a main-memory index structure that enables incremental similarity search and that is capable of supporting varying-length nearest neighbor queries.


Combi, C., S. Degani, C. S. Jensen, ,"Capturing Temporal Constraints in Temporal ER Models" in in Proceedings of the 27th International Conference on Conceptual Modeling, Barcelona, Spain, pp. 397-411 ,, 2008

Publication
Online at Springer

A wide range of database applications manage information that varies over time. The conceptual modeling of databases is frequently based on one of the several versions of the ER model. As this model does not provide built-in means for capturing temporal aspects of data, the resulting diagrams are unnecessarily obscure and inadequate for documentation purposes. The TimeER model extends the ER model with suitable constructs for modeling time-varying information, easing the design process, and leading to easy-to-understand diagrams. In a temporal ER model, support for the specification of advanced temporal constraints would be desiderable, allowing the designer to specify, e.g., that the value of an attribute must not change over time. This paper extends the TimeER model by introducing the notation, and the associated semantics, for the specification of new temporal constraints.


Ruxanda, M. M., C. S. Jensen, ,"Flexible Query Framework for Music Data and Playlist Manipulation" in in Proceedings of the Third International Workshop on Flexible Database and Information System Technology, Turin, Italy, pp. 693-697,, 2008

Publication
Online at IEEE

Motivated by the explosion of digital music on the Web and the increasing popularity of music recommender systems, this paper presents a relational query framework for flexible music retrieval and effective playlist manipulation. A generic song representation model is introduced, which captures heterogeneous categories of musical information and serves a foundation for query operators that offer a practical solution to playlist management. A formal definition of the proposed query operators is provided, together with real usage scenarios and a prototype implementation.


Diao, Y., C. S. Jensen, editors, , in in Proceedings of the Fifth International Workshop on Data Management for Sensor Networks, Auckland, New Zealand, 55+viii pages ,, 2008

Online at ACM Digital Library



Jeung, H., M. L. Yiu, X. Zhou, C. S. Jensen, H. T. Shen, ,"Discovery of Convoys in Trajectory Databases" in in Proceedings of the VLDB Endowment, Auckland, New Zealand, Vol. 1, No. 1, pp. 1068-1080 ,, 2008

Publication
Online at ACM Digital Library

As mobile devices with positioning capabilities continue to proliferate, data management for so-called trajectory databases that capture the historical movements of populations of moving objects becomes important. This paper considers the querying of such databases for convoys, a convoy being a group of objects that have traveled together for some time. More specifically, this paper formalizes the concept of a convoy query using density-based notions, in order to capture groups of arbitrary extents and shapes. Convoy discovery is relevant for real-life applications in throughput planning of trucks and carpooling of vehicles. Although there has been extensive research on trajectories in the literature, none of this can be applied to retrieve correctly exact convoy result sets. Motivated by this, we develop three efficient algorithms for convoy discovery that adopt the well-known filter-refinement framework. In the filter step, we apply line-simplification techniques on the trajectories and establish distance bounds between the simplified trajectories. This permits efficient convoy discovery over the simplified trajectories without missing any actual convoys. In the refinement step, the candidate convoys are further processed to obtain the actual convoys. Our comprehensive empirical study offers insight into the properties of the paper's proposals and demonstrates that the proposals are effective and efficient on real-world trajectory data.


Chen, S., C. S. Jensen, D. Lin, ,"A Benchmark for Evaluating Moving Object Indexes" in in Proceedings of the VLDB Endowment, Auckland, New Zealand, Vol. 1, No. 2, pp. 1574-1585 ,, 2008

Publication
Online at ACM Digital Library

Progress in science and engineering relies on the ability to measure, reliably and in detail, pertinent properties of artifacts under design. Progress in the area of database-index design thus relies on empirical studies based on prototype implementations of indexes. This paper proposes a benchmark that targets techniques for the indexing of the current and near-future positions of moving objects. This benchmark enables the comparison of existing and future indexing techniques. It covers important aspects of such indexes that have not previously been covered by any benchmark. Notable aspects covered include update efficiency, query efficiency, concurrency control, and storage requirements. Next, the paper applies the benchmark to half a dozen notable moving-object indexes, thus demonstrating the viability of the benchmark and offering new insight into the performance properties of the indexes.


Hansen, R., C. S. Jensen, B. Thomsen, R. Wind, ,"Seamless Indoor/Outdoor Positioning with Streamspin" in in Proceedings of the the Fifth Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, Dublin, Ireland, 2 pages,, 2008

Publication

This paper presents the implementation of novel seamless indoor/outdoor positioning service for mobile users. When users are not within GPS range, the service exploits the wifi access point infrastructure for positioning. A central server stores wifi radio maps and map images that are then sent to user terminals based on the mac addresses of nearby access points. The positioning services is available in Streamspin (www.streamspin.com), which is an open and scalable plat- form for the creation and delivery of location-based services. With this new service, the system enables the easy creation and deployment of mobile services that rely on seamless in- door/outdoor positioning.


Jensen, C. S., D. Tiesyte, ,"TransDB-GPS Data Management with Applications in Collective Transport" in in Proceedings of the First International Workshop on Computational Transportation Science, Dublin, Ireland, 6 pages ,, 2008

Publication

Recent and continuing advances in geo-positioning, mobile communications, and computing electronics combine to offer opportunities for advanced and affordable collective transport services. As the roads in many parts of the world are facing increasing congestion, it becomes increasingly important to establish collective transport solutions, such as bus services, that are competitive in comparison to the use of private cars. One important ingredient in the provisioning of such solutions is an information system that is always aware of the current location and expected future locations of each bus and that is capable of utilizing this information in real time as well as off-line, e.g., for offering the users accurate arrival information and for creating safe, realistic, and environmentally friendly bus schedules. This paper introduces to an on-going project that explores the advanced data management techniques needed to create an efficient, accurate, and yet inexpensive information system for collective transport monitoring. Focus is on bus travel time prediction and the communication between the vehicles and their surrounding infrastructure.


Böhlen, M. H., J. Gamper, C. S. Jensen, ,"Towards General Temporal Aggregation" in in Proceedings of the Twentyfifth British National Conference on Databases, Lecture Notes on Computer Science 5071, Cardiff, Wales, UK, pp. 257-269 ,, 2008

Publication
Online at Springer

Most database applications manage time-referenced, or temporal, data. Temporal data management is difficult when using conventional database technology, and many contributions have been made for how to better model, store, and query temporal data. Temporal aggregation illustrates well the problems associated with the management of temporal data. Indeed, temporal aggregation is complex and among the most difficult, and thus interesting, temporal functionality to support. This paper presents a general framework for temporal aggregation that accommodates existing kinds of aggregation, and it identifies open challenges within temporal aggregation.


Tzoumas, K., T. Sellis, C. S. Jensen, ,"A Reinforcement Learning Approach for Adaptive Query Processing" in DB Technical Report TR-22, Department of Computer Science, Aalborg University, June 2008, 27 pages ,, 2008

Publication

In adaptive query processing, query plans are improved at runtime by means of feedback. In the very flexible approach based on so-called eddies, query execution is treated as a process of routing tuples to the query operators that combine to compute a query. This makes it possible to alter query plans at the granularity of tuples. Further, the complex task of searching the query plan space for a suitable plan now resides in the routing policies used. These policies must adapt to the changing execution environment and must converge at a near-optimal plan when the environment stabilizes. This paper advances adaptive query processing in two respects. First, it proposes a general framework for the routing problem that may serve the same role for adaptive query processing as does the framework of search in query plan space for conventional query processing. It thus offers an improved foundation for research in adaptive query processing. The framework leverages reinforcement learning theory and formalizes a tuple routing policy as a mapping from a state space to an action space, capturing query semantics as well as routing constraints. In effect, the framework transforms query optimization from a search problem in query plan space to an unsupervised learning problem with quantitative rewards that is tightly coupled with the query execution. The framework covers selection queries as well as joins that use all proposed join execution mechanisms (SHJs, SteMs, STAIRs). Second, in addition to showing how existing routing policies can fit into the framework, the paper demonstrates new routing policies that build on advances in reinforcement learning. By means of empirical studies, it is shown that the proposed policies embody the desired adaptivity and convergence characteristics, and that they are capable of clearly outperforming existing policies.


Speicys, L., C. S. Jensen, ,"Enabling Location-Based Services - Multi-Graph Representation of Transportation Networks" in GeoInformatica, Vol. 12, No. 2, pp. 219-253,, 2008

Publication
Online at Springer

Advances in wireless communications, positioning technologies, and consumer electronics combine to enable a range of applications that use a mobile user's geo-spatial location to deliver on-line, location-enhanced services, often referred to as location-based services. This paper assumes that the service users are constrained to a transportation network, and it delves into the modeling of such networks, points of interest, and the service users with the objective of supporting location-based services. In particular, the paper presents a framework that encompasses two interrelated modelsa twodimensional, spatial representation and a multi-graph presentation. The former, high-fidelity model may be used for the positioning of content and users in the infrastructure (e.g., using map matching). The latter type of model is recognized as an ideal basis for a variety of query processing tasks, e.g., route and distance computations. Together, the two models capture central aspects of the problem domain needed in order to support the different types of queries that underlie location-based services. Notably, the framework is capable of capturing roads with lanes, lane shift and u-turn regulations, and turn restrictions. As part of the framework, the paper constructively demonstrates how it is possible map instances of the semantically rich two-dimensional model to instances of the graph model that preserve the topology of the twodimensional model instances. In doing so, the paper demonstrates how a wealth of previously proposed query processing techniques based on graphs are applicable even in the context of complex transportation networks. The paper also presents means of compacting graphs while preserving aspects of the graphs that are important for the intended applications.


Ruxanda, M. M., A. Nanopoulos, C. S. Jensen, Y. Manolopoulos, ,"Ranking Music Data by Relevance and Importance" in in Proceedings of the IEEE International Conference on Multimedia and Expo, Hannover, Germany, 4 pages,, 2008

Publication
Online at IEEE

Due to the rapidly increasing availability of audio files on the Web, it is relevant to augment search engines with advanced audio search functionality. In this context, the ranking of the retrieved music is an important issue. This paper proposes a music ranking method capable of flexibly fusing the music based on its relevance and importance. The fusion is controlled by a single parameter, which can be intuitively tuned by the user. The notion of authoritative music among relevant music is introduced, and social media mined from the Web is used in an innovative manner to determine both the relevance and importance of music. The proposed method may support users with diverse needs when searching for music.


Demri, S., C. S. Jensen, editors, , in in Proceedings of the Fifteenth International Symposium on Temporal Representation and Reasoning, Montreal, Canada, 174+x pages ,, 2008

Online at IEEE



Lu, H., C. S. Jensen, M. L. Yiu, ,"PAD: Privacy-Area Aware, Dummy-Based Location Privacy in Mobile Services" in in Proceedings of the Seventh ACM International Workshop on Data Engineering for Wireless and Mobile Access, Vancouver, Canada, pp. 16-23,, 2008

Publication
ACM Author-Izer

Location privacy in mobile services has the potential to become a serious concern for service providers and users. Existing privacy protection techniques that use k-anonymity convert an original query into an anonymous query that contains the locations of multiple users. Such techniques, however, generally fail in offering guaranteed large privacy regions at reasonable query processing costs. In this paper, we propose the PAD approach that is capable of offering privacy-region guarantees. To achieve this, PAD uses so-called dummy locations that are deliberately generated according to either a virtual grid or circle. These cover a user's actual location, and their spatial extents are controlled by the generation algorithms. The PAD approach only requires a lightweight server-side front-end in order for it to be integrated into an existing client/server mobile service system. In addition, query results are organized according to a compact format on the server, which not only reduces communication cost, but also facilitates the result refinement on the client side. An empirical study shows that our proposal is effective in terms of offering location privacy, and efficient in terms of computation and communication costs.


Jensen, C. S., R. T. Snodgrass, editors, ,"Temporal Database Entries for the Springer Encyclopedia of Database Systems" in TimeCenter Technical Report TR-90, 337+v pages,, 2008

TimeCenter
Publication



Tiesyte, D., C. S. Jensen, ,"Efficient Cost-Based Tracking of Scheduled Vehicle Journeys" in in Proceedings of the Ninth International Conference on Mobile Data Management, Beijing, China, pp. 9-16 ,, 2008

Publication
Online at IEEE

Applications in areas such as logistics, cargo delivery, and collective transport involve the management of fleets of vehicles that are expected to travel along known routes according to schedules. There is a fundamental need by the infrastructure surrounding the vehicles to know the actual status of the vehicles. Since the vehicles deviate from their schedules due to road construction, accidents, and other unexpected conditions, it is necessary for the vehicles to communicate with the infrastructure. Frequent updates introduce high communication costs, and server-side updates easily become a bottleneck. This paper presents techniques that enable the tracking of vehicle positions and arrival times at scheduled stops with little communication, while still offering the desired accuracy to the infrastructure of the status of the vehicles. Experimental results with real GPS data from buses show that the proposed techniques are capable of reducing the number of updates significantly compared to a state-of-the art approach where vehicles issue updates at pre-defined positions along their routes.


Yiu, M. L., C. S. Jensen, X. Huang, H. Lu, ," SpaceTwist: Managing the Trade-Offs Among Location Privacy, Query Performance, and Query Accuracy in Mobile Services " in in Proceedings of the Twentyfourth IEEE International Conference on Data Engineering, Cancun, Mexico, pp. 366-375 ,, 2008

Publication
Online at IEEE

In a mobile service scenario, users query a server for nearby points of interest but they may not want to disclose their locations to the service. Intuitively, location privacy may be obtained at the cost of query performance and query accuracy. The challenge addressed is how to obtain the best possible performance, subjected to given requirements for location privacy and query accuracy. Existing privacy solutions that use spatial cloaking employ complex server query processing techniques and entail the transmission of large quantities of intermediate result. Solutions that use transformation-based matching generally fall short in offering practical query accuracy guarantees. Our proposed framework, called SpaceTwist, rectifies these shortcomings for k nearest neighbor (kNN) queries. Starting with a location different from the user's actual location, nearest neighbors are retrieved incrementally until the query is answered correctly by the mobile terminal. This approach is flexible, needs no trusted middleware, and requires only well-known incremental NN query processing on the server. The framework also includes a server-side granular search technique that exploits relaxed query accuracy guarantees for obtaining better performance. The paper reports on empirical studies that elicit key properties of SpaceTwist and suggest that the framework offers very good performance and high privacy, at low communication cost.


Skyt, J., C. S. Jensen, T. B. Pedersen, ,"Specification-Based Data Reduction in Dimensional Data Warehouses" in Information Systems, Vol. 33, No. 1, pp. 36-63,, 2008

Publication
Online by Elsevier at ScienceDirect

Many data warehouses contain massive amounts of data, accumulated over long periods of time. In some cases, it is necessary or desirable to either delete "old" data or to maintain the data at an aggregate level. This may be due to privacy concerns, in which case the data are aggregated to levels that ensure anonymity. Another reason is the desire to maintain a balance between the uses of data that change as the data age and the size of the data, thus avoiding overly large data warehouses. This paper presents effective techniques for data reduction that enable the gradual aggregation of detailed data as the data ages. With these techniques, data may be aggregated to higher levels as they age, enabling the maintenance of more compact, consolidated data and the compliance with privacy requirements. Special care is taken to avoid semantic problems in the aggregation process. The paper also describes the querying of the resulting data warehouses and an implementation strategy based on current database technology.


Jensen, C. S., D. Lin, B. C. Ooi, ,"Indexing of Moving Objects, B+-Tree" in in Encyclopedia of GIS, edited by S. Shekhar and H. Xiong, Springer Verlag, pp. 512-518,, 2008




Jensen, C. S., D. Lin, B. C. Ooi, ,"Maximum Update Interval in Moving Objects Databases" in in Encyclopedia of GIS, edited by S. Shekhar and H. Xiong, Springer Verlag, p. 651,, 2008




Jensen, C. S., L. Speicys, ,"Road Network Data Models" in in Encyclopedia of GIS, edited by S. Shekhar and H. Xiong, Springer Verlag, pp. 972-976,, 2008




Tryfona, N., C. S. Jensen, ,"Spatio-temporal Database Modeling with an Extended Entity-Relationship Model" in in Encyclopedia of GIS, edited by S. Shekhar and H. Xiong, Springer Verlag, pp. 1115-1121,, 2008




2007 Top

Jensen, C. S., D. Lin, B. C. Ooi, ,"Continuous Clustering of Moving Objects" in IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 9, pp. 1161-1174,, 2007

Publication
Online at IEEE

This paper considers the problem of efficiently maintaining a clustering of a dynamic set of data points that move continuously in two-dimensional Euclidean space. This problem has received little attention and introduces new challenges to clustering. The paper proposes a new scheme that is capable of incrementally clustering moving objects. This proposal employs a notion of object dissimilarity that considers object movement across a period of time, and it employs clustering features that can be maintained efficiently in incremental fashion. In the proposed scheme, a quality measure for incremental clusters is used for identifying clusters that are not compact enough after certain insertions and deletions. An extensive experimental study shows that the new scheme performs significantly faster than traditional ones that frequently rebuild clusters. The study also shows that the new scheme is effective in preserving the quality of moving-object clusters.


Urgun, B., C. E. Dyreson, N. Kline, J. K. Miller, R. T. Snodgrass, M. D. Soo, C. S. Jensen, ,"Integrating Multiple Calendars using tZaman" in Software: Practice and Experience, Vol. 37, No. 3, pp. 267-308,, 2007

Publication
Online at Wiley InterScience

Programmers world-wide are interested in developing applications that can be used internationally. Part of the internationalization effort is the ability to engineer applications to use dates and times that conform to local calendars yet can inter-operate with dates and times in other calendars, for instance between the Gregorian and Islamic calendars. tZAMAN is a system that provides a natural language and calendarindependent framework for integrating multiple calendars. tZAMAN performs "runtime-binding" of calendars and language support. A running tZAMAN system dynamically loads calendars and language support tables from XML-formatted files. Loading a calendar integrates it with other, already loaded calendars, enabling users of tZAMAN to add, compare, and convert times between multiple calendars. tZAMAN also provides a flexible, calendar-independent framework for parsing temporal literals. Literals can be input and output in XML or plain text, using user-defined formats, and in different languages and character sets. Finally, tZAMAN is a client/server system, enabling shared access to calendar servers spread throughout the web. This paper describes the architecture of tZAMAN and experimentally quantifies the cost of using a calendar server to translate and manipulate dates.


Brilingaite, A., C. S. Jensen, ,"Enabling Routes of Road Network Constrained Movements as Mobile Service Context" in GeoInformatica, Vol. 11, No. 1, pp. 55-102,, 2007

Publication
Online at Springer

With the continuing advances in wireless communications, geo-positioning, and portable electronics, an infrastructure is emerging that enables the delivery of on-line, location-enabled services to very large numbers of mobile users. A typical usage situation for mobile services is one characterized by a small screen and no keyboard, and by the service being only a secondary focus of the user. Under such circumstances, it is particularly important to deliver the "right" information and service at the right time, with as little user interaction as possible. This may be achieved by making services context aware. Mobile users frequently follow the same route to a destination as they did during previous trips to the destination, and the route and destination constitute important aspects of the context for a range of services. This paper presents key concepts underlying a software component that identifies and accumulates the routes of a user along with their usage patterns and that makes the routes available to services. The problems associated with of route recording are analyzed, and algorithms that solve the problems are presented. Experiences from using the component on logs of GPS positions acquired from vehicles traveling within a real road network are reported.


Biveinis, L., S. Šaltenis, C. S. Jensen, ,"Main-Memory Operation Buffering for Efficient R-Tree Update" in in Proceedings of the Thirtythird International Conference on Very Large Data Bases, Vienna, Austria, pp. 591-602,, 2007

Publication
Online at VLDB

Emerging communication and sensor technologies enable new applications of database technology that require database systems to efficiently support very high rates of spatial-index updates. Previous works in this area require the availability of large amounts of main memory, do not exploit all the main memory that is indeed available, or do not support some of the standard index operations. Assuming a setting where the index updates need not be written to disk immediately, we propose an R-tree-based indexing technique that does not exhibit any of these drawbacks. This technique exploits the buffering of update operations in main memory as well as the grouping of operations to reduce disk I/O. In particular, operations are performed in bulk so that multiple operations are able to share I/O. The paper presents an analytical cost model that is shown to be accurate by empirical studies. The studies also show that, in terms of update I/O performance, the proposed technique improves on state of the art in settings with frequent updates.


Jensen, C. S., S. Pakalnis, ,"TRAX - Real-World Tracking of Moving Objects" in in Proceedings of the Thirtythird International Conference on Very Large Data Bases, Vienna, Austria, pp. 1362-1365,, 2007

Publication
Online at VLDB

A range of mobile services rely on knowing the current positions of populations of so-called moving objects. In the ideal setting, the positions of all objects are known always and exactly. While this is not possible in practice, it is possible to know each object's position with a certain guaranteed accuracy. This paper presents the TRAX tracking system that supports several techniques capable of tracking the current positions of moving objects with guaranteed accuracies at low update and communication costs in real-world settings. The techniques are readily relevant for practical applications, but they also have implications for continued research. The tracking techniques offer a realistic setting for existing query processing techniques that assume that it is possible to always know the exact positions of moving objects. The techniques enable studies of trade-offs between querying and update, and the accuracy guarantees they offer may be exploited by query processing techniques to offer perfect recall.


Tradišauskas, N., J. Juhl, H. Lahrmann, C. S. Jensen, ," Map Matching Algorithm for the Spar På Farten Intelligent Speed Adaptation Project " in 2007 Annual Transport Conference at Aalborg University, Aalborg, Denmark, 10 pages ,, 2007

Publication

The availability of Global Navigation Satellite Systems (GNSS) enables sophisticated vehicle guidance and advisory systems such as Intelligent Speed Adaptation (ISA) systems. In ISA systems, it is essential to be able to position vehicles within a road network. Because digital road networks as well as GNSS positioning are often inaccurate, a technique known as map matching is needed that aims to use this inaccurate data for determining a vehicle's real road-network position. Then, knowing this position, an ISA system can compare speed with the speed limit in eect and take measures against speeding. This paper presents an on-line map matching algorithm with an extensive number of weighting parameters that allow better determination of a vehicle's road network position. The algorithm uses certainty value to express its belief in the correctness of its results. The algorithm was designed and implemented to be used in the large scale ISA project "Spar på farten". Using test data and data collected from project participants, the algorithm's performance is evaluated. It is shown that algorithm performs correctly 95% of the time and is capable of handling GNSS positioning errors in a conservative manner.


Huang, X., C. S. Jensen, H. Lu, S. Šaltenis, ,"S-GRID: A Versatile Approach to Efficient Query Processing in Spatial Networks" in in Proceedings of the Tenth International Symposium on Spatial and Temporal Databases, Boston, MA, USA, pp. 93-111 ,, 2007

Publication
Online at Springer

Mobile services is emerging as an important application area for spatio-temporal database management technologies. Service users are often constrained to a spatial network, e.g., a road network, through which points of interest, termed data points, are accessible. Queries that implement services will often concern data points of some specific type, e.g., Thai restaurants or art museums. As a result, the relatively few data points are relevant to a query in comparison to the number of network edges, meaning that queries, e.g., k nearest-neighbor queries, must access large portions of the network. Existing query processing techniques pre-compute distances between data points and network vertices for improving the performance. However, pre-computation becomes problematic when the network or data points must be updated, possibly concurrently with the querying; and if the data points are moving, the existing techniques are inapplicable. In addition, multiple pre-computed structures must be maintained - one for each type of data point. We propose a versatile pre-computation approach for spatial network data. This approach uses a grid for pre-computing a simplified network. The above-mentioned shortcomings are avoided by making the pre-computed data independent of the data points. Empirical performance studies show that the structure is competitive with respect to the existing, more specialized techniques.


Lu, H., Z. Huang, C. S. Jensen, L. Xu, ,"Distributed, Concurrent Range Monitoring of Spatial-Network Constrained Mobile Objects" in in Proceedings of the Tenth International Symposium on Spatial and Temporal Databases, Boston, MA, USA, pp. 366-384,, 2007

Publication
Online at Springer

The ability to continuously monitor the positions of mobile objects is important in many applications. While most past work has been set in Euclidean spaces, the mobile objects relevant in many applications are constrained to spatial networks. This paper addresses the problem of range monitoring of mobile objects in this setting, in which network distance is concerned. An architecture is proposed where the mobile clients and a central server share computation, the objective being to obtain scalability by utilizing the capabilities of the clients. The clients issue location reports to the server, which is in charge of data storing and query processing. The server associates each range monitoring query with the network-edge portions it covers. This enables incremental maintenance of each query, and it also enables shared maintenance of concurrent queries by identifying the overlaps among such queries. The mobile clients contribute to the query processing by encapsulating their host edge portion identifiers in their reports to the server. Extensive empirical studies indicate that the paper's proposal is efficient and scalable, in terms of both query load and moving-object load.


Huang, Z., C. S. Jensen, H. Lu, B. C. Ooi, ,"Collaborative Spatial Data Sharing Among Mobile Lightweight Devices" in in Proceedings of the Tenth International Symposium on Spatial and Temporal Databases, Boston, MA, USA, pp. 403-422,, 2007

Publication
Online at Springer

Mobile devices are increasingly being equipped with wireless peerto- peer (P2P) networking interfaces, rendering the sharing of data among mobile devices feasible and beneficial. In comparison to the traditional client/server wireless channel, the P2P channels have considerably higher bandwidth. Motivated by these observations, we propose a collaborative spatial data sharing scheme that exploits the P2P capabilities of mobile devices. Using carefully maintained routing tables, this scheme enables mobile devices not only to use their local storage for query processing, but also to collaborate with nearby mobile peers to exploit their data. This scheme is capable of reducing the cost of the communication between mobile clients and the server as well as the query response time. The paper details the design of the data sharing scheme, including its routing table maintenance, query processing and update handling. An analytical cost model sensitive to user mobility is proposed to guide the storage content replacement and routing table maintenance. The results of extensive simulation studies based on an implementation of the scheme demonstrate that the scheme is efficient in processing location dependent queries and is robust to data updates.


Tradišauskas, N., J. Juhl, H. Lahrmann, C. S. Jensen, ,"Map Matching for Intelligent Speed Adaptation" in in Proceedings of the Sixth European Congress and Exhibition on Intelligent Transport Systems and Services, Aalborg, Denmark, 12 pages,, 2007

Publication

The availability of Global Navigation Satellite Systems enables sophisticated vehicle guidance and advisory systems such as Intelligent Speed Adaptation (ISA) systems. In ISA systems, it is essential to be able to position vehicles within a road network. Because digital road networks as well as GPS positioning are often inaccurate, a technique known as map matching is needed that aims to use this inaccurate data for determining a vehicle's real road-network position. Then, knowing this position, an ISA system can compare the vehicle's speed with the speed limit in effect and react appropriately. This paper presents an on-line map matching algorithm with an extensive number of weighting parameters that allow better determination of a vehicle's road network position. The algorithm uses certainty value to express its belief in the correctness of its results. The algorithm was designed and implemented for use in the large scale ISA project "Spar på farten." Using test data and data collected from project participants, the algorithm's performance is evaluated. It is shown that algorithm performs correctly 95% of the time and is capable of handling GPS/DR errors in a conservative manner.


Tiesyte, D., C. S. Jensen, ,"Recovery of Vehicle Trajectories from Tracking Data for Analysis Purposes" in in Proceedings of the Sixth European Congress and Exhibition on Intelligent Transport Systems and Services, Aalborg, Denmark, 12 pages ,, 2007

Publication

A number of transportation-related applications involve the accumulation of position data from vehicles. Examples include real-time tracking applications that relate to taxis, police and emergency vehicles, and collective transportation. Further, position data from historical vehicle trajectories may be utilized for improving many of these services. For example, better travel time prediction and scheduling algorithms can be developed. However, the position data being obtained from vehicles are often relatively infrequent and do not capture the vehicle trajectories accurately and systematically. This paper proposes techniques that systematically recover close-to-actual vehicle trajectories from the positions obtained during the tracking of the vehicles. The focus is on scenarios where the vehicles traverse known routes.


Wind, R., C. S. Jensen, K. Torp, ," An Open Platform for the Creation and Deployment of Transport-Related Mobile Data Services " in in Proceedings of the Sixth European Congress and Exhibition on Intelligent Transport Systems and Services, Aalborg, Denmark, 8 pages ,, 2007

Publication
Online at ITS Sweden

Advanced mobile computing devices with wireless communication and geo-positioning capabilities are finding increasingly widespread use in Europe and beyond. Example devices include smart phones, PDA phones, and navigation systems. It is thus becoming increasingly relevant and attractive to utilize these devices and the related communication infrastructure for the deployment of transport-related mobile services. This paper describes the architecture of a service platform that enables users to create their own mobile services. The platform is based on standard hardware and software technologies and offers integration of transportation-related services with other services.


Wind, R., C. S. Jensen, K. H. Pedersen, K. Torp, ,"A Testbed for the Exploration of Novel Concepts in Mobile Service Delivery" in in Proceedings of the Eighth International Conference on Mobile Data Management, Mannheim, Germany, pp. 218-220,, 2007

Publication
Online at IEEE

This paper describes an open, extendable, and scalable system that supports the delivery of context-dependent content to mobile users. The system enables users to receive content from multiple content providers that matches their demographic data, active proles, and context such as location and time. The system also allows users to subscribe to specic services. In addition, it allows users to provide their own content and services, by either using the system's publicly available interface or by lling out one of the service-conguration templates.


Tiesyte, D., C. S. Jensen, ,"Challenges in the Tracking and Prediction of Scheduled-Vehicle Journeys" in in Proceedings of the First International Workshop on Pervasive Transportation Systems, White Plains, NY, USA, 6 pages ,, 2007

Publication
Online at IEEE

A number of applications in areas such as logistics, cargo delivery, and collective transport involve the management of fleets of vehicles that are expected to travel along known routes according to fixed schedules. Due to road construction, accidents, and other unanticipated conditions, the vehicles deviate from their schedules. At the same time, there is a need for the infrastructure surrounding the vehicles to continually know the actual status of the vehicles. For example, anticipated arrival times of buses may have to be displayed at bus stops. It is a fundamental challenge to maintain this type of knowledge with minimal cost. This paper characterizes the problem of real-time vehicle tracking using wireless communication, and of predicting the future status of the vehicles when their movements are restricted to given routes and when they follow schedules with a best effort. The paper discusses challenges related to tracking, to the prediction of future travel times, and to historical data analysis. It also suggests approaches to addressing the challenges.


Jensen, C. S., ,"When the Internet Hits the Road" in in Proceedings of the Twelfth GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web, Aachen, Germany, pp. 2-16 ,, 2007

Publication
Online at BTW

The Internet has recovered from the dot-com crash of the early 2000's and now features an abundance of new, innovative technologies and services. We are also witnessing the emergence of a communication and computing infrastructure that encompasses millions of people with mobile devices, such as mobile phones, with Internet connectivity. This infrastructure will soon enable the Internet to go mobile. This paper describes the background and aspirations of a new research project that is concerned with data management aspects of innovative mobile Internet services. It is argued that mobile services will be context aware, and the project devotes particular attention to geographical context awareness. The project will adopt a prototyping approach where services are built and exposed to users, and where data management challenges are identified and addressed. The paper describes the evolving service platform that supports the approach chosen, it describes some of the data management techniques being integrated into the service platform, and it describes research guidelines that the project aims to follow.


Jensen, C. S., ,"Sensor Networks - the Case of Intelligent Transport Systems" in in Proceedings of the NSF Workshop on Data Management for Mobile Sensor Networks, Pittsburgh, PA, USA, 2 pages,, 2007

Publication
Online at the workshop site



Civilis, A., C. S. Jensen, S. Pakalnis, ,"Tracking of Moving Objects With Accuracy Guarantees" in Chapter 13, pp. 285-309 in Spatial Data on the Web - Modeling and Management, edited by A. Belussi, B. Catania, E. Clementini, and E. Ferrari, Springer Verlag,, 2007

Publication
Online at Springer



Wind, R., C. S. Jensen, K. Torp, ,"Windows Mobile Programming" in Chapter 8, pp. 207-235 in Mobile Phone Programming and its Application to Wireless Networks, edited by F. H. P. Fitzek and F. Reichert, Springer Verlag,, 2007

Publication
Online at Springer



Becker, C., C. S. Jensen, D. Nicklas, J. Su, editors, , in in Proceedings of the Eighth International Conference on Mobile Data Management, Mannheim, Germany, 232+viii pages,, 2007

Online at IEEE



Haas, L. M., C. S. Jensen, M. L. Kersten, editors, ,"Special issue: best papers of VLDB 2005" in The VLDB Journal, Vol. 16, No. 1, 164 pages,, 2007

Online at Springer



Haas, L. M., C. S. Jensen, M. L. Kersten, ,"Special issue: best papers of VLDB 2005" in The VLDB Journal, Vol. 16, No. 1, pp. 1-3,, 2007

Publication
Online at Springer



Huang, X., C. S. Jensen, ,"A Streams-Based Framework for Defining Location-Based Queries" in DB Technical Report, TR-19, 19 pages,, 2007

DB Technical Report
Publication

An infrastructure is emerging that supports the delivery of on-line, location-enabled services to mobile users. Such services involve novel database queries, and the database research community is quite active in proposing techniques for the efficient processing of such queries. In parallel to this, the management of data streams has become an active area of research. While most research in mobile services concerns performance issues, this paper aims to establish a formal framework for defining the semantics of queries encountered in mobile services, most notably the so-called continuous queries that are particularly relevant in this context. Rather than inventing an entirely new framework, the paper proposes a framework that builds on concepts from data streams and temporal databases. Definitions of example queries demonstrates how the framework enables clear formulation of query semantics and the comparison of queries. The paper also proposes a categorization of location-based queries.


2006 Top

Pelanis, M., S. Šaltenis, C. S. Jensen, ,"Indexing the Past, Present and Anticipated Future Positions of Moving Objects" in ACM Transactions on Database Systems, Vol. 31, No. 1, pp. 255-298,, 2006

Publication
ACM Author-Izer

With the proliferation of wireless communications and geo-positioning, e-services are envisioned that exploit the positions of a set of continuously moving users to provide context-aware functionality to each individual user. Because advances in disk capacities continue to outperform Moore's Law, it becomes increasingly feasible to store on-line all the position information obtained from the moving e-service users. With the much slower advances in I/O speeds and many concurrent users, indexing techniques are of essence in this scenario. Existing indexing techniques come in two forms. Some techniques capture the position of an object up until the time of the most recent position sample, while other techniques represent an object's position as a constant or linear function of time and capture the position from the current time and into the (near) future. This paper offers an indexing technique capable of capturing the positions of moving objects at all points in time. The index substantially extends partial persistence techniques, which support transaction time, to support valid time for monitoring applications. The performance of a timeslice query is independent of the number of past position samples stored for an object. No existing indices exist with these characteristics.


Benetis, R., C. S. Jensen, G. Karčiauskas, S. Šaltenis, ,"Nearest and Reverse Nearest Neighbor Queries for Moving Objects" in The VLDB Journal, Vol. 15, No. 3, pp. 229-249,, 2006

Publication
Online at Springer

With the continued proliferation of wireless communications and advances in positioning technologies, algorithms for efficiently answering queries about large populations of moving objects are gaining interest. This paper proposes algorithms for k nearest and reverse k nearest neighbor queries on the current and anticipated future positions of points moving continuously in the plane. The former type of query returns k objects nearest to a query object for each time point during a time interval, while the latter returns the objects that have a specified query object as one of their k closest neighbors, again for each time point during a time interval. In addition, algorithms for so-called persistent and continuous variants of these queries are provided. The algorithms are based on the indexing of object positions represented as linear functions of time. The results of empirical performance experiments are reported.


Böhlen, M. H., J. Gamper, C. S. Jensen, ,"An Algebraic Framework for Temporal Attribute Characteristics" in Annals of Mathematics and Artificial Intelligence, Vol. 46, No. 3, pp. 349-374,, 2006

Publication
Online at Springer

Most real-world database applications manage temporal data, i.e., data with associated time references that capture a temporal aspect of the data, typically either when the data is valid or when the data is known. Such applications abound in, e.g., the financial, medical, and scientific domains. In contrast to this, current database management systems offer precious little built-in query language support for temporal data management. This situation persists although an active temporal database research community has demonstrated that application development can be simplified substantially by built-in temporal support. This paper's contribution is motivated by the observation that existing temporal data models and query languages generally make the same rigid assumption about the semantics of the association of data and time, namely that if a subset of the time domain is associated with some data then this implies the association of any further subset with the data. This paper offers a comprehensive, general framework where alternative semantics may co-exist and that supports so-called malleable and atomic temporal associations, in addition to the conventional ones mentioned above, which are termed constant. To demonstrate the utility of the framework, the paper defines a characteristics-enabled temporal algebra, termed CETA, which defines the traditional relational operators in the new framework. This contribution demonstrates that it is possible to provide built-in temporal support while making less rigid assumptions about the data, without jeopardizing the degree of the support. This may move temporal support closer to practical applications.


Jensen, C. S., K. Torp, ,"GPS baseret tracking af mobile objekter" in Geoforum Perspektiv, Vol. 9, pp. 21-26,, 2006

Publication
Online at Geoforum Danmark

Denne artikel beskriver hvorledes man med eksisterende teknologi, herunder Global Position System og General Packet Radio Service, effektivt kan tracke mobile objekter som f.eks. køretøjer med en garanteret nøjagtighed. Først beskrives den teknologiske platform. Herefter beskrives tre forskellige teknikker til at tracke mobile objekter. Teknikkerne bliver gradvis mere avancerede. De tre teknikker evalueres, og omkostningen for at tracke et mobilt objekt med en nøjagtighed på cirka 150 meter estimeres til mindre end 1 kr. pr. døgn baseret på priser fra et forsøg udført i 2004.


Ruxanda, M. M., C. S. Jensen, ,"Efficient Similarity Retrieval In Music Databases" in in Proceedings of The Thirteenth International Conference on Management of Data, Delhi, India, pp. 56-67 ,, 2006

Publication

Audio music is increasingly becoming available in digital form, and the digital music collections of individuals continue to grow. Addressing the need for effective means of retrieving music from such collections, this paper proposes new techniques for content-based similarity search. Each music object is modeled as a time sequence of high-dimensional feature vectors, and dynamic time warping (DTW) is used as the similarity measure. To accomplish this, the paper extends techniques for time-series-length reduction and lower bounding of DTW distance to the multi-dimensional case. Further, the Vector Approximation file is adapted to the indexing of time sequences and to use a lower bound on the DTW distance. Using these techniques, the paper exploits the lack of a ground truth for queries to efficiently compute query results that differ only slightly from results that may be more accurate but also are much more expensive to compute. In particular, the paper demonstrates that aggressive use of time-series-length reduction together with query expansion results in significant performance improvements while providing good, approximative query results.


Jensen, C. S., D. Lin, B. C. Ooi, R. Zhang, ,"Effective Density Queries of Continuously Moving Objects" in in Proceedings of the Twentysecond International Conference on Data Engineering, Atlanta, GA, USA, 11 pages ,, 2006

Publication
Online at IEEE

This paper assumes a setting where a population of objects move continuously in the Euclidean plane. The position of each object, modeled as a linear function from time to points, is assumed known. In this setting, the paper studies the querying for dense regions. In particular, the paper defines a particular type of density query with desirable properties and then proceeds to propose an algorithm for the efficient computation of density queries. While the algorithm may exploit any existing index for the current and near-future positions of moving objects, the Bx-tree is used. The paper reports on an extensive empirical study, which elicits the performance properties of the algorithm.


Huang, Z., C. S. Jensen, H. Lu, B. C. Ooi, ,"Skyline Queries Against Mobile Lightweight Devices in MANETs" in in Proceedings of the Twentysecond International Conference on Data Engineering, Atlanta, GA, USA, 11 pages ,, 2006

Publication
Online at IEEE

Skyline queries are well suited when retrieving data according to multiple criteria. While most previous work has assumed a centralized setting this paper considers skyline querying in a mobile and distributed setting, where each mobile device is capable of holding only a portion of the whole dataset; where devices communicate through mobile ad hoc networks; and where a query issued by a mobile user is interested only in the user's local area, although a query generally involves data stored on many mobile devices due to the storage limitations. We present techniques that aim to reduce the costs of communication among mobile devices and reduce the execution time on each single mobile device. For the former, skyline query requests are forwarded among mobile devices in a deliberate way, such that the amount of data to be transferred is reduced. For the latter, specific optimization measures are proposed for resource-constrained mobile devices. We conduct extensive experiments to show that our proposal performs efficiently in real mobile devices and simulated wireless ad hoc networks.


Schmidt, A., C. S. Jensen, S. Šaltenis, ,"Expiration Times for Data Management" in in Proceedings of the Twentysecond International Conference on Data Engineering, Atlanta, GA, USA, 11 pages ,, 2006

Publication
Online at IEEE

This paper describes an approach to incorporating the notion of expiration time into data management based on the relational model. Expiration times indicate when tuples cease to be current in a database. The paper presents a formal data model and a query algebra that handle expiration times transparently and declaratively. In particular, expiration times are exposed to users only on insertion and update, and when triggers fire due to the expiration of a tuple; for queries, they are handled behind the scenes and do not concern the user. Notably, tuples are removed automatically from (materialised) query results as they expire in the (base) relations. For application developers, the benefits of using expiration times are leaner application code, lower transaction volume, smaller databases, and higher consistency for replicated data with lower overhead. Expiration times turn out to be especially useful in open architectures and loosely-coupled systems, which abound on the World Wide Web as well as in mobile networks, be it as Web Services or as ad hoc and intermittent networks of mobile devices.


Böhlen, M. H., J. Gamper, C. S. Jensen, ,"Multi-Dimensional Aggregation for Temporal Data" in in Proceedings of the Tenth Intenational Conference on Extending Database Technology, Lecture Notes on Computer Science 3896, Munich, Germany, pp. 257-275 ,, 2006

Publication
Online at Springer

Business Intelligence solutions, encompassing technologies such as multi-dimensional data modeling and aggregate query processing, are being applied increasingly to non-traditional data. This paper extends multi-dimensional aggregation to apply to data with associated interval values that capture when the data hold. In temporal databases, intervals typically capture the states of reality that the data apply to, or capture when the data are, or were, part of the current database state. This paper proposes a new aggregation operator that addresses several challenges posed by interval data. First, the intervals to be associated with the result tuples may not be known in advance, but depend on the actual data. Such unknown intervals are accommodated by allowing result groups that are specified only partially. Second, the operator contends with the case where an interval associated with data expresses that the data holds for each point in the interval, as well as the case where the data holds only for the entire interval, but must be adjusted to apply to sub-intervals. The paper reports on an implementation of the new operator and on an empirical study that indicates that the operator scales to large data sets and is competitive with respect to other temporal aggregation algorithms.


Jensen, C. S., D. Tiesyte, N. Tradišauskas, ,"The COST Benchmark - Comparison and Evaluation of Spatio-Temporal Indexes" in in Proceedings of the Eleventh International Conference on Database Systems for Advanced Applications, Lecture Notes on Computer Science 3882, Singapore, pp. 125-140 ,, 2006

Publication
Online at Springer

An infrastructure is emerging that enables the positioning of populations of on-line, mobile service users. In step with this, research in the management of moving objects has attracted substantial attention. In particular, quite a few proposals now exist for the indexing of moving objects, and more are underway. As a result, there is an increasing need for an independent benchmark for spatio-temporal indexes. This paper characterizes the spatio-temporal indexing problem and proposes a benchmark for the performance evaluation and comparison of spatio-temporal indexes. Notably, the benchmark takes into account that the available positions of the moving objects are inaccurate, an aspect largely ignored in previous indexing research. The concepts of data and query enlargement are introduced for addressing inaccuracy. As proof of concepts of the benchmark, the paper covers the application of the benchmark to three spatio-temporal indexes - the TPR-, TPR*-, and Bx-trees. Representative experimental results and consequent guidelines for the usage of these indexes are reported.


Schmidt, A., C. S. Jensen, ,"Efficient Maintenance of Ephemeral Data" in in Proceedings of the Eleventh International Conference on Database Systems for Advanced Applications, Lecture Notes on Computer Science 3882, Singapore, pp. 141-155 ,, 2006

Publication
Online at Springer

Motivated by the increasing prominence of loosely-coupled systems, such as mobile and sensor networks, the characteristics of which include intermittent connectivity and volatile data, we study the tagging of data with so-called expiration times. More specifically, when data are inserted into a database, they may be stamped with time values indicating when they expire, i.e., when they are regarded as stale or invalid and thus are no longer considered part of the database. In a number of applications, expiration times are known and can be assigned at insertion time. We present data structures and algorithms for online management of data stamped with expiration times. The algorithms are based on fully functional treaps, which are a combination of binary search trees with respect to a primary attribute and heaps with respect to a secondary attribute. The primary attribute implements primary keys, and the secondary attribute stores expiration times in a minimum heap, thus keeping a priority queue of tuples to expire. A detailed and comprehensive experimental study demonstrates the well-behavedness and scalability of the approach as well as its efficiency with respect to a number of competitors.


Jensen, C. S., ,"Geo-Enabled, Mobile Services - A Tale of Routes, Detours, and Dead Ends" in in Proceedings of the Eleventh International Conference on Database Systems for Advanced Applications, Lecture Notes on Computer Science 3882, Singapore, pp. 6-19 ,, 2006

Publication
Online at Springer

We are witnessing the emergence of a global infrastructure that enables the widespread deployment of geo-enabled, mobile services in practice. At the same time, the research community has also paid increasing attention to data management aspects of mobile services. This paper offers me an opportunity to characterize this research area and to describe some of their challenges and pitfalls, and it affords me an opportunity to look back and reflect upon some of the general challenges we face as researchers. I hope that my views and experiences as expressed in this paper may enable others to challenge their own views about this exciting research area and about how to best carry out their research in their own unique contexts.


Brilingaite, A., C. S. Jensen, ,"Online Route Prediction for Automotive Applications" in in Proceedings of the Thirteenth World Congress and Exhibition on Intelligent Transport Systems and Services, London, UK, 8 pages ,, 2006

Publication

An information and communication technology infrastructure is rapidly emerging that enables the delivery of location-based services to vast numbers of mobile users. Services will benefit from being aware of not only the user's location, but also the user's current destination and route towards the destination. This paper describes a component that enables the use of geo-context. Using GPS data, the component gathers a driver's routes and associates them with usage meta-data. Other services may then provide the component with a driver ID, the time of the day, and a location, in return obtaining the likely routes for the driver.


Huang, X., C. S. Jensen, S. Šaltenis, ,"Multiple k Nearest Neighbor Query Processing in Spatial Network Databases" in in Proceedings of the Tenth East-European Conference on Advances In Databases and Information Systems, Lecture Notes on Computer Science 4152, Thessaloniki, Greece, pp. 266-281 ,, 2006

Publication
Online at Springer

This paper concerns the efficient processing of multiple k nearest neighbor queries in a road-network setting. The assumed setting covers a range of scenarios such as the one where a large population of mobile service users that are constrained to a road network issue nearest-neighbor queries for points of interest that are accessible via the road network. Given multiple k nearest neighbor queries, the paper proposes progressive techniques that selectively cache query results in main memory and subsequently reuse these for query processing. The paper initially proposes techniques for the case where an upper bound on k is known a priori and then extends the techniques to the case where this is not so. Based on empirical studies with real-world data, the paper offers insight into the circumstances under which the different proposed techniques can be used with advantage for multiple k nearest neighbor query processing.


Jensen, C.S., D. Tiesyte, N. Tradišauskas, ,"Robust B+-Tree-Based Indexing of Moving Objects" in in Proceedings of the Seventh International Conference on Mobile Data Management, Nara, Japan, 9 pages ,, 2006

Publication
Online at IEEE

With the emergence of an infrastructure that enables the geo-positioning of on-line, mobile users, the management of so-called moving objects has emerged as an active area of research. Among the indexing techniques for efficiently answering predictive queries on moving-object positions, the recent Bx-tree is based on the B+-tree and is relatively easy to integrate into an existing DBMS. However, the Bx-tree is sensitive to data skew. This paper proposes a new query processing algorithm for the Bx-tree that fully exploits the available data statistics to reduce the query enlargement that is needed to guarantee perfect recall, thus significantly improving robustness. The new technique is empirically evaluated and compared with four other approaches and with the TPR-tree, a competitor that is based on the R*-tree. The results indicate that the new index is indeed more robust than its predecessor - it significantly reduces the number of I/O operations per query for the workloads considered. In many settings, the TPR-tree is outperformed as well.


Chrysanthis, P. K., C. S. Jensen, V. Kumar, A. Labrinidis, editors, , in in Proceedings of the Fifth ACM International Workshop on Data Engineering for Wireless and Mobile Access, Chicago, Illinois, USA, 92+viii pages,, 2006

Online at ACM Digital Library



Chrysanthis, P. K., C. S. Jensen, V. Kumar, A. Labrinidis, ,"Foreword" in in Proceedings of the Fifth ACM International Workshop on Data Engineering for Wireless and Mobile Access, Chicago, Illinois, USA, p. iii ,, 2006

Publication



Alonso, G., C. S. Jensen, B. Mitschang, editors, ," Data Always and Everywhere - Management of Mobile, Ubiquitous, Pervasive, and Sensor Data " in Dagstuhl Seminar Proceedings 05421, Dagstuhl, Germany, 37 pages,, 2006

Online at Dagstuhl



Alonso, G., C. S. Jensen, B. Mitschang, ,"05421 Abstracts Collection - Data Always and Everywhere" in in Dagstuhl Seminar Proceedings 05421, Dagstuhl, Germany, 19 pages,, 2006

Publication
Online at Dagstuhl

From 16.10.05 to 21.10.05, the Dagstuhl Seminar 05421, Data Always and Everywhere - Management of Mobile, Ubiquitous, Pervasive, and Sensor Data, was held in the International Conference and Research Center, Schloss Dagstuhl. During the seminar, all participants were given the opportunity to present their current research, and ongoing activities and open problems were discussed. This document is a collection of the abstracts of the presentations given during the seminar. Some abstracts offer links to extended abstracts, full papers, and other supporting documents. A separate companion document summarizes the seminar. The authors wish to acknowledge Victor Teixeira de Almeida, who served as collector for the seminar and thus played a key role in collecting materials from the seminar participants.


Alonso, G., C. S. Jensen, B. Mitschang, ," 05421 Executive Summary - Data Always and Everywhere - Management of Mobile, Ubiquitous, Pervasive, and Sensor Data " in Dagstuhl Seminar Proceedings 05421, Dagstuhl, Germany, 6 pages,, 2006

Publication
Online at Dagstuhl

This report summarizes the important aspects of the workshop on "Management of Mobile, Ubiquitous, Pervasive, and Sensor Data," which took place from October 16th to October 21st, 2005. Thirty-seven participants from thirteen countries met during that week and discussed a broad range of topics related to the management of data in relation to mobile, ubiquitous, and pervasive applications of information technology. The wealth of the contributions is available at the seminar page at the Dagstuhl server. Here, we provide a short overview.


Böhlen, M. H., J. Gamper, C. S. Jensen, ,"How Would You Like to Aggregate Your Temporal Data?" in in Proceedings of the Thirteenth International Symposium on Temporal Representation and Reasoning, Budapest, Hungary, pp. 121-136 ,, 2006

Publication
Online at IEEE

Real-world data management applications generally manage temporal data, i.e., they manage multiple states of time-varying data. Many contributions have been made by the research community for how to better model, store, and query temporal data. In particular, several dozen temporal data models and query languages have been proposed. Motivated in part by the emergence of non-traditional data management applications and the increasing proliferation of temporal data, this paper puts focus on the aggregation of temporal data. In particular, it provides a general framework of temporal aggregation concepts, and it discusses the abilities of five approaches to the design of temporal query languages with respect to temporal aggregation. Rather than providing focused, polished results, the paper's aim is to explore the inherent support for temporal aggregation in an informal manner that may serve as a foundation for further exploration.


Pedersen, T. B., C. S. Jensen, C. Dyreson, ,"Method and System for Making OLAP Hierarchies Summarisable" in United States Patent No. 7,133,865 B1, 27 pages,, 2006

Online at WIPO

A method, a computer system and a computer program product for a computer system for transforming general On-line Analytical Processing (OLAP) hierarchies into summarizable hierarchies whereby pre-aggregation is disclosed, by which fast query response times for aggregation queries without excessive storage use is made possible even when the hierarchies originally are irregular. Pre-aggregation is essential for ensuring adequate response time during data analysis. Most OLAP systems adopt the practical pre-aggregation approach, as opposed to full pre-aggregation, of materializing only select combinations of aggregates and then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints. The present invention significantly extends the scope of practical pre-aggregation by transforming irregulare dimension hierarchies and fact-dimension relationships into well-behaved structures that enable practical pre-aggregation.


Huang, X., C. S. Jensen, S. Šaltenis, ,"The Islands Approach to Nearest Neighbor Querying in Spatial Networks" in DB Technical Report, TR-16, 23 pages,, 2006

DB Technical Report
Publication

Much research has recently been devoted to the data management foundations of location-based mobile services. In one important scenario, the service users are constrained to a transportation network. As a result, query processing in spatial road networks is of interest. In this paper, we propose a versatile approach to k nearest neighbor computation in spatial networks, termed the Islands approach. By offering flexible yet simple means of balancing re-computation and pre-computation, this approach is able to manage the trade-off between query and update performance, and it offers better overall query and update performance than do its predecessors. The result is a single, efficient, and versatile approach to k nearest neighbor computation that obviates the need for using several k nearest neighbor approaches for supporting a single service scenario. The experimental comparison with the existing techniques uses real-world road network data and considers both I/O and CPU performance, for both queries and updates.


2005 Top

Böhm, K., C. S. Jensen, L. M. Haas, M. L. Kersten, P.-Å. Larson, B. C. Ooi, editors, , in in Proceedings of the Thirtyfirst International Conference on Very Large Data Bases, Trondheim, Norway, 1372+xxiv pages,, 2005

Online at DBLP



Bernstein, P. A., S. Chaudhuri, D. DeWitt, A. Heuer, Z. Ives, C. S. Jensen, H. Meyer, M. T. Özsu, R. T. Snodgrass, K. Y. Whang, J. Widom, ,"Database Publication Practices" in in Proceedings of the Thirtyfirst International Conference on Very Large Data Bases, Trondheim, Norway, pp. 1241-1246,, 2005

Publication
Online at VLDB

There has been a growing interest in improving the publication processes for database research papers. This panel reports on recent changes in those processes and presents an initial cut at historical data for the VLDB Journal and ACM Transactions on Database Systems.


Bernstein, P. A., E. Bertino, A. Heuer, C. S. Jensen, H. Meyer, M. T. Özsu, R. T. Snodgrass, K. Y. Whang, ,"An Apples-to-Apples Comparison of Two Database Journals" in ACM SIGMOD Record, Vol. 34, No. 4, pp. 61-64,, 2005

Publication
ACM Author-Izer

This paper defines a collection of metrics on manuscript reviewing and presents historical data for ACM Transactions on Database Systems and The VLDB Journal.


Huang, X., C. S. Jensen, S. Šaltenis, ,"The Islands Approach to Nearest Neighbor Querying in Spatial Networks" in in Proceedings of the Nineth International Symposium on Spatial and Temporal Databases, Angra, Brazil, published as Lecture Notes in Computer Science, Volume 3633, pp. 73-90 ,, 2005

Publication
Online at Springer

Much research has recently been devoted to the data management foundations of location-based mobile services. In one important scenario, the service users are constrained to a transportation network. As a result, query processing in spatial road networks is of interest. We propose a versatile approach to k nearest neighbor computation in spatial networks, termed the Islands approach. By offering flexible yet simple means of balancing re-computation and pre-computation, this approach is able to manage the trade-off between query and update performance. The result is a single, efficient, and versatile approach to k nearest neighbor computation that obviates the need for using several k nearest neighbor approaches for supporting a single service scenario. The experimental comparison with the existing techniques uses real-world road network data and considers both I/O and CPU performance, for both queries and updates.


Schmidt, A., C. S. Jensen, ,"Efficient Management of Short-Lived Data" in TimeCenter Technical Report TR-82 and CoRR cs.DB/0505038 (2005), 24 pages,, 2005

TimeCenter
cs.DB/0505038 (2005)
Publication
Publication at CoRR

Motivated by the increasing prominence of loosely-coupled systems, such as mobile and sensor networks, which are characterised by intermittent connectivity and volatile data, we study the tagging of data with so-called expiration times. More specifically, when data are inserted into a database, they may be tagged with time values indicating when they expire, i.e., when they are regarded as stale or invalid and thus are no longer considered part of the database. In a number of applications, expiration times are known and can be assigned at insertion time. We present data structures and algorithms for online management of data tagged with expiration times. The algorithms are based on fully functional, persistent treaps, which are a combination of binary search trees with respect to a primary attribute and heaps with respect to a secondary attribute. The primary attribute implements primary keys, and the secondary attribute stores expiration times in a minimum heap, thus keeping a priority queue of tuples to expire. A detailed and comprehensive experimental study demonstrates the well-behavedness and scalability of the approach as well as its efficiency with respect to a number of competitors.


Civilis, A., C. S. Jensen, S. Pakalnis, ,"Techniques for Efficient Tracking of Road-Network-Based Moving Objects" in IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 5, pp. 698-712,, 2005

Publication
Online at IEEE

With the continued advances in wireless communications, geo-positioning, and consumer electronics, an infrastructure is emerging that enables location-based services that rely on the tracking of the continuously changing positions of entire populations of service users, termed moving objects. This scenario is characterized by large volumes of updates, for which reason location update technologies become important. A setting is assumed in which a central database stores a representation of each moving object's current position. This position is to be maintained so that it deviates from the user's real position by at most a given threshold. To do so, each moving object stores locally the central representation of its position. Then an object updates the database whenever the deviation between its actual position (as obtained from a GPS device) and the database position exceeds the threshold. The main issue considered is how to represent the location of a moving object in a database so that tracking can be done with as few updates as possible. The paper proposes to use the road network within which the objects are assumed to move for predicting their future positions. The paper presents algorithms that modify an initial road-network representation, so that it works better as a basis for predicting an object's position; it proposes to use known movement patterns of the object, in the form of routes; and it proposes to use acceleration profiles together with the routes. Using real GPS-data and a corresponding real road network, the paper offers empirical evaluations and comparisons that include three existing approaches and all the proposed approaches.


Pfoser, D., C. S. Jensen, ,"Trajectory Indexing Using Movement Constraints" in Geoinformatica, Vol. 9, No. 2, pp. 93-115,, 2005

Publication
Online at Springer

With the proliferation of mobile computing, the ability to index efficiently the movements of mobile objects becomes important. Objects are typically seen as moving in two-dimensional (x,y) space, which means that their movements across time may be embedded in the three-dimensional (x,y,t) space. Further, the movements are typically represented as trajectories, sequences of connected line segments. In certain cases, movement is restricted; specifically, in this paper, we aim at exploiting that movements occur in transportation networks to reduce the dimensionality of the data. Briefly, the idea is to reduce movements to occur in one spatial dimension. As a consequence, the movement occurs in two-dimensional (x,t) space. The advantages of considering such lower-dimensional trajectories are that the overall size of the data is reduced and that lower-dimensional data is to be indexed. Since off-the-shelf database management systems typically do not offer higher-dimensional indexing, this reduction in dimensionality allows us to use existing DBMSes to store and index trajectories. Moreover, we argue that, given the right circumstances, indexing these dimensionality-reduced trajectories can be more efficient than using a three-dimensional index. A decisive factor here is the fractal dimension of the network.the lower, the more efficient is the proposed approach. This hypothesis is verified by an experimental study that incorporates trajectories stemming from real and synthetic road networks.


Pfoser, D., N. Tryfona, C. S. Jensen, ,"Indeterminacy and Spatiotemporal Data: Basic Definitions and Case Study" in Geoinformatica, Vol. 9, No. 3, pp. 211-236,, 2005

Publication
Online at Springer

For some spatiotemporal applications, it can be assumed that the modeled world is precise and bounded, and that our record of it is precise. While these simplifying assumptions are sufficient in applications like a land information system, they are unnecessarily crude for many other applications that manage data with spatial and/or temporal extents, such as navigational applications. This work explores fuzziness and uncertainty, subsumed under the term indeterminacy, in the spatiotemporal context. To better illustrate the basic spatiotemporal concepts of change or evolution, it is shown how the fundamental modeling concepts of spatial objects, attributes, and relationships and time points, and periods are influenced by indeterminacy, and how they can be combined. In particular, the focus is on the change of spatial objects and their geometries across time. Four change scenarios are outlined, which concern discrete versus continuous change and asynchronous versus synchronous measurement, and it is shown how to model indeterminacy for each. A case study illustrates the applicability of the paper's general proposal by describing the uncertainty related to the management of the movements of point objects, such as the management of vehicle positions in a fleet management system.


Friis-Christensen, A., C. S. Jensen, J. P. Nytun, D. Skogan, ," A Conceptual Schema Language for the Management of Multiple Representations of Geographic Entities " in Transactions in GIS, Vol. 9, No. 3, pp. 345-380,, 2005

Publication
Online at Blackwell Synergy

Multiple representation of geographic information occurs when a real-world entity is represented more than once in the same or different databases. This occurs frequently in practice, and it invariably results in the occurrence of inconsistencies among the different representations of the same entity. In this paper, we propose an approach to the modeling of multiply represented entities, which is based on the relationships among the entities and their representations. Central to our approach is the Multiple Representation Schema Language that, by intuitive and declarative means, is used to specify rules that match objects representing the same entity, maintain consistency among these representations, and restore consistency if necessary. The rules configure a Multiple Representation Management System, the aim of which is to manage multiple representations over a number of autonomous federated databases. We present a graphical and a lexical binding to the schema language. The graphical binding is built on an extension to the Unified Modeling Language and the Object Constraint Language. We demonstrate that it is possible to implement the constructs of the schema language in the object-relational model of a commercial RDBMS.


Gao, D., C. S. Jensen, R. T. Snodgrass, M. D. Soo, ,"Join Operations in Temporal Databases" in The VLDB Journal, Vol. 14, No. 1, pp. 2-29,, 2005

Publication
Online at Springer

Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the evaluation of joins with equality predicates rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally varying data dramatically increases the size of a database. These factors indicate that specialized techniques are needed to efficiently evaluate temporal joins. We address this need for efficient join evaluation in temporal databases. Our purpose is twofold. We first survey all previously proposed temporal join operators. While many temporal join operators have been defined in previous work, this work has been done largely in isolation from competing proposals, with little, if any, comparison of the various operators. We then address evaluation algorithms, comparing the applicability of various algorithms to the temporal join operators and describing a performance study involving algorithms for one important operator, the temporal equijoin. Our focus, with respect to implementation, is on non-index-based join algorithms. Such algorithms do not rely on auxiliary access paths but may exploit sort orderings to achieve efficiency.


Lomet, D., R. T. Snodgrass, C. S. Jensen, ,"Using the Lock Manager to Choose Timestamps" in in Proceedings of the Nineth International Database Engineering and Applications Symposium, Montreal, Canada, pp. 357-368 ,, 2005

Publication
Online at IEEE

Our goal is to support transaction-time functionality that enables the coexistence of ordinary, non-temporal tables with transaction-time tables. In such a system, each transaction updating a transaction-time or snapshot table must include a timestamp for its updated data that correctly reflects the serialization order of the transactions, including transactions on ordinary tables. A serious issue is coping with SQL CURRENT_TIME functions, which should return a time consistent with a transaction's timestamp and serialization order. Prior timestamping techniques cannot support such functions with this desired semantics. We show how to compatibly extend conventional database functionality for transactiontime support by exploiting the database system lock manager and by utilizing a spectrum of optimizations.


Frank, L., C. Frank, C. S. Jensen, T. B. Pedersen, ,"Dimensional Modeling By Using a New Response to Slowly Changing Dimensions" in in Proceedings of the Second International Conference on Information Technology, Amman, Jordan, pp. 7-10 ,, 2005

Publication

Dimensions are defined as dynamic or slowly changing if the attributes or relationships of a dimension can be updated. Aggregations to dynamic dimensions might be misleading if the measures are aggregated without regarding the changes of the dimensions. Kimball et al. has described three classic solutions/responses to handling the aggregation problems caused by slowly changing dimensions. In this paper, we will describe a fourth solution. A special aspect of our new response is that it should be used before the other responses, as it will change the design of the data warehouse. Afterwards, it may be necessary to use the classic responses to improve the design further.


Jensen, C. S., K.-J. Lee, S. Pakalnis, S. Šaltenis, ,"Advanced Tracking of Vehicles" in in Proceedings of the Fifth European Congress and Exhibition on Intelligent Transport Systems, Hannover, Germany, 12 pages ,, 2005

Publication

With the continued advances in wireless communications, geo-location technologies, and consumer electronics, it is becoming possible to accurately track the time-varying location of each vehicle in a population of vehicles. This paper reports on ongoing research that has as it objective to develop efficient tracking techniques. More specifically, while almost all commercially available tracking solutions simply offer time-based sampling of positions, this paper's techniques aim to offer a guaranteed tracking accuracy for each vehicle at the lowest possible costs, in terms of network traffic and server-side updates. This is achieved by designing, prototyping, and testing novel tracking techniques that exploit knowledge of the road network and past movement. These resulting tracking techniques are to support mobile services that rely on the existence of a central server that continuously tracks the current positions of vehicles.


Lin, D., C. S. Jensen, B. C. Ooi, S. Šaltenis, ,"Efficient Indexing of the Historical, Present, and Future Positions of Moving Objects" in in Proceedings of the Sixth International Conference on Mobile Data Management, Ayia Napa, Cyprus, pp. 59-66,, 2005

Publication
ACM Author-Izer

Although significant effort has been put into the development of efficient spatio-temporal indexing techniques for moving objects, little attention has been given to the development of techniques that efficiently support queries about the past, present, and future positions of objects. The provisioning of such techniques is challenging, both because of the nature of the data, which reflects continuous movement, and because of the types of queries to be supported. This paper proposes the BBx-index structure, which indexes the positions of moving objects, given as linear functions of time, at any time. The index stores linearized moving-object locations in a forest of B+-trees. The index supports queries that select objects based on temporal and spatial constraints, such as queries that retrieve all objects whose positions fall within a spatial range during a set of time intervals. Empirical experiments are reported that offer insight into the query and update performance of the proposed technique.


Damsgaard, J., J. Hørlück, C. S. Jensen, ,"IT-Driven Customer Service or Customer-Driven IT Service: Does IT Matter?" in teaching case, 24 pages, European Case Clearing House reference number: 905-002-1 ,, 2005

Publication

At the end of 2004, the Nykredit Group was doing well. In accordance with the overall plan of diversifying Nykredit from a mortgage bank to a retail financial institution, Nykredit had just successfully acquired another mortgage bank. The company portfolio of Nykredit was now close to being complete. Mortgage banking, retail banking, an insurance company, a real estate brokerage chain, and a real estate investor company comprised the Nykredit Group, making it a modern financial supermarket.. The deregulation of the Danish banking industry in the 1990s caused a lot of turmoil within the entire industry and had forced Nykredit into a radical reorientation of the company. From this Nykredit emerged not only as a survivor, but also as a clear winner. The remarkable competence of the IT staff of the Nykredit Group in maintaining, integrating and developing its multi-faceted portfolio of IT systems across the various constituent companies into a modern multi-channelled and multi-tiered IT infrastructure had accentuated the success of Nykredit's strategy. In 2004, the financial industry competition was again concentrated on gaining competitive advantage through differentiation and cost reduction. Everybody agreed that IT was the answer to achieving both cost reduction and differentiation, but how could Nykredit be sure they would always have the right IT? Some argued that the customer side should drive the IT development in order to ensure that Nykredit would have the most relevant IT systems. Others had the opinion that radical business innovation leading to a competitive advantage could only be achieved through in-depth knowledge of what new and emerging IT could do and of how this could be linked with existing IT systems. Acknowledging the importance of both sides, Nykredit had combined IT service and customer service into one powerful business development department. The creation of a central department had been extremely successful during the radical changes that Nykredit had been forced into during the 1990s. But could the success of the past be extended into the future? The public debate on the value of IT as a competitive weapon had further stimulated this discussion.


Damsgaard, J., J. Hørlück, C. S. Jensen, ,"IT-Driven Customer Service or Customer-Driven IT Service: Does IT Matter?" in teaching case, 8 pages, European Case Clearing House reference number: 905-002-8 ,, 2005

Publication

This case deals with a large European financial institution that has built an extensive IT infrastructure to serve its multi channel approach to its customers at the same time as changing into a modern financial supermarket with a large portfolio of almost all financial services. Experience has shown that in this industry, IT does matter. As an example: a few days after take over of a competitor, this competitor's previous owner - 105 small banks - sold Nykredit's products through their 1150 branches. The case can thus be used in a discussion of Nicholas Carr's article "IT Doesn't Matter" (Carr, Nicholas G. (2003). "IT Doesn't Matter." Harward Business Review(May): 41-49). Traditionally mortgage banking would mean either building an extensive branch network backed by central staff functions or joining forces with an existing retail financial institution. However, the Internet made online presence and a call center equipped with the latest CRM tools an inevitable alternative. But this was not viable if the existing IT infrastructure could not be transformed into a modern streamlined multitiered infrastructure accessible from the Internet. The IT infrastructure was - for historical reasons - based on a variety of systems encompassing both systems developed in house, acquired best of suite and best of breed systems. In order to implement a multi-channel customer approach, the financial institution was both engaged in rearranging the old IT systems while building new Internet ready systems. This case is open-ended and does not have a set solution. The business perspective is to discuss alternatives to a financial institution based on branches and especially what this requires in terms of IT support. It is designed to encourage discussion on issues such as physical distribution network versus a strong net presence; the changing role of the IT department, from being supplier of back office systems to delivering the storefront; the challenge of transforming several hundred existing legacy systems to a coherent and multi-layered, Internet-ready IT infrastructure; and modern software development and project management.


Huang, X, C. S. Jensen, ,"In-Route Skyline Querying for Location-Based Services" in Workshop on Web and Wireless Geographic Information Systems, post-workshop proceedings, published as Lecture Notes in Computer Science, Volume 3428, pp. 120-135 ,, 2005

Publication
Online at Springer

With the emergence of an infrastructure for location-aware mobile services, the processing of advanced, location-based queries that are expected to underlie such services is gaining in relevance. While much work has assumed that users move in Euclidean space, this paper assumes that movement is constrained to a road network and that points of interest can be reached via the network. More specifically, the paper assumes that the queries are issued by users moving along routes towards destinations. The paper defines in-route nearest-neighbor skyline queries in this setting and considers their efficient computation. The queries take into account several spatial preferences, and they intuitively return a set of most interesting results for each result returned by the corresponding non-skyline queries. The paper also covers a performance study of the proposed techniques based on real point-of-interest and road network data.


Civilis, A., C. S. Jensen, S. Pakalnis, ,"Techniques for Efficient Tracking of Road-Network-Based Moving Objects" in DB Technical Report, TR-10, 37 pages,, 2005

DB Technical Report
Publication

With the continued advances in wireless communications, geo-positioning, and consumer electronics, an infrastructure is emerging that enables location-based services that rely on the tracking of the continuously changing positions of entire populations of service users, termed moving objects. This scenario is characterized by large volumes of updates, for which reason location update technologies become important. A setting is assumed in which a central database stores a representation of each moving object's current position. This position is to be maintained so that it deviates from the user's real position by at most a given threshold. To do so, each moving object stores locally the central representation of its position. Then an object updates the database whenever the deviation between its actual position (as obtained from a GPS device) and the database position exceeds the threshold. The main issue considered is how to represent the location of a moving object in a database so that tracking can be done with as few updates as possible. The paper proposes to use the road network within which the objects are assumed to move for predicting their future positions. The paper presents algorithms that modify an initial road-network representation, so that it works better as a basis for predicting an object's position; it proposes to use known movement patterns of the object, in the form of routes; and it proposes to use acceleration profiles together with the routes. Using real GPS-data and a corresponding real road network, the paper offers empirical evaluations and comparisons that include three existing approaches and all the proposed approaches.


Brilingaite, A., C. S. Jensen, N. Zokaite, ,"Enabling Routes as Context in Mobile Services" in DB Technical Report, TR-9, 42 pages,, 2005

DB Technical Report
Publication
ACM Author-Izer

With the continuing advances in wireless communications, geo-positioning, and portable electronics, an infrastructure is emerging that enables the delivery of on-line, location-enabled services to very large numbers of mobile users. A typical usage situation for mobile services is one characterized by a small screen and no keyboard, and by the service being only a secondary focus of the user. Under such circumstances, it is particularly important to deliver the "right" information and service at the right time, with as little user interaction as possible. This may be achieved by making services context aware. Mobile users frequently follow the same route to a destination as they did during previous trips to the destination, and the route and destination constitute important aspects of the context for a range of services. This paper presents key concepts underlying a software component that identifies and accumulates the routes of a user along with their usage patterns and that makes the routes available to services. Experiences from using the component on logs of GPS positions acquired from vehicles traveling within a real road network are reported.


2004 Top

Breunig, M., C. S. Jensen, M. Klein, A. Zeitz, G. Koloniari, J. Grünbauer, P. J. Marrón, C. Panieyiotoa, S. Boll, S. Šaltenis, K.-U. Sattler, M. Hauswirth, W. Lehner, O. Wolfson, ,"Research Issues in Mobile Querying" in in Proceedings of the Dagstuhl Seminar 04441 on Mobile Information Management, Schloss Dagstuhl, Wadern, Germany, 6 pages,, 2004

Publication
Online at Dagstuhl

This document reports on key aspects of the discussions conducted within the working group. In particular, the document aims to offer a structured and somewhat digested summary of the group's discussions. The document first offers concepts that enable characterization of "mobile queries" as well as the types of systems that enable such queries. It explores the notion of context in mobile queries. The document ends with a few observations, mainly regarding challenges.


Boll, S., M. Breunig, N. Davies, C. S. Jensen, B. König-Ries, R. Malaka, F. Matthes, C. Panayiotou, S. Šaltenis, T. Schwarz, ,"Towards a Handbook for User-Centred Mobile Application Design" in in Proceedings of the Dagstuhl Seminar 04441 on Mobile Information Management, Schloss Dagstuhl, Wadern, Germany, 8 pages,, 2004

Publication
Online at Dagstuhl

Why do we have difficulties designing mobile apps? Is there a "Mobile RUP"?


Jensen, C. S., A. Kligys, T. B. Pedersen, I. Timko, ,"Multidimensional Data Modeling For Location-Based Services" in The VLDB Journal, Vol. 13, No. 1, pp. 1-21,, 2004

Publication
ACM Author-Izer

With the recent and continuing advances in areas such as wireless communications and positioning technologies, mobile, location-based services n are becoming possible. Such services deliver location-dependent content to their users. More specifically, these services may capture the movements and requests of their users in multidimensional databases, i.e., data warehouses, and content delivery may be based on the results of complex queries on these data warehouses. Such queries aggregate detailed data in order to find useful patterns, e.g., in the interaction of a particular user with the services. The application of multidimensional technology in this context poses a range of new challenges. The specific challenge addressed here concerns the provision of an appropriate multidimensional data model. In particular, the paper extends an existing multidimensional data model and algebraic query language to accommodate spatial values that exhibit partial containment relationships instead of the total containment relationships normally assumed in multidimensional data models. Partial containment introduces imprecision in aggregation paths. The paper proposes a method for evaluating the imprecision of such paths. The paper also offers transformations of dimension hierarchies with partial containment relationships to simple hierarchies, to which existing precomputation techniques are applicable.


Torp, K., C. S. Jensen, R. T. Snodgrass, ,"Modification Semantics in Now-relative Databases" in Information Systems, Vol. 18, No. 8, pp. 653-683,, 2004

Publication
Online by Elsevier at ScienceDirect

Most real-world databases record time-varying information. In such databases, the notion of "the current time," or NOW, occurs naturally and prominently. For example, when capturing the past states of a relation using begin and end time columns, tuples that are part of the current state have some past time as their begin time and NOW as their end time. While the semantics of such variable databases has been described in detail and is well understood, the modification of variable databases remains unexplored. This paper defines the semantics of modifications involving the variable NOW. More specifically, the problems with modifications in the presence of NOW are explored, illustrating that the main problems are with modifications of tuples that reach into the future. The paper defines the semantics of modifications-including insertions, deletions, and updates-of databases without NOW, with NOW, and withvalues of the type NOW + D; where D is a non-variable time duration. To accommodate these semantics, three new timestamp values are introduced. Finally, implementation is explored. We show how to represent the variable NOW withcolumns of standard SQL data types and give a mapping from SQL on NOW-relative data to standard SQL on these columns. The paper thereby completes the semantics, the querying, and the modification of now-relative databases.


Huang, X., C. S. Jensen, ,"In-Route Skyline Querying for Location-Based Services" in in Proceedings of the Fourth International Workshop on Web and Wireless Geographic Information Systems, Goyang, South Korea, pp. 223-238,, 2004

Publication

With the emergence of an infrastructure for location-aware mobile services, the processing of advanced, location-based queries that are expected to underlie such services is gaining in relevance. While most work has assumed that mobile objects move in Euclidean space, this paper considers the case where movement is constrained to a road network and where points of interest can be reached via the network. More specifically, the paper assumes that the queries are issued by objects moving along routes towards destinations. The paper defines in-route nearest-neighbor skyline queries in this setting and considers their efficient implementation. These skyline queries take into account several spatial preferences, and they intuitively return a set of most interesting results for each result returned by the corresponding non-skyline queries. The paper also reports on an empirical performance evaluation of the proposed implementations based on real road network and point-of-interest data.


Brilingaite, N., C. S. Jensen, N. Zokaite, ,"Enabling Routes as Context in Mobile Services" in in Proceedings of the Twelfth ACM International Symposium on Advances in Geographic Information Systems, Washington DC, USA, pp. 127-136,, 2004

Publication
Online at ACM Digital Library

With the continuing advances in wireless communications, geo-positioning, and portable electronics, an infrastructure is emerging that enables the delivery of on-line, location-enabled services to very large numbers of mobile users. A typical usage situation for mobile services is one characterized by a small screen and no keyboard, and by the service being only a secondary focus of the user. It is therefore particularly important to deliver the "right" information and service at the right time, with as little user interaction as possible. This may be achieved by making services context aware. Mobile users frequently follow the same route to a destination as they did during previous trips to the destination, and the route and destination are important aspects of the context for a range of services. This paper presents key concepts underlying a software component that discovers the routes of a user along with their usage patterns and that makes the accumulated routes available to services. Experiences from using the component with real GPS logs are reported.


Huang, X., C. S. Jensen, ,"Towards A Streams-Based Framework for Defining Location-Based Queries" in in Proceedings of the Second International Workshop on Spatio-Temporal Database Management, pp. 73-80 ,, 2004

Publication

An infrastructure is emerging that supports the delivery of on-line, location-enabled services to mobile users. Such services involve novel database queries, and the database research community is quite active in proposing techniques for the efficient processing of such queries. In parallel to this, the management of data streams has become an active area of research. While most research in mobile services concerns performance issues, this paper aims to establish a formal framework for defining the semantics of queries encountered in mobile services, most notably the so-called continuous queries that are particularly relevant in this context. Rather than inventing an entirely new framework, the paper proposes a framework that builds on concepts from data streams and temporal databases. Definitions of example queries demonstrates how the framework enables clear formulation of query semantics and the comparison of queries. The paper also proposes a categorization of location-based queries.


Jensen, C. S., D. Lin, B. C. Ooi, ,"Query and Update Efficient B+-Tree Based Indexing of Moving Objects" in in Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 768-779,, 2004

Publication
Online at VLDB

A number of emerging applications of data management technology involve the monitoring and querying of large quantities of continuous variables, e.g., the positions of mobile service users, termed moving objects. In such applications, large quantities of state samples obtained via sensors are streamed to a database. Indexes for moving objects must support queries efficiently, but must also support frequent updates. Indexes based on minimum bounding regions (MBRs) such as the R-tree exhibit high concurrency overheads during node splitting, and each individual update is known to be quite costly. This motivates the design of a solution that enables the B+-tree to manage moving objects. We represent moving-object locations as vectors that are timestamped based on their update time. By applying a novel linearization technique to these values, it is possible to index the resulting values using a single B+-tree that partitions values according to their timestamp and otherwise preserves spatial proximity. We develop algorithms for range and k nearest neighbor queries, as well as continuous queries. The proposal can be grafted into existing database systems cost effectively. An extensive experimental study explores the performance characteristics of the proposal and also shows that it is capable of substantially outperforming the R-tree based TPR-tree for both single and concurrent access scenarios.


Friis-Christensen, A., J. V. Christensen, C. S. Jensen, ,"A Framework for Conceptual Modeling of Geographic Data Quality" in in Proceedings of the Eleventh International Symposium on Spatial Data Handling, pp. 605-616 ,, 2004

Publication
Online at Springer

The notion of data quality is of particular importance to geographic data. One reason is that such data is often inherently imprecise. Another is that the usability of the data is in large part determined by how "good" the data is, as different applications of geographic data require different qualities of the data are met. Such qualities concern the object level as well as the attribute level of the data. This paper presents a systematic and integrated approach to the conceptual modeling of geographic data and quality. The approach integrates quality information with the basic model constructs. This results in a model that enables object-oriented specification of quality requirements and of acceptable quality levels. More specifically, it extends the Unified Modeling Language with new modeling constructs based on standard classes, attributes, and associations that include quality information. A case study illustrates the utility of the quality-enabled model.


Civilis, A., C. S. Jensen, J. Nenortaite, S. Pakalnis, ,"Efficient Tracking of Moving Objects with Precision Guarantees" in in Proceedings of the First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services, pp. 164-173 ,, 2004

Publication
Online at IEEE

Sustained advances in wireless communications, geo-positioning, and consumer electronics pave the way to a kind of location-based service that relies on the tracking of the continuously changing positions of an entire population of service users. This type of service is characterized by large volumes of updates, giving prominence to techniques for location representation and update. This paper presents several representations, along with associated update techniques, that predict the present and future positions of moving objects. An update occurs when the deviation between the predicted and the actual position of an object exceeds a given threshold. For the case where the road network, in which an object is moving, is known, we propose a so-called segment-based policy that predicts an object's movement according to the road's shape. Map matching is used for determining the road on which an object is moving. Empirical performance studies based on a real road network and GPS logs from cars are reported.


Gregersen, H., C. S. Jensen, ,"Conceptual Modeling of Time-Varying Information" in in Proceedings of the International Conference on Computing, Communications and Control Technologies, pp. 248-255 ,, 2004

Publication

A wide range of database applications manage information that varies over time. Many of the underlying database schemas of these were designed using the Entity-Relationship (ER) model. In the research community as well as in industry, it is common knowledge that the temporal aspects of the mini-world are important, but difficult to capture using the ER model. Several enhancements to the ER model have been proposed in an attempt to support the modeling of temporal aspects of information. Common to the existing temporally extended ER models, few or no specific requirements to the models were given by their designers. With the existing proposals, an ontological foundation, and novel requirements as its basis, this paper defines a graphical, temporally extended ER model. The ontological foundation serves to ensure an orthogonal design, and the requirements aim, in part, at ensuring a design that naturally extends the syntax and semantics of the regular ER model. The result is a novel model that satisfies an array of properties not satisfied by any single previously proposed model.


Pedersen, T. B., C. S. Jensen, ,"Multidimensional Databases" in Chapter 12, pp. 12-1 - 12-13, in Industrial Information Technology Handbook, edited by R. Zurawski, CRC Press,, 2004

Publication



Jensen, C. S., ,"Database Aspects of Location-Based Services" in Chapter 5, pp. 115-147 in Location-Based Services, edited by J. Schiller and A. Voisard, Morgan Kaufmann Publishers ,, 2004

Publication

Adopting a data management perspective on location-based services, this chapter explores central challenges to data management posed by location-based services. Because service users typically travel in, and are constrained to, transportation infrastructures, such structures must be represented in the databases underlying high-quality services. Several integrated representations - which capture different aspects of the same infrastructure - are needed. Further, all other content that can be related to geographical space must be integrated with the infrastructure representations. The chapter describes the general concepts underlying one approach to data modeling for location-based services. The chapter also covers techniques that are needed to keep a database for location-based services up to date with the reality it models. As part of this, caching is touched upon briefly. The notion of linear referencing plays an important role in the chapter's approach to data modeling. Thus, the chapter offers an overview of linear referencing concepts and describes the support for linear referencing in Oracle.


Jensen, C. S., H. Lahrmann, S. Pakalnis, J. Runge, ,"The INFATI Data" in TimeCenter Technical Report TR-79 and CoRR cs.DB/0410001 (2004), 10 pages,, 2004

TimeCenter
cs.DB/0410001 (2004)
Publication
Publication at CoRR

The ability to perform meaningful empirical studies is of essence in research in spatio-temporal query processing. Such studies are often necessary to gain detailed insight into the functional and performance characteristics of proposals for new query processing techniques. We present a collection of spatio-temporal data, collected during an intelligent speed adaptation project, termed INFATI, in which some two dozen cars equipped with GPS receivers and logging equipment took part. We describe how the data was collected and how it was "modified" to afford the drivers some degree of anonymity. We also present the road network in which the cars were moving during data collection. The GPS data is publicly available for non-commercial purposes. It is our hope that this resource will help the spatio-temporal research community in its efforts to develop new and better query processing techniques.


Bohm, M., E. Bonnerup, C. Elberling, C. S. Jensen, C. P. Knudsen, L. Leffland, H. H. Lund, N. Olsen, L. Pallesen, K. Sørensen, ,"Det begynder i skolen - En ATV-rapport om naturfagenes vilkår og fremtidige udviklingsmuligheder i grundskolen" in Danish Academy of Technical Sciences, 81 pages,, 2004

Publication
Online at ATV

ATV giver med dette debatoplæg for første gang sit bud på, hvordan naturfagsundervisningen specifikt og fagligheden generelt kan styrkes i grundskolen. Det gør vi, fordi den danske grundskole har ondt i fagligheden - og særlig ondt i naturfagligheden. Undervisningen i Natur og Teknik bliver nedprioriteret, de naturfaglige miljøer på skolerne har mange steder ringe gennemslagskraft, og alt for mange elever mister i løbet af deres skoletid interessen for naturfagene. Dette er ikke en tilfredsstillende situation for den største og vigtigste uddannelses- og kulturinstitution i Danmark. Det er i grundskolen, vi skal sætte ind for at nære og bevare elevernes interesse for naturfag og teknik. Denne interesse skal eleverne kunne tage med sig på deres færd gennem uddannelsessystemet - ud på arbejdsmarkedet og videre ud i samfundet. Indsigt i natur og teknologi er nødvendig for at kunne virke i et vidensamfund. Det gælder alle uanset uddannelse, job og livsform. En grundlæggende viden om naturfag giver den enkelte elev gode muligheder for senere at vælge et spændende uddannelses- og senere jobforløb. Men vigtig er også muligheden for at forstå, tage stilling til og agere i forhold til alle de områder af tilværelsen, hvor naturvidenskab og teknologi spiller en rolle. Både i forhold til det enkelte individ, samfundets indretning og fremtidsperspektiver. Danmarks velfærd begynder i grundskolen. Vi ønsker, at grundskolen bliver en bedre arbejdsplads for lærerne og en mere interessant skole for eleverne. Vi inviterer til en konstruktiv og fordomsfri dialog. Debatoplægget vil belyse to overordnede indsatsområder: 1. Mulighederne for at skabe et naturfagligt fokusområde i den danske grundskole, samt optimere brugen af de ressourcer, der i øjeblikket anvendes på grundskolen. 2. Mulighederne for at styrke den generelle faglighed i grundskolen og sikre en bedre sammenhæng mellem undervisningen i grundskolen og på de videre ungdomsuddannelser. ATV's Naturfagsudvalg er et udvalg under ATV's Tænketank. Tænketanken arbejder sideløbende med et projekt om naturfagenes vilkår i gymnasieskolen, som denne kommer til at se ud efter den nye gymnasiereform. ATV's Tænketank arbejder med tekniske og naturvidenskabelige emner og problemstillinger, som har samfundsmæssig relevans og betydning.


Pelanis, M., S. Šaltenis, C. S. Jensen, ,"Indexing the Past, Present and Anticipated Future Positions of Moving Objects" in TimeCenter Technical Report TR-78, 30 pages,, 2004

TimeCenter
Publication

With the proliferation of wireless communications and geo-positioning, e-services are envisioned that exploit the positions of a set of continuously moving users to provide context-aware functionality to each individual user. Because advances in disk capacities continue to outperform Moore's Law, it becomes increasingly feasible to store on-line all the position information obtained from the moving e-service users. With the much slower advances in I/O speeds and many concurrent users, indexing techniques are of essence in this scenario. Past indexing techniques capture the position of an object up until the time of the most recent position sample, or they represent an object's position as a constant or linear function of time and capture the position from the current time and into the (near) future. This paper offers an indexing technique capable of capturing the positions of moving objects at all points in time. The index substantially extends partial persistence techniques, which support transaction time, to support valid time for monitoring applications. The performance of a query is independent of the number of past position samples stored for an object. No existing indices exist with these characteristics.


Lee, M. L., W. Hsu, C. S. Jensen, B. Cui, ,"Supporting Frequent Updates in R-Trees: A Bottom-Up Approach" in DB Technical Report TR-6, 23 pages (also TRA4/04, School of Computing, National University of Singapore),, 2004

DB Technical Report
Publication

Advances in hardware-related technologies promise to enable new data management applications that monitor continuous processes. In these applications, enormous amounts of state samples are obtained via sensors and are streamed to a database. Further, updates are very frequent and may exhibit locality. While the R-tree is the index of choice for multi-dimensional data with low dimensionality, and is thus relevant to these applications, R-tree updates are also relatively inefficient. We present a bottom-up update strategy for R-trees that generalizes existing update techniques and aims to improve update performance. It has different levels of reorganization - ranging from global to local - during updates, avoiding expensive top-down updates. A compact main-memory summary structure that allows direct access to the R-tree index nodes is used together with efficient bottom-up algorithms. Empirical studies indicate that the bottom-up strategy outperforms the traditional top-down technique, leads to indices with better query performance, achieves higher throughput, and is scalable.


Pfoser, D., N. Tryfona, C. S. Jensen, ,"Indeterminacy and Spatiotemporal Data: the Moving Point Object Case" in Technical Report No. 2004/02/03, Computer Technology Institute ,, 2004

Publication



Civilis, A., C. S. Jensen, J. Nenortaite, S. Pakalnis, ,"Efficient Tracking of Moving Objects with Precision Guarantees" in DB Technical Report TR-5, 23 pages,, 2004

DB Technical Report
Publication

We are witnessing continued improvements in wireless communications and geo-positioning. In addition, the performance/price ratio for consumer electronics continues to improve. These developments pave the way to a kind of location-based service that relies on the tracking of the continuously changing positions of the entire population of service users. This type of service is characterized by large volumes of updates, giving prominence to techniques for location representation and update. In this paper, we present several representations, along with associated update techniques, that predict the future positions of moving objects. For all representations, the predicted position of a moving object is updated whenever the deviation between it and the actual position of the object exceeds a given threshold. For the case where the road network, in which the object is moving, is known, we propose a so-called segment-based policy that represents and predicts an object's movement according to the road's shape. Map matching is used for determining the road on which an object is moving. Empirical performance studies and comparisons of the proposed techniques based on a real road network and GPS logs from cars are reported.


2003 Top

Skyt, J., C. S. Jensen, L. Mark, ,"A Foundation for Vacuuming Temporal Databases" in Transactions on Data and Knowledge Engineering, Vol. 44, No. 1, pp. 1-29,, 2003

Publication
Online by Elsevier at ScienceDirect

A wide range of real-world database applications, including financial and medical applications, are faced with accountability and traceability requirements. These requirements lead to the replacement of the usual update-in-place policy by an append-only policy that retain all previous states in the database. This policy result in so-called transaction-time databases which are ever-growing. A variety of physical storage structures and indexing techniques as well as query languages have been proposed for transaction-time databases, but the support for physical removal of data, termed vacuuming, has only received little at- tention. Such vacuuming is called for by, e.g., the laws of many countries and the policies of many busi- nesses. Although necessary, with vacuuming, the database's perfect recollection of the past may be compromised via, e.g., selective removal of records pertaining to past states. This paper provides a semantic foundation for the vacuuming of transaction-time databases. The main focus is to establish a foundation for the correct processing of queries and updates against vacuumed databases. However, options for user, application, and database interactions in response to queries and updates against vacuumed data are also outlined.


Nytun, J. P., C. S. Jensen, V. A. Oleshchuk, ,"Towards a Data Consistency Modeling and Testing Framework for MOF Defined Languages" in Norsk informatikkonferanse 2003, Oslo, Norway, 12 pages,, 2003

Publication

The number of online data sources is continuously increasing, and related data are often available from several sources. However accessing data from multiple sources is hindered by the use of different languages and schemas at the sources, as well as by inconsistencies among the data. There is thus a growing need for tools that enable the testing of consistency among data from different sources. This paper puts forward the concept of a framework, that supports the integration of UML models and ontologies written in languages such as the W3C Web Ontology Language (OWL). The framework will be based on the Meta Object Facility (MOF); a MOF metamodel (e.g. a metamodel for OWL) can be input as a specification, the framework will then allow the user to instantiate the specified metamodel. Consistencies requirements are specified using a special modeling technique that is characterized by its use of special Boolean class attributes, termed consistency attributes, to which OCL expressions are attached. The framework makes it possible to exercise the modeling technique on two or more legacy models and in this way specify consistency between models. Output of the consistency modeling is called an integration model which consist of the legacy models and the consistency model. The resulting integration model enables the testing of consistency between instances of legacy models; the consistency model is automatically instantiated and the consistency attribute values that are false indicates inconsistencies.


Speicys, L., C. S. Jensen, A. Kligys, ,"Computational Data Modeling for Network-Constrained Moving Objects" in in Proceedings of the Eleventh International Symposium on Advances in Geographic Information Systems, New Orleans, LA, pp. 118-125,, 2003

Publication
ACM Author-Izer

Advances in wireless communications, positioning technology, and other hardware technologies combine to enable a range of applications that use a mobile user's geo-spatial data to deliver online, location-enhanced services, often referred to as location-based services. Assuming that the service users are constrained to a transportation network, this paper develops data structures that model road networks, the mobile users, and stationary objects of interest. The proposed framework encompasses two supplementary road network representations, namely a two-dimensional representation and a graph representation. These capture aspects of the problem do main that are required in order to support the querying that underlies the envisioned location-based services.


Jensen, C. S., J. Kolar, T. B. Pedersen, I. Timko, ,"Nearest Neighbor Queries in Road Networks" in in Proceedings of the Eleventh International Symposium on Advances in Geographic Information Systems, New Orleans, LA, pp. 1-8,, 2003

Publication
ACM Author-Izer

With wireless communications and geo-positioning being widely available, it becomes possible to offer new e-services that provide mobile users with information about other mobile objects. This paper concerns active, ordered k-nearest neighbor queries for query and data objects that are moving in road networks. Such queries may be of use in many services. Specifically, we present an easily implementable data model that serves well as a foundation for such queries. We also present the design of a prototype system that implements the queries based on the data model. The algorithm used for the nearest neighbor search in the prototype is presented in detail. In addition, the paper reports on results from experiments with the prototype system.


Pfoser, D., C. S. Jensen, ,"Indexing of Network Constrained Moving Objects" in in Proceedings of the Eleventh International Symposium on Advances in Geographic Information Systems, New Orleans, LA, pp. 25-32,, 2003

Publication
ACM Author-Izer

With the proliferation of mobile computing, the ability to index efficiently the movements of mobile objects becomes important. Objects are typically seen as moving in two-dimensional (x,y) space, which means that their movements across time may be embedded in the three-dimensional (x,y,t) space. Further, the movements are typically represented as trajectories, sequences of connected line segments. In certain cases, movement is restricted, and specifically in this paper, we aim at exploiting that movements occur in transportation networks to reduce the dimensionality of the data. Briefly, the idea is to reduce movements to occur in one spatial dimension. As a consequence, the movement data becomes two-dimensional (x,t). The advantages of considering such lower- dimensional trajectories are the reduced overall size of the data and the lower-dimensional indexing challenge. Since off-the-shelf database management systems typically do not offer higher- dimensional indexing, this reduction in dimensionality allows us to use such DBMSes to store and index trajectories. Moreover, we argue that, given the right circumstances, indexing these dimensionality-reduced trajectories can be more efficient than using a three-dimensional index. This hypothesis is verified by an experimental study that incorporates trajectories stemming from real and synthetic road networks.


Nytun, J. P., C. S. Jensen, ,"Modeling and Testing Legacy Data Consistency Requirements" in in Proceedings of the Sixth International Conference on the Unified Modeling Language, San Francisco, CA, USA, pp. 341-355,, 2003

Publication
Online at Springer

An increasing number of data sources are available on the Internet, many of which offer semantically overlapping data, but based on different schemas, or models. While it is often of interest to integrate such data sources, the lack of consistency among them makes this inte- gration difficult. This paper addresses the need for new techniques that enable the modeling and consistency checking for legacy data sources. Specifically, the paper contributes to the development of a framework that enables consistency testing of data coming from different types of data sources. The vehicle is UML and its accompanying XMI. The paper presents techniques for modeling consistency requirements using OCL and other UML modeling elements: it studies how models that describe the required consistencies among instances of legacy models can be de- signed in standard UML tools that support XMI. The paper also con- siders the automatic checking of consistency in the context of one of the modeling techniques. The legacy model instances that are inputs to the consistency check must be represented in XMI.


Lee, M. L., W. Hsu, C. S. Jensen, B. Cui, K. L. Teo, ,"Supporting Frequent Updates in R-Trees: A Bottom-Up Approach" in in Proceedings of the Twentynineth International Conference on Very Large Data Bases, Berlin, Germany, pp. 608-619 ,, 2003

Publication

Advances in hardware-related technologies promise to enable new data management applica- tions that monitor continuous processes. In these applications, enormous amounts of state samples are obtained via sensors and are streamed to a database. Further, updates are very frequent and may exhibit locality. While the R-tree is the index of choice for multi-dimensional data with low dimensionality, and is thus relevant to these applications, R-tree updates are also relatively in- efficient. We present a bottom-up update strategy for R-trees that generalizes existing update tech- niques and aims to improve update performance. It has different levels of reorganization-ranging from global to local-during updates, avoiding expensive top-down updates. A compact main- memory summary structure that allows direct access to the R-tree index nodes is used together with efficient bottom-up algorithms. Empirical studies indicate that the bottom-up strategy outperforms the traditional top-down technique, leads to indices with better query performance, achieves higher throughput, and is scalable.


Hage, C., C. S. Jensen, T. B. Pedersen, L. Speicys, I. Timko, ,"Integrated Data Management for Mobile Services in the Real World" in in Proceedings of the Twentynineth International Conference on Very Large Data Bases, Berlin, Germany, pp. 1019-1030,, 2003

Publication

Market research companies predict a huge mar- ket for services to be delivered to mobile users. Services include route guidance, point-of-interest search, metering services such as road pricing and parking payment, traffic monitoring, etc. We be- lieve that no single such service will be the killer service, but that suites of integrated services are called for. Such integrated services reuse in- tegrated content obtained from multiple content providers. This paper describes concepts and techniques un- derlying the data management system deployed by a Danish mobile content integrator. While geo- referencing of content is important, it is even more important to relate content to the transportation in- frastructure. The data management system thus re- lies on several sophisticated, integrated representa- tions of the infrastructure, each of which supports its own kind of use. The paper covers data model- ing, querying, and update, as well as the applica- tions using the system.


Friis-Christensen, A., C. S. Jensen, ,"Object-Relational Management of Multiply Represented Geographic Entities" in in Proceedings of the Fifteenth International Conference on Scientific and Statistical Database Management, Cambridge, MA, USA, pp. 183-192,, 2003

Publication

Multiple representation occurs when information about the same geographic entity is represented electronically more than once. This occurs frequently in practice, and it invariably results in the occurrence of inconsistencies among the different representations. We propose to resolve this situation by introducing a multiple representation management system (MRMS), the schema of which includes rules that specify how to identify representations of the same entity, rules that specify consistency requirements, and rules used to restore consistency when necessary. In this paper, we demonstrate by means of a prototype and a real-world case study that it is possible to implement a multiple representation schema language on top of an object-relational database management system. Specifically, it is demonstrated how it is possible to map the constructs of the language used for specifying the multiple representation schema to functionality available in Oracle. Though some limitations exist, Oracle has proven to be a suitable platform for implementing an MRMS.


Böhlen, M. H., C. S. Jensen, ,"Temporal Data Model and Query Language Concepts" in Encyclopedia of Information Systems, Vol. 4, pp. 437-453, Academic Press, Inc.,, 2003

Publication



Sellis, T., M. Koubarakis, A. Frank, S. Grumbach, R. H. Güting, C. S. Jensen, N. Lorentzos, Y. Manolopoulos, E. Nardelli, B. Pernici, H.-J. Schek, M. Scholl, B. Theodoulidis, N. Tryfona, editors, ,"Spatiotemporal Databases: The Chorochronos Approach" in Lecture Notes in Computer Science, Volume 2520, Springer-Verlag, 352+xiv pages,, 2003




Jensen, C. S., editor, , in Special Issue of the IEEE Data Engineering Bulletin on Infrastructure for Research in Spatio-Temporal Query Processing, 26(2), 54 pages,, 2003

Ed. letter
Online at Microsoft Research



Tryfona, N., R. Price, C. S. Jensen, ,"Conceptual Models for Spatio-temporal Applications" in in T. Sellis et al., editors, Spatiotemporal Databases: The Chorochronos Approach, Lecture Notes in Computer Science, Volume 2520, Springer-Verlag, pp. 79-116,, 2003

Publication
Online at Springer



Güting, R. H., M. H. Böhlen, M. Erwig, L. Forlizzi, C. S. Jensen, N. Lorentzos, E. Nardelli, M. Schneider, M. Vazirgiannis, ,"Spatiotemporal Models and Languages: An Approach Based on Data Types" in in T. Sellis et al., editors, Spatiotemporal Databases: The Chorochronos Approach, Lecture Notes in Computer Science, Volume 2520, Springer-Verlag, pp. 117-176,, 2003

Publication
Online at Springer



Di Pasquale, A., L. Forlizzi, C. S. Jensen, Y. Manolopoulos, E. Nardelli, D. Pfoser, G. Proietti, S. Šaltenis, Y. Theodoridis, T. Tzouramanis, M. Vassilakopoulos, ,"Access Methods and Query Processing" in in T. Sellis et al., editors, Spatiotemporal Databases: The Chorochronos Approach, Lecture Notes in Computer Science, Volume 2520, Springer-Verlag, pp. 203-261,, 2003

Publication
Online at Springer



M. Breunig, T. Can, M. H. Böhlen, S. Dieker, R. H. Güting, C. S. Jensen, L. Relly, P. Rigaux, H.-J. Schek, M. Scholl, ,"Architecture and Implementation of Spatio-Temporal DBMS" in in T. Sellis et al., editors, Spatiotemporal Databases: The Chorochronos Approach, Lecture Notes in Computer Science, Volume 2520, Springer-Verlag, pp. 263-318,, 2003

Publication
Online at Springer



C. S. Jensen, T. B. Pedersen, L. Speicys, I. Timko, ,"Data Modeling for Mobile Services in the Real World" in in Proceedings of the Eighth International Symposium on Spatial and Temporal Databases, Santorini, Greece, pp. 1-9. Lecture Notes in Computer Science, Volume 2750,, 2003

Publication
Online at Springer

Research contributions on data modeling, data structures, query processing, and indexing for mobile services may have an impact in the longer term, but each contribution typically offers an isolated solution to one small part of the practical problem of delivering mobile services in the real world. In contrast, this paper describes holistic concepts and techniques for mobile data modeling that are readily applicable in practice. Focus is on services to be delivered to mobile users, such as route guidance, point-of-interest search, road pricing, parking payment, traffic monitoring, etc. While geo-referencing of content is important, it is even more important to relate content to the transportation infrastructure. In addition, several sophisticated, integrated representations of the infrastructure are needed.


Jensen, C. S., A. Schmidt, ,"Spatio-Temporal Data Exchange Standards" in in C. S. Jensen, editor: Special Issue on Infrastructure for Research in Spatio-Temporal Query Processing, IEEE Data Engineering Bulletin, 26(2), pp. 51-55,, 2003

Publication

We believe that research that concerns aspects of spatio-temporal data management may benefit from taking into account the various standards for spatio-temporal data formats. For example, this may con- tribute to rendering prototype software "open" and more readily useful. This paper thus identifies and briefly surveys standardization in relation to primarily the exchange and integration of spatio-temporal data. An overview of several data exchange languages is offered, along with reviews their potential for facilitating the collection of test data and the leveraging of prototypes. The standards, most of which are XML-based, lend themselves to the integration of prototypes into middleware architectures, e.g., as Web services.


Jensen, C. S., ,"Introduction to Special Issue with Best Papers from EDBT 2002" in Information Systems, 28(1-2), pp. 1-2,, 2003

Publication
Online by Elsevier at ScienceDirect



2002 Top

Agarwal, P. K., L. J. Guibas, H. Edelsbrunner, J. Erickson, M. Isard, S. Har-Peled, J. Hershberger, C. S. Jensen, L. E. Kavraki, P. Koehl, M. Lin, D. Manocha, D. N. Metaxas, B. Mirtich, D. M. Mount, S. Muthukrishnan, D. K. Pai, E. Sacks, J. Snoeyink, S. Suri, O. Wolfson, ,"Algorithmic Issues in Modeling Motion" in ACM Computing Surveys, Vol. 34, No. 4, pp. 550-572,, 2002

Publication
ACM Author-Izer

This article is a survey of research areas in which motion plays a pivotal role. The aim of the article is to review current approaches to modeling motion together with related data structures and algorithms, and to summarize the challenges that lie ahead in producing a more unified theory of motion representation that would be useful across several disciplines.


Skyt, J., C. S. Jensen, ,"Persistent Views-A Mechanism for Managing Ageing Data" in The Computer Journal, Vol. 45, No. 5, pp. 481-493,, 2002

Publication
Online at Oxford Journals

Enabled by the continued advances in storage technologies, the amounts of on-line data grow at a rapidly increasing pace. This development is witnessed in many data warehouse-type applications, including so-called data webhouses that accumulate click streams from portals. The presence of very large and continuously growing amounts of data introduces new challenges, one of them being the need for effective management of aged data. In very large and growing databases, some data eventually becomes inaccurate or outdated and may be of reduced interest to the database applications. This paper offers a mechanism, termed persistent views, that aids in flexibly reducing the volume of data, for example, by enabling the replacement of such 'low-interest', detailed data with aggregated data. The paper motivates persistent views and precisely defines and contrasts these with the related mechanisms of views, snapshots and physical deletion. The paper also offers a provably correct foundation for implementing persistent views.


Šaltenis, S., C. S. Jensen, ,"Indexing of now-relative spatio-bitemporal data" in The VLDB Journal, Vol. 11, No. 1, pp. 1-16,, 2002

Publication
Online at Springer

Real-world entities are inherently spatially and temporally referenced, and database applications increasingly exploit databases that record the past, present, and anticipated future locations of entities, e.g., the residences of customers obtained by the geo-coding of addresses. Indices that efficiently support queries on the spatio-temporal extents of such entities are needed. However, past indexing research has progressed in largely separate spatial and temporal streams. Adding time dimensions to spatial indices, as if time were a spatial dimension, neither supports nor exploits the special properties of time. On the other hand, temporal indices are generally not amenable to extension with spatial dimensions. This paper proposes the first efficient and versatile index for a general class of spatio-temporal data: the discretely changing spatial aspect of an object may be a point or may have an extent; both transaction time and valid time are supported, and a generalized notion of the current time, now, is accommodated for both temporal dimensions. The index is based on the R*-tree and provides means of prioritizing space versus time, which enables it to adapt to spatially and temporally restrictive queries. Performance experiments are reported that evaluate pertinent aspects of the index.


Snaprud, M. H., C. S. Jensen, N. Ulltveit-Moe, J. P. Nytun, M. E. Rafoshei-Klev, A. Sawicka, O. Hanssen, ,"Towards a Web Accessibility Monitor" in in Proceedings of the Second European Medical and Biological Engineering Conference, Vienna, Austria,, 2002

Publication

A tool for the assessment and monitoring of web content accessibility is proposed. The experimental prototype utilises an Internet robot and stores the collected accessibility data in a data warehouse for further analysis. The evaluation is partly based on the Web Accessibility guidelines from W3C.


Jensen, C. S., A. Kligys, T. B. Pedersen, I. Timko, ,"Multidimensional Data Modeling for Location-Based Services" in in Proceedings of the Tenth ACM International Symposium on Advances in Geographic Information Systems, McLean, VA, pp. 55-61,, 2002

Publication
Online at ACM Digital Library

With the recent and continuing advances in areas such as wireless communications and positioning technologies, mobile, location-based services are becoming possible. Such services deliver location-dependent content to their users. More specifically, these services may capture the movements of their users in multidimensional databases, and their delivery of content in response to user requests may be based on the issuing of complex, multidimensional queries. The application of multidimensional technology in this and other contexts poses a range of new challenges. This paper aims to provide an appropriate multidimensional data model. In particular, the paper extends an existing multidimensional data model to accommodate spatial values that exhibit partial containment relationships instead of the total containment relationships normally assumed in multidimensional data models.


Friis-Christensen, A. A., D. Skogan, C. S. Jensen, G. Skagenstein, N. Tryfona, ,"Management of Multiply Represented Geographic Entities" in in Proceedings of the 2002 International Data Engineering and Applications Symposium, Edmonton, Canada, pp. 150-159,, 2002

Publication

Multiple representation of geographic information occurs when a real-world entity is represented more than once in the same or different databases. In this paper, we propose a new approach to the modeling of multiply represented entities and the relationships among the entities and their representations. A Multiple Representation Management System is outlined that can manage multiple representations consistently over a number of autonomous databases. Central to our approach is the Multiple Representation Schema Language that is used to configure the system. It provides an intuitive and declarative means of modeling multiple representations and specifying rules that are used to maintain consistency, match objects representing the same entity, and restore consistency if necessary.


Benetis, R., C. S. Jensen, G. Karčiauskas, S. Šaltenis, ,"Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects" in in Proceedings of the 2002 International Data Engineering and Applications Symposium, Edmonton, Canada, pp. 44-53,, 2002

Publication

With the proliferation of wireless communications and the rapid advances in technologies for tracking the positions of continuously moving objects, algorithms for efficiently answering queries about large numbers of moving objects increasingly are needed. One such query is the reverse nearest neighbor (RNN) query that returns the objects that have a query object as their closest object. While algorithms have been proposed that compute RNN queries for non-moving objects, there have been no proposals for answering RNN queries for continuously moving objects. Another such query is the nearest neighbor (NN) query, which has been studied extensively and in many contexts. Like the RNN query, the NN query has not been explored for moving query and data points. This paper proposes an algorithm for answering RNN queries for continuously moving points in the plane. As a part of the solution to this problem and as a separate contribution, an algorithm for answering NN queries for continuously moving points is also proposed. The results of performance experiments are reported.


Šaltenis, S., C. S. Jensen, ,"Indexing of Moving Objects for Location-Based Services" in in Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, pp. 463-472,, 2002

Publication
Online at IEEE

Visionaries predict that the Internet will soon extend to billions of wireless devices, or objects, a substantial fraction of which will offer their changing positions to location-based services. This paper assumes an Internet-service scenario where objects that have not reported their position within a specified duration of time are expected to no longer be interested in, or of interest to, the service. Due to the possibility of many "expiring" objects, a highly dynamic database results.The paper presents an R-tree based technique for the indexing of the current positions of such objects. Different types of bounding regions are studied, and new algorithms are provided for maintaining the tree structure. Performance experiments indicate that, when compared to the approach where the objects are not assumed to expire, the new indexing technique can improve search performance by a factor of two or more without sacrificing update performance.


Skyt, J., C. S. Jensen, T. B. Pedersen, ,"Specification-Based Data Reduction in Dimensional Data Warehouses" in in Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, p. 278,, 2002

Publication



Jensen, C. S., , in editor, Special Issue of the IEEE Data Engineering Bulletin on Indexing of Moving Objects, Vol. 25, No. 2, 60 pages,, 2002

Ed. letter
Online at Microsoft Research



Jensen, C. S., K. Jeffrey, J. Pokorny, S. Šaltenis, E. Bertino, K. Böhm, M. Jarke, editors, ,"Advances in Database Technology" in Eighth International Conference on Extending Database Technology, Prague, Czech Republic, Lecture Notes in Computer Science, Volume 2287, Springer-Verlag, 776+xvi pages,, 2002

Online at Springer



Šaltenis, S., C. S. Jensen, ,"Indexing of Objects on the Move" in Mining Spatio-Temporal Information Systems, pp. 21-41,, 2002

Mining Spatio-Temporal Information Systems
Publication

Visionaries predict that the Internet will soon extend to billions of wireless devices, or objects, a substantial fraction of which will offer their changing positions to location-enabled services. This chapter assumes an Internet-service scenario where objects that have not reported their position within a specified duration of time are expected to no longer be interested in, or of interest to, the service. Due to the possibility of many "expiring'" objects, a highly dynamic database results. The chapter describes two R-tree based techniques for the indexing of the current positions of such objects. These indexing techniques accommodate object positions that are described by linear functions of time. They employ novel types of bounding regions, as well as new algorithms for maintaining their tree structure. The results of quite encouraging performance experiments are described briefly.


Price, R., N. Tryfona, C. S. Jensen, ,"Extending UML for Space and Time-Dependent Applications" in Chapter XVII, pp. 342-366, in Advanced Topics in Database Research, Vol. 1, edited by K. Siau, Idea Group Publishing,, 2002

Publication



Jensen, C. S., ,"Location-Enabled Services - A Data Management Perspective" in in The Nordic GIS Conference 2002: GI - Communication and Perspective, Aalborg, Denmark, p. 16,, 2002




Jensen, C. S., J. P. Nytun, M. Snaprud, ,"Towards Virtual Worlds and Augmented Realities: A Research Agenda" in in M. Pätzold (ed.), Proceedings of the Second International Workshop on Research Directions in Mobile Communications and Services, Grimstad, Norway, pp. 19-22,, 2002

Publication

Powerful drivers combine to enable the capture of reality in computers as well as the ubiquitous delivery of information content and services, based on the captured reality. We outline key drivers, exemplify application areas related to the research agenda, and describe briefly general software challenges as well as one specific data representation challenge to software technologies posed by the research agenda.


Jensen, C. S., S. Šaltenis, ,"Towards Increasingly Update Efficient Moving-Object Indexing" in in Special Issue on Indexing of Moving Objects, IEEE Data Engineering Bulletin, Vol. 25, No. 2, edited by C. S. Jensen, pp. 35-40,, 2002

Publication

Current moving-object indexing concentrates on point-objects capable of continuous movement in one-, two-, and three-dimensional Euclidean spaces, and most approaches are based on well-known, conventional spatial indices. Approaches that aim at indexing the current and anticipated future positions of moving objects generally must contend with very large update loads because of the agility of the objects indexed. At the same time, conventional spatial indices were often originally proposed in settings characterized by few updates and focus on query performance. In this paper, we characterize the challenge of moving-object indexing and discuss a range of techniques, the use of which may lead to better update performance.


Jensen, C. S., ,"Location-Based Services - A Data Management Perspective" in in MapDays 2002 (Kartdagar 2002), Jönköping, Sweden, p. 44,, 2002

Publication

We are heading rapidly towards a global computing and information infrastructure that will contain billions of wirelessly connected devices, many of which will offer so-called location-based services to their mobile users always and everywhere. Indeed, users will soon take ubiquitous wireless access to information and services for granted. This scenario is made possible by the rapid advances in the underlying hardware technologies, which continue to follow variants of Moore s Law. Examples of location-based services include tracking, way-finding, traffic management, safety-related services, and mixed-reality games, to name but a few. This paper outlines a general usage scenario for location-based services, in which data warehousing plays a central role, and it describes central challenges to be met by the involved software technologies in order for them to reach their full potential for usage in location-based services.


Jensen, C. S., S. Šaltenis, ,"Data Representation and Indexing in Location-Enabled M-Services" in in National Science Foundation Workshop on Context-Aware Mobile Database Management, Providence, RI, USA, 3 pages,, 2002

Publication

Rapid, sustained advances in key computing technologies combine to enable a new class of computing services that aim to meet needs of mobile users. These ubiquitous and intelligent services adapt to each user s particular preferences and current circumstances they are personalized. The services exploit data available from multiple sources, including data on past interactions with the users, data accessible via the Internet, and data obtained from sensors. The user s geographical location is particularly central to these services. We outline some of the research challenges that aim to meet the data representation and indexing needs of such services.


Slivinskas, G., C. S. Jensen, R. T. Snodgrass, ,"Bringing Order to Query Optimization" in ACM SIGMOD Record 31(2), pp. 5-14,, 2002

Publication
ACM Author-Izer

A variety of developments combine to highlight the need for respecting order when manipulating relations. For ex- ample, new functionality is being added to SQL to sup- port OLAP-style querying in which order is frequently an important aspect. The set- or multiset-based frameworks for query optimization that are currently being taught to database students are increasingly inadequate. This paper presents a foundation for query optimization that extends existing frameworks to also capture ordering. A list-based relational algebra is provided along with three progressively stronger types of algebraic equivalences, concrete query transformation rules that obey the different equivalences, and a procedure for determining which types of transformation rules are applicable for optimizing a query. The exposition follows the style chosen by many textbooks, making it relatively easy to teach this material in continuation of the material covered in the textbooks, and to integrate this material into the textbooks.


Jensen, C. S., ,"Research Challenges in Location-Enabled M-Services" in in Proceedings of the Third International Conference on Mobile Data Management, Singapore, pp. 3-7,, 2002

Publication

Rapid, sustained advances in key computing hardware technologies combine to enable a new class of computing services that aim to meet needs of mobile users. These ubiquitous and intelligent services adapt to each user s particular preferences and current circumstances they are personalized. The services exploit data available from multiple sources, including data on past interactions with the users, data accessible via the Internet, and data obtained from sensors. The user s geographical location is particularly central to these services. We outline some of the research challenges that aim to meet the computing needs of such services. In particular, focus is on update and query processing in the context of geo-referenced data, where certain challenges related to the data representation, indexing, and precomputation are described.


Gao, D., C. S. Jensen, R. T. Snodgrass, M. D.Soo, ,"Join Operations in Temporal Databases" in TimeCenter Technical Report TR-71, 50 pages,, 2002

TimeCenter
Publication

Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the evaluation of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally-varying data dramatically increases the size of the database. These factors indicate that specialized techniques are needed to efficiently evaluate temporal joins. We address this need for efficient join evaluation in temporal databases. Our purpose is two-fold. We first survey all previously proposed temporal join operators. While many temporal join operators have been defined in previous work, this work has been done largely in isolation from competing proposals, with little, if any, comparison of the various operators. We then address evaluation algorithms, comparing the applicability of various algorithms to the temporal join operators, and describing a performance study involving algorithms for one important operator, the temporal equijoin. Our focus, with respect to implementation, is on non-index based join algorithms. Such algorithms do not rely on auxiliary access paths, but may exploit sort orderings to achieve efficiency.


Jensen, C. S., A. Kligys, T. B. Pedersen, I. Timko, ,"Multidimensional Data Modeling For Location-Based Services" in DB Technical Report TR-2, 30 pages,, 2002

DB Technical Report
Publication

With the recent and continuing advances in areas such as wireless communications and positioning technologies, mobile, location-based services are becoming possible. Such services deliver location- dependent content to their users. More specifically, these services may capture the movements of their users in multidimensional databases, and their delivery of content in response to user requests may be based on the issuing of complex, multidimensional queries. The application of multidimensional technology in this context poses a range of new challenges. The specific challenge addressed here concerns the provision of an appropriate multidimensional data model. In particular, the paper extends an existing multidimensional data model and algebraic query language to accommodate spatial values that exhibit partial containment relationships instead of the total containment relationships normally assumed in multidimensional data models. Partial containment introduces imprecision in aggregation paths. The paper proposes a method for evaluating the imprecision of such paths. The paper also offers transformations of dimension hierarchies with partial containment relationships to simple hierarchies, to which existing precomputation techniques are applicable.


Slivinskas, G., C. S. Jensen, R. T. Snodgrass, ,"Bringing Order to Query Optimization" in DB Technical Report TR-1, 19 pages,, 2002

DB Technical Report
Publication

A variety of developments combine to highlight the need for respecting order when manipulating relations. For example, new functionality is being added to SQL to support OLAP-style querying in which order is frequently an important aspect. The set- or multiset-based frameworks for query optimization that are currently being taught to database students are increasingly inadequate. This paper presents a foundation for query optimization that extends existing frameworks to also capture ordering. A list-based relational algebra is provided along with three progressively stronger types of algebraic equivalences, concrete query transformation rules that obey the different equivalences, and a procedure for determining which types of transformation rules are applicable for optimizing a query. The exposition follows the style chosen by many textbooks, making it relatively easy to teach this material in continuation of the material covered in the textbooks, and to integrate this material into the textbooks.


Slivinskas, G., C. S. Jensen, ,"Enhancing an Extensible Query Optimizer with Support for Multiple Equivalence Types" in TimeCenter Technical Report TR-70, 21 pages,, 2002

TimeCenter
Publication

Database management systems are continuously being extended with support for new types of data and more advanced querying capabilities. In large part because of this, query optimization has remained a very active area of research throughout the past two decades. At the same time, current commercial optimizers are hard to modify, to incorporate desired changes in, e.g., query algebras, transformation rules, search strategies. This has led to a number of research contributions that aim at creating extensible query optimizers. Examples include Starburst, Volcano, and OPT++. This paper reports on a study that has enhanced Volcano to support a relational algebra with added temporal operators, such as temporal join and aggregation. This includes the handling of algorithms and cost formulas for these new operators, six types of query equivalences, and accompanying query transformation rules. The paper shows how the Volcano search-space generation and plan-search algo- rithms were extended to support the six equivalence types, describes other key implementation tasks, and evaluates the extensibility of Volcano.


Gregersen, H., C. S. Jensen, ,"On the Ontological Expressiveness of Temporal Extensions to the Entity-Relationship Model" in TimeCenter Technical Report TR-69, 21 pages,, 2002

TimeCenter
Publication

It is widely recognized that temporal aspects of database schemas are prevalent, but also difficult to capture using the ER model. The database research community's response has been to develop temporally enhanced ER models. However, these models have not been subjected to systematic evaluation. In contrast, the evaluation of modeling methodologies for information systems development is a very active area of research in information systems engineering community, where the need for systematic evaluations of modeling methodologies is well recognized. Based on a framework from information systems engineering, this paper evaluates the ontological expressiveness of three different temporal enhancements to the ER model, the Entity-Relation-Time model, the TERC+ model, and the Time Extended ER model. Each of these temporal ER model extensions is well-documented, and together the models represent a substantial range of the design space for temporal ER extensions. The evaluation considers the uses of the models for both analysis and design, and the focus is on how well the models capture temporal aspects of reality as well as of relational database designs.


2001 Top

Dyreson, C. E., M. H. Böhlen, C. S. Jensen, ,"MetaXPath" in Journal of Digital Information, Vol. 2, No. 2,, 2001

Publication
Online at JoDI

This paper presents the METAXPath data model and query language. METAXPath extends XPath with support for XML metadata. XPath is a specification language for locations in an XML document. It serves as the basis for XML query languages like XSLT and the XML Query Algebra. The METAXPath data model is a nested XPath tree. Each level of metadata induces a new level of nest- ing. The data model separates metadata and data into different dataspaces, supports meta-metadata, and en- ables sharing of metadata common to a group of nodes without duplication. The METAXPath query language has a level shift operator to shift a query from a data level to a metadata level. METAXPath maximally reuses XPath hence the changes needed to support metadata are few. METAXPath is fully compatible with XPath.


Pedersen, T. B., C. S. Jensen, ,"Multidimensional Databases" in IEEE Computer, Vol. 34, No. 12, pp. 40-46,, 2001

Publication

Multidimensional data-base technology is a key factor in the interactive analysis of large amounts of data for decision-making purposes. In contrast to previous technologies, these databases view data as multidimensional cubes that are particularly well suited for data analysis. Multidimensional models categorize data either as facts with associated numerical measures or as textual dimensions that characterize the facts. Queries aggregate measure values over a range of dimension values to provide results such as total sales per month of a given product. Multidimensional database technology is being applied to distributed data and to new types of data that current technology often cannot adequately analyze. For example, classic techniques such as preaggregation cannot ensure fast query response times when data-such as that obtained from sensors or GPS-equipped moving objects-changes continuously. Multidimensional database technology will increasingly be applied where analysis results are fed directly into other systems, thereby eliminating humans from the loop. When coupled with the need for continuous updates, this context poses stringent performance requirements not met by current technology.


Pedersen, T. B., C. S. Jensen, C. E. Dyreson, ,"A Foundation for Capturing and Querying Complex Multidimensional Data" in Information Systems (special issue on data warehousing), Vol. 26, No. 5, pp. 383-423,, 2001

Publication
Online by Elsevier at ScienceDirect

On-line analytical processing (OLAP) systems considerably improve data analysis and are finding wide-spread use. OLAP systems typically employ multidimensional data models to structure their data. This paper identifies 11 modeling requirements for multidimensional data models. These requirements are derived from an assessment of complex data found in real-world applications. A survey of 14 multidimensional data models reveals shortcomings in meeting some of the requirements. Existing models do not support many-to-many relationships between facts and dimensions, lack built-in mechanisms for handling change and time, lack support for imprecision, and are generally unable to insert data with varying granularities. This paper defines an extended multidimensional data model and algebraic query language that address all 11 requirements. The model reuses the common multidimensional concepts of dimension hierarchies and granularities to capture imprecise data. For queries that cannot be answered precisely due to the imprecise data, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. In addition, alternative queries unaffected by imprecision are offered. The data model and query evaluation techniques discussed in this paper can be implemented using relational database technology. The approach is also capable of exploiting multidimensional query processing techniques like pre-aggregation. This yields a practical solution with low computational overhead.


Slivinskas, G., C. S. Jensen, R. T. Snodgrass, ,"A Foundation for Conventional and Temporal Query Optimization Addressing Duplicates and Ordering" in IEEE Transactions on Knowledge and Data Engineering (special issue with extended versions of best papers from ICDE'2000), Vol. 13, No. 1, pp. 21-49,, 2001

Publication

Most real-world databases contain substantial amounts of time-referenced, or temporal, data. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query representation, optimization, and processing mechanisms must be provided. This paper presents a foundation for query optimization that integrates conventional and temporal query optimization and is suitable for both conventional DBMS architectures and ones where the temporal support is obtained via a layer on top of a conventional DBMS. This foundation captures duplicates and ordering for all queries, as well as coalescing for temporal queries, thus generalizing all existing approaches known to the authors. It includes a temporally extended relational algebra to which SQL and temporal SQL queries may be mapped, six types of algebraic equivalences, concrete query transformation rules that obey different equivalences, a procedure for determining which types of transformation rules are applicable for optimizing a query, and a query plan enumeration algorithm. The presented approach partitions the work required by the database implementor to develop a provably correct query optimizer into four stages: The database implementor has to 1) specify operations formally, 2) design and prove correct appropriate transformation rules that satisfy any of the six equivalence types, 3) augment the mechanism that determines when the different types of rules are applicable to ensure that the enumeration algorithm applies the rules correctly, and 4) ensure that the mapping generates a correct initial query plan.


Price, R., N. Tryfona, C. S. Jensen, ,"Modeling Topological Constraints in Spatial Part-Whole Relationships" in in Proceedings of the Twentieth International Conference on Conceptual Modeling, Yokohama, Japan, pp. 27-40,, 2001

Publication
Online at Springer

To facilitate development of spatial applications, we investigate the problem of modeling topological constraints in part-whole relationships between spatial objects, where the related objects may themselves be composite. An example would be countries that belong to a supranational organization, where the countries are themselves composed of states. Current topological classification schemes are restricted to simple, bounded, regular, and/or 0-2D spatial data types; do not support the set-based topological constraints required to describe inter-part relationships such as those between members of a supranational organization; and focus primarily on query rather than design. We propose an approach to modeling topological relationships that allows specification of binary and set-based topological constraints on composite spatial objects. This approach does not depend on restricting the type of spatial objects, can be used to describe part-whole and inter-part relationships, and is at a level of detail suitable for use in conceptual modeling.


Friis-Christensen, A., N. Tryfona, C. S. Jensen, ,"Requirements and Research Issues in Geographic Data Modeling" in in Proceedings of the Ninth ACM International Symposium on Advances in Geographic Information Systems, Atlanta, GA, USA, pp. 2-8,, 2001

Publication
ACM Author-Izer

It is well-documented in the literature that geographic data have special characteristics that make the use of extensions to standard modeling languages and techniques, such as the Unified Modeling Language, attractive. Based on a real-world application from the Danish National Survey and Cadastre, this paper presents require- ments to geographic data modeling notations. Existing notations are then evaluated against the requirements, and a case study is carried out. The result is an identification of pertinent aspects of geographic data modeling-including roles of geographic objects, constraints on objects, and quality of data-that are not handled satisfactorily by existing proposals.


Dyreson, C. E., M. H. Böhlen, C. S. Jensen, ,"MetaXPath" in in Proceedings of the 2001 International Conference on Dublin Core and Metadata Applications, Tokyo, Japan, pp. 17-23,, 2001

Publication
Online at NII

This paper presents the MetaXPath data model and query language. MetaXPath extends XPath with support for XML metadata. XPath is a specification language for locations in an XML document. It serves as the basis for XML query languages like XSLT and the XML Query Algebra. The MetaXPath data model is a nested XPath tree. Each level of metadata induces a new level of nesting. The data model separates metadata and data into different dataspaces, supports meta-metadata, and enables sharing of metadata common to a group of nodes without duplication. The MetaXPath query language has a level shift operator to shift a query from a data level to a metadata level. MetaXPath maximally reuses XPath hence the changes needed to support metadata are few. MetaXPath is fully compatible with XPath


Slivinskas, G., C. S. Jensen, ,"Enhancing an Extensible Query Optimizer with Support for Multiple Equivalence Types" in in Proceedings of the Fifth East-European Conference on Advances in Databases and Information Systems, Vilnius, Lithuania, pp. 55-69,, 2001

Publication
Online at Springer

Database management systems are continuously being extended with support for new types of data and advanced querying capabilities. In large part because of this, query optimization has remained a very active area of research throughout the past two decades. At the same time, current commercial optimizers are hard to modify, to incorporate desired changes in, e.g., query algebras or transformation rules. This has led to a number of research contributions aiming to create extensible query optimizers, such as Starburst, Volcano, and OPT++. This paper reports on a study that has enhanced Volcano to support a relational algebra with added temporal operators, such as temporal join and aggregation. These enhancements include the introduction of algorithms and cost formulas for the new operators, six types of query equivalences, and accompanying query transformation rules. The paper describes extensions to Volcano's structure and algorithms and summarizes implementation experiences.


Lomet, D., C. S. Jensen, ,"Transaction Timestamping in (Temporal) Databases" in in Proceedings of the 27th International Conference on Very Large Databases, Rome, Italy, pp. 441-450,, 2001

Publication

Many database applications need accountability and trace-ability that necessitate retaining previous database states. For a transaction-time database supporting this, the choice of times used to timestamp database records, to establish when records are or were current, needs to be consistent with a committed transaction serialization order. Previous solutions have chosen timestamps at commit time, selecting a time that agrees with commit order. However, SQL standard databases can require an earlier choice because a statement within a transaction may request "current time." Managing timestamps chosen before a serialization order is established is the challenging problem we solve here. By building on two-phase locking concurrency control, we can delay a transaction's choice of a timestamp, reducing the chance that transactions may need to be aborted in order keep timestamps consistent with a serialization order. Also, while timestamps stored with records in a transaction-time database make it possible to directly identify write-write and write-read conflicts, handling read-write conflicts requires more. Our simple auxiliary structure conservatively detects read-write conflicts, and hence provides transaction timestamps that are consistent with a serialization order.


Slivinskas, G., C. S. Jensen, R. T. Snodgrass, ,"Adaptable Query Optimization and Evaluation in Temporal Middleware" in in Proceedings of the 2001 ACM SIGMOD International Conference on the Management of Data, Santa Barbara, CA, USA, pp. 127-138,, 2001

Publication
ACM Author-Izer

Time-referenced data are pervasive in most real-world databases. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query optimization and evaluation mechanisms must be provided, either within the DBMS proper or as a source level translation from temporal queries to conventional SQL. This paper proposes a new approach: using a middleware component on top of a conventional DBMS. This component accepts temporal SQL statements and produces a corresponding query plan consisting of algebraic as well as regular SQL parts. The algebraic parts are processed by the middleware, while the SQL parts are processed by the DBMS. The middleware uses performance feedback from the DBMS to adapt its partitioning of subsequent queries into middleware and DBMS parts. The paper describes the architecture and implementation of the temporal middleware component, termed TANGO, which is based on the Volcano extensible query optimizer and the XXL query processing library. Experiments with the system demonstrate the utility of the middleware`s internal processing capability and its cost-based mechanism for apportioning the processing between the middle- ware and the underlying DBMS.


Pfoser, D., C. S. Jensen, ,"Querying the Trajectories of On-Line Mobile Objects" in in Proceedings of the Second ACM International Workshop on Data Engineering for Wireless and Mobile Access, Santa Barbara, California, USA, pp. 66-73,, 2001

Publication
ACM Author-Izer

Position data is expected to play a central role in a wide range of mobile computing applications, including advertising, leisure, safety, security, tourist, and traffic applications. Applications such as these are characterized by large quantities of wirelessly Internet-worked, position-aware mobile objects that receive services where the objects' position is essential. The movement of an object is captured via sampling, resulting in a trajectory consisting of a sequence of connected line segments for each moving object. This paper presents a technique for querying these trajectories. The technique uses indices for the processing of spatiotemporal range queries on trajectories. If object movement is constrained by the presence of infrastructure, e.g., lakes, park areas, etc., the technique is capable of exploiting this to reduce the range query, the purpose being to obtain better query performance. Specifically, an algorithm is proposed that segments the original range query based on the infrastructure contained in its range. The applicability and limitations of the proposal are assessed via empirical performance studies with varying datasets and parameter settings


Pedersen, T. B., C. S. Jensen, C. E. Dyreson, ,"Preaggregation for irregular OLAP Hierarchies with the TreeScape System" in in Demo Proceedings of the Seventeenth International Conference on Data Engineering, Heidelberg, Germany, pp. 1-3,, 2001

Publication

We present the TreeScape system which, unlike any other system known to the authors, enables the reuse of pre-computed aggregate query results involving the kinds of irregular dimension hierarchies that occur frequently in practice. The system establishes a foundation for obtaining high-performance query processing while precomputing only few aggregates. It is demonstrated how this reuse of aggregates is enabled through dimension transformations that occur transparently to the user.


Jensen, C. S., M. Schneider, B. Seeger, V. J. Tsotras, (Eds), ,"Advances in Spatial and Temporal Databases" in Seventh International Symposium, SSTD 2001, Redondo Beach, CA, USA, Lecture Notes in Computer Science, Volume 2121, Springer-Verlag,, 2001

Online at Springer



Jensen, C. S., ,"Temporal Database Concepts" in in Encyclopedia of Microcomputers, Volume 27, Supplement 6, pp. 371-391, edited by A. Kent and J. Williams, published by Marcel Dekker, New York, NY,, 2001

Publication



Agarwal, P. K., L. J. Guibas, H. Edelsbrunner, J. Erickson, M. Isard, S. Har-Peled, J. Hershberger, C. S. Jensen, L. Kavraki, P. Koehl, M. Lin, D. Manocha, D. Metaxas, B. Mirtich, D. Mount, S. Muthukrishnan, D. Pai, E. Sacks, J. Snoeyink, S. Suri, O. Wolfson, ,"Algorithmic Issues in Modeling Motion" in Report from the National Science Foundation/Army Research Office Workshop on Motion Algorithms. The workshop was held in August 2000. 27 pages ,, 2001

Publication

This report presents the results of the workshop on Algorithmic Issues in Modeling Motion, funded by NSF and ARO, held on August 6 and 7, 2000 at Duke University, Durham, NC. The report identifies research areas in which motion plays a pivotal role, summarizes the challenges that lie ahead in dealing with motion, and makes a number of specific recommendations to address some of the challenges presented.


Jensen, C. S., A. Friis-Christensen, T. B. Pedersen, D. Pfoser, S. Šaltenis, N. Tryfona, ,"Location-Based Services - A Database Perspective" in in Proceedings of the Eighth Scandinavian Research Conference on Geographical Information Sciences, Norway, pp. 59-68,, 2001

Eighth Scandinavian Research Conference on Geographical Information Science
Publication

We are heading rapidly towards a global computing and information infrastructure that will contain billions of wirelessly connected devices, many of which will offer so-called location-based services to their mobile users always and everywhere. Indeed, users will soon take ubiquitous wireless access to information and services for granted. This scenario is made possible by the rapid advances in the underlying hardware technologies, which continue to follow variants of Moore's Law. Examples of location-based services include tracking, way-finding, traffic management, safety-related services, and mixed-reality games, to name but a few. This paper outlines a general usage scenario for location-based services, in which data warehousing plays a central role, and it describes central challenges to be met by the involved software technologies in order for them to reach their full potential for usage in location-based services.


Jensen, C. .S., T. B. Pedersen, ,"Mobile E-Services and Their Challenges to Data Warehousing" in in Proceedings of the Workshop des Arbeitskreises `Konzepte des Data Warehousing', Oldenburg, Germany, 9 pages,, 2001

Publication

Continued advances in hardware technologies combine to create a new class of information services, termed mobile e-services, or simply m-services, which exploits the advances in, among others, wireless communications, positioning, and miniaturization. Because the users do not merely interact with the services from behind stationary desktop computers, but from a variety of increasingly unobtrusive information appliances while on the move, location information plays a fundamental role, and new types of services become of interest. Such services include tracking, way-finding, traffic management, safety-related services, and mixed-reality games, to name but a few. Data warehousing has the potential for playing an essential part in m-services. However, for data warehousing to be successful in an m-service scenario, new challenges must be met by data warehousing technologies. Such challenges include support for non-standard dimension hierarchies and imprecision and varying precision in the data; transportation networks; continuous change; closed-loop usage; and dynamic services. This paper outlines a general m-services scenario and describes central challenges to be met by data warehousing in order for it to reach its full potential for usage in m-services.


Damsgaard, J., J. Hørlück, C. S. Jensen, ,"From Financial Wholesale to Retail - Preparing an IT-infrastructure for e-commerce" in Teaching case. 28 pages,, 2001

In the beginning of 2001, the newly employed Internet director of the largest Danish mortgage company Nykredit was on his way to a meeting to review and revise Nykredit's Internet strategy. The principal consideration was depending on how to distribute Nykredit's many products and services in the near future. One basic question summarized the pending decision well: Should Nykredit primarily rely on a call-center and a strong web presence or should Nykredit extend its present branch network, focus on building up a finely meshed physical network, and distribute its services mainly through branches and offices? Nykredit was in a unique position in that it did not have many physical offices, and therefore Nykredit did in fact have a real choice that many other financial companies did not have in practice. On the other hand, the choices were only real if Nykredit could meet the technological challenge and if they, by the use of technology, could convince the customers that personal service was possible without a strong physical presence. The meeting was to take place at Nykredit's Data Center in Aalborg. As he was walking through the building's long passageway, he was thinking to himself how IT was once a remote back-office support service and how it had now moved to the center stage and had become the business storefront. The fact that the strategy meeting took place at the data center underlined the importance of IT to Nykredit. Any decisions on distribution had to be carefully aligned not only to traditional considerations such as Nykredit's own organization, the market and the competitors' initiatives, but also to Nykredit's existing IT infrastructure and its IT department's capabilities to produce modern software fast.


Damsgaard, J., J. Hørlück, C. S. Jensen, ,"From Financial Wholesale to Retail - Preparing an IT-infrastructure for e-commerce" in Teaching note accompanying the above, 8 pages,, 2001




Jensen, C. .S., ,"Virtual Worlds and Augmented Realities" in Vision statement invited by the Danish Technical Research Council, 2 pages,, 2001




Benetis, R., C. S. Jensen, G. Karčiauskas, S. Šaltenis, ,"Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects" in TimeCenter Technical Report TR-66, 21 pages,, 2001

Publication

With the proliferation of wireless communications and the rapid advances in technologies for tracking the positions of continuously moving objects, algorithms for efficiently answering queries about large numbers of moving objects increasingly are needed. One such query is the reverse nearest neighbor (RNN ) query that returns the objects that have a query object as their closest object. While algorithms have been proposed that compute RNN queries for non-moving objects, there have been no proposals for answering RNN queries for continuously moving objects. Another such query is the nearest neighbor (NN ) query, which has been studied extensively and in many contexts. Like the RNN query, the NN query has not been explored for moving query and data points. This paper proposes an algorithm for answering RNN queries for continuously moving points in the plane. As a part of the solution to this problem and as a separate contribution, an algorithm for answering NN queries for continuously moving points is also proposed. The results of performance experiments are reported.


Skyt, J., C. S. Jensen, ,"Persistent Views - a Mechanism for Managing Aging Data" in TimeCenter Technical Report TR-65, 29 pages,, 2001

Publication

Enabled by the continued advances in storage technologies, the amounts of on-line data grow at a rapidly increasing pace. This development is witnessed in, e.g., so-called data webhouses that accumulate click streams from portals, and in other data warehouse-type applications. The presence of very large and continuously growing amounts of data introduces new challenges, one of them being the need for effective management of aged data. In very large and growing databases, some data eventually becomes inaccurate or outdated, and may be of reduced interest to the database applications. This paper offers a mechanism, termed persistent views, that aids in flexibly reducing the volume of data, e.g., by enabling the replacement of such "low-interest," detailed data with aggregated data. The paper motivates persistent views and precisely defines and contrasts these with the related mechanisms of views, snapshots, and physical deletion. The paper also offers a provably correct foundation for implementing persistent views.


Šaltenis, S., C. S. Jensen, ,"Indexing of Moving Objects for Location-Based Services" in TimeCenter Technical Report TR-63, 24 pages,, 2001

Publication

With the continued proliferation of wireless networks, e.g., based on such evolving standards as WAP and Bluetooth, visionaries predict that the Internet will soon extend to billions of wireless devices, or objects. A substantial fraction of these will offer their changing positions to the (location-based) services, they either use or support. As a result, software technologies that enable the management of the positions of objects capable of continuous movement are in increasingly high demand. This paper assumes what we consider a realistic Internet-service scenario where objects that have not reported their position within a specified duration of time are expected to no longer be interested in, or of interest to, the service. In this scenario, the possibility of substantial quantities of "expiring" objects introduces a new kind of implicit update, which contributes to rendering the database highly dynamic. The paper presents an R-tree based technique for the indexing of the current positions of such objects. Extensive performance experiments explore the properties of the types of bounding regions that are candidates for being used in the internal entries of the index, and they show that, when compared to the approach where the objects are not assumed to expire, the new indexing technique can improve the search performance by as much as a factor of two or more without sacrificing update performance.


Pfoser, D., C. S. Jensen, ,"Querying the Trajectories of On-Line Mobile Objects" in TimeCenter Technical Report TR-57, 19 pages,, 2001

Publication

Position data is expected to play a central role in a wide range of mobile computing applications, including advertising, leisure, safety, security, tourist, and traffic applications. Applications such as these are characterized by large quantities of wirelessly Internet-worked, position-aware mobile objects that receive services where the objects' position is essential. The movement of an object is captured via sampling, resulting in a trajectory consisting of a sequence of connected line segments for each moving object. This paper presents a technique for querying these trajectories. The technique uses indices for the processing of spatiotemporal range queries on trajectories. If object movement is constrained by the presence of infrastructure, e.g., lakes, park areas, etc., the technique is capable of exploiting this to reduce the range query, the purpose being to obtain better query performance. Specifically, an algorithm is proposed that segments the original range query based on the infrastructure contained in its range. The applicability and limitations of the proposal are assessed via empirical performance studies with varying datasets and parameter settings.


Slivinskas, G., C. S. Jensen, R. T. Snodgrass, ,"Adaptable Query Optimization and Evaluation in Temporal Middleware" in TimeCenter Technical Report TR-56, 28 pages,, 2001

Publication

Time-referenced data are pervasive in most real-world databases. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query optimization and evaluation mechanisms must be provided, either within the DBMS proper or as a source level translation from temporal queries to conventional SQL. This paper proposes a new approach: using a middleware component on top of a conventional DBMS. This component accepts temporal SQL statements and produces a corresponding query plan consisting of algebraic as well as regular SQL parts. The algebraic parts are processed by the middleware, while the SQL parts are processed by the DBMS. The middleware uses performance feedback from the DBMS to adapt its partitioning of subsequent queries into middleware and DBMS parts. The paper describes the architecture and implementation of the temporal middleware component, termed TANGO, which is based on the Volcano extensible query optimizer and the XXL query processing library. Experiments with the system demonstrate the utility of the middleware`s internal processing capability and its cost-based mechanism for apportioning the processing between the middleware and the underlying DBMS.


2000 Top

Price, R., N. Tryfona, C. S. Jensen, ,"Extended Spatiotemporal UML: Motivations, Requirements, and Constructs" in Journal of Database Management (special issue: Systems Analysis and Design Using UML), Vol. 11, No. 4, pp. 13-27,, 2000

Publication



Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, ,"Temporal Statement Modifiers" in ACM Transactions on Database Systems, Vol. 25, No. 4, pp. 407-456,, 2000

Publication
ACM Author-Izer

A wide range of database applications manage time-varying data. Many temporal query languages have been proposed, each one the result of many carefully made yet subtly interacting design decisions. In this article we advocate a different approach to articulating a set of requirements, or desiderata, that directly imply the syntactic structure and core semantics of a temporal extension of an (arbitrary) nontemporal query language. These desiderata facilitate transitioning applications from a nontemporal query language and data model, which has received only scant attention thus far. The paper then introduces the notion of statement modifiers that provide a means of systematically adding temporal support to an existing query language. Statement modifiers apply to all query language statements, for example, queries, cursor definitions, integrity constraints, assertions, views, and data manipulation statements. We also provide a way to systematically add temporal support to an existing implementation. The result is a temporal query language syntax, semantics, and implementation that derives from first principles. We exemplify this approach by extending SQL-92 with statement modifiers. This extended language, termed ATSQL, is formally defined via a denotational-semantics-style mapping of temporal statements to expressions using a combination of temporal and conventional relational algebraic operators.


Güting, R. H., M. Böhlen, M. Erwig, C. S. Jensen, N. Lorentzos, M. Schneider, M. Vazirgiannis, ,"A Foundation for Representing and Querying Moving Objects" in ACM Transactions on Database Systems, Vol. 25, No. 1, pp. 1-42,, 2000

Publication
ACM Author-Izer

Spatio-temporal databases deal with geometries changing over time. The goal of our work is to provide a DBMS data model and query language capable of handling such time-dependent geometries, including those changing continuously that describe moving objects. Two fundamental abstractions are moving point and moving region, describing objects for which only the time-dependent position, or position and extent, respectively, are of interest. We propose to present such time-dependent geometries as attribute data types with suitable operations, that is, to provide an abstract data type extension to a DBMS data model and query language. This paper presents a design of such a system of abstract data types. It turns out that besides the main types of interest, moving point and moving region, a relatively large number of auxiliary data types are needed. For example, one needs a line type to represent the projection of a moving point into the plane, or a moving real to represent the time-dependent distance of two points. It then becomes crucial to achieve (i) orthogonality in the design of the system, i.e., type constructors can be applied unifomly; (ii) genericity and consistency of operations, i.e., operations range over as many types as possible and behave consistently; and (iii) closure and consistency between structure and operations of nontemporal and related temporal types. Satisfying these goal leads to a simple and expressive system of abstract data types that may be integrated into a query language to yield a powerful language for querying spatio-temporal data, including moving objects. The paper formally defines the types and operations, offers detailed insight into the considerations that went into the design, and exemplifies the use of the abstract data types using SQL. The paper offers a precise and conceptually clean foundation for implementing a spatio-temporal DBMS extension.


Torp, K., R. T. Snodgrass, C. S. Jensen, ,"Effective Timestamping in Databases" in The VLDB Journal, Vol. 8, No. 4, pp. 267-288,, 2000

Publication
Online at Springer

Many existing database applications place various timestamps on their data, rendering temporal values such as dates and times prevalent in database tables. During the past two decades, several dozen temporal data models have appeared, all with timestamps being integral components. The models have used timestamps for encoding two specific temporal aspects of database facts, namely transaction time, when the facts are current in the database, and valid time, when the facts are true in the modeled reality. However, with few exceptions, the assignment of timestamp values has been considered only in the context of individual modification statements. This paper takes the next logical step: It considers the use of timestamping for capturing transaction and valid time in the context of transactions. The paper initially identifies and analyzes several problems with straightforward timestamping, then proceeds to propose a variety of techniques aimed at solving these problems. Timestamping the results of a transaction with the commit time of the transaction is a promising approach. The paper studies how this timestamping may be done using a spectrum of techniques. While many database facts are valid until now, the current time, this p value is absent from the existing temporal types. Techniques that address this problem using different substitute values are presented. Using a stratum architecture, the performance of the different proposed techniques are studied. Although querying and modifying time-varying data is accompanied by a number of subtle problems, we present a comprehensive approach that provides application programmers with simple, consistent, and efficient support for modifying bitemporal databases in the context of user transactions.


Price, R., N. Tryfona, C. S. Jensen, ,"Modeling part-whole relationships for spatial data" in in Proceedings of the Eighth International Symposium of ACM GIS, Washington DC, pp. 1-8,, 2000

Publication
ACM Author-Izer

Spatial applications must manage partwhole (PW) relationships between spatial objects, for example, the division of an administrative region into zones based on land use. Support for conceptual modeling of relationships between parts and whole, such as aggregation and membership, has been well researched in the object oriented (OO) community; however, spatial data has generally not been considered. We propose here a practical approach to integrating support for spatial PW relationships into conceptual modeling languages. Three different types of relationships - spatial part, spatial membership, and spatial inclusion - that are of general utility in spatial applications are identified and formally defined using a consistent classification framework based on spatial derivation and constraint relationships. An extension of the Unified Modeling Language (UML) for spatiotemporal data, namely Extended Spatiotemporal UML, is used to demonstrate the feasibility of using such an approach to define modeling constructs supporting spatial PW relationships.


Pedersen, T. B., A. Shoshani, J. Gu, C. S. Jensen, ,"Extending OLAP Querying to External Object Databases" in in Proceedings of the Ninth International Conference on Information and Knowledge Management, Washington, DC, pp. 405-413,, 2000

Publication
ACM Author-Izer

On-Line Analytical Processing (OLAP) systems based on a multidimensional view of data have found widespread use in business applications and are being used increasingly in non-standard applications. These systems provide good performance and ease-of-use. However, the complex structures and relationships inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This paper presents the concepts and techniques underlying a flexible, "multi-model" federated system that enables OLAP users to exploit simultaneously the features of OLAP and object database systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for multidimensional data and object database systems for more complex, general data. Additionally, physical data integration can be avoided. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally support queries that involve data in object databases. The language permits selection criteria that reference object data, queries that return combinations of OLAP and object data, and queries that group multidimensional data according to object data. The system is designed to be aggregation-safe, in the sense that it exploits the aggregation semantics of the data to prevent incorrect or meaningless query results. A prototype implementation of the system is reported.


Skyt, J., C. S. Jensen, ,"Managing Aging Data Using Persistent Views (Extended Abstract)" in in Proceedings of the Fifth IFCIS International Conference on Cooperative Information Systems, Eilat, Israel, pp. 132-137. The proceedings were published as Lecture Notes in Computer Science 1901,, 2000

Publication

Enabled by the continued advances in storage technologies, the amounts of on-line data grow at a rapidly increasing pace. For example, this development is witnessed in the so-called data webhouses that accumulate data derived from clickstreams. The presence of very large and continuously growing amounts of data introduces new challenges, one of them being the need for effectively managing aging data that is perhaps inaccurate, partly outdated, and of reduced interest. This paper describes a new mechanism, persistent views, that aids in flexibly reducing the volume of data, e.g., by enabling the replacement of such "low-interest," detailed data with aggregated data; and it outlines a strategy for implementing persistent views.


Pedersen, T. B., C. S. Jensen, C. E. Dyreson, ,"The TreeScape System: Reuse of Pre-Computed Aggregates over Irregular OLAP Hierarchies" in in Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt, pp. 595-598 (demo),, 2000

Publication

We present the TreeScape system that, unlike any other system known to the authors, enables the reuse of pre-computed aggregate query results for irregular dimension hierarchies, which occur frequently in practice. The system establishes a foundation for obtaining high query processing performance while pre-computing only limited aggregates. The paper shows how this reuse of aggregates is enabled through dimension transformations that occur transparently to the user.


Pfoser, D., C. S. Jensen, Y. Theodoridis, ,"Novel Approaches in Query Processing for Moving Objects" in in Proceesings of the 26th International Conference on Very Large Databases, Cairo, Egypt, pp. 395-406,, 2000

Publication

The domain of spatiotemporal applications is a treasure trove of new types of data and queries. However, work in this area is guided by related research from the spatial and temporal domains, so far, with little attention towards the true nature of spatiotemporal phenomena. In this work, the focus is on a spatiotemporal sub-domain, namely the trajectories of moving point objects. We present new types of spatiotemporal queries, as well as algorithms to process those. Further, we introduce two access methods this kind of data, namely the Spatio-Temporal R-tree (STR-tree) and the Trajectory-Bundle tree (TB-tree). The former is an R-tree based access method that considers the trajectory identity in the index as well, while the latter is a hybrid structure, which preserves trajectories as well as allows for R-tree typical range search in the data. We present performance studies that compare the two indices with the R-tree (appropriately modified, for a fair comparison) under a varying set of spatiotemporal queries, and we provide guidelines for a successful choice among them.


Bliujute, R., C. S. Jensen, S. Šaltenis, G. Slivinskas, ,"Light-Weight Indexing of General Bitemporal Data" in in Proceedings of the Twelfth International Conference on Scientific and Statistical Database Management, Berlin, Germany, pp. 125-138,, 2000

Publication

Most data managed by existing, real-world database applications is time referenced. Often, two temporal aspects of data are of interest, namely valid time, when data is true in the mini-world, and transaction time, when data is current in the database, resulting in so-called bitemporal data. Like spatial data, bitemporal data thus has associated two-dimensional regions. Such data is in part naturally now relative: some data is true until the current time, and some data is part of the current database state. Therefore, unlike for spatial data, bitemporal data regions may grow continuously. Existing indices, e.g., B+- and R-trees, typically do not contend well with even small amounts of now-relative data.In contrast, the 4-R index presented in the paper is capable of indexing general bitemporal data efficiently. The different kinds of growing data regions are transformed into stationary regions, which are then indexed by R*-trees. Queries are also transformed to counter the data transformations, yielding a technique with perfect precision and recall. Performance studies indicate that the technique is competitive with the best existing index; and unlike this existing index, the new technique does not require extension of the DBMS kernel.


Šaltenis, S., C. S. Jensen, S. Leutenegger, M. Lopez, ,"Indexing the Positions of Continuously Moving Objects" in in Proceedings of the 2000 ACM SIGMOD International Conference on the Management of Data, Dallas, TX, USA, pp. 331-342,, 2000

Publication
ACM Author-Izer

The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The present paper proposes a novel, R*-tree based indexing technique that supports the efficient querying of the current and projected future positions of such moving objects. The technique is capable of indexing objects moving in one-, two-, and three-dimensional space. Update algorithms enable the index to accommodate a dynamic data set, where objects may appear and disappear, and where changes occur in the anticipated positions of existing objects. A comprehensive performance study is reported.


Tryfona, N., C. S. Jensen, ,"Using Abstractions for Spatio-Temporal Conceptual Modeling" in in Proceedings of the 2000 ACM Symposium on Applied Computing, Villa Olmo, Como, Italy, pp. 313-322,, 2000

Publication
ACM Author-Izer

Conceptual data modeling for complex applications, such as multimedia and spatiotemporal applications, often results in large, complicated and difficult-to-comprehend diagrams. These diagrams frequently involve repetition of autonomous, semantically meaningful parts that capture similar situations and characteristics. By recognizing such parts and treating them as modeling units, it is possible to simplify the diagrams, as well as the conceptual modeling process. In this paper, based on requirements drawn from real applications, we present a set of modeling units that capture spatial, temporal, and spatiotemporal aspects. To facilitate the conceptual design process, these units are abbreviated in the conceptual diagrams by corresponding spatial, temporal, and spatiotemporal modeling abstractions. The result is more elegant and less- detailed diagrams that are easier to comprehend (for the user) and to construct (for the designer), but yet semantically rich. An extension of the Entity-Relationship model serves as the context for this study. An example from a real cadastral application illustrates the benefits of using an abstraction-based conceptual model.


Slivinskas, G., C. S. Jensen, R. T. Snodgrass, ," Query Plans for Conventional and Temporal Queries Involving Duplicates and Ordering " in in Proceedings of the Sixteenth IEEE International Conference on Data Engineering, San Diego, CA, USA, pp. 547-558 ,, 2000

Publication

Most real-world database applications contain a substantial portion of time-referenced, or temporal, data. Recent advances in temporal query languages show that such database applications could benefit substantially from builtin temporal support in the DBMS. To achieve this, temporal query representation, optimization, and processing mechanisms must be provided. This paper presents a general, algebraic foundation for query optimization that integrates conventional and temporal query optimization and is suitable for providing temporal support both via a stand-alone temporal DBMS and via a layer on top of a conventional DBMS. By capturing duplicate removal and retention and order preservation for all queries, as well as coalescing for temporal queries, this foundation formalizes and generalizes existing approaches.


Jensen, C. S., R. T. Snodgrass, ,"Temporally Enhanced Database Design" in Chapter 7, pp. 163-193, in Advances in Object-Oriented Data Modeling, edited by M. P. Papazoglou, S. Spaccapietra, and Z. Tari, MIT Press, 367+xxv pages,, 2000

Advances in Object-Oriented Data Modeling
Publication



Pedersen, T. B., C. S. Jensen, ,"Advanced Implementation Techniques for Scientific Data Warehouses" in in Proceedings of the Workshop of Management and Integration of Biochemical Data, Villa Bosch, Heidelberg, Germany, 9 pages ,, 2000

Publication

Data warehouses using a multidimensional view of data have become very popular in both business and science in recent years. Data warehouses for scientific purposes such as medicine and bio-chemistry1 pose several great challenges to existing data warehouse technology. Data warehouses usually use pre-aggregated data to ensure fast query response. However, pre-aggregation cannot be used in practice if the dimension structures or the relationships between facts and dimensions are irregular. A technique for overcoming this limitation and some experimental results are presented. Queries over scientific data warehouses often need to reference data that is external to the data warehouse, e.g., data that is too complex to be handled by current data warehouse technology, data that is "owned" by other organizations, or data that is updated frequently. An example of this are the public genome databases such as Swissprot. This paper presents a federation architecture that allows the integration of multidimensional warehouse data with complex external data.


Jensen, C. S., ,"Themes and Challenges in Temporal Databases" in pp. 175-176, in H.Schmidt, editor, Modellierung Betrieblicher Informationssysteme. (Proceedings der MOBIS-Fachtagung, Universität Siegen, Germany) ,, 2000

Publication



Jensen, C. S., ,"Review - The Logical Access Path Schema of a Database" in ACM SIGMOD Digital Review, Vol. 2,, 2000

Online at DBLP



Price, R., N. Tryfona, C. S. Jensen, ,"Supporting Conceptual Modeling of Complex Spatial Relationships" in Chorochronos Technical Report CH-00-5,, 2000

Geographic Information Systems must manage complex relationships between spatial entities such as the division of administrative regions into zones based on land use. Support for conceptual modeling of complex relationships such as aggregation and membership has been well-researched in the object-oriented community; however, spatial data has generally not been considered. In this paper, we propose a practical approach to integrating support for complex spatial relationships into a conceptual modeling language. Three different types of complex spatial relationships-spatial part, overlay, membership, and inclusion-which are of general utility in GIS applications are identified and formally defined using a consistent classification framework based on spatial derivation and constraint relationships. An extension of the Unified Modeling Language (UML) for spatiotemporal data, Extended Spatiotemporal UML, is used to demonstrate the feasibility of using such an approach to define modeling constructs supporting complex spatial relationships.


Slivinskas, G., C. S. Jensen, R. T. Snodgrass, ," A Foundation for Conventional and Temporal Query Optimization Addressing Duplicates and Ordering " in TimeCenter Technical Report TR-49, 44 pages,, 2000

Publication

Most real-world databases contain substantial amounts of time-referenced, or temporal, data. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query representation, optimization, and processing mechanisms must be provided. This paper presents a foundation for query optimization that integrates conventional and temporal query optimization and is suitable for both conventional DBMS architectures and ones where the temporal support is obtained via a layer on top of a conventional DBMS. This foundation captures duplicates and ordering for all queries, as well as coalescing for temporal queries, thus generalizing all existing approaches known to the authors. It includes a temporally extended relational algebra to which SQL and temporal SQL queries may be mapped, six types of algebraic equivalences, concrete query transformation rules that obey different equivalences, a procedure for determining which types of transformation rules are applicable for optimizing a query, and a query plan enumeration algorithm. The presented approach partitions the work required by the database implementor to develop a provably correct query optimizer into four stages: the database implementor has to (1) specify operations formally; (2) design and prove correct appropriate transformation rules that satisfy any of the six equivalence types; (3) augment the mechanism that determines when the different types of rules are applicable to ensure that the enumeration algorithm applies the rules correctly; and (4) ensure that the mapping generates a correct initial query plan.


Pfoser, D., C. S. Jensen, Y. Theodoridis, ,"Novel Approaches in Query Processing for Moving Objects" in Chorochronos Technical Report CH-00-03, 26 pages,, 2000

Publication

The domain of spatiotemporal applications is a treasure trove of new types of data as well as queries. However, work in this area is guided by related research from the spatial and temporal domains, so far, with little attention towards the true nature of spatiotemporal phenomena. In this work the focus is on a spatiotemporal sub-domain, namely moving point objects. We present new types of spatiotemporal queries, as well as algorithms to process those. Further, we introduce two access methods to index these kinds of data, namely the Spatio-Temporal R-tree (STR-tree) and the Trajectory-Bundle tree (TB-tree). The former is an R-tree based access method that considers the trajectory identity in the index as well, while the latter is a hybrid structure, which preserves trajectories as well as allows for R-tree typical range search in the data. We present performance studies that compare the two indices with the R-tree (appropriately modified, for a fair comparison) under a varying set of spatiotemporal queries and provide guidelines for a successful choice among them.


1999 Top

Tryfona, N., C. S. Jensen, ,"Conceptual Data Modeling for Spatiotemporal Applications" in Geoinformatica, Vol. 3, No. 3, pp. 245-268,, 1999

Publication
Online at Springer

Many exciting potential application areas for database technology manage time-varying, spatial information. In contrast, existing database techniques, languages, and associated tools provide little built-in support for the management of such information. The focus of this paper is on enhancing existing conceptual data models with new constructs, improving their ability to conveniently model spatiotemporal aspects of information. The goal is to speed up the data modeling process and to make diagrams easier to comprehend and maintain. Based on explicitly formulated ontological foundations, the paper presents a small set of new, generic modeling constructs that may be introduced into different conceptual data models. The ER model is used as the concrete context for presenting the constructs. The semantics of the resulting spatiotemporal ER model, STER, is given in terms of the underlying ER model. STER is accompanied by a textual counterpart, and a CASE tool based on STER is currently being implemented, using the textual counterpart as its internal representation.


Gregersen, H., C. S. Jensen, ,"Temporal Entity-Relationship Models - a Survey" in IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 3, pp. 464-497,, 1999

Publication

The Entity-Relationship (ER) Model, using varying notations and with some semantic variations, is enjoying a remarkable, and increasing, popularity in both the research community, the computer science curriculum, and in industry. In step with the increasing diffusion of relational platforms, ER modeling is growing in popularity. It has been widely recognized that temporal aspects of database schemas are prevalent and difficult to model using the ER model. As a result, how to enable the ER model to properly capture time-varying information has for a decade and a half been an active area of the database research community. This has led to the proposal of almost a dozen temporally enhanced ER models. This paper surveys all temporally enhanced ER models known to the authors. It is the first paper to provide a comprehensive overview of temporal ER modeling, and it thus meets a need for consolidating and providing easy access to the research in temporal ER modeling. In the presentation of each model, the paper examines how the time-varying information is captured in the model and presents the new concepts and modeling constructs of the model. A total of 19 different design properties for temporally enhanced ER models are defined, and each model is characterized according the these properties.


Jensen, C. S., R. T. Snodgrass, ,"Temporal Data Management" in in IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 1, pp. 36-44,, 1999

Publication

A wide range of database applications manage time-varying information. Existing database technology currently provides little support for managing such data. The research area of temporal databases has made important contributions in characterizing the semantics of such information and in providing expressive and efficient means to model, store, and query temporal data. This paper introduces the reader to temporal data management, surveys state-of-the-art solutions to challenging aspects of temporal data management, and points to research directions.


Gregersen, H., C. S. Jensen, ," On the Ontological Expressiveness of Temporal Extensions to the Entity-Relationship Model " in in Proceedings of the First International Workshop on Evolution and Change in Data Management, Versailles, France, pp. 110-121 ,, 1999

Publication

It is widely recognized that temporal aspects of database schemas are prevalent, but also difficult to capture using the ER model. The database research community's response has been to develop temporally enhanced ER models. However, these models have not been subjected to systematic evaluation. In contrast, the evaluation of modeling methodologies for information systems development is a very active area of research in information systems engineering community, where the need for systematic evaluations of modeling methodologies is well recognized. Based on a framework from information systems engineering, this paper evaluates the ontological expressiveness of three different temporal enhancements to the ER model, the Entity-Relation-Time model, the TERC+ model, and the Time Extended ER model. Each of these temporal ER model extensions is well-documented, and together the models represent a substantial range of the design space for temporal ER extensions. The evaluation considers the uses of the models for both analysis and design, and the focus is on how well the models capture temporal aspects of reality as well as of relational database designs.


Dyreson, C. E., M. H. Böhlen, C. S.Jensen, ,"Capturing and Querying Multiple Aspects of Semistructured Data" in in Proceedings of the Twentyfifth International Conference on Very Large Databases, pp. 290-301 ,, 1999

Publication

Motivated to a large extent by the substantial and growing prominence of the World-Wide Web and the potential benefits that may be obtained by applying database concepts and techniques to web data management, new data models and query languages have emerged that contend with web data. These models organize data in graphs where nodes denote objects or values and edges are labeled with single words or phrases. Nodes are described by the labels of the paths that lead to them, and these descriptions serve as the basis for querying. This paper proposes an extensible framework for capturing and querying meta-data properties in a semistructured data model. Properties such as temporal aspects of data, prices associated with data access, quality ratings associated with the data, and access restrictions on the data are considered. Specifically, the paper defines an extensible data model and an accompanying query language that provides new facilities for matching, slicing, collapsing, and coalescing properties. It also briefly introduces an implemented, SQL-like query language for the extended data model that includes additional constructs for the effective querying of graphs with properties.


Pedersen, T. B., C. S. Jensen, C. E. Dyreson, ,"Extending Practical Pre-Aggregation in On-Line Analytical Processing" in in Proceedings of the Twentyfifth International Conference on Very Large Databases, pp. 663-674 ,, 1999

Publication

On-Line Analytical Processing (OLAP) based on a dimensional view of data is being used increasingly for the purpose of analyzing very large amounts of data. To improve query performance, modern OLAP systems use a technique known as practical pre-aggregation, where select combinations of aggregate queries are materialized and re-used to compute other aggregates; full preaggregation, where all combinations of aggregates are materialized, is infeasible. However, this reuse of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints, which severely limits the scope of practical preaggregation. This paper significantly extends the scope of practical pre-aggregation to cover a much wider range of realistic situations. Specifically, algorithms are given that transform irregular dimension hierarchies and fact-dimension relationships, which often occur in real-world OLAP applications, into well-behaved structures that, when used by existing OLAP systems, enable practical pre-aggregation. The algorithms have low computational complexity and may be applied incrementally to reduce the cost of updating OLAP structures.


Tryfona, N., S. Andersen, S. R. Mogensen, C. S. Jensen, ,"A Methodology and a Tool for Spatiotemporal Database Design" in in Proceedings of the Seventh Hellenic Conference on Informatics, Ioannina, Greece, pp. III 53-60 ,, 1999

Publication

This paper concerns a methodology and its supporting prototype tool for database design of spatiotemporal applications. The methodology focuses on the main phases of conceptual and logical modeling with each phase being accompanied by models specifically constructed to handle spatiotemporal peculiarities. A database design tool that guides the designer through the conceptual and logical modeling as well as implementation, while dealing with applications involving space and time, is further presented. Starting from the conceptual modeling phase, the tool provides a specific environment to support the SpatioTemporal Entity-Relationship (STER) model, an extension of the Entity-Relationship model, towards the spatial and temporal dimension. An intermediate representation phase, namely, the logical phase, follows; in this, conceptual schemata are mapped into maps and relations, using an extension of the relational model, the SpatioTemporal Relational model (STR). Translation rules from conceptual to logical schemata are given. The resulted logical schemata are further translated into different underlying target DBMSs with spatial support; Oracle and the Spatial Data Option are used as a prototype. The STER and STR models, as well as the proposed tool are tested with extended examples from real applications.


Pfoser, D., C. S. Jensen, ,"Incremental Join of Time-Oriented Data" in in Proceedings of the Eleventh International Conference on Scientific and Statistical Database Management, Cleveland, Ohio, pp. 232-243 ,, 1999

Publication

Data warehouses as well as a wide range of other databases exhibit a strong temporal orientation: it is important to track the temporal variation of data over several months or years. In addition, databases often exhibit append-only characteristics where old data is retained while new data is appended. Performing joins efficiently on large databases such as these is essential to obtain good overall query processing performance. This paper presents a sort-merge-based incremental algorithm for time-oriented data. While incremental computation techniques have proven competitive in many settings, they also introduce a space overhead in the form of differential files. For the temporal data explored here, this overhead is avoided because the differential files are already part of the database. In addition, data is naturally sorted, leaving only merging. The incremental algorithm works in a partitioned storage environment and does not assume the availability of indices, making it a competitor to sort-based and nested-loop joins. The paper presents analytical as well as simulation-based characterizations of the performance of the join.


Pedersen, T. B., C. S. Jensen, C. E. Dyreson, ,"Supporting Imprecision in Multidimensional Databases Using Granularities" in in Proceedings of the Eleventh International Conference on Scientific and Statistical Database Management, Cleveland, Ohio, pp. 90-101 ,, 1999

Publication

On-Line Analytical Processing (OLAP) technologies are being used widely, but the lack of effective means of handling data imprecision, which occurs when exact values are not known precisely or are entirely missing, represents a major obstacle in applying these technologies in many domains.This paper develops techniques for handling imprecision that aim to maximally reuse existing OLAP modeling constructs such as dimension hierarchies and granularities. With imprecise data available in the database, queries are tested to determine whether or not they may be answered precisely given the available data; if not, alternative queries unaffected by the imprecision are suggested. When processing queries affected by imprecision, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. The approach is capable of exploiting existing OLAP query processing techniques such as pre-aggregation, yielding an effective approach with low computational overhead and that may be implemented using current technology.


Pfoser, D., C. S. Jensen, ,"Capturing the Uncertainty of Moving-Object Representations" in in Proceedings of the Sixth International Symposium on Spatial Databases, Hong Kong, pp. 111-132 ,, 1999

Publication
Online at Springer

Spatiotemporal applications, such as fleet management and air traffic control, involving continuously moving objects are increasingly at the focus of research efforts. The representation of the continuously changing positions of the objects is fundamentally important in these applications. This paper reports on on-going research in the representation of the positions of moving-point objects. More specifically, object positions are sampled using the Global Positioning System, and interpolation is applied to determine positions in-between the samples. Special attention is given in the representation to the quantification of the position uncertainty introduced by the sampling technique and the interpolation. In addition, the paper considers the use for query processing of the proposed representation in conjunction with indexing. It is demonstrated how queries involving uncertainty may be answered using the standard filter-and-refine approach known from spatial query processing.


Bliujute, R., S. Šaltenis, G. Slivinskas, C. S. Jensen, ,"Developing a DataBlade for a New Index" in in Proceedings of the Fifteenth IEEE International Conference on Data Engineering, Sydney, Australia, pp. 314-323 ,, 1999

Publication

In order to better support current and new applications, the major DBMS vendors are stepping beyond uninterpreted binary large objects, termed BLOBs, and are beginning to offer extensibility features that allow external developers to extend the DBMS with, e.g., their own data types and accompanying access methods. Existing solutions include DB2 extenders, Informix DataBlades, and Oracle cartridges. Extensible systems offer new and exciting opportunities for researchers and third-party developers alike. This paper reports on an implementation of an Informix DataBlade for the GR-tree, a new R-tree based index. This effort represents a stress test of the perhaps currently most extensible DBMS, in that the new DataBlade aims to achieve better performance, not just to add functionality. The paper provides guidelines for how to create an access method DataBlade, describes the sometimes surprising challenges that must be negotiated during DataBlade development, and evaluates the extensibility of the Informix Dynamic Server.


Pedersen, T. B., C. S. Jensen, ,"Multidimensional Data Modeling for Complex Data" in in Proceedings of the Fifteenth IEEE International Conference on Data Engineering, Sydney, Australia, pp. 336-345 ,, 1999

Publication

On-Line Analytical Processing (OLAP) systems considerably ease the process of analyzing business data and have become widely used in industry. Such systems primarily employ multidimensional data models to structure their data. However, current multidimensional data models fall short in their abilities to model the complex data found in some real-world application domains. The paper presents nine requirements to multidimensional data models, each of which is exemplified by a real-world, clinical case study. A survey of the existing models reveals that the requirements not currently met include support for many-to-many relationships between facts and dimensions, built-in support for handling change and time, and support for uncertainty as well as different levels of granularity in the data. The paper defines an extended multidimensional data model, and an associated algebra, which address all nine requirements.


Böhlen, M. H., C. S. Jensen, M. Scholl, editors, ,"Spatio-Temporal Database Management" in in Proceedings of the International Workshop on Spatio-Temporal Database Management, Edinburgh, Scotland, September, Lecture Notes in Computer Science, Volume 1678, Springer-Verlag ,, 1999

Online at Springer



Frank, A., R. H. Güting, C. S. Jensen, M. Koubarakis, N. Lorentzos, Y. Manolopoulos, E. Nardelli, B. Pernici, H.-J. Schek, M. Scholl, T. Sellis, B. Theodoulidis, P. Widmayer, ,"Chorochronos: A Research Network for Spatiotemporal Database Systems" in in ACM SIGMOD Record, Vol. 28, No. 3, pp. 12-21. (Submissions go through a mini-review process.) ,, 1999

Publication
ACM Author-Izer



Jensen, C. S., ," Review - Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions " in ACM SIGMOD Digital Review, Vol. 1,, 1999

Online at DBLP



Jensen, C. S., ,"Review - Multi-Step Processing of Spatial Joins" in ACM SIGMOD Digital Review, Vol. 1,, 1999

Online at DBLP



Jensen, C. S., ,"Review - R-Trees: A Dynamic Index Structure for Spatial Searching" in ACM SIGMOD Digital Review, Vol.1,, 1999

Online at DBLP



Jensen, C. S., ,"Databasen er strategisk" in (in Danish), Computerworld, November 5, page 27,, 1999




Price, R., N. Tryfona, C. S. Jensen, ,"A Conceptual Modeling Language for Spatiotemporal Applications" in Chorochronos Technical Report CH-99-20,, 1999

Publication

This paper presents a conceptual modeling language for spatiotemporal applications that offers built-in support for capturing geo-referenced, time-varying information. More specifically, the well-known object-oriented Unified Modeling Language (UML) is extended to capture the semantics of space and time as they appear in spatiotemporal applications. Language clarity and simplicity is maintained in the new language, the Extended Spatiotemporal UML, which introduces a small base set of modeling constructs, namely, the spatial, temporal and thematic constructs, which can then be combined and applied at different levels (i.e., attribute, association, object class) in the object-oriented model. An example is used to illustrate the simplicity and flexibility of this approach, and a formal functional specification of the semantic constructs and their symbolic combinations is given.


Šaltenis, S., C. S. Jensen, S. Leutenegger, M. Lopez, ,"Indexing the Positions of Continuously Moving Objects" in TimeCenter Technical Report TR-44, November 1999, 27 pages, and Chorochronos Technical Report CH-99-19, 26 pages,, 1999

Publication

The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The present paper proposes a novel, R -tree based indexing technique that supports the efficient querying of the current and projected future positions of such moving objects. The technique is capable of indexing objects moving in one-, two-, and three- dimensional space. Update algorithms enable the index to accommodate a dynamic data set, where objects may appear and disappear, and where changes occur in the anticipated positions of existing objects. In addition, a bulkloading algorithm is provided for building and rebuilding the index. A comprehensive performance study is reported.


Šaltenis S., C. S. Jensen, ,"R-Tree Based Indexing of General Spatio-Temporal Data" in TimeCenter Technical Report TR-45, December 1999, 23 pages, and Chorochronos Technical Report CH-99-18, 22 pages,, 1999

Publication

Real-world objects are inherently spatially and temporally referenced, and many database applications rely on databases that record the past, present, and anticipated future locations of, e.g., people or land parcels. As a result, indices that efficiently support queries on the spatio-temporal extents of objects are needed. In contrast, past indexing research has progressed in largely separate spatial and temporal streams. In the former, focus has been on one-, two-, or three-dimensional space; and in the latter, focus has been on one or both of the temporal aspects, or dimensions, of data known as transaction time and valid time. Adding time dimensions to spatial indices, as if time was a spatial dimension, neither supports nor exploits the special properties of time. On the other hand, temporal indices are generally not amenable to extension with spatial dimensions. This paper proposes an efficient and versatile technique for the indexing of spatio-temporal data with discretely changing spatial extents: the spatial aspect of an object may be a point or may have an extent; both the transaction time and valid time are supported; and a generalized notion of the current time, now, is accommodated for the temporal dimensions. The technique extends the previously proposed R -tree and borrows from the GR-tree, and it provides means of prioritizing space versus time, enabling it to adapt to spatially and temporally restrictive queries. Performance experiments were performed to evaluate different aspects of the proposed indexing technique, and are included in the paper.


Torp, K., C. S. Jensen, R. T. Snodgrass, ,"Modification of Now-Relative Databases" in TimeCenter Technical Report TR-43, 37 pages,, 1999

Publication

Most real-world databases record time-varying information. In such databases, the notion of "the current time," or "now", occurs naturally and prominently. For example, when capturing the past states of a relation using begin and end time attributes, tuples that are part of the current state have some past time as their begin time and "now" as their end time. While the semantics of such variable databases has been described in detail and is well understood, the modification of variable databases remains unexplored. This paper defines the semantics of modifications involving the variable "now". More specifically, the problems with modifications in the presence of "now" are explored, illustrating that the main problems are with modifications and tuples that reach into the future. The paper defines the semantics of modifications - including insertions, deletions, and updates - of databases without "now", with "now", and with values of the type "now" + Delta, where Delta is a non-variable time duration. To accommodate these semantics, three new timestamp values are introduced. An approximate semantics that does not rely on new timestamp values is also provided. Finally, implementation is explored.


Pedersen, T. B., C. S. Jensen, C. E. Dyreson, ,"Extending Practical Pre-Aggregation in On-Line Analytical Processing" in Technical Report R-99-5004, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 33 pages ,, 1999

Publication

On-Line Analytical Processing (OLAP) based on a dimensional view of data is being used increasingly in traditional business applications as well as in applications such as health care for the purpose of analyzing very large amounts of data. Pre-aggregation, the prior materialization of aggregate queries for later use, is an essential technique for ensuring adequate response time during data analysis. Full pre-aggregation, where all combinations of aggregates are materialized, is infeasible. Instead, modern OLAP systems adopt the practical pre-aggregation approach of materializing only select combinations of aggregates and then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints. This severely limits the scope of the practical pre-aggregation approach. This paper significantly extends the scope of practical pre-aggregation to cover a much wider range of realistic situations. Specifically, algorithms are given that transform irregular dimension hierarchies and fact-dimension relationships, which often occur in real-world OLAP applications, into well-behaved structures that, when used by existing OLAP systems, enable practical pre-aggregation. The algorithms have low computational complexity and may be applied incrementally to reduce the cost of updating OLAP structures.


Pfoser, D., Y. Theodoridis, C. S. Jensen, ,"Indexing Trajectories of Moving Point Objects" in Chorochronos Technical Report CH-99-3, 23 pages,, 1999

Publication

Spatiotemporal applications attract more and more attention, both, from researchers as well as application developers. Especially the peculiarities of spatiotemporal data are the focus of an increasing research effort. In this paper we extend the well-known R-tree method to handle trajectory data stemming from moving point objects. The resulting access method, termed (Spatio-Temporal) STR-tree, differs from the R-tree in that it stores additional information in the entries at the leaf level and, further, has modified insertion and split algorithms. Besides the description of the STR-tree algorithms, we provide an extensive performance study examining the behaviour of the new method as compared to the original R-tree under a varying set of queries and datasets. The collection of queries comprises the typical point and range queries as well as pure spatiotemporal queries based on the semantics of objects' trajectories, the so-called trajectory and navigational queries.


Pedersen, T. B., C. S. Jensen, C. E. Dyreson, ,"Supporting Imprecision in Multidimensional Databases Using Granularities" in Technical Report R-99-5003, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 29+iii pages ,, 1999

Publication

On-Line Analytical Processing (OLAP) technologies are being used widely for business-data analysis, and these technologies are also being used increasingly in medical applications, e.g., for patient-data analysis. The lack of effective means of handling data imprecision, which occurs when exact values are not known precisely or are entirely missing, represents a major obstacle in applying OLAP technology to the medical domain, as well as many other domains. OLAP systems are mainly based on a multidimensional model of data and include constructs such as dimension hierarchies and granularities. This paper develops techniques for the handling of imprecision that aim to maximally reusing these already existing constructs. With imprecise data now available in the database, queries are tested to determine whether or not they may be answered precisely given the available data; if not, alternative queries that are unaffected by the imprecision are suggested. When a user elects to proceed with a query that is affected by imprecision, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. The approach is capable of exploiting existing multidimensional query processing techniques such as pre-aggregation, yielding an effective approach with low computational overhead and that may be implemented using current technology. The paper illustrates how to implement the approach using SQL databases.


1998 Top

Torp, K., L. Mark, C. S. Jensen, ,"Efficient Differential Timeslice Computation" in in IEEE Transactions on Knowledge and Data Engineering Vol. 10, No. 4, pp. 599-611 ,, 1998

Publication

Transaction-time databases support access to not only the current database state, but also previous database states. Supporting access to previous database states requires large quantities of data and necessitates efficient temporal query processing techniques. In previous work, we have presented a log-based storage structure and algorithms for the differential computation of previous database states. Timeslices - i.e., previous database states - are computed by traversing a log of database changes, using previously computed and cached timeslices as outsets. When computing a new timeslice, the cache will contain two candidate outsets: an earlier outset and a later outset. The new timeslice can be computed by either incrementally updating the earlier outset or decrementally "downdating" the later outset using the log. The cost of this computation is determined by the size of the log between the outset and the new timeslice. This paper proposes an efficient algorithm that identifies the cheaper outset for the differential computation. The basic idea is to compute the sizes of the two pieces of the log by maintaining and using a tree structure on the timestamps of the database changes in the log. The lack of a homogeneous node structure, a controllable and high fill-factor for nodes, and of appropriate node allocation in existing tree structures (e.g., B+-trees, Monotonic B+-trees, and Append-only trees) render existing tree structures unsuited for our use. Consequently, a specialized tree structure, the Pointer-less Insertion tree, is developed to support the algorithm. As a proof of concept, we have implemented a main memory version of the algorithm and its tree structure.


Bliujute, R., C. S. Jensen, S. Šaltenis, G.Slivinskas, ,"R-tree-based Indexing of Now-Relative Bitemporal Data" in in Proceedings of the 24th International Conference on Very Large Databases, New York City, NY, pp. 345-356 ,, 1998

Publication

The databases of a wide range of applications, e.g., in data warehousing, store multiple states of time-evolving data. These databases contain a substantial part of now-relative data: data that became valid at some past time and remains valid until the current time. More specifically, two temporal aspects of data are frequently of interest, namely valid time, when data is true, and transaction time, when data is current in the database, leading to bitemporal data. Only little work, based mostly on R-trees, has addressed the indexing of bitemporal data. No indices exist that contend well with now-relative data, which leads to temporal data regions that are continuous functions of time. The paper proposes two extended R*-trees that permit the indexing of data regions that grow continuously over time, by also letting the internal bounding regions grow. Internal bounding regions may be triangular as well as rectangular. New heuristics for the algorithms that govern the index structure are provided. As a result, dead space and overlap, now also functions of time, are reduced. Performance studies indicate that the best extended index is typically 3-5 times faster than the existing R-tree based indices.


Torp, K., C. S. Jensen, R. T. Snodgrass, ,"Stratum Approaches to Temporal DBMS Implementation" in in Proceedings of the 1998 International Database Engineering and Applications Symposium, Cardiff, Wales, U.K., pp. 4-13. IEEE Computer Society ,, 1998

Publication

Previous approaches to implementing temporal DBMSs have assumed that a temporal DBMS must be built from scratch, employing an integrated architecture and using new temporal implementation techniques such as temporal indexes and join algorithms. However, this is a very large and time-consuming task. The paper explores approaches to implementing a temporal DBMS as a stratum on top of an existing non-temporal DBMS, rendering implementation more feasible by reusing much of the functionality of the underlying conventional DBMS. More specifically, the paper introduces three stratum meta-architectures, each with several specific architectures. Based on a new set of evaluation criteria, advantages and disadvantages of the specific architectures are identified. The paper also classifies all existing temporal DBMS implementations according to the specific architectures they, employ. It is concluded that a stratum architecture is the best short, medium, and perhaps even long-term, approach to implementing a temporal DBMS


Pedersen, T. B., C. S. Jensen, ,"Research Issues in Clinical Data Warehousing" in in Proceedings of the Tenth International Conference on Scientific and Statistical Database Management, Capri, Italy, pp. 43-52. IEEE Computer Society ,, 1998

Publication

Medical informatics has been an important area for the application of computing and database technology for at least four decades. This area may benefit from the functionality offered by data warehousing. However, the special nature of clinical applications poses different and new requirements to data warehousing technologies, over those posed by conventional data warehouse applications. This article presents a number of exciting new research challenges posed by clinical applications, to be met by the database research community. These include the need for complex-data modeling features, advanced temporal support, advanced classification structures, continuously valued data, dimensionally reduced data, and the integration of very complex data. In addition, the support for clinical treatment protocols and medical research are interesting areas for research.


Pedersen, T. B., C. S. Jensen, ,"Clinical Data Warehousing - A Survey" in in Proceedings of the VIII Mediterranean Conference on Medical and Biological Engineering and Computing, Lemesos, Cyprus, Section 20.3, 6 pages (CD-rom proceedings),, 1998

Publication

In this article we present the concept of data warehousing, and its use in the clinical area. Clinical data warehousing will become very important in the near future, as healthcare enterprises need to gain more information from their clinical, administrative, and financial data, in order to improve quality and reduce costs. Adoption of data warehousing in health care has been slowed by lack of understanding of the benefits offered by the technology. This paper contributes by providing needed understanding, by introducing the opportunities offered by data warehousing, describing current efforts in the area, and providing criteria for comparing clinical data warehouse systems.


Bliujute, R., S. Šaltenis, G. Slivinskas, C. S. Jensen, ,"Systematic Change Management in Dimensional Data Warehousing" in in Proceedings of the Third International Baltic Workshop on DB and IS, Riga, Latvia, pp. 27-41,, 1998

Publication

With the widespread and increasing use of data warehousing in industry, the design of effective data warehouses and their maintenance has become a focus of attention. Independently of this, the area of temporal databases has been an active area of research for well beyond a decade. This article identifies shortcomings of so-called star schemas, which are widely used in industrial warehousing, in their ability to handle change and subsequently studies the application of temporal techniques for solving these shortcomings. Star schemas represent a new approach to database design and have gained widespread popularity in data warehousing, but while they have many attractive properties, star schemas do not contend well with so-called slowly changing dimensions and with state-oriented data. We study the use of so-called temporal star schemas that may provide a solution to the identified problems while not fundamentally changing the database design approach. More specifically, we study the relative database size and query performance when using regular star schemas and their temporal counterparts for state-oriented data. We also offer some insight into the relative ease of understanding and querying databases with regular and temporal star schemas.


Böhlen, M. H., C. S. Jensen, B. Skjellaug, ,"Spatio-Temporal Database Support for Legacy Applications" in in Proceedings of the 1998 ACM Symposium on Applied Computing, Atlanta, Georgia, pp. 226-234,, 1998

Publication
ACM Author-Izer

In areas such as finance, marketing, and property and resource management, many database applications manage spatio-temporal data. These applications typically run on top of a relational DBMS and manage spatio-temporal data either using the DBMS, which provides little support, or employ the services of a proprietary system that co-exists with the DBMS, but is separate from and not integrated with the DBMS. This wealth of applications may benefit substantially from built-in, integrated spatio-temporal DBMS support. Providing a foundation for such support is an important and substantial challenge. This paper initially defines technical requirements to a spatio-temporal DBMS aimed at protecting business invest- ments in the existing legacy applications and at reusing personnel expertise. These requirements provide a foundation for making it economically feasible to migrate legacy applications to a spatio-temporal DBMS. The paper next presents the design of the core of a spatio-temporal, multi-dimensional extension to SQL-92, called STSQL, that satisfies the requirements. STSQL does so by supporting so-called upward compatible, dimensional upward compatible, reducible, and non-reducible queries. In particular, dimensional upward compatibility and reducibility were designed to address migration concerns and complement proposals based on abstract data types.


M. H. Böhlen, Busatto, R., C. S. Jensen, ,"Point versus Interval-based Temporal Data Models" in in Proceedings of the Fourteenth IEEE International Conference on Data Engineering, Orlando, Florida, pp. 192-200,, 1998

Publication

The association of timestamps with various data items such as tuples or attribute values is fundamental to the management of time-varying information. Using intervals in timestamps, as do most data models, leaves a data model with a variety of choices for giving a meaning to timestamps. Specifically, some such data models claim to be point-based while other data models claim to be interval-based. The meaning chosen for timestamps is important-it has a pervasive effect on most aspects of a data model, including database design, a variety of query language properties, and query processing techniques, e.g., the availability of query optimization opportunities. This paper precisely defines the notions of point-based and interval-based temporal data models, thus providing a new, formal basis for characterizing temporal data models and obtaining new insights into the properties of their query languages. Queries in point-based models treat snapshot equivalent argument relations identically. This renders point-based models insensitive to coalescing. In contrast, queries in interval-based models give significance to the actual intervals used in the timestamps, thus generally treating non-identical, but possibly snapshot equivalent, relations differently. The paper identifies the notion of time-fragment preservation as the essential defining property of an interval-based data model.


Snodgrass, R. T., M. H. Böhlen, C. S. Jensen, A. Steiner, ,"Transitioning Temporal Support in TSQL2 to SQL3" in pp. 150-194 in O. Etzion, S. Jajodia, and S. Sripada, editors, Temporal Databases: Research and Practice, Lecture Notes in Computer Science 1399, Springer-Verlag 1998. (Proceedings of the Dagstuhl Seminar on Temporal Databases, Schloss Dagstuhl, Germany) ,, 1998

Publication

This document summarizes the proposals before the SQL3 committees to allow the addition of tables with valid-time and transaction-time support into SQL/Temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to one encompassing temporal support. Initially, important requirements to a temporal system that may facilitate such a transition are motivated and discussed. The proposal then describes the language additions necessary to add valid-time support to SQL3 while fulfilling these requirements. The constructs of the language are divided into four levels, with each level adding increased temporal functionality to its predecessor. A prototype system implementing these constructs on top of a conventional DBMS is publicly available.


C. S. Jensen, C. E. Dyreson, (editors, with multiple other contributors), ,"A Consensus Glossary of Temporal Database Concepts - February 1998 Version" in in O. Etzion, S. Jajodia, and S. Sripada, editors, Temporal Databases: Research and Practice, Lecture Notes in Computer Science 1399, Springer-Verlag, pp. 367-405 ,, 1998

Publication

This document contains definitions of a wide range of concepts specific to and widely used within temporal databases. In addition to providing definitions, the document also includes explanations of concepts as well as discussions of the adopted names. The consensus effort that lead to this glossary was initiated in Early 1992. Earlier versions appeared in SIGMOD Record in September 1992 and March 1994. The present glossary subsumes all the previous documents. The glossary meets the need for creating a higher degree of consensus on the definition and naming of temporal database concepts. Two sets of criteria are included. First, all included concepts were required to satisfy four relevance criteria, and, second, the naming of the concepts was resolved using a set of evaluation criteria. The concepts are grouped into three categories: concepts of general database interest, of temporal database interest, and of specialized interest.


Pedersen, T. B., C. S. Jensen, ,"Clinical Data Warehousing - A Survey" in presented at Conference on Healthcare Computing 1998, Harrogate, United Kingdom, 6 pages ,, 1998

Publication

In this article we present the concept of data warehousing, and its use in the clinical area. Clinical data warehousing will become very important in the near future, as healthcare enterprises need to gain more information from their clinical, administrative, and financial data, in order to improve quality and reduce costs. Adoption of data warehousing in health care has been slowed by lack of understanding of the benefits offered by the technology. This paper contributes by providing needed understanding, by introducing the opportunities offered by data warehousing, describing current efforts in the area, and providing criteria for comparing clinical data warehouse systems.


Tsotras, V., C. S. Jensen, R. T. Snodgrass, ,"An Extensible Notation for Spatiotemporal Index Queries" in in ACM SIGMOD Record, Vol. 27, No. 1, pp. 47-53,, 1998

Publication
ACM Author-Izer

Temporal, spatial and spatiotemporal queries are inherently multidimensional, combining predicates on explicit attributes with predicates on time dimension(s) and spatial dimension(s). Much confusion has prevailed in the literature on access methods because no consistent notation exists for referring to such queries. As a contribution towards eliminating this problem, we propose a new and simple notation for spatiotemporal queries. The notation aims to address the selection-based spatiotemporal queries commonly studied in the literature of access methods. The notation is extensible and can be applied to more general multidimensional, selection-based queries.


Gregersen, H., L. Mark, C. S. Jensen, ,"Mapping Temporal ER Diagrams to Relational Schemas" in TimeCenter Technical Report TR-39, 37 pages,, 1998

Publication

Many database applications manage information that varies over time, and most of the database schemas for these applications were designed using one of the several versions of the Entity-Relationship (ER) model. In the research community as well as in industry, it is common knowledge that temporal aspects of data are pervasive and important to the applications, but are also difficult to capture using the ER model. The research community has developed temporal ER models, in an attempt to provide modeling constructs that more naturally and elegantly support capturing the temporal aspects. Specifically, the temporal models provide enhanced support for capturing aspects such as lifespans, valid time, and transaction time of data. Because commercial database management systems support neither the ER model nor any temporal ER model as a model for data manipulation but rather support various versions of the relational model for this purpose we provide a two-step transformation from temporal ER diagrams, with built-in support for lifespans and valid and transaction time, to relational schemas. The first step of the algorithm translates a temporal ER diagram into relations in a surrogate-based relational target model; and the second step further translates this relational schema into a schema in a lexically-based relational target model.


Tryfona, N., C. S. Jensen, ,"A Component-Based Conceptual Model for Spatiotemporal Application Design" in Chorochronos Technical Report CH-98-10,, 1998

Publication

Conceptual data modeling for complex applications, such as multimedia and spatiotemporal applications, often results in large, complicated and difficult-to-comprehend diagrams. One reason for this is that these diagrams frequently involve repetition of autonomous, semantically meaningful parts that capture similar situations and characteristics. By recognizing such parts and treating them as units, it is possible to simplify the diagrams, as well as the conceptual modeling process. We propose to capture autonomous and semantically meaningful excerpts of diagrams that occur frequently as modeling patterns. Specifically, the paper concerns modeling patterns for conceptual design of spatiotemporal databases. Based on requirements drawn from real applications, it presents a set of modeling patterns that capture spatial, temporal, and spatiotemporal aspects. To facilitate the conceptual design process, these patterns are abbreviated by corresponding spatial, temporal, and spatiotemporal pattern abstractions, termed components. The result is more elegant and less-detailed diagrams that are easier to comprehend, but yet semantically rich. The Entity-Relationship model serves as the context for this study. An extensive example from a real cadastral application illustrates the benefits of using a component-based conceptual model.


Tryfona, N., C. S. Jensen, ,"Conceptual Modeling for Spatiotemporal Applications" in Chorochronos Technical Report CH-98-8,, 1998

Publication

Many exciting potential application areas for database technology manage time-varying, spatial information. In contrast, existing database techniques, languages, and associated tools provide little built- in support for the management of such information. The focus of this paper is on enhancing existing conceptual data models with new constructs, improving their ability to conveniently model spatiotemporal aspects of information. The goal is to speed up the data modeling process and to make diagrams easier to comprehend and maintain. Based on explicitly formulated ontological foundations, the paper presents a small set of new, generic modeling constructs that may be introduced into different conceptual data models. The ER model is used as the concrete context for presenting the constructs. The semantics of the resulting spatiotemporal ER model, STER, is given in terms of the underlying ER model. STER is accompanied by a textual counterpart, and a CASE tool based on STER is currently being implemented, using the textual counterpart as its internal representation.


Pedersen, T. B., C. S. Jensen, ,"Multidimensional Data Modeling for Complex Data" in TimeCenter Technical Report TR-37, 25 pages,, 1998

Publication

Systems for On-Line Analytical Processing (OLAP) considerably ease the process of analyzing business data and have become widely used in industry. OLAP systems primarily employ multidimensional data models to structure their data. However, current multidimensional data models fall short in their ability to model the complex data found in some real-world application domains. The paper presents nine requirements to multidimensional data models, each of which is exemplified by a real-world, clinical case study. A survey of the existing models reveals that the requirements not currently met include support for many-to-many relationships between facts and dimensions, built-in support for handling change and time, and support for uncertainty as well as different levels of granularity in the data. The paper defines an extended multidimensional data model, which addresses all nine requirements. Along with the model, we present an associated algebra, and outline how to implement the model using relational databases.


Dyreson, C. E., M. H. Böhlen, C. S. Jensen, ,"Capturing and Querying Multiple Aspects of Semistructured Data" in TimeCenter Technical Report TR-36, 21 pages,, 1998

Publication

Motivated to a large extent by the substantial and growing prominence of the World-Wide Web and the potential benefits that may be obtained by applying database concepts and techniques to web data management, new data models and query languages have emerged that contend with the semistructured nature of web data. These models organize data in graphs. The nodes in a graph denote objects or values, and each edge is labeled with a single word or phrase. Nodes are described by the labels of the paths that lead to them, and these descriptions serve as the basis for querying. This paper proposes an extensible framework for capturing more data semantics in semistructured data models. Inspired by the multidimensional paradigm that finds application in on-line analytical processing and data warehousing, the framework makes it possible to associate values drawn from an extensible set of dimensions with edges. The paper considers dimensions that capture temporal aspects of data, prices associated with data access, quality ratings associated with the data, and access restrictions on the data. In this way, it accommodates notions from temporal databases, electronic commerce, information quality, and database security, The paper defines the extensible data model and an accompanying query language that provides new facilities for matching, slicing, and collapsing the enriched paths and for coalescing edges. The paper describes an implemented, SQL-like query language for the extended data model that includes additional constructs for the effective querying of graphs with enriched paths.


Gregersen, H., C. S. Jensen, ,"Conceptual Modeling of Time-Varying Information" in TimeCenter Technical Report TR-35, 31 pages,, 1998

Publication

A wide range of database applications manage information that varies over time. Many of the underlying database schemas of these were designed using one of the several versions, with varying syntax and semantics, of the Entity-Relationship (ER) model. In the research community as well as in industry, it is common knowledge that the temporal aspects of the mini-world are pervasive and important, but are also difficult to capture using the ER model. Not surprisingly, several enhancements to the ER model have been proposed in an attempt to more naturally and elegantly support the modeling of temporal aspects of information. Common to the existing temporally extended ER models, few or no specific requirements to the models were given by their designers. With the existing proposals, an ontological foundation, and novel requirements as its basis, this paper formally defines a graphical, temporally extended ER model. The ontological foundation serves to aid in ensuring a maximally orthogonal design, and the requirements aim, in part, at ensuring a design that naturally extends the syntax and semantics of the regular ER model. The result is a novel model that satisfies an array of properties not satisfied by any single previously proposed model.


Güting, R. H., M. H. Böhlen, M. Erwig, C. S. Jensen, N. Lorentzos, M. Schneider, M. Vazirgiannis, ,"A Foundation for Representing and Querying Moving Objects" in Informatik Berichte 238 - 9/1998, FernUniversität Hagen, Fachbereich Informatik, 49 pages ,, 1998

Publication

Spatio-temporal databases deal with geometries changing over time. The goal of our work is to provide a DBMS data model and query language capable of handling such time-dependent geometries, including those changing continuously which describe moving objects. Two fundamental abstractions are moving point} and moving region, describing objects for which only the time-dependent position, or position and extent, are of interest, respectively. We propose to represent such time-dependent geometries as attribute data types with suitable operations, that is, to provide an abstract data type extension to a DBMS data model and query language. This paper presents a design of such a system of abstract data types. It turns out that besides the main types of interest, moving point and moving region, a relatively large number of auxiliary data types is needed. For example, one needs a line type to represent the projection of a moving point into the plane, or a "moving real" to represent the time-dependent distance of two moving points. It then becomes crucial to achieve (i) orthogonality in the design of the type system, i.e., type constructors can be applied uniformly, (ii) genericity and consistency of operations, i.e., operations range over as many types as possible and behave consistently, and (iii) closure and consistency between structure and operations of non-temporal and related temporal types. Satisfying these goals leads to a simple and expressive system of abstract data types that may be integrated into a query language to yield a powerful language for querying spatio-temporal data, including moving objects. The paper formally defines the types and operations, offers detailed insight into the considerations that went into the design, and exemplifies the use of the abstract data types using SQL. The paper offers a precise and conceptually clean foundation for implementing a spatio-temporal DBMS extension.


Pfoser, D., C. S. Jensen, ,"Incremental Join of Time-Oriented Data" in TimeCenter Technical Report TR-34, 22 pages,, 1998

Publication

Data warehouses as well as a wide range of other databases exhibit a strong temporal orientation: it is important to track the temporal variation of data over several months or, often, years. In addition, data warehouses and databases often exhibit append-only characteristics where old data pertaining to the past is retained while new data pertaining to the present is appended. Performing joins on large databases such as these can be very costly, and the efficient processing of joins is essential to obtain good overall query processing performance. This paper presents a sort-merge-based incremental algorithm for timeoriented data. While incremental computation techniques have proven competitive in many settings, they also introduce a space overhead in the form of differential files. However, for the temporal data explored here, this overhead is avoided because the differential files are already part of the database. In addition, data is naturally sorted, leaving only merging. The incremental algorithm works in a partitioned storage environment and does not assume the availability of indices, making it a competitor to sort-based and nested-loop joins. The paper presents analytical cost formulas as well as simulation-based studies that characterize the performance of the join.


Skyt, J., C. S. Jensen, ,"Vacuuming Temporal Databases" in TimeCenter Technical Report TR-32, 20 pages,, 1998

Publication

A wide range of real-world database applications, including financial and medical applications, are faced with accountability and trace-ability requirements. These requirements lead to the replacement of the usual update-in-place policy by an append-only policy, yielding so-called transaction-time databases. With logical deletions being implemented as insertions at the physical level, these databases retain all previously current states and are ever-growing. A variety of physical storage structures and indexing techniques as well as query languages have been proposed for transaction-time databases, but the support for physical deletion, termed vacuuming, has received precious little attention. Such vacuuming is called for by, e.g., the laws of many countries. Although necessary, with vacuuming, the database s previously perfect and reliable recollection of the past may be manipulated via, e.g., selective removal of records pertaining to past states. This paper provides a semantic framework for the vacuuming of transaction-time databases. The main focus is to establish a foundation for the correct and user-friendly processing of queries and updates against vacuumed databases. Queries that may return results affected by vacuuming are intercepted, and the user is presented with the option of issuing similar queries that are not affected by vacuuming.


Bliujute, R., C. S. Jensen, S. Šaltenis, G. Slivinskas, ,"Light-Weight Indexing of General Bitemporal Data" in TimeCenter Technical Report TR-30, 20 pages,, 1998

Publication

Most data managed by existing, real-world database applications is time referenced. Data warehouses are good examples. Often, two temporal aspects of data are of interest, namely valid time, when data is true in the mini-world, and transaction time, when data is current in the database, resulting in so-called bitemporal data. Like spatial data, bitemporal data thus has associated two-dimensional regions. Such data is in part naturally now-relative: some data is currently true in the mini-world or is part of the current database state. So, unlike for spatial data, the regions of now-relative bitemporal data grow continuously. Existing indices, including commercially available indices such as B+- and R-trees, typically do not contend well with even small amounts of now-relative data. This paper proposes a new indexing technique that indexes general bitemporal data efficiently. The technique eliminates the different kinds of growing data regions by means of transformations and then indexes the resulting stationary data regions with four R*-trees, and queries on the original data are mapped to corresponding queries on the transformed data. Extensive performance studies are reported that provide insight into the characteristics and behavior of the four trees storing differently-shaped regions, and they indicate that the new technique yields a performance that is competitive with the best existing index; and unlike this existing index, the new technique does not require extension of the kernel of the DBMS.


Bliujute, R., S. Šaltenis, G. Slivinskas, C. S. Jensen, ,"Developing a DataBlade for a New index" in TimeCenter Technical Report TR-29, 20 pages,, 1998

Publication

Many current and potential applications of database technology, e.g., geographical, medical, spatial, and multimedia applications, require efficient support for the management of data with new, complex data types. As a result, the major DBMS vendors are stepping beyond the support for uninterpreted binary large objects, termed BLOBs, and are beginning to offer extensibility features that allow external developers to extend the DBMS with, e.g., their own data types and accompanying access methods. Existing solutions include DB2 extenders, Informix DataBlades, and Oracle cartridges. Extensible systems offer new and exciting opportunities for researchers and third-party developers alike. This paper reports on an implementation of an Informix DataBlade for the GR-tree, a new R-tree based index. This effort represents a stress test of what is perhaps currently the most extensible DBMS, in that the new DataBlade aims to achieve better performance, not just to add functionality. The paper provides guidelines for how to create an access method DataBlade, describes the sometimes surprising challenges that must be negotiated during DataBlade development, and evaluates the extensibility of the Informix Dynamic Server.


Bliujute, R., C. S. Jensen, S. Šaltenis, G. Slivinskas, ,"R-tree-based Indexing of Now-Relative Bitemporal Data" in TimeCenter Technical Report TR-25, 21 pages,, 1998

Publication

The databases of a wide range of applications, e.g., in data warehousing, store multiple states of time-evolving data. These databases contain a substantial part of now-relative data: data that became valid at some past time and remains valid until the current time. More specifically, two temporal aspects of data are frequently of interest, namely valid time, when data is true, and transaction time, when data is current in the database. The latter aspect is essential in all applications where accountability or trace-ability are required. When both aspects are captured, data is termed bitemporal. A number of indices have been devised for the efficient support of operations on time-varying data with one time dimension, but only little work, based mostly on R-trees, has addressed the indexing of two- or higher-dimensional temporal data. No indices exist that contend well with now-relative data, which leads to temporal data regions that are continuous functions of time. The paper proposes two extended R-tree based indices.


Bliujute, R., S. Šaltenis, G. Slivinskas, C. S. Jensen, ,"Systematic Change Management in Dimensional Data Warehousing" in TimeCenter Technical Report TR-23, 14 pages,, 1998

Publication

With the widespread and increasing use of data warehousing in industry, the design of effective data warehouses and their maintenance has become a focus of attention. Independently of this, the area of temporal databases has been an active area of research for well beyond a decade. This article identifies shortcomings of so-called star schemas, which are widely used in industrial warehousing, in their ability to handle change and subsequently studies the application of temporal techniques for solving these shortcomings. Star schemas represent a new approach to database design and have gained widespread popularity in data warehousing, but while they have many attractive properties, star schemas do not contend well with so-called slowly changing dimensions and with state-oriented data. We study the use of so-called temporal star schemas that may provide a solution to the identified problems while not fundamentally changing the database design approach. More specifically, we study the relative database size and query performance when using regular star schemas and their temporal counterparts for state-oriented data. We also offer some insight into the relative ease of understanding and querying databases with regular and temporal star schemas.


Böhlen, M., R. Busatto, C. S. Jensen, ,"Point versus Interval-based Temporal Data Models" in TimeCenter Technical Report TR-21, 14 pages,, 1998

Publication

The association of timestamps with various data items such as tuples or attribute values is fundamental to the management of time-varying information. Using intervals in timestamps, as do most data models, leaves a data model with a variety of choices for giving a meaning to timestamps. Specifically, some such data models claim to be point-based while other data models claim to be interval-based. The meaning chosen for timestamps is important it has a pervasive effect on most aspects of a data model, including database design, a variety of query language properties, and query processing techniques, e.g., the availability of query optimization opportunities. This paper precisely defines the notions of point-based and interval-based temporal data models, thus providing a new, formal basis for characterizing temporal data models and obtaining new insights into the properties of their query languages. Queries in point-based models treat snapshot equivalent argument relations identically. This renders point-based models insensitive to coalescing. In contrast, queries in interval-based models give significance to the actual intervals used in the timestamps, thus generally treating non-identical, but possibly snapshot equivalent, relations differently. The paper identifies the notion of time-fragment preservation as the essential defining property of an interval-based data model.


1997 Top

Clifford, J., C. Dyreson, T. Isakowitz, C. S. Jensen, R. T. Snodgrass, ,"On the Semantics of Now in Databases" in ACM Transactions on Database Systems. Vol. 22, No. 2, pp. 171-214,, 1997

Publication
ACM Author-Izer

Although "now" is expressed in SQL as CURRENT_TIMESTAMP within queries, this value cannot be stored in the database. However, this notion of an ever-increasing current-time value has been reflected in some temporal data models by inclusion of database-resident variables, such as "now", "until-changed," "x," "@," and "-". Time variables are very desirable, but their use also leads to a new type of database, consisting of tuples with variables, termed a variable database. This article proposes a framework for defining the semantics of the variable databases of the relational and temporal relational data models. A framework is presented because several reasonable meanings may be given to databases that use some of the specific temporal variables that have appeared in the literature. Using the framework, the article defines a useful semantics for such databases. Because situations occur where the existing time variables are inadequate, two new types of modeling entities that address these shortcomings, timestamps that we call now-relative and now-relative indeterminate, are introduced and defined within the framework. Moreover, the article provides a foundation, using algebraic bind operators, for the querying of variable databases via existing query languages. This transition to variable databases presented here requires minimal change to the query processor. Finally, to underline the practical feasibility of variable databases, we show that database variables can be precisely specified and efficiently implemented in conventional query languages, such as SQL, and in temporal query languages, such as TSQL2.


Bair, J., M. H. Böhlen, C. S. Jensen, R. T. Snodgrass, ,"Notions of Upward Compatibility of Temporal Query Languages" in Wirtschaftsinformatik, Vol. 39, No. 1, pp. 25-34,, 1997

Publication
Online at Wirtschaftsinformatik

Migrating applications from conventional to temporal database management technology has received scant mention in the research literature. This paper formally defines three increasingly restrictive notions of upward compatibility which capture properties of a temporal SQL with respect to conventional SQL that, when satisfied, provide for a smooth migration of legacy applications to a temporal system. The notions of upward compatibility dictate the semantics of conventional SQL statements and constrain the semantics of extensions to these statements. The paper evaluates the seven extant temporal extensions to SQL, all of which are shown to complicate migration through design decisions that violate one or more of these notions. We then outline how SQL-92 can be systematically extended to become a temporal query language that satisfies all three notions.


Gregersen, H., C. S. Jensen, L. Mark, ,"Evaluating Temporally Extended ER-Models" in in Proceedings of the CAiSE'97/IFIP 8.1 International Workshop on Evaluation of Modeling Methods in Systems Analysis and Design, Barcelona, Spain, 12 pages,, 1997

Publication

The Entity-Relationship (ER) Model, is enjoying a remarkable popularity in industry. It has been widely recognized that while temporal aspects of data play a prominent role in database applications, these aspects are difficult to capture using the ER model. Some industrial users have responded to this deficiency by ignoring all temporal aspects in their ER diagrams and simply supplement the diagrams with phrases like "full temporal support." The research community has responded by developing about a dozen proposals for temporally extended ER models. These temporally extended ER models were accompanied by only few or no specific criteria for designing them, making it difficult to appreciate their properties and to conduct an insightful comparison of the models. This paper defines a set of design criteria that may be used for evaluating and comparing the existing temporally extended ER models.


Torp, K., C. S. Jensen, M. H. Böhlen, ,"Layered Temporal DBMSs - Concepts and Techniques" in Fifth International Conference on Database Systems for Advanced Applications, Melbourne, Australia, 10 pages, pp. 371-380,, 1997

Publication

A wide range of database applications manage time- varying data, and it is well-known that querying and correctly updating time-varying data is dificult and error-prone when using standard SQL. Temporal exten- sions of SQL ofSeer substantial benefits over SQL when managing time-varying data. The topic of this paper is the effective implementation of temporally extended SQL s. Traditionally, it has been assumed that a temporal DBMS must be built from scratch, utilizing new technologies for storage, in- dexing, query optimization, concurrency control, and recovery. In contrast, this paper explores the concepts and techniques involved in implementing a temporally enhanced SQL while maximally reusing the facilities of an existing SQL implementation. The topics cov- ered span the choice of an adequate timestamp domain that includes the time van able NOW, a comparison. of query processing architectures, and transaction pro- cessing, the latter including how to ensure ACID prop- erties and assign timestamps to updates.


Tryfona, N., C. S. Jensen, ,"Conceptual Design of Spatio-Temporal Applications: Requirements and Solutions (extended abstract)" in Procedings of the First Chorochronos Intensive Workshop on Spatio-Temporal Database Systems, Petronell-Carnuntum, Austria, 6 pages,, 1997

Publication



Jensen, C. S., R. T. Snodgrass, ,"TimeCenter Prospectus" in TimeCenter Technical Report, Internal TR-1, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 18 pages,, 1997




Böhlen, M. H., C. S. Jensen, B. Skjellaug, ,"Spatio-Temporal Database Support for Legacy Applications" in TimeCenter Technical Report TR-20, 21 pages,, 1997

Publication

In areas such as finance, marketing, and property and resource management, many database applications manage spatio-temporal data. These applications typically run on top of a relational DBMS and manage spatio-temporal data either using the DBMS, which provides little support, or employ the services of a proprietary system that co-exists with the DBMS, but is separate from and not integrated with the DBMS. This wealth of applications may benefit substantially from built-in, integrated spatio-temporal DBMS support. Providing a foundation for such support is an important and substantial challenge. This paper initially defines technical requirements to a spatio-temporal DBMS aimed at protecting business investments in the existing legacy applications and at reusing personnel expertise. These requirements provide a foundation for making it economically feasible to migrate legacy applications to a spatio-temporal DBMS. The paper next presents the design of the core of a spatio-temporal extension to SQL 92, called STSQL, that satisfies the requirements. STSQL supports multiple temporal as well as spatial dimensions. Queries may ignore any dimension; this provides an important kind of upward compatibility with SQL 92. Queries may also view the tables in a dimensional fashion, where the DBMS provides so-called snapshot reducible query processing for each dimension. Finally, queries may view dimension attributes as if they are no different from other attributes.


Jensen, C. S., R. T. Snodgrass, ,"Temporal Data Management" in TimeCenter Technical Report TR-17, 12 pages,, 1997

Publication

A wide range of database applications manage time-varying information. Existing database technology currently provides little support for managing such data. The research area of temporal databases has made important contributions in characterizing the semantics of such information and in providing expressive and efficient means to model, store, and query temporal data. This paper introduces the reader to temporal data management, surveys state-of-the-art solutions to challenging aspects of temporal data management, and points to research directions.


Tsotras, V., C. S. Jensen, R. T. Snodgrass, ,"A Notation for Spatiotemporal Queries" in TimeCenter Technical Report TR-10, 13 pages,, 1997

Publication

Temporal, spatial and spatiotemporal queries are inherently multidimensional, combining predicates on time dimension(s) with predicates on explicit attributes and/or several spatial dimensions. In the past there was no consistent way to refer to temporal or spatiotemporal queries, thus leading to considerable confusion. In an attempt to eliminate this problem, we propose a new notation for such queries. Our notation is simple and extensible and can be easily applied to multidimensional queries in general.


Snodgrass, R. T., M. H. Böhlen, C. S. Jensen, A. Steiner, ,"Transitioning Temporal Support in TSQL2 to SQL3" in TimeCenter Technical Report TR-8, 28 pages,, 1997

Publication

This document summarizes the proposals before the SQL3 committees to allow the addition of tables with valid-time and transaction-time support into SQL/Temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to one encompassing temporal support. Initially, important requirements to a temporal system that may facilitate such a transition are motivated and discussed. The proposal then describes the language additions necessary to add valid-time support to SQL3 while fulfilling these requirements. The constructs of the language are divided into four levels, with each level adding increased temporal functionality to its predecessor. A prototype system implementing these constructs on top of a conventional DBMS is publicly available.


Torp, K., C. S. Jensen, R. T. Snodgrass, ,"Stratum Approaches to Temporal DBMS Implementation" in TimeCenter Technical Report TR-5, 18 pages,, 1997

Publication

Previous approaches to temporal databases have assumed that a temporal database management system (temporal DBMS) must be implemented from scratch, as an integrated architecture to yield adequate performance and to use new temporal implementation techniques, such as temporal indexes and join algorithms. However, this is a very large and time-consuming task. In this paper we explore howa temporalDBMS can be implemented in a stratum on top of an existing non-temporal DBMS, rendering the task more feasible because it reuses much of the functionality of the underlying conventional DBMS. At the outset, we discuss the advantages and disadvantages of the stratum architecture compared to the integrated architecture, and we present a set of criteria for a stratum architecture. Subsequently, three different meta architectures for implementing a temporal DBMS in a stratum are identified. Each meta architecture contains several specific architectures, which are examined in turn. Existing temporal DBMS implementations are classified according to the specific architectures identified. Finally, the specific architectures are evaluated according to our criteria. We conclude that a stratum architecture is the best short, medium, and perhaps even long-term, approach to implementing a temporal DBMS. Further, it is possible to integrate existing conventional DBMSs with new temporal implementation techniques, blurring the differences between integrated and stratum architectures.


Torp, K., R. T. Snodgrass, C. S. Jensen, ,"Correct and Efficient Timestamping of Temporal Data" in TimeCenter Technical Report TR-4, 20 pages,, 1997

Publication

Previous approaches to timestamping temporal data have implicitly assumed that transactions have no duration. In this paper we identify several situations where a sequence of operations over time within a single transaction can violate ACID properties. It has been previously shown that the transaction-time dimension must be timestamped after commit. This time is not known within the transaction. We describe how to correctly implement most queries that make explicit reference to this (unknown) transaction time, and argue that the rest, which can be syntactically identified, can only be answered with an approximation of the correct value. The drawback of timestamping after commit is that it requires revisiting tuples. We show that this expensive revisiting step is required only before any queries or modifications in subsequent transactions that access prior states; in most cases, revisiting tuples can be postponed, and when to revisit can be syntactically determined. We propose several strategies for revisiting tuples, and we empirically evaluate these strategies in order to determine under which circumstances each is best.


1996 Top

Jensen, C. S., R. T. Snodgrass, M. D. Soo, ,"Extending Existing Dependency Theory to Temporal Databases" in IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 4, pp. 563-582,, 1996

Publication

Transaction-time databases support access to not only the current database state, but also previous database states. Supporting access to previous database states requires large quantities of data and necessitates efficient temporal query processing techniques. Previously, we presented a log based storage structure and algorithms for the differential computation of previous database states. Timeslices-i.e., previous database states-are computed by traversing a log of database changes, using previously computed and cached timeslices as outsets. When computing a new timeslice, the cache will contain two candidate outsets: an earlier outset and a later outset. The new timeslice can be computed by either incrementally updating the earlier outset or decrementally downdating the later outset using the log. The cost of this computation is determined by the size of the log between the outset and the new timeslice. The paper proposes an efficient algorithm that identifies the cheaper outset for the differential computation. The basic idea is to compute the sizes of the two pieces of the log by maintaining and using a tree structure on the timestamps of the database changes in the log. The lack of a homogeneous node structure, a controllable and high fill factor for nodes, and of appropriate node allocation in existing tree structures (e.g., B+ trees, Monotonic B+ trees, and Append only trees) render existing tree structures unsuited for our use. Consequently, a specialized tree structure, the pointer-less insertion tree, is developed to support the algorithm. As a proof of concept, we have implemented a main memory version of the algorithm and its tree structure.


Jensen, C. S., R. T. Snodgrass, ,"Semantics of Time-Varying Information" in Information Systems, Vol. 21, No. 4, pp. 311-352,, 1996

Publication
Online by Elsevier at ScienceDirect

This paper provides a systematic and comprehensive study of the underlying semantics of temporal databases, summarizing the results of an intensive collaboration between the two authors over the last five years. We first examine how facts may be associated with time, most prominently with one or more dimensions of valid time and transaction time. One common case is that of a bitemporal relation, in which facts are associated with timestamps from exactly one valid-time and one transaction-time dimension. These two times may be related in various ways, yielding temporal specialization. Multiple transaction times arise when a fact is stored in one database, then later replicated or transferred to another database. By retaining the transaction times, termed temporal generalization, the original relation can be effectively queried by referencing only the final relation. We attempt to capture the essence of time-varying information via a very simple data model, the bitemporal conceptual data model. Emphasis is placed on the notion of snapshot equivalence of the information content of relations of different data models. The logical design of temporal databases is a natural next topic. Normal forms play a central role during the design of conventional relational databases. We show how to extend the existing relational dependency theory, including the dependencies themselves, keys, normal forms, and schema decomposition algorithms, to apply to temporal relations. However, this theory does not fully take into account the temporal semantics of the attributes of temporal relations. To address this deficiency, we study the semantics of individual attributes. One aspect is the observation and update patterns of attributes - when an attribute changes value and when the changes are recorded in the database, respectively. A related aspect is when an attribute has some value, termed its lifespan. Yet another aspect is the values themselves of attributes - how to derive a value for an attribute at any point in time from stored values, termed temporal derivation. This study of attribute semantics leads to the formulation of temporal guidelines for logical database design.


Jensen, C. S., R. T. Snodgrass, ,"Semantics of Time-Varying Information" in Technical Report R-96-2008, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 50 pages,, 1996




Snodgrass, R. T., M. Böhlen, C. S. Jensen, A. Steiner, ,"Adding Valid Time to SQL/Temporal" in ANSI Expert's Contribution, ANSI X3H2-96-501r1, ISO/IEC JTC1/SC21/ WG3 DBL MAD-146r2, International Organization for Standardization, 77 pages,, 1996

Publication
Online at University of Arizona

This change proposal specifies the addition of tables with valid-time support into SQL/Temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to a temporal system. Initially, important requirements to a temporal system that may facilitate such a transition are motivated and discussed. The proposal then describes the language additions necessary to add valid-time support to SQL3 while fulfilling these requirements. The constructs of the language are divided into four levels, with each level adding increased temporal functionality to its predecessor. The proposal formally defines the semantics of the query language by providing a denotational semantics mapping to well-defined algebraic expressions. Several alternatives for implementing the language constructs are listed. A prototype system implementing these constructs on top of a conventional DBMS is publicly available.


Snodgrass, R. T., M. Böhlen, C. S. Jensen, A. Steiner, ,"Adding Transaction Time to SQL/Temporal" in ANSI Expert's Contribution, ANSI X3H2-96-502r2, ISO/IEC JTC1/SC21/WG3 DBL MAD-147r2, International Organization for Standardization, 47 pages,, 1996

Publication

Transaction time identifies when data was asserted in the database. If transaction time is supported, the states of the database at all previous points of time are retained. This change proposal specifies the addition of transaction time, in a fashion consistent with that already proposed for valid time. In particular, constructs to create tables with valid-time and transaction-time support and query such tables with temporal upward compatibility, sequenced semantics, and nonsequenced semantics, orthogonally for valid and transaction time, is defined. These constructs also can be used in modifications, assertions, cursors, and views.


Böhlen, M. H., C. S. Jensen, ,"Seamless Integration of Time into SQL" in Technical Report R-96-2049, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 51 pages,, 1996




Gregersen, H., C. S. Jensen, ,"Temporal Entity-Relationship Models-a Survey" in Technical Report R-96-2039, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 41 pages. Also TimeCenter Technical Report TR-3,, 1996

Publication

The Entity-Relationship (ER) Model, using varying notations and with some semantic variations, is enjoying a remarkable, and increasing, popularity in both the research community, the computer science curriculum, and in industry. In step with the increasing diffusion of relational platforms, ER modeling is growing in popularity. It has been widely recognized that temporal aspects of database schemas are prevalent and difficult to model using the ER model. As a result, how to enable the ER model to properly capture time-varying information has for a decade and a half been an active area of the database research community. This has led to the proposal of almost a dozen temporally enhanced ER models. This paper surveys all temporally enhanced ER models known to the authors. It is the first paper to provide a comprehensive overview of temporal ER modeling, and it thus meets a need for consolidating and providing easy access to the research in temporal ER modeling. In the presentation of each model, the paper examines how the time-varying information is captured in the model and present the new concepts and modeling constructs of the model. A total of 20 different design properties for temporally enhanced ER models are defined, and each model is characterized according the these properties.


Bair, J., M. Böhlen, C. S. Jensen, R. T. Snodgrass, ,"Notions of Upward Compatibility of Temporal Query Languages" in Technical Report R-96-2038, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 25 pages. Also TimeCenter Technical Report TR-6,, 1996

Publication

Migrating applications from conventional to temporal database management technology has received scant mention in the research literature. This paper formally defines three increasingly restrictive notions of upward compatibility which capture properties of a temporal SQL with respect to conventional SQL that, when satisfied, provide for a smooth migration of legacy applications to a temporal system. The notions of upward compatibility dictate the semantics of conventional SQL statements and constrain the semantics of extensions to these statements. The paper evaluates the seven extant temporal extensions to SQL, all of which are shown to complicate migration through design decisions that violate one or more of these notions. We then outline how SQL-92 can be systematically extended to become a temporal query language that satisfies all three notions.


Torp, K., C. S. Jensen, M. Böhlen, ,"Layered Implementation of Temporal DBMSs - Concepts and Techniques" in Technical Report R-96-2037, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 25 pages. Also TimeCenter Technical Report TR-2,, 1996

Publication

A wide range of database applications manage time-varying data. Examples include, e.g., accounting, personnel, schedule, and data warehousing applications. At the same time, it is well-known that querying and correctly updating time-varying data is difficult and error-prone when using standard SQL. As a result of a decade of intensive exploration, temporal extensions of SQL have reached a level of maturity and sophistication where it is clear that they offer substantial benefits over SQL when managing time-varying data. The topic of this paper is the effective implementation of temporally extended SQL's. Traditionally, it has been assumed that a temporal DBMS must be built from scratch, utilizing new technologies for storage, indexing, query optimization, concurrency control, and recovery. This paper adopts a quite different approach. Specifically, it explores the concepts and techniques involved in implementing a temporally enhanced SQL while maximally reusing the facilities of an existing SQL implementation, e.g., Oracle or DB2. The topics covered span the choice of an adequate timestamp domain that include the time variable "now," a comparison of alternative query processing architectures including a partial parser approach, update processing, and transaction processing, the latter including how to ensure ACID properties and assign correct timestamps.


1995 Top

Jensen, C. S., R. T. Snodgrass, ,"Semantics of Time-Varying Attributes and Their Use for Temporal Database Design" in Fourteenth International Conference on Object-Oriented and Entity Relationship Modeling, Queensland, Australia, pp. 366-377,, 1995

Publication

Based on a systematic study of the semantics of temporal attributes of entities, this paper provides new guidelines for the design of temporal relational databases. The notions of observation and update patterns of an attribute capture when the attribute changes value and when the changes are recorded in the database. A lifespan describes when an attribute has a value. And derivation functions describe how the values of an attribute for all times within its lifespan are computed from stored values. The implications for temporal database design of the semantics that may be captured using these concepts are formulated as schema decomposition rules.


Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, ,"Evaluating the Completeness of TSQL2" in in Proceedings of the International Workshop on Temporal Databases, Zürich, Switzerland, pp. 153-172. The proceedings are entitled Recent Advances in Temporal Databases and are published by Springer-Verlag in their Workshops in Computing Series,, 1995

Publication

The question of what is a well-designed temporal data model and query language is a difficult, but also an important one. The consensus temporal query language TSQL2 attempts to take advantage of the accumulated knowledge gained from designing and studying many of the earlier models and languages. In this sense, TSQL2 represents a constructive answer to this question. Others have provided analytical answers by developing criteria, formulated as completeness properties, for what is a good model and language. This paper applies important existing completeness notions to TSQL2 in order to evaluate the design of TSQL2. It is shown that TSQL2 satisfies only a subset of these completeness notions.


Snodgrass, R. T., C. S. Jensen, C. E. Dyreson, W. Käefer, N. Kline, J. F. Roddick, ,"A Second Example" in Chapter 4, pp. 47-70, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Jensen, C. S., R. T. Snodgrass, ,"The Surrogate Data Type" in Chapter 9, pp. 149-152, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Jensen, C. S., R. T. Snodgrass, M. D. Soo, ,"The TSQL2 Data Model" in Chapter 10, pp. 153-238, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Snodgrass, R. T., C. S. Jensen, M. D. Soo, ,"Schema Specification" in Chapter 11, pp. 239-242, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Snodgrass, R. T., C. S. Jensen, F. Grandi, ,"The From Clause" in Chapter 12, pp. 243-248, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Hsu, S., C. S. Jensen, R. T. Snodgrass, ,"Valid-Time Selection and Projection" in Chapter 13, pp. 249-296, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Leung, T. Y. C., C. S. Jensen, R. T. Snodgrass, ,"Modification" in Chapter 14, pp. 297-302, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Jensen, C. S., R. T. Snodgrass, T. Y. C. Leung, ,"Cursors" in Chapter 15, pp. 303-308, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Clifford, J., C. E. Dyreson, R. T. Snodgrass, T. Isakowitz, C. S. Jensen, ,"Now" in Chapter 20, pp. 383-392, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Jensen, C. S., ,"Vacuuming" in Chapter 23, pp. 447-458, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Soo, M. D., C. S. Jensen, R. T. Snodgrass, ,"An Architectural Framework" in Chapter 24, pp. 461-470, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Soo, M. D., C. S. Jensen, R. T. Snodgrass, ,"An Algebra for TSQL2" in Chapter 27, pp. 501-541, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Snodgrass, R. T., I. Ahn, G. Ariav, D. S. Batory, J. Clifford, C. E. Dyreson, R. Elmasri, F. Grandi, C. S. Jensen, W. Käfer, N. Kline, K. Kulkarni, T. Y. C. Leung, N. Lorentzos, J. F. Roddick, A. Segev, M. D. Soo, S. M. Sripada, ,"Language Syntax" in Chapters 28-38, pp. 549-629, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages,, 1995

Publication



Segev, A., C. S. Jensen, R. T. Snodgrass, ,"Report on The 1995 International Workshop on Temporal Databases" in in ACM SIGMOD Record, Vol. 24, No. 4, pp. 46-52,, 1995

Publication
ACM Author-Izer

This paper provides an overview of the 1995 International Workshop on Temporal Databases. It summarizes the technical papers and related discussions, and three panels: "Wither TSQL3?", "Temporal Data Management in Financial Applications," and "Temporal Data Management Infrastructure & Beyond."


Snodgrass, R. T., M. H. Böhlen, C. S. Jensen, A. Steiner, ,"Change Proposal for SQL/Temporal: Adding Valid Time-Part A" in Technical Report R-95-2024, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 36 pages,, 1995

Publication

The effective management of time-varying information is of essence in a wide range of applications. Such applications may benefit substantially from built-in temporal support in the database management system. It is also important to be able to transition effectively from a non-temporal to a temporal DBMS. This change proposal specifies the addition of temporal tables into SQL/Temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to a temporal system. Initially, important requirements to a temporal system that may facilitate a smooth transition are motivated and discussed. The proposal then describes the language additions necessary to add temporal support to SQL3 while fulfilling these requirements. The constructs of the language are divided into four levels, with each level adding increased temporal functionality to its predecessor. The proposal formally defines the semantics of the query language by providing a denotational semantics mapping to well-defined algebraic expressions. Several alternatives for implementing the language constructs are listed, ranging from minimal extensions of SQL3 systems to alternatives that may exploit temporal query processing techniques and indices to achieve better performance. A prototype system implementing these constructs on top of a conventional DBMS is publicly available.


Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, ,"Evaluating and Enhancing the Completeness of TSQL2" in Technical Report 95-5, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 26 pages,, 1995

Publication

The question of what is a well-designed temporal data model and query language is a diffcult, but also an important one. The consensus temporal query language TSQL2 attempts to take advantage of the accumulated knowledge gained from designing and studying many of the earlier models and languages. In this sense, TSQL2 represents a constructive answer to this question. Others have provided analytical answers by developing criteria, formulated as completeness properties, for what is a good model and language. This paper applies important existing completeness notions to TSQL2 in order to evaluate the design of TSQL2. It is shown that TSQL2 satisfies only a subset of these completeness notions. In response to this, a minimally modified version of TSQL2, termed Applied TSQL2, is proposed; this new language satisfies the notions of temporal semi-completeness and completeness which are not satisfied by TSQL2. An outline of the formal semantics for Applied TSQL2 is given.


Jensen, C. S., R. T. Snodgrass, ,"Semantics of Time-Varying Attributes and Their Use for Temporal Database Design" in Technical Report R-95-2012, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 28 pages,, 1995

Publication

This paper concerns the design of temporal relational database schemas. Normal forms play a central role during the design of conventional relational databases, and we have previously extended all existing relational normal forms to apply to temporal relations. However, these normal forms are all atemporal in nature and do not fully take into account the temporal semantics of the attributes of temporal relations. Consequently, additional guidelines for the design of temporal relations are required. This paper presents a systematic study of important aspects of the temporal semantics of attributes. One such aspect is the observation and update patterns of attributes - when an attribute changes value and when the changes are recorded in the database. A related aspect is when the attributes have values. A third aspect is the values themselves of attributes - how to derive a value for an attribute at any point in time from stored values. Guidelines for the design of the logical schema of a temporal database are introduced, and implications of the temporal-attribute semantics for the design of views and the physical schema are considered. The Bitemporal Conceptual Data Model, the data model of the consensus temporal query language TSQL2, serves as the context for the study.


1994 Top

Jensen, C. S., M. D. Soo, R. T. Snodgrass, ,"Unifying Temporal Data Models via a Conceptual Model" in Information Systems, Vol. 19, No. 7, pp. 513-547,, 1994

Publication
Online by Elsevier at ScienceDirect

To add time support to the relational model, both first normal form (1NF) and non-1NF data models have been proposed. Each has associated advantages and disadvantages. For example, remaining within 1NF when time support is added may introduce data redundancy. On the other hand, well-established storage organization and query evaluation techniques require atomic attribute values, and are thus intended for 1NF models; utilizing a non-1NF model may degrade performance. This paper describes a new temporal data model designed with the single purpose of capturing the time-dependent semantics of data. Here, tuples of bitemporal relations are stamped with sets of two-dimensional chronons in transaction-time/valid-time space. We use the notion of snapshot equivalence to map temporal relation instances and temporal operators of one existing model to equivalent instances and operators of another. We examine five previously proposed schemes for representing bitemporal data: two are tuple-timestamped 1NF representations, one is a Backlog relation composed of 1NF timestamped change requests, and two are non-1NF attribute value-timestamped representations. The mappings between these models are possible using mappings to and from the new conceptual model. The framework of well-behaved mappings between models, with the new conceptual model at the center, illustrates how it is possible to use different models for display and storage purposes in a temporal database system. Some models provide rich structure and are useful for display of temporal data, while other models provide regular structure useful for storing temporal data. The equivalence mappings effectively move the distinction between the investigated data models from a semantic basis to a display-related or a physical, performance-relevant basis, thereby allowing the exploitation of different data models by using each for the task(s) for which they are best suited.


Jensen, C. S., R. Snodgrass, ,"Temporal Specialization and Generalization" in IEEE Transactions on Knowledge and Data Engineering, Vol. 6, No. 6, pp. 954-974,, 1994

Publication

A standard relation has two dimensions: attributes and tuples. A temporal relation contains two additional orthogonal time dimensions: valid time records when facts are true in the modeled reality, and transaction time records when facts are stored in the temporal relation. Although there are no restrictions between the valid time and transaction time associated with each fact, in many practical applications the valid and transaction times exhibit restricted interrelationships that define several types of specialized temporal relations. This paper examines areas where different specialized temporal relations are present. In application systems with multiple, interconnected temporal relations, multiple time dimensions may be associated with facts as they flow from one temporal relation to another. The paper investigates several aspects of the resulting generalized temporal relations, including the ability to query a predecessor relation from a successor relation. The presented framework for generalization and specialization allows one to precisely characterize and compare temporal relations and the application systems in which they are embedded. The framework's comprehensiveness and its use in understanding temporal relations are demonstrated by placing previously proposed temporal data models within the framework. The practical relevance of the defined specializations and generalizations is illustrated by sample realistic applications in which they occur. The additional semantics of specialized relations are especially useful for improving the performance of query processing.


Soo, M. D., R. T. Snodgrass, C. S. Jensen, ,"Efficient Evaluation of the Valid-Time Natural Join" in in Proceedings of the Tenth IEEE International Conference on Data Engineering, Houston, TX, pp. 282-292,, 1994

Publication

Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the optimization of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally-varying data dramatically increases the size of the database. These factors require new techniques to e1~iciently evaluate valid-time joins. We address this need for efficient join evaluation in databases supporting valid-time. A new temporaljoin algorithm based on tuple partitioning is introduced. This algorithm avoids the quadratic cost of nestedloop evaluation methods; it also avoids sorting. Performance comparisons between the partition-based algorithm and other evaluation methods are provided. While we focus on the important valid-time natural join, the techniques presented are also applicable to other valid-time joins.


Snodgrass, R. T., I. Ahn, G. Ariav, D. S. Batory, J. Clifford, C. E. Dyreson, R. Elmasri, F. Grandi, C. S. Jensen, W. Käfer, N. Kline, K. Kulkarni, T. Y. C. Leung, N. Lorentzos, J. F. Roddick, A. Segev, M. D. Soo, S. M. Sripada, ,"A TSQL2 Tutorial" in in ACM SIGMOD Record, Vol. 23, No. 3, pp. 27-33,, 1994

Publication
ACM Author-Izer



Snodgrass, R. T., I. Ahn, G. Ariav, D. S. Batory, J. Clifford, C. E. Dyreson, R. Elmasri, F. Grandi, C. S. Jensen, W. Käfer, N. Kline, K. Kulkarni, T. Y. C. Leung, N. Lorentzos, J. F. Roddick, A. Segev, M. D. Soo, S. M. Sripada, ,"TSQL2 Language Specification" in in ACM SIGMOD Record, Vol. 23, No. 1, pp. 65-86,, 1994

Publication
ACM Author-Izer



Dahl, K., H. Gregersen, C. A. Have, C. S. Jensen, J. Sigurdsson, J. S. Winter, ,"Databasebenchmarks" in in PROSA-bladet, No. 5, pp. 15-17 (in Danish),, 1994

Publication



Jensen, C. S. et al. (editors, with multiple other contributors), ,"A Consensus Glossary of Temporal Database Concepts" in in ACM SIGMOD Record, Vol. 23, No. 1, pp. 52-65 (Special Section: Temporal Database Infrastructure),, 1994

Publication
Online at ACM Digital Library

This document contains definitions of a wide range of concepts specific to and widely used within temporal databases. In addition to providing definitions, the document also includes separate explanations of many of the defined concepts. Two sets of criteria are included. First, all included concepts were required to satisfy four relevance criteria, and, second, the naming of the concepts was resolved using a set of evaluation criteria. The concepts are grouped into three categories: concepts of general database interest, of temporal database interest, and of specialized interest. This document is a digest of a full version of the glossary. In addition to the material included here, the full version includes substantial discussions of the naming of the concepts. The consensus effort that lead to this glossary was initiated in Early 1992. Earlier status documents appeared in March 1993 and December 1992 and included terms proposed after an initial glossary appeared in SIGMOD Record in September 1992. The present glossary subsumes all the previous documents. It was most recently discussed at the "ARPA/NSF International Workshop on an Infrastructure for Temporal Databases," in Arlington, TX, June 1993, and is recommended by a significant part of the temporal database community. The glossary meets a need for creating a higher degree of consensus on the definition and naming of temporal database concepts.


Snodgrass, R. T. (ed.), I. Ahn, G. Ariav, P. Bayer, J. Clifford, C. E. Dyreson, F. Grandi, L. Hermosilla, C. S. Jensen, W. Käfer, N. Kline, K. Kulkarni, T. Y. C. Leung, N. Lorentzos, Y. Mitsopoulos, J. F. Roddick, M. D. Soo, S. M. Sripada, ,"An Evaluation of TSQL2" in 53 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

This document evaluates the TSQL2 language against a sizable consensus test suite of temporal database queries. The test suite consists of a database schema, an instance for the schema, and a set of approximately 150 queries on this database. The reader is cautioned that the queries have not been independently validated nor tested on an implementation of TSQL2. Given the number and complexity of some of the queries, there are most certainly errors.


Soo, M. D., C. S. Jensen, R. T. Snodgrass, ,"An Algebra for TSQL2" in Technical Report R-94-2053, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 33 pages,, 1994

Publication



Hsu, S., C. S. Jensen, R. T. Snodgrass, ,"Valid-time Selection and Projection in TSQL2" in Technical Report R-94-2052, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 36 pages,, 1994

Publication

Temporal databases have now been studied for more than a decade. During that period of time, numerous query languages have been proposed for temporal databases. One of the essential components of a temporal query language is valid-time selection, which allows the user to retrieve tuples according to their valid-time relationship. Often, this component is closely tied to another important component, valid-time projection, which defines the timestamps of the tuples in query results. Here, nine difierent temporal query languages, primarily SQL and QUEL extensions, are examined with a focus on valid-time selection and projection. Based on that survey, the specific design of the valid-time selection and projection components of the consensus temporal query language TSQL2 are presented.


Jensen, C. S., R. T. Snodgrass, M. D. Soo, ,"The TSQL2 Data Model" in Technical Report R-94-2051, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 60 pages,, 1994

Publication



Torp, K., L. Mark, C. S. Jensen, ,"Efficient Differential Timeslice Computation" in Technical Report R-94-2055, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 26 pages,, 1994

Publication

Transaction-time databases record all previous database states and are ever-growing, leading to potentially huge quantities of data. For that reason, eficient query processing and the utilization of cheap write-once storage media is of particular importance. This is facilitated by adopting a log-based storage structure. Timeslices, i.e., relation states or snapshots, are computed by traversing the logs, using previously computed and cached timeslices as outsets. When computing a new timeslice, the cache will contain two candidate outsets: an earlier outset and a later outset. The new timeslice can be computed by either incrementally updating the earlier outset or decrementally downdating the later outset. This paper proposes an eficient algorithm that eficiently identifies the cheaper outset. The perhaps most obvious algorithm uses the proximity in time between the earlier and later outsets and the new timeslice as the basis for its measure of cost. Unfortunately, this is not a reliable measure of the in the number of changes recorded in the logs between each of the two outsets and the new timeslice. The amount of change to the database may vary substantially over time. We subsequently investigated a number of index structures on the timestamps in the logs, including B+-trees, Monotonic B+-trees, and Append-only trees. The fundamental idea was that determining the relative positioning, of the timestamps of the earlier, the new, and the later timeslices, in an index would allow a computation of the corresponding number of changes recorded in the logs between these times. Unfortunately, the lack in these index structures of either a homogeneous node structure, a controllable fill-factor for nodes, or of an appropriate node allocation algorithm greatly complicated the computation. Consequently, a specialized index structure was developed to support the algorithm. We present and analyze two variations of this index structure, the Insertion tree (I-tree) and the Pointer-less Insertion tree (PLI-tree). The cost of using one of these trees for picking the optimal outset for timeslice computation is only slightly lower than that of using a B+-tree. However, being sparse and packed, I-trees and PLI-trees require little space overhead, and they are cheap to maintain as the underlying relations are updated. The trees also provide a basis for an algorithm that precisely and eficiently predicts the actual costs of computing timeslices in advance. This is useful for query optimization and can be essential in real-time applications. Finally, it is demonstrated how the trees can be used in the computation of other types of queries. As a proof of the functionality of the I- and PLI-tree we have implemented main memory versions of both.


Jensen, C. S., R. T. Snodgrass, M. D. Soo, ,"Extending Existing Dependency Theory to Temporal Databases" in Technical Report R-94-2050, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 58 pages,, 1994

Publication

Normal forms play a central role in the design of relational databases. Several normal forms for temporal relational databases have been proposed. These definitions are particular to specific temporal data models, which are numerous and incompatible. This paper attempts to rectify this situation. We define a consistent framework of temporal equivalents of the important conventional database design concepts: functional dependencies, primary keys, and third and Boyce-Codd normal forms. This framework is enabled by making a clear distinction between the logical concept of a temporal relation and its physical representation. As a result, the role played by temporal normal forms during temporal database design closely parallels that of normal forms during conventional database design. These new normal forms apply equally well to all temporal data models that have timeslice operators, including those employing tuple timestamping, backlogs, and attribute value timestamping. As a basis for our research, we conduct a thorough examination of existing proposals for temporal dependencies, keys, and normal forms. To demonstrate the generality of our approach, we outline how normal forms and dependency theory can also be applied to spatial and spatiotemporal databases.


Jensen, C. S., ,"Vacuuming in TSQL2" in Technical Report R-94-2049, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 17 pages,, 1994

Publication

Updates, including (logical) deletions, to temporal tables that support transaction time result in insertions at the physical level. Despite the continuing decrease in cost of data storage, it is still, for various reasons, not always acceptable that all data be retained forever. Therefore, there is a need for a new mechanism for the vacuuming, i.e., physical deletion, of data when such tables are being managed. We propose syntax and informal semantics for vacuuming of data from temporal tables in TSQL2 which support transaction time. The mechanism allows - at schema definition time, as well as later, during the life span of a table - for the specification of so-called cut-off points. A cut-off point for a table is a timestamp that evaluates to a time instant. The timestamp may be either absolute or a bound or unbound now-relative timestamp. Conceptually, the cut-off point indicates that all data, current in the table solely before the (current value of the) timestamp, has been physically deleted. Vacuuming based on cut-off points is an example of a more general notion of vacuuming where arbitrary subsets of data may be physically deleted.


Clifford, J., C. Dyreson, T. Isakowitz, C. S. Jensen, R. T. Snodgrass, ,"On the Semantics of Now in Temporal Databases" in Technical Report R-94-2047, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 50 pages,, 1994

Publication

Most databases record time-varying data, and significant efforts have been devoted to the convenient and efficient management of such data. Perhaps most prominently, numerous data models with varying degrees of built-in support for the temporal dimension of data have been proposed. Some models are quite restricted and simply support uninterpreted attribute domains for times and dates. Other models incorporate either a valid-time dimension, recording when the stored data is true, or a transaction-time dimension, recording when the stored data is current in the database. Bitemporal data models incorporate both valid and transaction time. The special temporal notion of an ever-increasing current-time value has been reflected in some of these data models by inclusion of current-time variables, such as "now," "until-changed," "1," "@" and "-." As timestamp values associated with facts in temporal databases, such variables may be conveniently used for indicating that a fact is currently valid. Although the services of time variables are very desirable, their use leads to a new type of database, consisting of tuples with variables, termed variable databases. This paper proposes a framework for defining the semantics of the variable databases of temporal relational data models. A framework is presented because several reasonable meanings may be given to databases that use some of the specific temporal variables that have appeared in the literature. Using the framework, the paper defines a useful semantics for such databases. Because situations occur where the existing time variables are inadequate, two new types of modeling entities that address these shortcomings, timestamps which we call now-relative and now-relative indeterminate, are introduced and defined within the framework. Moreover, the paper provides a foundation, using algebraic bind operators, for the querying of variable databases via existing query languages. This transition to variable databases presented here requires minimal change to the query processor. Finally, to underline the practical feasibility of variable databases, we show that variables may be represented and manipulated efficiently, incurring little space or execution time overhead.


Snodgrass, R. T., I. Ahn, G. Ariav, D. Batory, J. Clifford, C. E. Dyreson, R. Elmasri, F. Grandi, C. S. Jensen, W. Käfer, N. Kline, K. Kulkarni, T. Y. C. Leung, N. Lorentzos, J. F. Roddick, A. Segev, M. D. Soo, S. M. Sripada, ,"The TSQL2 Language Specification" in (the language specification proper) 68 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication



Jensen, C. S., R. T. Snodgrass, ,"The Surrogate Data Type in TSQL2" in 4 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

This document proposes syntax and informal semantics for the inclusion of a SURROGATE data type in the TSQL2 query language.


Jensen, C. S., R. T. Snodgrass, M. D. Soo, ,"The TSQL2 Data Model" in 62 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication



Snodgrass, R. T., C. S. Jensen, F. Grandi, ,"Schema Specification in TSQL2" in 4 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

This document proposes syntax and informal semantics for extended Create and Alter statements that permit valid-time tables to be defined.


Snodgrass, R. T., C. S. Jensen, ,"The From Clause in TSQL2" in 6 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

This document proposes syntax and informal semantics for an extended From clause in the Select statement.


Hsu, S., C. S. Jensen, R. T. Snodgrass, ,"A Survey of Valid-time Selection and Projection in Temporal Query Languages" in 23 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

Temporal databases have now been studied for more than a decade. During that period of time, numerous query languages have been proposed for temporal databases. One of the essential parts of a temporal query language is valid-time selection, which allows the user to retrieve tuples according to their valid-time relationships. Valid-time projection is another important ingredient which defines the timestamps of the tuples in query results. Here, nine different temporal query languages are examined with a focus on valid-time selection and projection. This document is intended to provide an ideal foundation for designing the valid-time selection and projection components of the consensus query language TSQL2 that is currently being designed.


Hsu, S., C. S. Jensen, R. T. Snodgrass, ,"Valid-time Selection in TSQL2" in 14 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

Temporal databases have now been studied for more than a decade, and numerous temporal query languages have been proposed. One of the essential parts of a temporal query language is valid-time selection, which allows the user to retrieve tuples based on their underlying valid-times. We have previously surveyed valid-time selection and projection in nine temporal query languages, primarily SQL and Quel extensions. Based in that survey, this document proposes a specific design of the valid-time selection component of the consensus temporal query language TSQL2 that is currently being designed.


Hsu, S., C. S. Jensen, R. T. Snodgrass, ,"Valid-time Projection in TSQL2" in 10 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

Temporal databases have now been studied for more than a decade, and numerous temporal query languages have been proposed. Valid-time projection, which defines the timestamps of the tuples in query results, is an important ingredient of a temporal query language. Often, valid-time projections is closely tied to another important component, namely valid-time selection, which allows the user to retrieve tuples based on their underlying valid-times. We have previously surveyed valid-time selection and projection in nine temporal query languages, primarily SQL and Quel extensions. Based in that survey, this document proposes a specific design of the valid-time selection component of the consensus temporal query language TSQL2 that is currently being designed.


Leung, T. Y. C., C. S. Jensen, R. T. Snodgrass, ,"Update in TSQL2" in 5 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

This document proposes syntax and informal semantics for update in TSQL2.


Jensen, C. S., R. T. Snodgrass, T. Y. C. Leung, ,"Cursors in TSQL2" in 4 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

This document proposes syntax and informal semantics for cursors in TSQL2


Clifford, J., C. E. Dyreson, R. T. Snodgrass, T. Isakowitz, C. S. Jensen, ,"Now in TSQL2" in 12 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

"Now" is a distinguished timestamp value used by many temporal data model proposals. In this paper, we propose a new king of event, a a now-relative event, that more accurately captures the semantics of "now". We discuss query language constructs, representation, and query processing strategies for such values. We demonstrate that these values incur no storage overhead and nominal additional query execution cost. The related concepts of "infinite future" and "infinite past" are also considered.


Jensen, C. S., ,"Vacuuming in TSQL2" in 10 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication

Updates, including (logical) deletions, to temporal tables that support transaction time result in insertions at the physical level. Despite the continuing decrease in cost of data storage, it is still, for various reasons, not always acceptable that all data be retained forever. Therefore, there is a need for a new mechanism for the vacuuming, i.e., physical deletion, of data when such tables are being managed. We propose syntax and informal semantics for vacuuming of data from temporal tables in TSQL2 which support transaction time. The mechanism allows - at schema definition time, as well as later, during the life span of a table - for the specification of so-called cut-off points. A cut-off point for a table is a timestamp that evaluates to a time instant. The timestamp may be either absolute or a bound or unbound now-relative timestamp. Conceptually, the cut-off point indicates that all data, current in the table solely before the (current value of the) timestamp, has been physically deleted. Vacuuming based on cut-off points is an example of a more general notion of vacuuming where arbitrary subsets of data may be physically deleted.


Soo, M. D., C. S. Jensen, R. T. Snodgrass, ,"An Algebra for TSQL2" in 36 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages,, 1994

Publication



Torp, K., L. Mark, C. S. Jensen, ,"Efficient Differential Timeslice Computation" in Technical Report GIT-CC-94-12, Georgia Institute of Technology, College of Computing, Atlanta, Georgia 30332, USA, 15 pages,, 1994

Publication

Transaction-time databases record all previous database states and are ever-groving, leading to potentially huge quantities of data. For that reason, efficient query processing is of particular importance. Due to the large size of transaction-time relations, it is advantageous to utilize cheap write-once storage media for storage. This is facilitated by adopting a log-based storage structure. Timeslices, i.e., relation states or snapshots, are computed by traversing the logs, using previously computed and cached timeslices as outsets. When computing a new timeslice, the cache will contain two candidate outsets: an earlier outset and a later outset. We provide efficient means of always picking the optimal one. Specifically, we define and investigate the use of a new data structure, the B+-tree-like Insertion Tree (I-tree), for this purpose. The cost of using an I-tree for picking the optimal outset is similar to that of using a B+-tree. Being sparse, I-trees require little space overhead, and they are cheap to maintain as the underlying relations are updated. I-trees also provide a basis for precisely and efficiently estimating the costs of performing timeslices in advance. This is useful for query optimization and can be essential in real-time applications. Finally, it is demonstrated how I-trees can be used in the computation of other types of queries.


1993 Top

Jensen, C. S., L. Mark, N. Roussopoulos, T. Sellis, ,"Using Differential Techniques to Efficiently Support Transaction Time" in The VLDB Journal, Vol. 2, No. 1, pp. 75-111,, 1993

Publication

We present an architecture for query processing in the relational model extended with transaction time. The architecture integrates standard query optimization and computation techniques with new differential computation techniques. Differential computation computes a query incrementally or decrementally from the cached and indexed results of previous computations. The use of differential computation techniques is essential to provide efficient processing of queries that access very large temporal relations. Alternative query plans are integrated into a state transaction network, where the state space includes backlogs of base relations, cached results from previous computations, a cache index, and intermediate results; the transactions include standard relational algebra operators, operators for constructing differential files, operators for differential computation, and combined operators. A rule set is presented to prune away parts of state transition networks that are not promising, and dynamic programming techniques are used to identify the optimal plans from the remaining state transition networks. An extended logical access path serves as a "structuring" index on the cached results and contains, in addition, vital statistics for the query optimization process (including statistics about base relations, backlogs, and queries - previously computed and cached, previously computed, or just previously estimated).


Jensen, C. S., R. T. Snodgrass, ,"Three Proposals for a Third-Generation Temporal Data Model" in in Proceedings of the International Workshop on an Infrastructure for Temporal Databases, Arlington, TX, pp. T-1-T10,, 1993

Publication

We present three general proposals for a next-generation temporal data model. Each of these proposals express a synthesis of a variety of contributions from diverse sources within temporal databases. We believe that the proposals may aid in bringing consensus to the area of temporal data models. The current plethora of diverse and incompatible temporal data models has an impeding effect on the design of a consensus temporal data model. A single data model is highly desirable, both to the temporal database community and to the database user community at large. It is our contention that the simultaneous foci on the modeling, presentation, representation, and querying of temporal data have been a major cause of the proliferation of models. We advocate instead a separation of concerns. As the next step, we propose a data model for the single, central task of temporal data modeling. In this model, tuples are stamped with bitemporal elements, i.e., sets of pairs of valid and transaction time chronons. This model has no intention of being suitable for the other tasks, where existing models may perhaps be more appropriate. However, this model does capture time-varying data in a natural way. Finally, we argue that flexible support for physical deletion is needed in bitemporal databases. Physical deletion requires special attention in order not to compromise the correctness of query processing.


Jensen, C. S., M. D. Soo, R. T. Snodgrass, ,"Unification of Temporal Data Models" in in Proceedings of the Ninth IEEE International Conference on Data Engineering, Vienna, Austria, pp. 262-271,, 1993

Publication

To add time support to the relational model, both first normal form (1NF) and non-1NF approaches have been proposed. Each has associated dii]iculties. Remaining within 1NF when time support is added may introduce data redundancy. The non-1NF models may not be capable of directly using existing relational storage structures or query evaluation strategies. This paper describes a new, conceptual temporal data model that better captures the time-dependent semantics of the data while permitting multiple data models at the representation level. This conceptual model effectively moves the distinction between the various existing data models from a semantic basis to a physical, performance-relevant basis. We define a conceptual notion of a bitemporal relation where tuples are stamped with sets of two-dimensional chronons in transaction-time/valid-time space. We introduce a tuple-timestamped 1NF representation to exemplify how the conceptual bitemporal data model is related, by means of snapshot equivalence, with representational models. We then consider querying within the two-level framework. We first define an algebra at the conceptual level. We proceed to map this algebra to the sample representational model in such a way that new operators compute equivalent results for different representations of the same conceptual bitemporal relation. This demonstrates that the representational model is faithful to the semantics of the conceptual data model, with many choices available that may be exploited to improve performance.


Jensen, C. S., L. Mark, ,"Differential Query Processing in Transaction-Time Databases" in Chapter 19, pp. 457-491, in Temporal Databases: Theory, Design, and Implementation, edited by A. Tansel et al., Benjamin/Cummings Publishers, Database Systems and Applications Series,, 1993

Publication



Jensen, C. S., J. Clifford, S. K. Gadia, A. Segev, R. T. Snodgrass, ,"A Glossary of Temporal Database Concepts" in Appendix A, pp. 621-633, in Temporal Databases: Theory, Design, and Implementation, edited by A. Tansel et al., Benjamin/Cummings Publishers, Database Systems and Applications Series,, 1993

Publication
ACM Author-Izer



Jensen, C. S. (editor, with multiple other contributors), ,"Proposed Temporal Database Concepts - May 1993" in in Proceedings of the International Workshop on an Infrastructure for Temporal Databases, Arlington, TX, pp. A-1-A-24,, 1993

Publication

This document contains the complete set of glossary entries proposed by members of the temporal database community from Spring 1992 until May 1993. It is part of an initiative aimed at establishing an infrastructure for temporal databases. As such, the proposed concepts will be discussed during "International Workshop on an Infrastructure for Temporal Databases," in Arlington, TX, June 1993, with the specific purpose of defining a consensus glossary of temporal database concepts and names. Earlier status documents appeared in March 1993 and December 1992 and included terms proposed after an initial glossary appeared in SIGMOD Record in September 1992. This document subsumes all the previous documents. Additional information related to the initiative may be found at cs.arizona.edu in the tsql directory, accessible via anonymous ftp.


Jensen, C. S. (editor, with multiple other contributors), ,"Addendum to 'Proposed Temporal Database Concepts - May 1993'" in in Proceedings of the International Workshop on an Infrastructure for Temporal Databases, Arlington, TX, pp. A-25-A-29,, 1993

Publication

The paper "Proposed Temporal Database Concepts - May 1993" contained a complete set of glossary entries proposed by members of the temporal database community from Spring 1992 until May 1993. The aim of the proposal was to define a consensus glossary of temporal database concepts and names. Several glossary entries (Section 3) were included in the proposal, but were still unresolved at the time of the deadline. This addendum reflects on-going discussions and contains revised versions of several unresolved entries. The entries here thus supersede the corresponding entries in Section 3 of the proposal.


Jensen, C. S. (editor, with multiple other contributors), ,"The TSQL2 Benchmark" in in Proceedings of the International Workshop on an Infrastructure for Temporal Databases, Arlington, TX, pp. QQ-1-QQ-28,, 1993

Publication

This document presents the temporal database community with an extensive, consensus benchmark for temporal query languages. The benchmark is semantic in nature. It is intended to be helpful when evaluating the user-friendliness of temporal query languages, including proposals for the consensus temporal SQL that is currently being developed. The benchmark consists of a database schema, an instance for the schema, and a set of queries on the this database. The queries are classified according to a taxonomy, which is also part of the benchmark.


Jensen, C. S. (ed.) et al., ,"A Consensus Test Suite of Temporal Database Queries" in Technical Report R-93-2034, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 45 pages,, 1993

Publication

This document presents the temporal database community with an sizable consensus test suite of temporal database queries. The test suite is intended to be helpful when evaluating the user-friendliness of temporal relational query languages. The test suite consists of a database schema, an instance for the schema, and a set of approximately 150 queries on this database. The queries are classified according to a taxonomy, which is also included in the document.


Snodgrass, R. T., C. E. Dyreson, C. S. Jensen, N. Kline, L. Soo, M. D. Soo, ,"The MultiCal System" in Manual and Systems Documentation, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, ca. 300 pages,, 1993




Dyreson, C., R. Snodgrass, C. S. Jensen, ,"On the Semantics of `Now' in Temporal Databases" in TempIS Technical Report 42, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 22 pages,, 1993

Publication

"Now" is a distinguished timestamp value used by many temporal data model proposals. In this paper, we examine the different semantics given this familiar term, and propose two new timestamp values, a deterministic now-relative value and an indeterminate now-relative value, that more accurately capture the semantics of "now." We discuss query language constructs, representation, and query processing strategies for such value.s Both the valid-time and transaction-time dimentions are considered. We demonstrated that these values incur no storage overhead and nominal additional query execution cost. The related concepts of "indefinite future"" and "indefinite now" are also considered.


Jensen, C. S. et al. (eds.), ,"A Consensus Glossary of Temporal Database Concepts" in Technical Report R-93-2035, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 55 pages,, 1993

Publication

This document contains definitions of a wide range of concepts specific to and widely used within temporal databases. In addition to providing definitions, the document also includes explanations of concepts as well as discussions of the adopted names. The consensus effort that lead to this glossary was initiated in Early 1992. Earlier status documents appeared in March 1993 and December 1992 and included terms proposed after an initial glossary appeared in SIGMOD Record in September 1992. The present glossary subsumes all the previous documents. It was most recently discussed at the "ARPA/NSF International Workshop on an Infrastructure for Temporal Databases," in Arlington, TX, June 1993, and the present glossary is recommended by a significant part of the temporal database community. The glossary meets a need for creating a higher degree of consensus on the definition and naming of temporal database concepts. Two sets of criteria are included. First, all included concepts were required to satisfy four relevance criteria, and, second, the naming of the concepts was resolved using a set of evaluation criteria. The concepts are grouped into three categories: concepts of general database interest, of temporal database interest, and of specialized interest.


Jensen, C. S., M. D. Soo, R. T. Snodgrass, ,"Unifying Temporal Data Models via a Conceptual Model" in Technical Report 93-31, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 36 pages,, 1993

Publication

To add time support to the relational model, both first normal form (1NF) and non-1NF approaches have been proposed. Each has associated difficulties. Remaining within 1NF when time support is added may introduce data redundancy. The non-1NF models may be incapable of directly using existing relational storage structures or query evaluation technologies. This paper describes a new, conceptual temporal data model that better captures the time-dependent semantics of the data, while permitting multiple data models at the representation level. This conceptual model effectively moves the distinction between the various existing data models from a semantic basis to a physical, performance-relevant basis. We define a conceptual notion of a bitemporal relation where tuples are stamped with sets of two-dimensional chronons in transaction-time/valid-time space. Next, we describe five representation schemes that support both valid and transaction time; these representations include both 1NF and non-1NF models. We use snapshot equivalence to relate the representation data models with the bitemporal conceptual data model. We then consider querying within the two-level framework. To do so, we define an algebra at the conceptual level. We then map this algebra to the representation level in such a way that new operators compute equivalent results for different representations of the same bitemporal conceptual relation. This demonstrates that all of these representations are faithful to the semantics of the conceptual data model, with many choices available that may be exploited to improve performance.


Soo, M. D., R. T. Snodgrass, C. S. Jensen, ,"Efficient Evaluation of the Valid-Time Natural Join" in Technical Report 93-17, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 26 pages,, 1993

Publication

Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the optimization of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally-varying data dramatically increases the size of the database. These factors require new techniques to efficiently evaluate valid-time joins. We address this need for efficient join evaluation in databases supporting valid-time. A new temporal-join algorithm based on tuple partitioning is introduced. This algorithm avoids the quadratic cost of nested-loop evaluation methods; it also avoids sorting. Performance comparisons between the partition-based algorithm and other evaluation methods are provided. While we focus on the important valid-time natural join, the techniques presented are also applicable to other valid-time joins.


Jensen, C. S. (ed.), ,"Proposed Glossary Entries - March 1993" in Status report for the Terminology Subtask of the TSQL2 Design Initiative. Distributed to the TSQL Mailing List, 25 pages,, 1993

Publication

This document describes the current status, as of March 30, 1993, of an initiative aimed at creating a consensus glossary of temporal database concepts and names. An earlier status document appeared in December 1992 and included terms proposed after an initial glossary appeared in SIGMOD Record. This document contains a set of new terms, proposed since December 1992, and the terms from the December 1992 document. To provide a context, the terms from the initial glossary are included in an appendix in dictionary format, and criteria for evaluation of glossary entries are also listed in the appendix. The document is intended to help future contributors of glossary entries. Proposed glossary entries should be sent to tsql@cs.arizona.edu. Other information related to the initiative may be found at cs.arizona.edu in the tsql directory, accessible via anonymous ftp.


Soo, M. D., R. Snodgrass, C. E. Dyreson, N. Kline, C. S. Jensen, ,"Architectural Extensions to Support Multiple Calendars" in The MULTICAL Project, Department of Computer Science, The University of Arizona, Tucson, Arizona 85721, USA, 78 pages,, 1993

Publication

We describe in detail a system architecture for supporting a time-stamp attribute domain in conventional relational database management systems. This architecture underlies previously proposed temporal modifications to SQL. We describe the major components of the system and how they interact. For each component of the system, we provide specifications for the routines exported by that component. Finally, we describe a preliminary design for a tollkit that aids in the generation of the components of the database management system that support time.


1992 Top

Jensen, C. S., L. Mark, ,"Queries on Change in an Extended Relational Model" in IEEE Transactions on Knowledge and Data Engineering, Vol. 4, No. 2, pp. 192-200,, 1992

Publication

A data model that allows for the storage of detailed change history in so-called backlog relations is described. Its extended relational algebra, in conjunction with the extended data structures, provides a powerful tool for the retrieval of patterns and exceptions in change history. An operator, S, based on the notion of compact active domain is introduced. It groups data not in predefined groups but in groups that fit the data. This operator further expands the retrieval capabilities of the algebra. The expressive power of the algebra is demonstrated by examples, some of which show how patterns and exceptions in change history can be detected. Sample applications of this work are statistical and scientific databases, monitoring (of databases, manufacturing plants, power plants, etc.), CAD, and CASE.


Jensen, C. S., R. Snodgrass, ,"Temporal Specialization" in in Proceedings of the Eighth IEEE International Conference on Data Engineering, Phoenix, AZ, pp. 594-603,, 1992

Publication

In temporal specialization, the database designer restricts the relationship between the valid time-stamp (recording when something is true in the reality being modeled) and the transaction time-stamp (recording when a fact is stored in the database). An example is a retroactive temporal event relation, where the event must have occurred before it was stored, i.e., the valid time-stamp is restricted to be less than the transaction time-stamp. We discuss many useful restrictions, defining a large number of specialized types of temporal relations, and indicate some of their applications. We present a detailed taxonomy of specialized temporal relations. This taxonomy may be employed during database design to specify the particular time semantics of temporal relations.


Jensen, C. S., J. Clifford, S. K. Gadia, A. Segev, R. T. Snodgrass, ,"A Glossary of Temporal Database Concepts" in in ACM SIGMOD Record, Vol. 21, No. 3, pp. 35-43,, 1992

Publication
Online at ACM Digital Library

This glossary contains concepts specific to temporal databases that are well-defined, well understood, and widely used. In addition to defining and naming the concepts, the glossary also explains the decisions made. It lists competing alternatives and discusses the pros and cons of these. It also includes evaluation criteria for the naming of concepts. This paper is a structured presentation of the results of e-mail discussions initiated during the preparation of the first book on temporal databases, Temporal Databases: Theory, Design, and Implementation, published by Benjamin/Cummings, to appear January 1993. Independently of the book, an initiative aimed at designing a consensus Temporal SQL is under way. The paper is a contribution towards establishing common terminology, an initial subtask of this initiative.


Jensen, C. S., R. Snodgrass, ,"Proposal for a Data Model for the Temporal Structured Query Language" in TempIS Technical Report 37, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 32 pages,, 1992

Publication

Adding time to the relational model has been a daunting task. More than two dozen time-extended relational data models have been proposed over the last fifteen years. We feel that the reason why so many temporal data models have been proposed is that these models attempt to simultaneously retain the simplicity of the relational model, present all the information concerning an object in one tuple, and ensure ease of implementation and query evaluation efficiency. We advocate instead a separation of concerns. We propose a new, conceptual temporal data model that better captures the time-dependent semantics of the data while permitting multiple data models at the representation level. This conceptual model effectively moves the distinction between the various existing data models from a semantic basis to a physical, performance-relevant basis. We compare our model with the previously proposed temporal data models.


Jensen, C. S., M. D. Soo, ,"Temporal Joins in a Two-Level Data Model" in Unpublished manuscript, 24 pages,, 1992

This paper focuses on the outer natural join of temporal relations and the simpler temporal joins by which it is defined. Joins are of particular interest because they are both frequently used and computationally expensive. The temporal joins are defined within a model that completely separates the conceptual notion of a temporal relation from the actual representation of the temporal relation. Conceptually, tuples in a temporal relation have two associated times - valid time, recording changes in reality, and transaction time, recording changes in the database. We have adopted a representation scheme where temporal relations are embedded in conventional relations. One conceptual temporal relation may be represented by many such embeddings. The join operators are representation independent in that they compute equivalent results for different representations of the same temporal relation. The separation of a temporal relation from its representation is conceptually desirable; it allows significant flexibility when implementing the join operators which, in turn, may be exploited to gain better performance. The joins proposed are natural generalizations of their counterparts in the conventional relational algebra. Just as the natural join is used when creating loss-less join decompositions in conventional relational database design, the generalized counterpart, the temporal natural join, plays the identical role when designing temporal relational database schemas.


Soo, M. D., R. T. Snodgrass, C. E. Dyreson, C. S. Jensen, N. Kline, ,"Architectural Extensions to Support Multiple Calendars" in TempIS Technical Report 32, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 74 pages,, 1992

Publication

This paper is a detailed description of the design of the architectural extensions to a database management system (DBMS) supporting multiple calendars. The paper contains descriptions of the modules comprising the architectural extensions, with detailed descriptions of the services provided by each module. In addition, the data structures used by each system module are included.


Jensen, C. S. (ed.), ,"Proposed Glossary Entries - December 1992" in Status report for the Terminology Subtask of the TSQL2 Design Initiative. Distributed to the TSQL Mailing List, 14 pages,, 1992

Publication

This document describes the current status, as of December 15, 1992, of an initiative aimed at creating a consensus glossary of temporal database concepts and names. It contains the set of currently proposed, complete glossary entries. Existing terms and criteria for evaluation of glossary entries are contained in appendices. The document is intended to help future contributors of glossary entries. Proposed glossary entries should be sent to tsql@cs.arizona.edu. Other information related to the initiative may be found at cs.arizona.edu in the tsql directory, accessible via anynomous ftp.


Jensen, C. S., R. T. Snodgrass, M. D. Soo, ,"Extending Normal Forms to Temporal Relations" in Technical Report 92-17, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 24 pages,, 1992

Publication

Normal forms play a central role in the design of relational databases. Recently several normal forms for temporal relational databases have been proposed. The result is a number of isolated and sometimes contradictory contributions that only apply within specialized settings. This paper attempts to rectify this situation. We define a consistent framework of temporal equivalents of all the important conventional database design concepts: functional and multivalued dependencies, primary keys, and third, Boyce-Codd, and fourth normal forms. This framwork is enabled by making a clear distinction between the logical concept of a temporal relation and its physical representation. As a result, the role played by temporal norms forms during temporal database design closely parallels that of normal forms during conventional database design. We compare our approach with previously proposed definitions of temporal normal forms and temporal keys. To demonstrate the generality of our approach, we outline how normal forms and dependency theory can also be applied to spatial databases, as well as to spatial-temporal databases.


Jensen, C. S., M. D. Soo, R. T. Snodgrass, ,"Unification of Temporal Data Models" in Technical Report TR 92-15, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 28 pages,, 1992

Publication

To add time support to the relational model, both first normal form (1NF) and non-1NF approaches have been proposed. Each has associated difficulties. Remaining within 1NF when time support is added may introduce data redundancy. The non-1NF models may not be capable of directly using existing relational storage structures or query evaluation technologies. This paper describes a new, conceptual temporal data model that better captures the time-dependent semantics of the data while permitting multiple data models at the representation level. This conceptual model effectively moves the distinction between the various existing data models from a semantic basis to a physical, performance-relevant basis. We define a conceptual notion of a bitemporal relation where tuples are stamped with sets of two-dimensional chronons in transaction-time/valid-time space. Next, we describe three representation schemes: a tuple-timestamped 1NF representation, a backlog relation composed of 1NF timestamped change requests, and a non-1NF attribute value-timestamped representation. We further investigate several variants of these representations. We use snapshot equivalence to relate the representation data models with the conceptual bitemporal data model. We then consider querying within the two-level framework. To do so, we define first an algebra at the conceptual level. We proceed to map this algebra to the representation level in such a way that new operators compute equivalent results for different representations of the same conceptual bitemporal relation. This demonstrates that all of these representations are faithful to the semantics of the conceptual data model, with many choices available that may be exploited to gain improved performance.


1991 Top

Jensen, C. S., L. Mark, N. Roussopoulos, ,"Incremental Implementation Model for Relational Databases with Transaction Time" in IEEE Transactions on Knowledge and Data Engineering, Vol. 3, No. 4, pp. 461-473,, 1991

Publication

An implementation model for the standard relational data model extended with transaction time is presented. The implementation model integrates techniques of view materialization, differential computation, and deferred update into a coherent whole. It is capable of storing any view (reflecting past or present states) and subsequently using stored views as outsets for incremental and decremental computations of requested views, making it more flexible than previously proposed partitioned storage models. The working and the expressiveness of the model are demonstrated by sample queries that show how historical data are retrieved.


Jensen, C. S., R. Snodgrass, ,"Temporal Specialization and Generalization" in Technical Report R 91-45, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, August, 47 pages (also Technical Report 91-25, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA),, 1991

Publication

A standard relation is two-dimensional with attributes and tuples as dimensions. A temporal relation contains two additional, orthogonal time dimensions, namely valid time and transaction time. Valid time records when facts are true in the modeled reality, and transaction time records when facts are stored in the temporal relation. While, in general, there are no restrictions between the valid time and transaction time associated with each fact, in many practical applications the valid and transaction times exhibit more or less restricted interrelationships which define several types of specialized temporal relations. The paper examines five different areas where a variety of types of specialized temporal relations are present. In application systems with multiple, interconnected temporal relations, multiple time dimensions may be associated with facts as they flow from one temporal relation to another. For example, a fact may have associated multiple transaction times telling when it was stored in previous temporal relations. The paper investigates several aspects of the resulting generalized temporal relations, including the ability to query a predecessor relation from a successor relation. The presented framework for generalization and specialization allows researchers as well as database and system designers to precisely characterize, compare, and thus better understand temporal relations and the application systems in which they are embedded. The framework's comprehensiveness and its use in understanding temporal relations are demonstrated by placing previously proposed temporal data models within the framework. The practical relevance of the defined specializations and generalizations is illustrated by sample realistic applications in which they occur. The additional semantics of specialized relations are especially useful for improving the performance of query processing.


Jensen, C. S., R. Snodgrass, ,"Specialized Temporal Relations" in Technical Report R 91-26, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 27 pages,, 1991

Publication

In temporal specialization, the database designer restricts the relationship between the valid time-stamp (recording when something is true in the reality being modeled) and the transaction time-stamp (recording when a fact is stored in the database). An example is a retroactive temporal event relation, where the event must have occured before it was stored, i.e., the valid time-stamp is restricted to be less than the transaction time-stamp. We discuss some two dozen useful restrictions, defining as many specialized types of temporal relations and indicate some of their applications. We present a detailed taxonomy of specialized temporal relations. This taxonomy may be employed during database design to specify the particular time semantics of temporal relations. Additionally, the DBMS may exploit such a characterization to more efficiently store and access those temporal relations. Many previous research efforts that considered only one kind of time also apply to certain specialized temporal relations with both kinds of time. We classify a wide range of such efforts, identifying the particular specialization each concerns. Similarly, implementation approaches that assume only one kind of time often also apply to specialized temporal relations. We analyze the extent that each technique may be modified to work with temporal relations, thereby achieving improved performance when supporting such relations. An implication of this work is that much of previous and current research that heretofore has applied only to rollback or historical databases is also relevant to restricted forms of temporal databases.


1990 Top

Jensen, C. S., L. Mark, ,"Replication Gives High Performance Query Processing in Relational Models Extended with Transaction Time" in The First IEEE Workshop on the Management of Replicated Data, Houston, TX, 4 pages,, 1990

Publication



Jensen, C. S., L. Mark, ,"A Framework for Vacuuming Temporal Databases" in Technical Report CS-TR-2516, UMIACS-TR-90-105, Department of Computer Science, University of Maryland, College Park, MD 20742, 46 pages,, 1990

Publication

In conventional databases, the amount of data typically reaches a certain level and is then relatively stable. In databases supporting transaction time, old data are retained, and the amount of data is ever growing. Even with continued advances in mass storage technology, vacuuming (i.e., deletion or off-line storage of data) will eventually be necessary. At the same time, the fundamental principle, that history cannot be changed, of transaction time databases must be obeyed. This paper provides a framework for vacuuming subsystems for relational transaction time databases. Our main focus is to establish a foundation for correct and cooperative query processing through the modification of queries that cannot be processed due to vacuuming. In doing this, we provide language facilities for specifying vacuuming; we present three classifications of vacuuming specifications; and we define correctness criteria for vacuuming specifications. Based on the classifications, we provide a comprehensive set of rules for expressing modified queries. For some of the classes, modified queries can be expressed using relational algebra - for others, this is impossible, and an extended, tagged relational algebra is used instead. The framework is a useful tool for designers of specific vacuuming subsystems. The framework is presented in the context of a previously developed relational model with transaction time support, DM/T.


C. S. Jensen, L. Mark, N. Roussopoulos, T. Sellis, ,"Using Caching, Cache Indexing, and Differential Techniques to Efficiently Support Transaction Time" in Technical Report CS-TR-2413, UMIACS-TR-90-25, Department of Computer Science, University of Maryland, College Park, MD 20742, 28 pages,, 1990

Publication

We present a framework for query processing in the relational model extended with transaction time. The framework integrates standard query optimization and computation techniques with new differential computation techniques. Differential computation incrementally or decrementally computes a query from the cached and indexed results of previous computations. The use of differential computation techniques is essential in order to provide efficient processing of queries that access very large temporal relations. Alternative query plans are integrated into a state transition network, where the state space includes backlogs of base relations, cached results from previous computations, a cache index, and intermediate results; the transitions include standard relational algebra operators, operators for constructing differential files, operators for differential computation, and combined operators. A rule set is presented to prune away parts of state transition networks that are not promising, and dynamic programming techniques are used to identify the optimal plans from remaining state transition networks. An extended logical access path serves as a "structuring" index on the cached results and contains in addition vital statistics for the query optimization process, including statistics about base relations, backlogs, about previously computed and cached, previously computed, or just previously estimated queries.


1989 Top

C. S. Jensen, L. Mark, N. Roussopoulos, T. Sellis, ,"A Framework for Efficient Query Processing Using Caching, Cache Indexing, and Differential Techniques in the Relational Model Extended with Transaction Time" in Technical Report R-89-45, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark,, 1989

Publication

We present a framework for query processing in the relational model extended with transaction time. The framework integrates standard techniques for query optimization and computation with techniques for incremental and decremental, i.e., differential, computation from cached and indexed results of previous computation in order to provide efficient processing of queries on very large temporal relations. Alternative query plans are integrated into a state transition network, where the state space includes backlogs of base relations, cached results from previous computations, the cache index, and intermediate results; transitions include standard relational algebra operators, operators for constructing differential files, operators for differential computation, and combined operators. A rule set is presented to prune away parts that are not promising, and dynamic programming techniques are used to identify the optimal plan of the resulting state transition network. An extended logical access path serves as a "structuring" index on the cached results and contains in addition vital statistics for the query optimization process, including statistics about base relations, backlogs, about previously computed and cached, previously computed, or just previously estimated queries. The framework exploits eager, threshold triggered, and eager propagation of update to ensure consistency between base data and cached data. It integrates previously proposed approaches to supporting views, i.e., recomputation, storage of data snapshots, and storage of pointer structures, and it generalizes incremental computation techniques to differential computation techniques.


C. S. Jensen, L. Mark, ,"Queries on Change in an Extended Relational Model" in Technical Report CS-TR-2299, UMIACS-TR-89-80, Department of Computer Science, University of Maryland, College Park, MD 20742, 37 pages,, 1989

Publication

A data model is a means of modeling, communicating about, and managing part of reality. In our understanding one of the most fundamental characteristics of reality is change; whereas change is fundamental, stability is relative and temporary. Change is an often critical aspect of database systems applications; in many applications change itself and previous states are of interest. Change presupposes the concept of time. We provide a data model that allows for the storage of detailed historical data in so-called backlog relations. The query language extends the standard relational algebra to take advantage of the additional data. In particular, we introduce an operator Sigma based on the notion of compact active domain. This operator groups data, not in predefined groups, but in groups that "fit" the data. The expressive power of the operator is demonstrated by examples showing how patterns and exceptions in change history can be detected. Sample applications of this work are statistic and scientific databases, monitoring (of production systems, databases, power plants, etc.), CAD and CASE.


C. S. Jensen, L. Mark, N. Roussopoulos, ,"Incremental Implementation Model for Relational Databases with Transaction Time" in Technical Report CS-TR-2275, UMIACS-TR-8963, Department of Computer Science, University of Maryland, College Park, MD 20742, 28 pages,, 1989

Publication

The database literature contains numerous contributions to the understanding of time in relational database systems. In the past, the focus has been on data model issues and only recently has efficient implementation been addressed. We present an implementation model for the standard relational data model extended with transaction time. The implementation model exploits techniques of view materialization, incremental computation, and deferred update. It is more flexible than previously presented partitioned storage models. A new and interesting class of detailled queries on change behaviour of the database is supported.