-
Notifications
You must be signed in to change notification settings - Fork 1
/
chapter1.Rnw
92 lines (68 loc) · 11.7 KB
/
chapter1.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
\chapter{Introduction}
\label{sec:intro}
\chaptermark{Introduction: Matched Data and Data Fusion}
\section{Data Settings}
It is a challenge to perform a tractable analysis on data obtained from disparate sources (such as multiple sensors). The increasing variety of sensor technologies and the large number of sensors introduce challenges but also hold promise for effective inference. One of our contributions is the development of well-defined simple settings that provide intuition about the right approaches to data fusion and lead to the development of inference methods that are useful in practice.
\begin{figure}
\centering
\includegraphics[scale=0.75]{gen-model-orig-proj.pdf}
\caption{Multiple Sensor setting \label{fig:gen-model}}
\end{figure}
Our world view of data fusion from multiple sensors is depicted in \autoref{fig:gen-model}. We refer to the entities of interest for pattern recognition as \emph{objects}. These might be real objects or abstract concepts. The data consist of measurements for a collection of these objects.
We assume that these objects lie in some ``object'' space $\Xi$ and that each sensor has another ``view'' of the objects. The measurements recorded by the $i^{th}$ sensor lie in some ``measurement space'' $\Xi_i$. The usual approach in pattern
recognition is to use feature extractors on the spaces for a feature representation in Euclidean space and to use classical pattern recognition tools for the exploitation task. The alternative approach is to acquire the dissimilarities between the group of objects and use them either to find an embedding in a low-dimensional Euclidean space for which classic statistical tools are available for inference or to use dissimilarity-based versions of pattern recognition tools~\cite{duin2005dissimilarity}.
We use the embedding approach so that we avoid the ``curse of dimensionality'' with the low embedding dimension, allowing us to still use classic statistical tools. Additionally, the embeddings of dissimilarities from different conditions need to be ``commensurate'' so that sensor measurements can be compared in a meaningful way (i.e., the degree of (dis)similarity can be inferred) or jointly used in inference. This is accomplished by maps $\rho_k,\onespace k=1,\ldots,K$ from measurement spaces $\Xi_k$ to a low-dimensional commensurate space $\mathcal{X}$, visualized in \autoref{fig:gen-model}. Learning these maps from data is an important part of our novel approach.
\label{sec:data}
\subsection{Exploitation Task\label{subsec:expl_task}}
Data fusion is a very general concept, and here, we will clarify the specific meaning of data fusion and the setting that we have in mind. The exploitation task in which we are interested might involve (perhaps notional) complex objects or abstract entities that are not practically representable. The objects are members of a (perhaps notional) space called ``object'' space, $\Xi$ in \autoref{fig:gen-model}. We will extract different ``views'', ``measurements'', or ``data modalities'' from these objects (which we will refer to as ``conditions''), and these observations will be elements of the measurement space for those conditions ($\Xi_k$ for $k^{th}$ condition). Each of the objects will have an observation in each of the different conditions, and the corresponding observations across different conditions will be ``matched''. Given new observations from these different conditions, is it possible
to determine whether they are ``matched''? If a group of observations from each condition are ``matched'' to each other but the specific correspondences are unknown, is it possible to find the true correspondences? Different approaches are proposed in this dissertation to address these questions.
\label{subsec:task}
\section{Dissimilarity representation\label{sec:dissim_repr}}
Significant progress been made in the theory and applications of pattern recognition, particularly in problem settings in which the data are available or assumed to be available as vectors in metric spaces. There are still many problems for which, due to the nature of the setting, one only has access to dissimilarities, proximities, or distances between measurements or a subjective assessment of the similarities of objects. While our approach depends naturally on the representation, the inference task is agnostic about this representation issue. The gap between the two kinds of representation of data can be bridged using various techniques, such as different kinds of embedding methods, and by computing dissimilarities between entities.
\cite{duin2005dissimilarity} is an excellent resource that compiles the research on learning from dissimilarity-based representation. In the introduction, the authors clarify the distinction between statistical and structural (syntactic) pattern recognition, which was discussed previously in \cite{NadlerSmith1993}. Statistical pattern recognition addresses the analysis of features, which are measured values for object attributes. Syntactic pattern recognition uses a relational view of objects for representation. In both cases, the task of discrimination can rely on distances (however they are defined). P{\k{e}}kalska and Duin suggest that dissimilarity measures are a natural bridge between these types of information, and their applicability to multiple settings motivates our use of dissimilarity representation in information fusion.
For feature-based representation, the features are either raw or processed measurements from sensors that observe the objects, and the representation of each object is a single point in the representation space, each dimension corresponding to a feature. Dissimilarity-based representation relies on a dissimilarity measure, a way of quantifying the dissimilarity, proximity, or similarity between any two objects. Preferably, the dissimilarity is \emph{designed for} the inference task at hand.
%\cite{Duin} argue this is a reason dissimilarities can be superior, since similarities can encode concepts of class membership directly in class discrimination problems.
There are multiple ways of comparing entities (some more natural than others), which is the basis for one of the arguments behind our approach to information fusion from disparate data sources, including separate sources of the same modality. When the data come from separate sensors that are of the same type, the same measurements might have different dissimilarity representations according to subjective judgments or different dissimilarity measures.
\section{Match Detection}
We will now provide a formal description of the problem that was mentioned in \autoref{subsec:expl_task}, which was the initial motivation for our investigations, along with a few general remarks. We will describe this problem in more detail in \autoref{chap:match_detection}.
Consider $n$ distinct objects, which are described with a finite number of measurements. Each of the measurements $x_{ik}$ lies in the corresponding space $\Xi_k$, and the measurements $x_{ik}$ are matched for the same $k$ index.
\[ \begin{array}{cccc}
& \Xi_1 & \cdots & \Xi_K\\
Object ~ 1 & \bm{x}_{11} & \sim \cdots \sim & \bm{x}_{1K} \\
\vdots & \vdots & \vdots & \vdots \\
%\text{Object} ~ i & \bm{x}_{i1} & \sim \cdots \sim & \bm{x}_{iK} \\
%\vdots & \vdots & \vdots & \vdots \\
Object ~ n & \bm{x}_{n1} & \sim \cdots \sim & \bm{x}_{nK}
\end{array}
\]
To each pair of measurements $x_{ik},x_{jk}$ in the same space, we can assign a dissimilarity value $\delta_{ijk}=\delta\{x_{ik},x_{jk}\}$, which is dependent on the space $\Xi_k$. We assume the dissimilarities are symmetric, are always non-negative and that they are positive and 0 according as the two arguments $x_{ik},x_{jk}$ are different or the same.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%NOTE
%
We exploit this training set of dissimilarities to perform inference on the following exploitation task:
Given the dissimilarities between $K$ new measurements/observations ($\bm{y}_{k};\, k \in [K]$) and the previous $n$ objects under $K$ conditions,
we test the null hypothesis that ``these measurements are from the same object'' against the alternative hypothesis that ``they are not from the same object''~\cite{JOFC}:
\[
\begin{array}{l}
%\hspace{-2em}
H_0: \bm{y}_{1} \sim \bm{y}_{2} \sim \cdots \sim \bm{y}_{K}
\text{ versus }
H_A: \exists i, j , 1\leq i < j \leq K :\bm{y}_{i} \nsim \bm{y}_{j}
\end{array}
\]
The null hypothesis can be restated as the case in which the dissimilarities are \emph{matched}, and the alternative can be restated as the case in which they are \emph{not matched}.
We represent the dissimilarities between $n$ objects in the form of $n \times n$ dissimilarity matrices $\{\Delta_k;k=1,\ldots,K\}$ where the entries for the $k^{th}$ dissimilarity matrix are $\{\delta^{(k)}_{ij} ; i=1,\ldots,n;\quad j=1,\ldots,n\}$. For the matching task, we are given $K$ vectors of new dissimilarities $\{\mathcal{D}_k,k=1 ,\ldots,K\}$ each of which has the entries $\{ \delta_{i,new}^{(k)}; i=1,\ldots, n;\quad k=1,\ldots,K\} $, where $\delta_{i,new}^{(k)}$ is the dissimilarity between $x_{ik}$ and $y_k$.
For the hypothesis testing problem, we are to compute the test statistics for the objects represented by the given dissimilarities. In order to compute the test statistics, it is necessary to obtain a collection of mappings (one from each condition) to a lower-dimensional space such that new observations from each condition are made commensurate when they are mapped to this space.
These mappings do not need to be explicitly defined; they can be the results of the embedding operation for a particular dataset. If the embedding of the in-sample dissimilarities ($\{\Delta_k;k=1,\ldots,K\}$) results in a unique mapping, out-of-sample (OOS) embeddings could be adjoined to the embedding of in-sample dissimilarities.
A few points should be mentioned to distinguish our approach from related approaches and emphasize the specific challenges of the inference task.
\begin{remark}
Because the data sources are ``disparate'', it is not immediately obvious how a dissimilarity between an object in one condition and another object in another condition can be computed or even meaningfully defined. In general, these between-\-condition, between-\-object similarities are not available.
\label{rem:between_cond_diss}
\end{remark}
\begin{remark}
Whether the data are collected in dissimilarity representations for each condition or whether dissimilarities are computed for the observations that are feature observations at each condition is not relevant to our exploitation task. We assume that dissimilarities for each condition are made available for inference purposes (perhaps by experts in the problem domain).
\end{remark}
\begin{remark}
The exploitation task under consideration is \emph{not} an accurate reconstruction of these feature observations, even if it does exist.
If the embeddings are considered good enough to be useful for the inference task, the quality of the embeddings are considered acceptable. Therefore, the quality of our representation will be dependent on the bias-variance tradeoff, where, by choosing a low-dimensional representation, we might be introducing more model bias, but the representation will be more robust with respect to noise, which might result in smaller errors in the inference task.
\end{remark}
We will use this inference problem to elucidate two concepts that we introduce in \autoref{chap:FidComm}. Our novel solution to this matching problem will use those concepts as two error criteria to be minimized. We seek the mappings from each condition to the common low-dimensional space that minimize these error criteria and are most appropriate for the inference task.