Robuste Bilderkennung mit lokalen linearen Abbildungen und elastischer Graphenanpassung

Hardt, Florian

Robuste Bilderkennung mit lokalen linearen Abbildungen und elastischer Graphenanpassung

Files

Prom.pdf (2.84 MB)

Date

2006

Authors

Hardt, Florian

Abstract

Die Elastische Graphenanpassung (Elastic Graph Matching, EGM) ist eine bekannte Methode der Objekterkennung. Die Etikettierung der Graphen mit biologisch motivierten Features führt zu einem Modell, welches invariant gegenüber Translationen und kleinen Störungen ist und hinreichend strukturreiche Objekte erkennen kann.

Mit elastischer Graphenanpassung wird jedem Knoten des Modellgraphen in einem iterativen Prozess mittels eines Referenzvektors ein Punkt des Bildgraphen zugewiesen. Damit ist das Verfahren eine niederdimensionale Version des Dynamic Link Matching, einem vollständig neuronalen Modell, welches zwei Muster über gewichtete all-to-all-Verbindungen aufeinander abbilden kann.

In dieser Arbeit wurden zwei grundlegende Schwächen des Elastic Graph Matching behoben: Zum einen wurde eine Vorverarbeitungsstufe eingeführt, die die Initialisierung eines anfänglichen Bildgraphen ohne Vorwissen über Orientierung, Größe, Position und Deformation des Objekts erlaubt. Zum anderen wurde das Verfahren so erweitert, dass die Beschreibung von konformen Deformationen, also lokal verschiedenen Rotationen und Streckungen, möglich wird.

Die Vorverarbeitung beseitigt die bisherige Notwendigkeit, die Bilddaten vor der Erkennung auf geeignete Art zu normieren. Der Ansatz geht davon aus, dass lokale Bildmerkmale existieren, die robust gegenüber einer Deformation sind und dass eine Zuordnung dieser Untermenge von Punkten eine gute Abschätzung für die Zuordnung aller Bildpunkte liefert.

Es konnte gezeigt werden, dass ein hoher Anteil von Ecken, die mittels eines auf End-Stopped Zellen basierenden Detektors gefundenen wurden, diesen Anforderungen entspricht. Für die Zuordnung dieser Ecken wurden die lokalen Merkmale auf verschieden Größenskalen miteinander verglichen und eine Eins-zu-Eins-Abbildung erzeugt. Fehlerhafte Zuordnungen (Outliers) wurden mit einem hier eingeführten Bildfluss- und Symmetriefilter ausgeschlossen. Dabei wurden lediglich die auch beim Elastic Graph Matching bzw. Dynamic Link Matching benutzen Prinzipien der Nachbarschaftserhaltung und Eigenschaftsähnlichkeit genutzt. Die Kombination beider Filterprozesse liefert eine Zuordnung relevanter Ecken mit sehr hoher Zuverlässigkeit. Bereits aus der Eckenzuordnung kann erkannt werden, ob das gesuchte Modellobjekt im Bildbereich enthalten ist.

Die Erweiterung der etikettierten Graphenanpassung auf Rotation, Streckung und konforme Abbildungen gelang zum einen durch eine modifizierte Graphenähnlichkeitsfunktion sowie eine entsprechend angepasste Suche im Parameterraum, zum anderen durch die Verwendung des Konzepts der Local Linear Maps (LLMs): Die modifizierte Graphenähnlichkeitsfunktion normiert die Beiträge gestreckter Kanten, zudem wird neben der Position der Knoten auch ein lokaler Streck- und Rotationsparameter variiert. Damit ist die Anpassung des Graphen an lokal verschiedene Deformationen möglich.

Die Bandbreite, innerhalb derer Streck- und Rotationsparameter an jedem Knoten variiert werden können, wird von den LLMs des betreffenden sowie der benachbarten Knoten vorgegeben, so dass die Deformationsunterschiede von Knoten zu Knoten stetig sind. Damit wird jeder Iterationsschritt der Graphenanpassung von der LLM kontrolliert, deren Werte sich wiederum aus der vorhergehenden Iteration ergeben.

Diese Vorgehensweise erwies sich als robust gegenüber Störungen und Teilverdeckungen von Objekten. Auf konsistente Weise kann nicht nur bestimmt werden, ob und wo ein gesuchtes Objekt dargestellt ist, sondern auch, welche Bereiche ggf. verdeckt werden. Das hier vorgestellte System ist darüber hinaus in der Lage, deformierte Objekte auch in einer komplexen Szene mit verschiedenen Objekten zu finden.

Die Lösung des Korrespondenzproblems gelingt durch die Approximation der Abbildungsfunktion f mit einer Anzahl von LLMs, deren Werte aus der Graphenanpassung gewonnen werden.

Dynamic Link Matching (DLM) is a well known model to construct a correspondence-based mapping between a template and an image. Local features of the image are extracted and matched onto similar parts of the template. Depending on the type of features used the model has a limited tolerance against deformations. To extend the model to stronger deformations the use of Local Linear Maps (LLM) was suggested. LLM approximate the local deformation of the image. A LLM therefore serves as a local estimator for the global transformation.This allows the deformation sensitive features to be modified according to the LLM, which makes the matching tolerant to all deformations within the range of the LLM employed.

Within this work it is assumed that the deformation of an image can be described by a continuous nonlinear map, e.g. the existence of a function f mapping the original image onto the deformed image is assumed. Solving the correspondence problem, i.e. finding the correct mapping between the two images corresponds to finding the mapping function f. This, however, is a hard problem since without knowledge of f it is difficult to decide which part of a deformed image corresponds to a given part of the template image.

The concept of a LLM is to approximate the global mapping f with a local linear approximation. For each image part, f is replaced with its linear approximation, therefore the LLM is composed out of the total number of hyperplanes tangential to f.

The strength of this approach lies in the fact that the number of degrees of freedom of the deformation is greatly reduced. It could be shown that with only four (position dependent) parameters (two stretchings and two rotations) every possible LLM is fully described. In this manner the complexity of LLMs can be classified according to the number and type of parameters required.

The LLM can then be used to modify non-invariant features required to establish a correct matching. This yields effectively distortion-invariant features.

Restricting the LLM to Conformal Mappings (being described by one rotation and an isotropic stretching) results in a mapping which is still quite general and allows for straightforward adaption of the features employed.

Image data is usually stored in a large array with each coefficient representing the light intensity at one specific image point (in the case of color images one coefficient is required per color channel). While useful for image display on a computer, this data structure is not adapted to a correspondence-based matching system. This is due to the fact that the properties of each data unit (here: one coefficient) are likely to be shared with a large number of other image points. Additionally, there is no (direct) correlation between the values of two neighbouring image points.

For a correspondence-based system striving to establish point-to-point connections between corresponding data units, each data unit must hold enough information to avoid ambiguities. It is also desirable that the properties of these data units are somewhat robust against changes in illumination.

For that reason the image data is preprocessed and organized in larger data units. These data units are modelled after the properties of biological neurons:

The early visual system of humans and animals is relatively well-understood compared to other parts of the brain. Ganglion cells in the retina and the lateral geniculate nucleus connect topologically to the primary visual cortex. These cells already enhance contrast and dynamically adapt to spacial and temporal correlations in the image data. Ganglion cells connect to Simple Cells.

The responses of Simple Cells in the Cortex can be modelled by a convolution of the image data with admissible Gabor functions, e.g. Gabor functions modified in such a manner that constant illumination does not stimulate the cell. Simple Cells come in two varieties and can be described as simple detectors for specifically oriented bars (symmetric cell) and edges (anti-symmetric cell). From the amplitudes of Simple Cells another cell type, the Complex Cell, can be constructed. Complex Cells respond to both bars and edges. Several Complex Cells connect to the same area of the retina but have different orientation and frequency preferences and therefore represent the image characteristics in a small area.

The responses of all Complex Cells centered on a given image point can be arranged in a vector called "jet". Jets represent the basic data unit to describe the image data in a way suited to tackle the correspondence problem, as the similarity between two jets (and therefore image points) can be described by their normed scalar product.

Additionally, the responses of Complex Cells can be used to model the behaviour of another important cell type, the End-Stopped Cell.

Like Complex Cells, End-Stopped Cells respond to bars and edges in the image. They also share the property of orientation selectivity, e.g. the bar or edge has to be orientated in a specific orientation determined by the preferences of the cell. Moving the bar perpendicularly to the preferred orientation does not alter the response of an End-Stopped Cell, provided the bar stays within the cell's receptive field.

However, the responses of End-Stopped Cells do not increase for longer bars. Instead, an End-Stopped Cell is only active when a properly oriented bar terminates within the receptive field. To activate a Single End-Stopped Cell it suffices if the bar terminates in one direction, for Double End-Stopped Cells the bar has to terminate on both sides. This behaviour can be modeled with the first and second derivative of a layer of Complex Cells.

In order to solve the correspondence-problem, i.e., to answer the question which points of two given images correspond to each other, if any, a two-fold strategy is employed. First, the image data is stored in data units containing enough information to minimize the number of points with identical local features. This was achieved by representing the data as jets. Secondly, solving the correspondence problem for a special subset of points will facilitate finding a solution for every point. The subset has to consist of points that are reliably detectable and are reasonably robust against image distortion. Corners fulfill these criteria.

The two types of End-Stopped Cells described above can be used to construct a robust corner detector: Several End-Stopped Cells sharing the same receptive field but responding to different orientations and frequencies are combined to describe the "angularity" of the point. False responses have to be suppressed by a surround inhibition process, reducing the signal of the corner detector along straight bars and edges. Local maxima of the corner detector responses are considered corners, if the response averaged over all frequency scales is greater than a threshold value.

Matching a subset of image points invariant with respect to the deformation employed can be used as a first step towards a complete mapping. It is demonstrated that corners are sufficiently robust against conformal mappings.

In order to match the corners detected in the original image onto the corners detected in the deformed image, some features invariant against the deformation have to be found. While for a Conformal Mapping (in continuous space) corners remain corners, the feature extracted at corresponding corners change. Therefore a strategy to account for those changes is developed: By modifying the Gabor filters according to a local approximation of the deformation employed, the feature changes due to that deformation are effectively nullified, rendering the feature jets invariant to distortion. The previously introduced LLMs serve to describe the local approximation.

The similarity of two jets can then be defined by the normed scalar product.

Optimal sampling parameters for stretchings are determined and a simple matching algorithm is used to connect corners with similar features. False connections (outliers) are removed by two filtering procedures based on symmetry and topological constraints. The amount of false positives is drastically reduced.

Based on possible discrepancies between linked corners, models not present in the distorted image can be excluded at an early stage.

Labeled graphs are a well-known method to represent an object for recognition purposes. A graph is composed of edges and nodes. Nodes are labeled with a jet and therefore correspond to an image point while the positions of one jet with regard to another is encoded by edges. Matching a model graph onto an image graph is equivalent to finding the mapping between two images. This standard model is extended to allow for local stretchings by normalizing the edge information with the local stretching.

A template graph is extracted by placing a regular grid on the salient parts of the template image. Nodes are labeled with jets extracted on these image points. A similarity function comparing two graphs is composed of a positive term totalling all jet similarities and a topology enforcing negative term totalling the length deviations of the edges.

The initial placement of the image graph is approximated by interpolating the translations of the linked corner pairs. In the same manner, local stretching and rotation parameters are estimated.

Standard labeled graph matching is extended to include scale and rotations into the search space: Position, scale and rotation of each jet are modified within a search window and the position with the highest graph similarity is accepted as the new node position in feature space. As a second step, the LLM is updated, taking the node pairs as new estimators of the image distortion. Depending on the quality of the initial placement, the graph and the LLM converge to the correct position within a few iterations. Nodes of the model graph are then linked to corresponding points in the distorted image to very good accuracy.

It could be shown that this method is reasonably robust to strong distortions and inaccurate initial graph placements, proving the modified labeled graph matching scheme an object recognition method well suited to the problem at hand.

Additionally, the system can detect and match model graphs to distorted objects in a complex scene. To this end, a model graph is extracted from an otherwise empty image, taking care to place nodes only on inner points of the object (i.e., jets from cells whose receptive fields do not extend beyond the boundary of the object) . This ensures that those jets are not affected by neighboring objects or background.

URI

http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-26282
http://elib.uni-stuttgart.de/handle/11682/4788
http://dx.doi.org/10.18419/opus-4771

Collections

08 Fakultät Mathematik und Physik

Full item page

Robuste Bilderkennung mit lokalen linearen Abbildungen und elastischer Graphenanpassung

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By