Yongmin Li    PhD, MEng, BEng
Click on the links below or scroll down for details.
  1. Retinal Image Analysis
  2. Road Sign Recognition
  3. Foveated Ray Tracing
  4. Set-Membership Filtering
  5. Reconstructing Background of DNA Microarray Imagery
  6. Dynamic Face Models: Construction and Applications
  7. Semantic Video Analysis
  8. Incremental and Robust Subspace Learning
  9. Robust Panoramic Scene Construction from MPEG Video

  1. Retinal Image Analysis

  2. We perform the analysis of retinal images by detecting the eye structures such as the blood vessels and optic disc first. Then the retinal image is analyzed to detect the suspicious lesion regions if any. The segmentation of blood vessel is based on the Graph Cut technique. In opposed to the traditional formulation of the graph which is normally ineffective for long, thin structures like the blood vessel, we have included the flux vectors in the graph construction and achieved better results on a number of public datasets. We have addressed the problem of optic disc segmentation based on prior blood vessel segmentation. Two different methods have been developed and evaluated: a direct extension of the Graph Cut method with a compensation factor to eliminate the interference of blood vessels, and the background reconstruction on the blood vessels using Markov Random Field. Based on the results of blood vessels and optic disc, a further detection and classification of suspicious lesion areas in the retinal is performed.

    References:
  3. Road Sign Recognition

  4. Road sign recognition is of great interest in Intelligent Transportation and Automatic Vehicles. However, this problem is non-trivial owing to limitations such as low resolution of input video, poor lighting condition, cluttered background, and changing scales/views while vehicles are moving. A comprehensive approach to online detection and recognition of traffic signs is presented in this work. The process is comprised of the typical three stages of detection, tracking and recognition, as commonly used in many object recognition problems. At the detection stage, a quad-tree operation is performed first on the densities of sign-specific colour gradients in order to locate the regions of interest. A regular polygon detector or a boosted classifier cascade is then used to detect the possible signs. We have also developed a Confidence-Weighted Mean Shift algorithm to refine the often redundant detection results. For the recognition, we have developed and evaluated several different approaches, including the model-based method of class specific discriminative features and data-driven methods of SimBoost and similarity-learning kernel regression trees. We have also demonstrated that these methods can be used for other general object recognition.

    References:
  5. Forveated Real-Time Ray Tracing

  6. Ray tracing is capable of generating high degree of realistic images but suffers from real-time performance. We have presented an approach that significantly improves the real-time performance of ray tracing. This is done by combining foveated rendering based on eye tracking with reprojection rendering using previous frames in order to drastically reduce the number of new image samples per frame. To reproject samples a coarse geometry is reconstructed from a G-Buffer. Possible errors introduced by this reprojection as well as parts that are critical to the perception are scheduled for resampling. Additionally, a coarse colour buffer is used to provide an initial image, refined smoothly by more samples were needed.

    Evaluations and user tests show that our method achieves real-time frame rates, while visual differences compared to fully rendered images are hardly perceivable. This method is well-suited for wide-FOV Head-Mounted Displays with eye tracking and can be used for many Virtual Reality applications.

    References:
  7. Set-Membership Filtering, EPSRC EP/C007654/1

  8. In many real-world scenarios, tracking non-rigid objects (e.g. human faces) from video or image sequences, or in other words, estimating the ideal hidden state of a dynamic system with noisy visual observation, is a non-trivial task. Owing to the severe non-linearity from the intrinsic characteristics of the objects themselves, their dynamics, and the external complications from measurement environment, the traditional techniques (e.g. the Kalman Filter, the Extended Kalman Filter and the Unscented Kalman Filter) are inappropriate for this problem. Other methods such as Particle Filtering suffer from intensive computation and model degeneration. To overcome these limitations, we have developed methods of Set-Membership Filtering, where the state estimate is guaranteed within a set specified by an ellipsoid bound of the state vector. Recursive algorithms are also developed for fast computing.

    References:
  9. Reconstructing Background of DNA Microarray Imagery, EPSRC EP/C524586/1

  10. DNA microarray technology has enabled biologists to study all the genes within an entire organism to obtain a global view of gene interaction and regulation. However, the technology is still early in its development, and errors may be introduced at each of the main stages of the microarray process: spotting, hybridisation, and scanning. Consequently the microarray image data collected often contain errors and noise, which will then be propagated down through all later stages of processing and analysis. Therefore to realise the potential of such technology it is crucial to obtain high quality image data that would indeed reflect the underlying biology in the samples. If this is not achieved many of the subtle and low level gene expression genes, which are often of biological significance, will not be analysed. Although there is recently much research on how to detect and eliminate these variations and errors, the progress has been slow. We have initiated research to develop a novel way of processing microarray image data by reconstructing background noise of the microarray chip, and this has shown much early promise in extracting high quality cDNA image data. Instead of using the standard approach of correcting anomalies in the signal, we focus on estimating the noise as accurately as possible, to the extent that we almost ignore the signal until the last stage of processing. The proposed project brings together expertise from the disparate fields of image processing, data mining and molecular biology to make an interdisciplinary attempt in advancing the state of art in this important area. It is particularly timely since there is an urgent need to have image analysis software that can save both time and labour as well as provide high-quality image data.

    References:
  11. Dynamic Face Models: Construction and Applications

  12. A comprehensive framework for face detection, head pose estimation, tracking, and recognition is presented in this work. Statistical learning methods and, in particular, Support Vector Machines (SVMs) are applied to multi-view face detection and 3D head pose estimation. A dynamic multi-view face model is designed to extract the identity and geometrical information of moving faces from video inputs. Kernel Discriminant Analysis (KDA), a non-linear method to maximise the between-class variance and minimise the within-class variance, is proposed to represent face patterns. The facial identity structures across views and over time, referred to as Identity Surfaces, are constructed for face recognition.

    References:
  13. Semantic Video Analysis

  14. With the rapidly growing mass of video data from media services, Internet and home digital cameras/camcorders, automatic methods for semantic video analysis become essential for parsing, indexing, retrieval, summarisation of these data. This task can take various forms depending on the granularity of semantics and application scenarios, e.g. from video genre classification, scene segmentation, to specific event/person detection and behaviour analysis. The aim of this research is to develop a multi-layered framework for semantic analysis of raw video. (1) At the lowest level, the acoustic and visual features, e.g. the mel-frequency cepstral coefficients (MFCC), colour, texture, shape and motion, are integrated together, which provide a fundamental description of video content. (2) At the intermediate layer, the low-level audio/visual features are processed and integrated to form the so-called "atomic semantic features" which seek to represent the semantic concepts over a minimal temporal period. (3) The user-oriented semantics analysis is performed at a higher level using statistic models such as the Bayesian Network and the Hidden Markov Model.

    References:
  15. Incremental and Robust Subspace Learning

  16. Principal Component Analysis (PCA) has been of great interest in computer vision and pattern recognition. In particular, incrementally learning a PCA model, which is computationally efficient for large scale problems as well as adaptable to reflect the variable state of a dynamic system, is an attractive research topic with numerous applications such as adaptive background modelling and active object recognition. In addition, the conventional PCA, in the sense of least mean squared error minimisation, is susceptible to outlying measurements. Unfortunately the two issues have only been addressed separately in the previous studies. In this work, we have presented a novel algorithm of incremental PCA, and then extended it to robust PCA. In oppose to most previous studies where robust PCA is solved by intensive iterative algorithms, we use the current PCA model at each updating step to evaluate the likelihood of an element of a new observation to be an outlier, so that the robust analysis is efficiently embedded in the incremental updating framework. This is the key idea of our algorithm. Compared with the previous studies on robust PCA, our algorithm is computationally more efficient. We have applied this method to dynamic background modelling and multi-view face modelling, and obtained very encouraging results.

    References:
  17. Robust Panoramic Scene Construction from MPEG Video

  18. Coarse macroblock motion vectors can be extracted from MPEG video with a minimal decompression. With a reasonable MPEG encoder, most of motion vectors usually reflect the real motion in a video scene although they are coded for compression purpose. Based on this observation, we developed a method of image mosaicking from MPEG video in this work. The main idea is that global motion estimation from MPEG motion vectors can be formulated as a robust parameter estimation problem which treats the "good" motion vectors as inliers and "bad" ones outliers. The bi-directional motion information in B-frames provides multiple routes to warp a frame to its previous anchor frame. A Least Median of Squares based algorithm is adopted for robust motion estimation. In the case of a large proportion of outliers, we detect possible algorithm failure and then perform re-estimation along a different route or interpolate the transform from neighbouring frames. We also developed a simplified method for constructing static background panorama and dynamic foreground panorama.

    References:
    • Y. Li, L-Q. Xu, G. Morrison, C. Nightingale and J. Morphett. Robust panorama from MPEG video. In Proc. IEEE International Conference on Multimedia and Expo (ICME2003), Baltimore, USA, July 2003.