[1] arXiv:2006.10042 [pdf]
Learning to Detect 3D Reflection Symmetry for Single-View Reconstruction
3D reconstruction from a single RGB image is a challenging problem in computer vision. Previous methods are usually solely data-driven, which lead to inaccurate 3D shape recovery and limited generalization capability. In this work, we focus on object-level 3D reconstruction and present a geometry-based end-to-end deep learning framework that first detects the mirror plane of reflection symmetry that commonly exists in man-made objects and then predicts depth maps by finding the intra-image pixel-wise correspondence of the symmetry. Our method fully utilizes the geometric cues from symmetry during the test time by building plane-sweep cost volumes, a powerful tool that has been used in multi-view stereopsis. To our knowledge, this is the first work that uses the concept of cost volumes in the setting of single-image 3D reconstruction. We conduct extensive experiments on the ShapeNet dataset and find that our reconstruction method significantly outperforms the previous state-of-the-art single-view 3D reconstruction networks in term of the accuracy of camera poses and depth maps, without requiring objects being completely symmetric. Code is available at this https URL.
[2] arXiv:2006.10027 [pdf]
Deep Learning Meets SAR
Deep learning in remote sensing has become an international hype, but it is mostly limited to the evaluation of optical data. Although deep learning has been introduced in SAR data processing, despite successful first attempts, its huge potential remains locked. For example, to the best knowledge of the authors, there is no single example of deep learning in SAR that has been developed up to operational processing of big data or integrated into the production chain of any satellite mission. In this paper, we provide an introduction to the most relevant deep learning models and concepts, point out possible pitfalls by analyzing special characteristics of SAR data, review the state-of-the-art of deep learning applied to SAR in depth, summarize available benchmarks, and recommend some important future research directions. With this effort, we hope to stimulate more research in this interesting yet under-exploited research field.
[3] arXiv:2006.09902 [pdf]
Vision-Aided Dynamic Blockage Prediction for 6G Wireless Communication Networks
Unlocking the full potential of millimeter-wave and sub-terahertz wireless communication networks hinges on realizing unprecedented low-latency and high-reliability requirements. The challenge in meeting those requirements lies partly in the sensitivity of signals in the millimeter-wave and sub-terahertz frequency ranges to blockages. One promising way to tackle that challenge is to help a wireless network develop a sense of its surrounding using machine learning. This paper attempts to do that by utilizing deep learning and computer vision. It proposes a novel solution that proactively predicts \textit{dynamic} link blockages. More specifically, it develops a deep neural network architecture that learns from observed sequences of RGB images and beamforming vectors how to predict possible future link blockages. The proposed architecture is evaluated on a publicly available dataset that represents a synthetic dynamic communication scenario with multiple moving users and blockages. It scores a link-blockage prediction accuracy in the neighborhood of 86\%, a performance that is unlikely to be matched without utilizing visual data.
[4] arXiv:2006.09883 [pdf]
Near-Infrared Search for Fundamental-mode RR Lyrae Stars Toward the Inner Bulge by Deep Learning
Aiming to extend the census of RR Lyrae stars to highly reddened low-latitude regions of the central Milky Way, we performed a deep near-IR variability search using data from the VISTA Variables in the Vía Láctea (VVV) survey of the bulge, analyzing the photometric time series of over a hundred million point sources. In order to separate fundamental-mode RR Lyrae (RRab) stars from other periodically variable sources, we trained a deep bidirectional long short-term memory recurrent neural network (RNN) classifier using VVV survey data and catalogs of RRab stars discovered and classified by optical surveys. Our classifier attained a ~99% precision and recall for light curves with signal-to-noise ratio above 60, and is comparable to the best-performing classifiers trained on accurate optical data. Using our RNN classifier, we identified over 4300 hitherto unknown bona fide RRab stars toward the inner bulge. We provide their photometric catalog and VVV J,H,Ks photometric time-series.
[5] arXiv:2006.09853 [pdf]
Shallow Feature Based Dense Attention Network for Crowd Counting
While the performance of crowd counting via deep learning has been improved dramatically in the recent years, it remains an ingrained problem due to cluttered backgrounds and varying scales of people within an image. In this paper, we propose a Shallow feature based Dense Attention Network (SDANet) for crowd counting from still images, which diminishes the impact of backgrounds via involving a shallow feature based attention model, and meanwhile, captures multi-scale information via densely connecting hierarchical image features. Specifically, inspired by the observation that backgrounds and human crowds generally have noticeably different responses in shallow features, we decide to build our attention model upon shallow-feature maps, which results in accurate background-pixel detection. Moreover, considering that the most representative features of people across different scales can appear in different layers of a feature extraction network, to better keep them all, we propose to densely connect hierarchical image features of different layers and subsequently encode them for estimating crowd density. Experimental results on three benchmark datasets clearly demonstrate the superiority of SDANet when dealing with different scenarios. Particularly, on the challenging UCF CC 50 dataset, our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.
[6] arXiv:2006.09835 [pdf]
Wireless 3D Point Cloud Delivery Using Deep Graph Neural Networks
In typical point cloud delivery, a sender uses octree-based digital video compression to send three-dimensional (3D) points and color attributes over band-limited links. However, the digital-based schemes have an issue called the cliff effect, where the 3D reconstruction quality will be a step function in terms of wireless channel quality. To prevent the cliff effect subject to channel quality fluctuation, we have proposed soft point cloud delivery called HoloCast. Although the HoloCast realizes graceful quality improvement according to wireless channel quality, it requires large communication overheads. In this paper, we propose a novel scheme for soft point cloud delivery to simultaneously realize better quality and lower communication overheads. The proposed scheme introduces an end-to-end deep learning framework based on graph neural network (GNN) to reconstruct high-quality point clouds from its distorted observation under wireless fading channels. We demonstrate that the proposed GNN-based scheme can reconstruct clean 3D point cloud with low overheads by removing fading and noise effects.
[7] arXiv:2006.09772 [pdf]
Mitosis Detection Under Limited Annotation A Joint Learning Approach
Mitotic counting is a vital prognostic marker of tumor proliferation in breast cancer. Deep learning-based mitotic detection is on par with pathologists, but it requires large labeled data for training. We propose a deep classification framework for enhancing mitosis detection by leveraging class label information, via softmax loss, and spatial distribution information among samples, via distance metric learning. We also investigate strategies towards steadily providing informative samples to boost the learning. The efficacy of the proposed framework is established through evaluation on ICPR 2012 and AMIDA 2013 mitotic data. Our framework significantly improves the detection with small training data and achieves on par or superior performance compared to state-of-the-art methods for using the entire training data.
[8] arXiv:2006.09766 [pdf]
Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification
Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts.
[9] arXiv:2006.09747 [pdf]
Research and development of MolAICal for drug design via deep learning and classical programming
Deep learning methods have permeated into the research area of computer-aided drug design. The deep learning generative model and classical algorithm can be simultaneously used for three-dimensional (3D) drug design in the 3D pocket of the receptor. Here, three aspects of MolAICal are illustrated for drug design in the first part, the MolAICal uses the genetic algorithm, Vinardo score and deep learning generative model trained by generative adversarial net (GAN) for drug design. In the second part, the deep learning generative model is trained by drug-like molecules from the drug database such as ZINC database. The MolAICal invokes the deep learning generative model and molecular docking for drug virtual screening automatically. In the third part, the useful drug tools are added for calculating the relative properties such as Pan-assay interference compounds (PAINS), Lipinski's rule of five, synthetic accessibility (SA), and so on. Besides, the structural similarity search and quantitative structure-activity relationship (QSAR), etc are also embedded for the calculations of drug properties in the MolAICal. MolAICal will constantly optimize and develop the current and new modules for drug design. The MolAICal can help the scientists, pharmacists and biologists to design the rational 3D drugs in the receptor pocket through the deep learning model and classical programming. MolAICal is free of charge for any academic and educational purposes, and it can be downloaded from the website this https URL.
[10] arXiv:2006.09590 [pdf]
Deep Learning with Functional Inputs
We present a methodology for integrating functional data into deep densely connected feed-forward neural networks. The model is defined for scalar responses with multiple functional and scalar covariates. A by-product of the method is a set of dynamic functional weights that can be visualized during the optimization process. This visualization leads to greater interpretability of the relationship between the covariates and the response relative to conventional neural networks. The model is shown to perform well in a number of contexts including prediction of new data and recovery of the true underlying functional weights; these results were confirmed through real applications and simulation studies. A forthcoming R package is developed on top of a popular deep learning library (Keras) allowing for general use of the approach.
[11] arXiv:2006.09550 [pdf]
An encoder-decoder deep surrogate for reverse time migration in seismic imaging under uncertainty
Seismic imaging faces challenges due to the presence of several uncertainty sources. Uncertainties exist in data measurements, source positioning, and subsurface geophysical properties. Reverse time migration (RTM) is a high-resolution depth migration approach useful for extracting information such as reservoir localization and boundaries. RTM, however, is time-consuming and data-intensive as it requires computing twice the wave equation to generate and store an imaging condition. RTM, when embedded in an uncertainty quantification algorithm (like the Monte Carlo method), shows a many-fold increase in its computational complexity due to the high input-output dimensionality. In this work, we propose an encoder-decoder deep learning surrogate model for RTM under uncertainty. Inputs are an ensemble of velocity fields, expressing the uncertainty, and outputs the seismic images. We show by numerical experimentation that the surrogate model can reproduce the seismic images accurately, and, more importantly, the uncertainty propagation from the input velocity fields to the image ensemble.
[12] arXiv:2006.09545 [pdf]
Neural Optimal Control for Representation Learning
The intriguing connections recently established between neural networks and dynamical systems have invited deep learning researchers to tap into the well-explored principles of differential calculus. Notably, the adjoint sensitivity method used in neural ordinary differential equations (Neural ODEs) has cast the training of neural networks as a control problem in which neural modules operate as continuous-time homeomorphic transformations of features. Typically, these methods optimize a single set of parameters governing the dynamical system for the whole data set, forcing the network to learn complex transformations that are functionally limited and computationally heavy. Instead, we propose learning a data-conditioned distribution of \emph{optimal controls} over the network dynamics, emulating a form of input-dependent fast neural plasticity. We describe a general method for training such models as well as convergence proofs assuming mild hypotheses about the ODEs and show empirically that this method leads to simpler dynamics and reduces the computational cost of Neural ODEs. We evaluate this approach for unsupervised image representation learning; our new "functional" auto-encoding model with ODEs, AutoencODE, achieves state-of-the-art image reconstruction quality on CIFAR-10, and exhibits substantial improvements in unsupervised classification over existing auto-encoding models.
[13] arXiv:2006.09535 [pdf]
Multipole Graph Neural Operator for Parametric Partial Differential Equations
One of the main challenges in using deep learning-based methods for simulating physical systems and solving partial differential equations (PDEs) is formulating physics-based data in the desired structure for neural networks. Graph neural networks (GNNs) have gained popularity in this area since graphs offer a natural way of modeling particle interactions and provide a clear way of discretizing the continuum models. However, the graphs constructed for approximating such tasks usually ignore long-range interactions due to unfavorable scaling of the computational complexity with respect to the number of nodes. The errors due to these approximations scale with the discretization of the system, thereby not allowing for generalization under mesh-refinement. Inspired by the classical multipole methods, we propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Our multi-level formulation is equivalent to recursively adding inducing points to the kernel matrix, unifying GNNs with multi-resolution matrix factorization of the kernel. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
[14] arXiv:2006.09501 [pdf]
On the Inference of Soft Biometrics from Typing Patterns Collected in a Multi-device Environment
In this paper, we study the inference of gender, major/minor (computer science, non-computer science), typing style, age, and height from the typing patterns collected from 117 individuals in a multi-device environment. The inference of the first three identifiers was considered as classification tasks, while the rest as regression tasks. For classification tasks, we benchmark the performance of six classical machine learning (ML) and four deep learning (DL) classifiers. On the other hand, for regression tasks, we evaluated three ML and four DL-based regressors. The overall experiment consisted of two text-entry (free and fixed) and four device (Desktop, Tablet, Phone, and Combined) configurations. The best arrangements achieved accuracies of 96.15%, 93.02%, and 87.80% for typing style, gender, and major/minor, respectively, and mean absolute errors of 1.77 years and 2.65 inches for age and height, respectively. The results are promising considering the variety of application scenarios that we have listed in this work.
[15] arXiv:2006.09464 [pdf]
Visualization for Histopathology Images using Graph Convolutional Neural Networks
With the increase in the use of deep learning for computer-aided diagnosis in medical images, the criticism of the black-box nature of the deep learning models is also on the rise. The medical community needs interpretable models for both due diligence and advancing the understanding of disease and treatment mechanisms. In histology, in particular, while there is rich detail available at the cellular level and that of spatial relationships between cells, it is difficult to modify convolutional neural networks to point out the relevant visual features. We adopt an approach to model histology tissue as a graph of nuclei and develop a graph convolutional network framework based on attention mechanism and node occlusion for disease diagnosis. The proposed method highlights the relative contribution of each cell nucleus in the whole-slide image. Our visualization of such networks trained to distinguish between invasive and in-situ breast cancers, and Gleason 3 and 4 prostate cancers generate interpretable visual maps that correspond well with our understanding of the structures that are important to experts for their diagnosis.
[16] arXiv:2006.09410 [pdf]
Fast Correlated-Photon Imaging Enhanced by Deep Learning
Correlated photon pairs, carrying strong quantum correlations, have been harnessed to bring quantum advantages to various fields from biological imaging to range finding. Such inherent non-classical properties support extracting more valid signals to build photon-limited images even in low flux-level, where the shot noise becomes dominant as light source decreases to single-photon level. Optimization by numerical reconstruction algorithms is possible but require thousands of photon-sparse frames, thus unavailable in real time. Here, we present an experimental fast correlated-photon imaging enhanced by deep learning, showing an intelligent computational strategy to discover deeper structure in big data. Convolutional neural network is found being able to efficiently solve image inverse problems associated with strong shot noise and background noise (electronic noise, scattered light). Our results fill the key gap in incompatibility between imaging speed and image quality by pushing low-light imaging technique to the regime of real-time and single-photon level, opening up an avenue to deep leaning-enhanced quantum imaging for real-life applications.
[17] arXiv:2006.09310 [pdf]
Deep Multimodal Transfer-Learned Regression in Data-Poor Domains
In many real-world applications of deep learning, estimation of a target may rely on various types of input data modes, such as audio-video, image-text, etc. This task can be further complicated by a lack of sufficient data. Here we propose a Deep Multimodal Transfer-Learned Regressor (DMTL-R) for multimodal learning of image and feature data in a deep regression architecture effective at predicting target parameters in data-poor domains. Our model is capable of fine-tuning a given set of pre-trained CNN weights on a small amount of training image data, while simultaneously conditioning on feature information from a complimentary data mode during network training, yielding more accurate single-target or multi-target regression than can be achieved using the images or the features alone. We present results using phase-field simulation microstructure images with an accompanying set of physical features, using pre-trained weights from various well-known CNN architectures, which demonstrate the efficacy of the proposed multimodal approach.
[18] arXiv:2006.09276 [pdf]
How Secure is Distributed Convolutional Neural Network on IoT Edge Devices?
Convolutional Neural Networks (CNN) has found successful adoption in many applications. The deployment of CNN on resource-constrained edge devices have proved challenging. CNN distributed deployment across different edge devices has been adopted. In this paper, we propose Trojan attacks on CNN deployed across a distributed edge network across different nodes. We propose five stealthy attack scenarios for distributed CNN inference. These attacks are divided into trigger and payload circuitry. These attacks are tested on deep learning models (LeNet, AlexNet). The results show how the degree of vulnerability of individual layers and how critical they are to the final classification.
[19] arXiv:2006.09245 [pdf]
Deep learning approaches for fast radio signal prediction
The aim of this work is the prediction of power coverage in a dense urban environment given building and transmitter locations. Conventionally ray-tracing is regarded as the most accurate method to predict energy distribution patterns in the area in the presence of diverse radio propagation phenomena. However, ray-tracing simulations are time consuming and require extensive computational resources. We propose deep neural network models to learn from ray-tracing results and predict the power coverage dynamically from buildings and transmitter properties. The proposed UNET model with strided convolutions and inception modules provide highly accurate results that are close to the ray-tracing output on 32x32 frames. This model will allow practitioners to search for the best transmitter locations effectively and reduce the design time significantly.
[20] arXiv:2006.09238 [pdf]
Foreground-Background Imbalance Problem in Deep Object Detectors A Review
Recent years have witnessed the remarkable developments made by deep learning techniques for object detection, a fundamentally challenging problem of computer vision. Nevertheless, there are still difficulties in training accurate deep object detectors, one of which is owing to the foreground-background imbalance problem. In this paper, we survey the recent advances about the solutions to the imbalance problem. First, we analyze the characteristics of the imbalance problem in different kinds of deep detectors, including one-stage and two-stage ones. Second, we divide the existing solutions into two categories sampling heuristics and non-sampling schemes, and review them in detail. Third, we experimentally compare the performance of some state-of-the-art solutions on the COCO benchmark. Promising directions for future work are also discussed.
[21] arXiv:2006.09225 [pdf]
DSDANet Deep Siamese Domain Adaptation Convolutional Neural Network for Cross-domain Change Detection
Change detection (CD) is one of the most vital applications in remote sensing. Recently, deep learning has achieved promising performance in the CD task. However, the deep models are task-specific and CD data set bias often exists, hence it is inevitable that deep CD models would suffer degraded performance after transferring it from original CD data set to new ones, making manually label numerous samples in the new data set unavoidable, which costs a large amount of time and human labor. How to learn a transferable CD model in the data set with enough labeled data (original domain) but can well detect changes in another data set without labeled data (target domain)? This is defined as the cross-domain change detection problem. In this paper, we propose a novel deep siamese domain adaptation convolutional neural network (DSDANet) architecture for cross-domain CD. In DSDANet, a siamese convolutional neural network first extracts spatial-spectral features from multi-temporal images. Then, through multi-kernel maximum mean discrepancy (MK-MMD), the learned feature representation is embedded into a reproducing kernel Hilbert space (RKHS), in which the distribution of two domains can be explicitly matched. By optimizing the network parameters and kernel coefficients with the source labeled data and target unlabeled data, DSDANet can learn transferrable feature representation that can bridge the discrepancy between two domains. To the best of our knowledge, it is the first time that such a domain adaptation-based deep network is proposed for CD. The theoretical analysis and experimental results demonstrate the effectiveness and potential of the proposed method.
[22] arXiv:2006.09220 [pdf]
MS-TCN++ Multi-Stage Temporal Convolutional Network for Action Segmentation
With the success of deep learning in classifying short trimmed videos, more attention has been focused on temporally segmenting and classifying activities in long untrimmed videos. State-of-the-art approaches for action segmentation utilize several layers of temporal convolution and temporal pooling. Despite the capabilities of these approaches in capturing temporal dependencies, their predictions suffer from over-segmentation errors. In this paper, we propose a multi-stage architecture for the temporal action segmentation task that overcomes the limitations of the previous approaches. The first stage generates an initial prediction that is refined by the next ones. In each stage we stack several layers of dilated temporal convolutions covering a large receptive field with few parameters. While this architecture already performs well, lower layers still suffer from a small receptive field. To address this limitation, we propose a dual dilated layer that combines both large and small receptive fields. We further decouple the design of the first stage from the refining stages to address the different requirements of these stages. Extensive evaluation shows the effectiveness of the proposed model in capturing long-range dependencies and recognizing action segments. Our models achieve state-of-the-art results on three datasets 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset.
[23] arXiv:2006.09163 [pdf]
Mapping the Design Space of Photonic Topological States via Deep Learning
Topological states in photonics offer novel prospects for guiding and manipulating photons and facilitate the development of modern optical components for a variety of applications. Over the past few years, photonic topology physics has evolved and unveiled various unconventional optical properties in these topological materials, such as silicon photonic crystals. However, the design of such topological states still poses a significant challenge. Conventional optimization schemes often fail to capture their complex high dimensional design space. In this manuscript, we develop a deep learning framework to map the design space of topological states in the photonic crystals. This framework overcomes the limitations of existing deep learning implementations. Specifically, it reconciles the dimension mismatch between the input (topological properties) and output (design parameters) vector spaces and the non-uniqueness that arises from one-to-many function mappings. We use a fully connected deep neural network (DNN) architecture for the forward model and a cyclic convolutional neural network (cCNN)for the inverse model. The inverse architecture contains the pre-trained forward model in tandem, thereby reducing the prediction error significantly.
[24] arXiv:2006.09117 [pdf]
End-to-End Real-time Catheter Segmentation with Optical Flow-Guided Warping during Endovascular Intervention
Accurate real-time catheter segmentation is an important pre-requisite for robot-assisted endovascular intervention. Most of the existing learning-based methods for catheter segmentation and tracking are only trained on small-scale datasets or synthetic data due to the difficulties of ground-truth annotation. Furthermore, the temporal continuity in intraoperative imaging sequences is not fully utilised. In this paper, we present FW-Net, an end-to-end and real-time deep learning framework for endovascular intervention. The proposed FW-Net has three modules a segmentation network with encoder-decoder architecture, a flow network to extract optical flow information, and a novel flow-guided warping function to learn the frame-to-frame temporal continuity. We show that by effectively learning temporal continuity, the network can successfully segment and track the catheters in real-time sequences using only raw ground-truth for training. Detailed validation results confirm that our FW-Net outperforms state-of-the-art techniques while achieving real-time performance.
[25] arXiv:2006.09104 [pdf]
New Interpretations of Normalization Methods in Deep Learning
In recent years, a variety of normalization methods have been proposed to help train neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However, mathematical tools to analyze all these normalization methods are lacking. In this paper, we first propose a lemma to define some necessary tools. Then, we use these tools to make a deep analysis on popular normalization methods and obtain the following conclusions 1) Most of the normalization methods can be interpreted in a unified framework, namely normalizing pre-activations or weights onto a sphere; 2) Since most of the existing normalization methods are scaling invariant, we can conduct optimization on a sphere with scaling symmetry removed, which can help stabilize the training of network; 3) We prove that training with these normalization methods can make the norm of weights increase, which could cause adversarial vulnerability as it amplifies the attack. Finally, a series of experiments are conducted to verify these claims.
[26] arXiv:2006.09050 [pdf]
Multi-Objective CNN Based Algorithm for SAR Despeckling
Deep learning (DL) in remote sensing has nowadays became an effective operative tool it is largely used in applications such as change detection, image restoration, segmentation, detection and classification. With reference to synthetic aperture radar (SAR) domain the application of DL techniques is not straightforward due to non trivial interpretation of SAR images, specially caused by the presence of speckle. Several deep learning solutions for SAR despeckling have been proposed in the last few years. Most of these solutions focus on the definition of different network architectures with similar cost functions not involving SAR image properties. In this paper, a convolutional neural network (CNN) with a multi-objective cost function taking care of spatial and statistical properties of the SAR image is proposed. This is achieved by the definition of a peculiar loss function obtained by the weighted combination of three different terms. Each of this term is dedicated mainly to one of the following SAR image characteristics spatial details, speckle statistical properties and strong scatterers preservation. Their combination allows to balance these effects. Moreover, a specifically designed architecture is proposed for effectively extract distinctive features within the considered framework. Experiments on simulated and real SAR images show the accuracy of the proposed method compared to the State-of-Art despeckling algorithms, both from quantitative and qualitative point of view. The importance of considering such SAR properties in the cost function is crucial for a correct noise rejection and object preservation in different underlined scenarios, such as homogeneous, heterogeneous and extremely heterogeneous.
[27] arXiv:2006.09049 [pdf]
Multi-Precision Policy Enforced Training (MuPPET) A precision-switching strategy for quantised fixed-point training of CNNs
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from hours to weeks, limiting the productivity and experimentation of deep learning practitioners. As networks grow in size and complexity, training time can be reduced through low-precision data representations and computations. However, in doing so the final accuracy suffers due to the problem of vanishing gradients. Existing state-of-the-art methods combat this issue by means of a mixed-precision approach utilising two different precision levels, FP32 (32-bit floating-point) and FP16/FP8 (16-/8-bit floating-point), leveraging the hardware support of recent GPU architectures for FP16 operations to obtain performance gains. This work pushes the boundary of quantised training by employing a multilevel optimisation approach that utilises multiple precisions including low-precision fixed-point representations. The novel training strategy, MuPPET, combines the use of multiple number representation regimes together with a precision-switching mechanism that decides at run time the transition point between precision regimes. Overall, the proposed strategy tailors the training process to the hardware-level capabilities of the target hardware architecture and yields improvements in training time and energy efficiency compared to state-of-the-art approaches. Applying MuPPET on the training of AlexNet, ResNet18 and GoogLeNet on ImageNet (ILSVRC12) and targeting an NVIDIA Turing GPU, MuPPET achieves the same accuracy as standard full-precision training with training-time speedup of up to 1.84$\times$ and an average speedup of 1.58$\times$ across the networks.
[28] arXiv:2006.09034 [pdf]
Deep Learning based Segmentation of Fish in Noisy Forward Looking MBES Images
In this work, we investigate a Deep Learning (DL) approach to fish segmentation in a small dataset of noisy low-resolution images generated by a forward-looking multibeam echosounder (MBES). We build on recent advances in DL and Convolutional Neural Networks (CNNs) for semantic segmentation and demonstrate an end-to-end approach for a fish/non-fish probability prediction for all range-azimuth positions projected by an imaging sonar. We use self-collected datasets from the Danish Sound and the Faroe Islands to train and test our model and present techniques to obtain satisfying performance and generalization even with a low-volume dataset. We show that our model proves the desired performance and has learned to harness the importance of semantic context and take this into account to separate noise and non-targets from real targets. Furthermore, we present techniques to deploy models on low-cost embedded platforms to obtain higher performance fit for edge environments - where compute and power are restricted by size/cost - for testing and prototyping.
[29] arXiv:2006.08933 [pdf]
Plug-and-Play Anomaly Detection with Expectation Maximization Filtering
Anomaly detection in crowds enables early rescue response. A plug-and-play smart camera for crowd surveillance has numerous constraints different from typical anomaly detection the training data cannot be used iteratively; there are no training labels; and training and classification needs to be performed simultaneously. We tackle all these constraints with our approach in this paper. We propose a Core Anomaly-Detection (CAD) neural network which learns the motion behavior of objects in the scene with an unsupervised method. On average over standard datasets, CAD with a single epoch of training shows a percentage increase in Area Under the Curve (AUC) of 4.66% and 4.9% compared to the best results with convolutional autoencoders and convolutional LSTM-based methods, respectively. With a single epoch of training, our method improves the AUC by 8.03% compared to the convolutional LSTM-based approach. We also propose an Expectation Maximization filter which chooses samples for training the core anomaly-detection network. The overall framework improves the AUC compared to future frame prediction-based approach by 24.87% when crowd anomaly detection is performed on a video stream. We believe our work is the first step towards using deep learning methods with autonomous plug-and-play smart cameras for crowd anomaly detection.
[30] arXiv:2006.08925 [pdf]
Improving the Performance of Deep Learning for Wireless Localization
Indoor localization systems are most commonly based on Received Signal Strength Indicator (RSSI) measurements of either WiFi or Bluetooth-Low-Energy (BLE) beacons. In such systems, the two most common techniques are trilateration and fingerprinting, with the latter providing higher accuracy. In the fingerprinting technique, Deep Learning (DL) algorithms are often used to predict the location of the receiver based on the RSSI measurements of multiple beacons received at the receiver. In this paper, we address two practical issues with applying Deep Learning to wireless localization -- transfer of solution from one wireless environment to another \emph{and} small size of labelled data set. First, we apply automatic hyperparameter optimization to a deep neural network (DNN) system for indoor wireless localization, which makes the system easy to port to new wireless environments. Second, we show how to augment a typically small labelled data set using the unlabelled data set. We observed improved performance in DL by applying the two techniques. Additionally, all relevant code has been made freely available.
[31] arXiv:2006.08924 [pdf]
GCNs-Net A Graph Convolutional Neural Network Approach for Decoding Time-resolved EEG Motor Imagery Signals
Towards developing effective and efficient brain-computer interface (BCI) systems, precise decoding of brain activity measured by electroencephalogram (EEG), is highly demanded. Traditional works classify EEG signals without considering the topological relationship among electrodes. However, neuroscience research has increasingly emphasized network patterns of brain dynamics. Thus, the Euclidean structure of electrodes might not adequately reflect the interaction between signals. To fill the gap, a novel deep learning framework based on the graph convolutional neural networks (GCNs) was presented to enhance the decoding performance of raw EEG signals during different types of motor imagery (MI) tasks while cooperating with the functional topological relationship of electrodes. Based on the absolute Pearson's matrix of overall signals, the graph Laplacian of EEG electrodes was built up. The GCNs-Net constructed by graph convolutional layers learns the generalized features. The followed pooling layers reduce dimensionality, and the fully-connected softmax layer derives the final prediction. The introduced approach has been shown to converge for both personalized and group-wise predictions. It has achieved the highest averaged accuracy, 93.056% and 88.57% (PhysioNet Dataset), 96.24% and 80.89% (High Gamma Dataset), at the subject and group level, respectively, compared with existing studies, which suggests adaptability and robustness to individual variability. Moreover, the performance was stably reproducible among repetitive experiments for cross-validation. To conclude, the GCNs-Net filters EEG signals based on the functional topological relationship, which manages to decode relevant features for brain motor imagery.
[32] arXiv:2006.08903 [pdf]
Depth by Poking Learning to Estimate Depth from Self-Supervised Grasping
Accurate depth estimation remains an open problem for robotic manipulation; even state of the art techniques including structured light and LiDAR sensors fail on reflective or transparent surfaces. We address this problem by training a neural network model to estimate depth from RGB-D images, using labels from physical interactions between a robot and its environment. Our network predicts, for each pixel in an input image, the z position that a robot's end effector would reach if it attempted to grasp or poke at the corresponding position. Given an autonomous grasping policy, our approach is self-supervised as end effector position labels can be recovered through forward kinematics, without human annotation. Although gathering such physical interaction data is expensive, it is necessary for training and routine operation of state of the art manipulation systems. Therefore, this depth estimator comes for free'' while collecting data for other tasks (e.g., grasping, pushing, placing). We show our approach achieves significantly lower root mean squared error than traditional structured light sensors and unsupervised deep learning methods on difficult, industry-scale jumbled bin datasets.
[33] arXiv:2006.08896 [pdf]
Model-Driven DNN Decoder for Turbo Codes Design, Simulation and Experimental Results
This paper presents a novel model-driven deep learning (DL) architecture, called TurboNet, for turbo decoding that integrates DL into the traditional max-log-maximum a posteriori (MAP) algorithm. The TurboNet inherits the superiority of the max-log-MAP algorithm and DL tools and thus presents excellent error-correction capability with low training cost. To design the TurboNet, the original iterative structure is unfolded as deep neural network (DNN) decoding units, where trainable weights are introduced to the max-log-MAP algorithm and optimized through supervised learning. To efficiently train the TurboNet, a loss function is carefully designed to prevent tricky gradient vanishing issue. To further reduce the computational complexity and training cost of the TurboNet, we can prune it into TurboNet+. Compared with the existing black-box DL approaches, the TurboNet+ has considerable advantage in computational complexity and is conducive to significantly reducing the decoding overhead. Furthermore, we also present a simple training strategy to address the overfitting issue, which enable efficient training of the proposed TurboNet+. Simulation results demonstrate TurboNet+'s superiority in error-correction ability, signal-to-noise ratio generalization, and computational overhead. In addition, an experimental system is established for an over-the-air (OTA) test with the help of a 5G rapid prototyping system and demonstrates TurboNet's strong learning ability and great robustness to various scenarios.
[34] arXiv:2006.08885 [pdf]
DeepCapture Image Spam Detection Using Deep Learning and Data Augmentation
Image spam emails are often used to evade text-based spam filters that detect spam emails with their frequently used keywords. In this paper, we propose a new image spam email detection tool called DeepCapture using a convolutional neural network (CNN) model. There have been many efforts to detect image spam emails, but there is a significant performance degrade against entirely new and unseen image spam emails due to overfitting during the training phase. To address this challenging issue, we mainly focus on developing a more robust model to address the overfitting problem. Our key idea is to build a CNN-XGBoost framework consisting of eight layers only with a large number of training samples using data augmentation techniques tailored towards the image spam detection task. To show the feasibility of DeepCapture, we evaluate its performance with publicly available datasets consisting of 6,000 spam and 2,313 non-spam image samples. The experimental results show that DeepCapture is capable of achieving an F1-score of 88%, which has a 6% improvement over the best existing spam detection model CNN-SVM with an F1-score of 82%. Moreover, DeepCapture outperformed existing image spam detection solutions against new and unseen image datasets.
[35] arXiv:2006.08852 [pdf]
Counterexample-Guided Learning of Monotonic Neural Networks
The widespread adoption of deep learning is often attributed to its automatic feature construction with minimal inductive bias. However, in many real-world tasks, the learned function is intended to satisfy domain-specific constraints. We focus on monotonicity constraints, which are common and require that the function's output increases with increasing values of specific input features. We develop a counterexample-guided technique to provably enforce monotonicity constraints at prediction time. Additionally, we propose a technique to use monotonicity as an inductive bias for deep learning. It works by iteratively incorporating monotonicity counterexamples in the learning process. Contrary to prior work in monotonic learning, we target general ReLU neural networks and do not further restrict the hypothesis space. We have implemented these techniques in a tool called COMET. Experiments on real-world datasets demonstrate that our approach achieves state-of-the-art results compared to existing monotonic learners, and can improve the model quality compared to those that were trained without taking monotonicity constraints into account.
[36] arXiv:2006.08759 [pdf]
Semi-Streaming Architecture A New Design Paradigm for CNN Implementation on FPGAs
The recent research advances in deep learning have led to the development of small and powerful Convolutional Neural Network (CNN) architectures. Meanwhile Field Programmable Gate Arrays (FPGAs) has become a popular hardware target choice for their deployment, splitting into two main implementation categories streaming hardware architectures and single computation engine design approaches. The streaming hardware architectures generally require implementing every layer as a discrete processing unit, and are suitable for smaller software models that could fit in their unfolded versions into resource-constrained targets. On the other hand, single computation engines can be scaled to fit into a device to execute CNN models of different sizes and complexities, however, the achievable performance of one-size-fits-all implementations may vary across CNNs with different workload attributes leading to inefficient utilization of hardware resources. By combing the advantages of both of the above methods, this work proposes a new design paradigm called semi-streaming architecture, where layerspecialized configurable engines are used for network realization. As a proof of concept this paper presents a set of five layerspecialized configurable processing engines for implementing 8-bit quantized MobilenevV2 CNN model. The engines are chained to partially preserve data streaming and tuned individually to efficiently process specific types of layers normalized addition of residuals, depthwise, pointwise (expansion and projection), and standard 2D convolution layers capable of delivering 5.4GOp/s, 16GOp/s, 27.2GOp/s, 27.2GOp/s and 89.6GOp/s, respectively, with the overall energy efficiency of 5.32GOp/s/W at a 100MHz system clock, requiring total power of 6.2W on a XCZU7EV SoC FPGA.
[37] arXiv:2006.08742 [pdf]
Certifying Strategyproof Auction Networks
Optimal auctions maximize a seller's expected revenue subject to individual rationality and strategyproofness for the buyers. Myerson's seminal work in 1981 settled the case of auctioning a single item; however, subsequent decades of work have yielded little progress moving beyond a single item, leaving the design of revenue-maximizing auctions as a central open problem in the field of mechanism design. A recent thread of work in "differentiable economics" has used tools from modern deep learning to instead learn good mechanisms. We focus on the RegretNet architecture, which can represent auctions with arbitrary numbers of items and participants; it is trained to be empirically strategyproof, but the property is never exactly verified leaving potential loopholes for market participants to exploit. We propose ways to explicitly verify strategyproofness under a particular valuation profile using techniques from the neural network verification literature. Doing so requires making several modifications to the RegretNet architecture in order to represent it exactly in an integer program. We train our network and produce certificates in several settings, including settings for which the optimal strategyproof mechanism is not known.
[38] arXiv:2006.08696 [pdf]
Skin Segmentation from NIR Images using Unsupervised Domain Adaptation through Generative Latent Search
Segmentation of the pixels corresponding to human skin is an essential first step in multiple applications ranging from surveillance to heart-rate estimation from remote-photoplethysmography. However, the existing literature considers the problem only in the visible-range of the EM-spectrum which limits their utility in low or no light settings where the criticality of the application is higher. To alleviate this problem, we consider the problem of skin segmentation from the Near-infrared images. However, Deep learning based state-of-the-art segmentation techniques demands large amounts of labelled data that is unavailable for the current problem. Therefore we cast the skin segmentation problem as that of target-independent unsupervised domain adaptation (UDA) where we use the data from the Red-channel of the visible-range to develop skin segmentation algorithm on NIR images. We propose a method for target-independent segmentation where the 'nearest-clone' of a target image in the source domain is searched and used as a proxy in the segmentation network trained only on the source domain. We prove the existence of 'nearest-clone' and propose a method to find it through an optimization algorithm over the latent space of a Deep generative model based on variational inference. We demonstrate the efficacy of the proposed method for NIR skin segmentation over the state-of-the-art UDA segmenation methods on the two newly created skin segmentation datasets in NIR domain despite not having access to the target NIR data.
[39] arXiv:2006.08658 [pdf]
ESL Entropy-guided Self-supervised Learning for Domain Adaptation in Semantic Segmentation
While fully-supervised deep learning yields good models for urban scene semantic segmentation, these models struggle to generalize to new environments with different lighting or weather conditions for instance. In addition, producing the extensive pixel-level annotations that the task requires comes at a great cost. Unsupervised domain adaptation (UDA) is one approach that tries to address these issues in order to make such systems more scalable. In particular, self-supervised learning (SSL) has recently become an effective strategy for UDA in semantic segmentation. At the core of such methods lies `pseudo-labeling', that is, the practice of assigning high-confident class predictions as pseudo-labels, subsequently used as true labels, for target data. To collect pseudo-labels, previous works often rely on the highest softmax score, which we here argue as an unfavorable confidence measurement. In this work, we propose Entropy-guided Self-supervised Learning (ESL), leveraging entropy as the confidence indicator for producing more accurate pseudo-labels. On different UDA benchmarks, ESL consistently outperforms strong SSL baselines and achieves state-of-the-art results.
[40] arXiv:2006.08656 [pdf]
Multiscale Deep Equilibrium Models
We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ), suited to large-scale and highly hierarchical pattern recognition domains. An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously, using implicit differentiation to avoid storing intermediate states (and thus requiring only O(1) memory consumption). These simultaneously-learned multi-resolution features allow us to train a single model on a diverse set of tasks and loss functions, such as using a single MDEQ to perform both image classification and semantic segmentation. We illustrate the effectiveness of this approach on two large-scale vision tasks ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset. In both settings, MDEQs are able to match or exceed the performance of recent competitive computer vision models the first time such performance and scale have been achieved by an implicit deep learning approach. The code and pre-trained models are at this https URL .
[41] arXiv:2006.08643 [pdf]
On the training dynamics of deep networks with $L_2$ regularization
We study the role of $L_2$ regularization in deep learning, and uncover simple relations between the performance of the model, the $L_2$ coefficient, the learning rate, and the number of training steps. These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks. We derive the gradient flow dynamics of such networks, and compare the role of $L_2$ regularization in this context with that of linear models.
[42] arXiv:2006.08610 [pdf]
Autofocusing technologies for whole slide imaging and automated microscopy
Whole slide imaging (WSI) has moved digital pathology closer to diagnostic practice in recent years. Due to the inherent tissue topography variability, accurate autofocusing remains a critical challenge for WSI and automated microscopy systems. Traditional focus map surveying method is limited in its ability to acquire a high degree of focus points while still maintaining high throughput. Real-time approaches decouple image acquisition from focusing, thus allowing for rapid scanning while maintaining continuous accurate focus. This work reviews the traditional focus map approach and discusses the choice of focus measure for focal plane determination. It will also discuss various real-time autofocusing approaches including reflective-based triangulation, confocal pinhole detection, low-coherence interferometry, tilted sensor approach, independent dual sensor scanning, beam splitter array, phase detection, dual-LED illumination, and deep-learning approaches. The technical concepts, merits, and limitations of these methods are explained and compared to that of a traditional WSI system. This review may provide new insights for the development of high-throughput automated microscopy imaging systems that can be made broadly available and utilizable without loss of capacity.
[43] arXiv:2006.08601 [pdf]
Explaining Local, Global, And Higher-Order Interactions In Deep Learning
We present a simple yet highly generalizable method for explaining interacting parts within a neural network's reasoning process. In this work, we consider local, global, and higher-order statistical interactions. Generally speaking, local interactions occur between features within individual datapoints, while global interactions come in the form of universal features across the whole dataset. With deep learning, combined with some heuristics for tractability, we achieve state of the art measurement of global statistical interaction effects, including at higher orders (3-way interactions or more). We generalize this to the multidimensional setting to explain local interactions in multi-object detection and relational reasoning using the COCO annotated-image and Sort-Of-CLEVR toy datasets respectively. Here, we submit a new task for testing feature vector interactions, conduct a human study, propose a novel metric for relational reasoning, and use our interaction interpretations to innovate a more effective Relation Network. Finally, we apply these techniques on a real-world biomedical dataset to discover the higher-order interactions underlying Parkinson's disease clinical progression. Code for all experiments, fully reproducible, is available at this https URL.
[44] arXiv:2006.08600 [pdf]
Temporal Phenotyping using Deep Predictive Clustering of Disease Progression
Due to the wider availability of modern electronic health records, patient care data is often being stored in the form of time-series. Clustering such time-series data is crucial for patient phenotyping, anticipating patients' prognoses by identifying "similar" patients, and designing treatment guidelines that are tailored to homogeneous patient subgroups. In this paper, we develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest (e.g., adverse events, the onset of comorbidities). To encourage each cluster to have homogeneous future outcomes, the clustering is carried out by learning discrete representations that best describe the future outcome distribution based on novel loss functions. Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks and identifies meaningful clusters that can be translated into actionable information for clinical decision-making.
[45] arXiv:2006.08564 [pdf]
Post-Hoc Methods for Debiasing Neural Networks
As deep learning models become tasked with more and more decisions that impact human lives, such as hiring, criminal recidivism, and loan repayment, bias is becoming a growing concern. This has led to dozens of definitions of fairness and numerous algorithmic techniques to improve the fairness of neural networks. Most debiasing algorithms require retraining a neural network from scratch, however, this is not feasible in many applications, especially when the model takes days to train or when the full training dataset is no longer available. In this work, we present a study on post-hoc methods for debiasing neural networks. First we study the nature of the problem, showing that the difficulty of post-hoc debiasing is highly dependent on the initial conditions of the original model. Then we define three new fine-tuning techniques random perturbation, layer-wise optimization, and adversarial fine-tuning. All three techniques work for any group fairness constraint. We give a comparison with six algorithms - three popular post-processing debiasing algorithms and our three proposed methods - across three datasets and three popular bias measures. We show that no post-hoc debiasing technique dominates all others, and we identify settings in which each algorithm performs the best. Our code is available at this https URL.
[46] arXiv:2006.08554 [pdf]
Now that I can see, I can improve Enabling data-driven finetuning of CNNs on the edge
In today's world, a vast amount of data is being generated by edge devices that can be used as valuable training data to improve the performance of machine learning algorithms in terms of the achieved accuracy or to reduce the compute requirements of the model. However, due to user data privacy concerns as well as storage and communication bandwidth limitations, this data cannot be moved from the device to the data centre for further improvement of the model and subsequent deployment. As such there is a need for increased edge intelligence, where the deployed models can be fine-tuned on the edge, leading to improved accuracy and/or reducing the model's workload as well as its memory and power footprint. In the case of Convolutional Neural Networks (CNNs), both the weights of the network as well as its topology can be tuned to adapt to the data that it processes. This paper provides a first step towards enabling CNN finetuning on an edge device based on structured pruning. It explores the performance gains and costs of doing so and presents an extensible open-source framework that allows the deployment of such approaches on a wide range of network architectures and devices. The results show that on average, data-aware pruning with retraining can provide 10.2pp increased accuracy over a wide range of subsets, networks and pruning levels with a maximum improvement of 42.0pp over pruning and retraining in a manner agnostic to the data being processed by the network.
[47] arXiv:2006.08521 [pdf]
Go-CaRD -- Generic, Optical Car Part Recognition and Detection Collection, Insights, and Applications
Systems for the automatic recognition and detection of automotive parts are crucial in several emerging research areas in the development of intelligent vehicles. They enable, for example, the detection and modelling of interactions between human and the vehicle. In this paper, we present three suitable datasets as well as quantitatively and qualitatively explore the efficacy of state-of-the-art deep learning architectures for the localisation of 29 interior and exterior vehicle regions, independent of brand, model, and environment. A ResNet50 model achieved an F1 score of 93.67 % for recognition, while our best Darknet model achieved an mAP of 58.20 % for detection. We also experiment with joint and transfer learning approaches and point out potential applications of our systems.
[48] arXiv:2006.08517 [pdf]
The Limit of the Batch Size
Large-batch training is an efficient approach for current distributed deep learning systems. It has enabled researchers to reduce the ImageNet/ResNet-50 training from 29 hours to around 1 minute. In this paper, we focus on studying the limit of the batch size. We think it may provide a guidance to AI supercomputer and algorithm designers. We provide detailed numerical optimization instructions for step-by-step comparison. Moreover, it is important to understand the generalization and optimization performance of huge batch training. Hoffer et al. introduced "ultra-slow diffusion" theory to large-batch training. However, our experiments show contradictory results with the conclusion of Hoffer et al. We provide comprehensive experimental results and detailed analysis to study the limitations of batch size scaling and "ultra-slow diffusion" theory. For the first time we scale the batch size on ImageNet to at least a magnitude larger than all previous work, and provide detailed studies on the performance of many state-of-the-art optimization schemes under this setting. We propose an optimization recipe that is able to improve the top-1 test accuracy by 18% compared to the baseline.
[49] arXiv:2006.08509 [pdf]
APQ Joint Search for Network Architecture, Pruning and Quantization Policy
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy predictor requires collecting a large number of quantized pairs, which involves quantization-aware finetuning and thus is highly time-consuming. To tackle this challenge, we propose to transfer the knowledge from a full-precision (i.e., fp32) accuracy predictor to the quantization-aware (i.e., int8) accuracy predictor, which greatly improves the sample efficiency. Besides, collecting the dataset for the fp32 accuracy predictor only requires to evaluate neural networks without any training cost by sampling from a pretrained once-for-all network, which is highly efficient. Extensive experiments on ImageNet demonstrate the benefits of our joint optimization approach. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ. Compared to the separate optimization approach (ProxylessNAS+AMC+HAQ), APQ achieves 2.3% higher ImageNet accuracy while reducing orders of magnitude GPU hours and CO2 emission, pushing the frontier for green AI that is environmental-friendly. The code and video are publicly available.
[50] arXiv:2006.08495 [pdf]
Weighted Optimization better generalization by smoother interpolation
We provide a rigorous analysis of how implicit bias towards smooth interpolations leads to low generalization error in the overparameterized setting. We provide the first case study of this connection through a random Fourier series model and weighted least squares. We then argue through this model and numerical experiments that normalization methods in deep learning such as weight normalization improve generalization in overparameterized neural networks by implicitly encouraging smooth interpolants.
[51] arXiv:2006.08486 [pdf]
Scientometric analysis and knowledge mapping of literature-based discovery (1986-2020)
Literature-based discovery (LBD) aims to discover valuable latent relationships between disparate sets of literatures. LBD research has undergone an evolution from being an emerging area to a mature research field. Hence it is timely and necessary to summarize the LBD literature and scrutinize general bibliographic characteristics, current and future publication trends, and its intellectual structure. This paper presents the first inclusive scientometric overview of LBD research. We utilize a comprehensive scientometric approach incorporating CiteSpace to systematically analyze the literature on LBD from the last four decades (1986-2020). After manual cleaning, we have retrieved a total of 409 documents from six bibliographic databases (Web of Science, Scopus, PubMed, IEEE Xplore, ACM Digital Library, and Springer Link) and two preprint servers (ArXiv and BiorXiv). The results have shown that Thomas C. Rindflesch published the highest number of LBD papers, followed by Don R. Swanson. The United States plays a leading role in LBD research with the University of Chicago as the dominant institution. To go deeper, we also perform science mapping including cascading citation expansion. The knowledge base of LBD research has changed significantly since its inception, with emerging topics including deep learning and explainable artificial intelligence. The results have indicated that LBD is still growing and evolving. Drawing on our insights, we now better understand the historical progress of LBD in the last 35 years and are able to improve publishing practices to contribute to the field in the future.
[52] arXiv:2006.08472 [pdf]
Physics informed deep learning for computational elastodynamics without labeled data
Numerical methods such as finite element have been flourishing in the past decades for modeling solid mechanics problems via solving governing partial differential equations (PDEs). A salient aspect that distinguishes these numerical methods is how they approximate the physical fields of interest. Physics-informed deep learning is a novel approach recently developed for modeling PDE solutions and shows promise to solve computational mechanics problems without using any labeled data. The philosophy behind it is to approximate the quantity of interest (e.g., PDE solution variables) by a deep neural network (DNN) and embed the physical law to regularize the network. To this end, training the network is equivalent to minimization of a well-designed loss function that contains the PDE residuals and initial/boundary conditions (I/BCs). In this paper, we present a physics-informed neural network (PINN) with mixed-variable output to model elastodynamics problems without resort to labeled data, in which the I/BCs are hardly imposed. In particular, both the displacement and stress components are taken as the DNN output, inspired by the hybrid finite element analysis, which largely improves the accuracy and trainability of the network. Since the conventional PINN framework augments all the residual loss components in a "soft" manner with Lagrange multipliers, the weakly imposed I/BCs cannot not be well satisfied especially when complex I/BCs are present. To overcome this issue, a composite scheme of DNNs is established based on multiple single DNNs such that the I/BCs can be satisfied forcibly in a "hard" manner. The propose PINN framework is demonstrated on several numerical elasticity examples with different I/BCs, including both static and dynamic problems as well as wave propagation in truncated domains. Results show the promise of PINN in the context of computational mechanics applications.
[53] arXiv:2006.08437 [pdf]
Depth Uncertainty in Neural Networks
Existing methods for estimating uncertainty in deep learning tend to require multiple forward passes, making them unsuitable for applications where computational resources are limited. To solve this, we perform probabilistic reasoning over the depth of neural networks. Different depths correspond to subnetworks which share weights and whose predictions are combined via marginalisation, yielding model uncertainty. By exploiting the sequential structure of feed-forward networks, we are able to both evaluate our training objective and make predictions with a single forward pass. We validate our approach on real-world regression and image classification tasks. Our approach provides uncertainty calibration, robustness to dataset shift, and accuracies competitive with more computationally expensive baselines.
[54] arXiv:2006.08391 [pdf]
Fast & Accurate Method for Bounding the Singular Values of Convolutional Layers with Application to Lipschitz Regularization
This paper tackles the problem of Lipschitz regularization of Convolutional Neural Networks. Lipschitz regularity is now established as a key property of modern deep learning with implications in training stability, generalization, robustness against adversarial examples, etc. However, computing the exact value of the Lipschitz constant of a neural network is known to be NP-hard. Recent attempts from the literature introduce upper bounds to approximate this constant that are either efficient but loose or accurate but computationally expensive. In this work, by leveraging the theory of Toeplitz matrices, we introduce a new upper bound for convolutional layers that is both tight and easy to compute. Based on this result we devise an algorithm to train Lipschitz regularized Convolutional Neural Networks.
[55] arXiv:2006.08367 [pdf]
Tamil Vowel Recognition With Augmented MNIST-like Data Set
We report generation of a MNIST [4] compatible data set [1] for Tamil vowels to enable building a classification DNN or other such ML/AI deep learning [2] models for Tamil OCR/Handwriting applications. We report the capability of the 60,000 grayscale, 28x28 pixel dataset to build a 92% accuracy (training) and 82% cross-validation 4-layer CNN, with 100,000+ parameters, in TensorFlow. We also report a top-1 classification accuracy of 70% and top-2 classification accuracy of 92% on handwritten vowels showing, for the same network.
[56] arXiv:2006.08357 [pdf]
CoDeNet Algorithm-hardware Co-design for Deformable Convolution
Deploying deep learning models on embedded systems for computer vision tasks has been challenging due to limited compute resources and strict energy budgets. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and therefore, require specialized convolutions to aggregate spatial information. To address this, recent work proposes dynamic deformable convolution to augment regular convolutions. Regular convolutions process a fixed grid of pixels across all the spatial locations in an image, while dynamic deformable convolution may access arbitrary pixels in the image and the access pattern is input-dependent and varies per spatial location. These properties lead to inefficient memory accesses of inputs with existing hardware. In this work, we first investigate the overhead of the deformable convolution on embedded FPGA SoCs, and introduce a depthwise deformable convolution to reduce the total number of operations required. We then show the speed-accuracy tradeoffs for a set of algorithm modifications including irregular-access versus limited-range and fixed-shape. We evaluate these algorithmic changes with corresponding hardware optimizations. Results show a 1.36x and 9.76x speedup respectively for the full and depthwise deformable convolution on the embedded FPGA accelerator with minor accuracy loss on the object detection task. We then co-design an efficient network CoDeNet with the modified deformable convolution for object detection and quantize the network to 4-bit weights and 8-bit activations. Results show that our designs lie on the pareto-optimal front of the latency-accuracy tradeoff for the object detection task on embedded FPGAs
[57] arXiv:2006.08334 [pdf]
StackOverflow vs Kaggle A Study of Developer Discussions About Data Science
Software developers are increasingly required to understand fundamental Data science (DS) concepts. Recently, the presence of machine learning (ML) and deep learning (DL) has dramatically increased in the development of user applications, whether they are leveraged through frameworks or implemented from scratch. These topics attract much discussion on online platforms. This paper conducts large-scale qualitative and quantitative experiments to study the characteristics of 197836 posts from StackOverflow and Kaggle. Latent Dirichlet Allocation topic modelling is used to extract twenty-four DS discussion topics. The main findings include that TensorFlow-related topics were most prevalent in StackOverflow, while meta discussion topics were the prevalent ones on Kaggle. StackOverflow tends to include lower-level troubleshooting, while Kaggle focuses on practicality and optimising leaderboard performance. In addition, across both communities, DS discussion is increasing at a dramatic rate. While TensorFlow discussion on StackOverflow is slowing, interest in Keras is rising. Finally, ensemble algorithms are the most mentioned ML/DL algorithms in Kaggle but are rarely discussed on StackOverflow. These findings can help educators and researchers to more effectively tailor and prioritise efforts in researching and communicating DS concepts towards different developer communities.
[58] arXiv:2006.08296 [pdf]
Deep-CAPTCHA a deep learning based CAPTCHA solver for vulnerability assessment
[59] arXiv:2006.08254 [pdf]
Dermatologist vs Neural Network
Cancer, in general, is very deadly. Timely treatment of any cancer is the key to saving a life. Skin cancer is no exception. There have been thousands of Skin Cancer cases registered per year all over the world. There have been 123,000 deadly melanoma cases detected in a single year. This huge number is proven to be a cause of a high amount of UV rays present in the sunlight due to the degradation of the Ozone layer. If not detected at an early stage, skin cancer can lead to the death of the patient. Unavailability of proper resources such as expert dermatologists, state of the art testing facilities, and quick biopsy results have led researchers to develop a technology that can solve the above problem. Deep Learning is one such method that has offered extraordinary results. The Convolutional Neural Network proposed in this study out performs every pretrained models. We trained our model on the HAM10000 dataset which offers 10015 images belonging to 7 classes of skin disease. The model we proposed gave an accuracy of 89%. This model can predict deadly melanoma skin cancer with a great accuracy. Hopefully, this study can help save people's life where there is the unavailability of proper dermatological resources by bridging the gap using our proposed study.
[60] arXiv:2006.08228 [pdf]
Finding trainable sparse networks through Neural Tangent Transfer
Deep neural networks have dramatically transformed machine learning, but their memory and energy demands are substantial. The requirements of real biological neural networks are rather modest in comparison, and one feature that might underlie this austerity is their sparse connectivity. In deep learning, trainable sparse networks that perform well on a specific task are usually constructed using label-dependent pruning criteria. In this article, we introduce Neural Tangent Transfer, a method that instead finds trainable sparse networks in a label-free manner. Specifically, we find sparse networks whose training dynamics, as characterized by the neural tangent kernel, mimic those of dense networks in function space. Finally, we evaluate our label-agnostic approach on several standard classification tasks and show that the resulting sparse networks achieve higher classification performance while converging faster.
[61] arXiv:2006.08209 [pdf]
Multimodal fusion for sea level anomaly forecasting
The accumulated remote sensing data of altimeters and scatterometers have provided a new opportunity to forecast the ocean states and improve the knowledge in ocean/atmosphere exchanges. Few previous studies have focused on sea level anomaly (SLA) multi-step forecasting by multivariate deep learning for different modalities. For this paper, a novel multimodal fusion approach named MMFnet is used for SLA multi-step forecasting in South China Sea (SCS). First, a grid forecasting network is trained by an improved Convolutional Long Short-Term Memory (ConvLSTM) network on daily multiple remote sensing data from 1993 to 2016. Then, an in-situ forecasting network is trained by an improved LSTM network, which is decomposed by the ensemble empirical mode decomposition (EEMD-LSTM), on real-time, in-situ and remote sensing data. Finally, the two single-modal networks are fused by an ocean data assimilation scheme. During the test period from 2017 to 2019, the average RMSE of the MMFnet (single-modal ConvLSTM) is 4.03 cm (4.51 cm), the 15th-day anomaly correlation coefficient is 0.78 (0.67), the performance of MMFnet is much higher than those of current state-of-the-art dynamical (HYCOM) and statistical (ConvLSTM, Persistence and daily Climatology) forecasting systems. Sensitivity experiments analysis indicates that, the MMFnet, which added CCMP SCAT products and OISST for SLA forecasting, has improved the forecast range over a week and can effectively produce 15-day SLA forecasting with reasonable this http URL an extension of the validation over the North Pacific Ocean, MMFnet can calculate the forecasting results in a few minutes, and we find good agreement in amplitude and distribution of SLA variability between MMFnet and other classical operational model products.
[62] arXiv:2006.08177 [pdf]
Dissimilarity Mixture Autoencoder for Deep Clustering
In this paper, we introduce the Dissimilarity Mixture Autoencoder (DMAE), a novel neural network model that uses a dissimilarity function to generalize a family of density estimation and clustering methods. It is formulated in such a way that it internally estimates the parameters of a probability distribution through gradient-based optimization. Also, the proposed model can leverage from deep representation learning due to its straightforward incorporation into deep learning architectures, because, it consists of an encoder-decoder network that computes a probabilistic representation. Experimental evaluation was performed on image and text clustering benchmark datasets showing that the method is competitive in terms of unsupervised classification accuracy and normalized mutual information. The source code to replicate the experiments is publicly available at this https URL
[63] arXiv:2006.08149 [pdf]
GNNGuard Defending Graph Neural Networks against Adversarial Attacks
Deep learning methods for graphs achieve remarkable performance on many tasks. However, despite the proliferation of such methods and their success, recent findings indicate that small, unnoticeable perturbations of graph structure can catastrophically reduce performance of even the strongest and most popular Graph Neural Networks (GNNs). Here, we develop GNNGuard, a general defense approach against a variety of training-time attacks that perturb the discrete graph structure. GNNGuard can be straightforwardly incorporated into any GNN. Its core principle is to detect and quantify the relationship between the graph structure and node features, if one exists, and then exploit that relationship to mitigate negative effects of the attack. GNNGuard uses network theory of homophily to learn how best assign higher weights to edges connecting similar nodes while pruning edges between unrelated nodes. The revised edges then allow the underlying GNN to robustly propagate neural messages in the graph. GNNGuard introduces two novel components, the neighbor importance estimation, and the layer-wise graph memory, and we show empirically that both components are necessary for a successful defense. Across five GNNs, three defense methods, and four datasets, including a challenging human disease graph, experiments show that GNNGuard outperforms existing defense approaches by 15.3% on average. Remarkably, GNNGuard can effectively restore the state-of-the-art performance of GNNs in the face of various adversarial attacks, including targeted and non-targeted attacks.
[64] arXiv:2006.08143 [pdf]
Anomalous Motion Detection on Highway Using Deep Learning
Research in visual anomaly detection draws much interest due to its applications in surveillance. Common datasets for evaluation are constructed using a stationary camera overlooking a region of interest. Previous research has shown promising results in detecting spatial as well as temporal anomalies in these settings. The advent of self-driving cars provides an opportunity to apply visual anomaly detection in a more dynamic application yet no dataset exists in this type of environment. This paper presents a new anomaly detection dataset - the Highway Traffic Anomaly (HTA) dataset - for the problem of detecting anomalous traffic patterns from dash cam videos of vehicles on highways. We evaluate state-of-the-art deep learning anomaly detection models and propose novel variations to these methods. Our results show that state-of-the-art models built for settings with a stationary camera do not translate well to a more dynamic environment. The proposed variations to these SoTA methods show promising results on the new HTA dataset.
[65] arXiv:2006.08129 [pdf]
Emotion Recognition in Audio and Video Using Deep Neural Networks
Humans are able to comprehend information from multiple domains for e.g. speech, text and visual. With advancement of deep learning technology there has been significant improvement of speech recognition. Recognizing emotion from speech is important aspect and with deep learning technology emotion recognition has improved in accuracy and latency. There are still many challenges to improve accuracy. In this work, we attempt to explore different neural networks to improve accuracy of emotion recognition. With different architectures explored, we find (CNN+RNN) + 3DCNN multi-model architecture which processes audio spectrograms and corresponding video frames giving emotion prediction accuracy of 54.0% among 4 emotions and 71.75% among 3 emotions using IEMOCAP[2] dataset.
[66] arXiv:2006.08003 [pdf]
CompressNet Generative Compression at Extremely Low Bitrates
Compressing images at extremely low bitrates (< 0.1 bpp) has always been a challenging task since the quality of reconstruction significantly reduces due to the strong imposed constraint on the number of bits allocated for the compressed data. With the increasing need to transfer large amounts of images with limited bandwidth, compressing images to very low sizes is a crucial task. However, the existing methods are not effective at extremely low bitrates. To address this need, we propose a novel network called CompressNet which augments a Stacked Autoencoder with a Switch Prediction Network (SAE-SPN). This helps in the reconstruction of visually pleasing images at these low bitrates (< 0.1 bpp). We benchmark the performance of our proposed method on the Cityscapes dataset, evaluating over different metrics at extremely low bitrates to show that our method outperforms the other state-of-the-art. In particular, at a bitrate of 0.07, CompressNet achieves 22% lower Perceptual Loss and 55% lower Frechet Inception Distance (FID) compared to the deep learning SOTA methods.
[67] arXiv:2006.07993 [pdf]
Road Mapping in Low Data Environments with OpenStreetMap
Roads are among the most essential components of any country's infrastructure. By facilitating the movement and exchange of people, ideas, and goods, they support economic and cultural activity both within and across local and international borders. A comprehensive, up-to-date mapping of the geographical distribution of roads and their quality thus has the potential to act as an indicator for broader economic development. Such an indicator has a variety of high-impact applications, particularly in the planning of rural development projects where up-to-date infrastructure information is not available. This work investigates the viability of high resolution satellite imagery and crowd-sourced resources like OpenStreetMap in the construction of such a mapping. We experiment with state-of-the-art deep learning methods to explore the utility of OpenStreetMap data in road classification and segmentation tasks. We also compare the performance of models in different mask occlusion scenarios as well as out-of-country domains. Our comparison raises important pitfalls to consider in image-based infrastructure classification tasks, and shows the need for local training data specific to regions of interest for reliable performance.
[68] arXiv:2006.07972 [pdf]
Sub-Seasonal Climate Forecasting via Machine Learning Challenges, Analysis, and Advances
Sub-seasonal climate forecasting (SSF) focuses on predicting key climate variables such as temperature and precipitation in the 2-week to 2-month time scales. Skillful SSF would have immense societal value, in such areas as agricultural productivity, water resource management, transportation and aviation systems, and emergency planning for extreme weather events. However, SSF is considered more challenging than either weather prediction or even seasonal prediction. In this paper, we carefully study a variety of machine learning (ML) approaches for SSF over the US mainland. While atmosphere-land-ocean couplings and the limited amount of good quality data makes it hard to apply black-box ML naively, we show that with carefully constructed feature representations, even linear regression models, e.g., Lasso, can be made to perform well. Among a broad suite of 10 ML approaches considered, gradient boosting performs the best, and deep learning (DL) methods show some promise with careful architecture choices. Overall, ML methods are able to outperform the climatological baseline, i.e., predictions based on the 30 year average at a given location and time. Further, based on studying feature importance, ocean (especially indices based on climatic oscillations such as El Nino) and land (soil moisture) covariates are found to be predictive, whereas atmospheric covariates are not considered helpful.
[69] arXiv:2006.07934 [pdf]
Adversarial Attacks and Detection on Reinforcement Learning-Based Interactive Recommender Systems
Adversarial attacks pose significant challenges for detecting adversarial attacks at an early stage. We propose attack-agnostic detection on reinforcement learning-based interactive recommendation systems. We first craft adversarial examples to show their diverse distributions and then augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data. Finally, we study the attack strength and frequency of adversarial examples and evaluate our model on standard datasets with multiple crafting methods. Our extensive experiments show that most adversarial attacks are effective, and both attack strength and attack frequency impact the attack performance. The strategically-timed attack achieves comparative attack performance with only 1/3 to 1/2 attack frequency. Besides, our black-box detector trained with one crafting method has the generalization ability over several crafting methods.
[70] arXiv:2006.07822 [pdf]
Proximal Mapping for Deep Regularization
Underpinning the success of deep learning is effective regularizations that allow a variety of priors in data to be modeled. For example, robustness to adversarial perturbations, and correlations between multiple modalities. However, most regularizers are specified in terms of hidden layer outputs, which are not themselves optimization variables. In contrast to prevalent methods that optimize them indirectly through model weights, we propose inserting proximal mapping as a new layer to the deep network, which directly and explicitly produces well regularized hidden layer outputs. The resulting technique is shown well connected to kernel warping and dropout, and novel algorithms were developed for robust temporal learning and multiview modeling, both outperforming state-of-the-art methods.
[71] arXiv:2006.07810 [pdf]
Disentanglement for Discriminative Visual Recognition
Recent successes of deep learning-based recognition rely on maintaining the content related to the main-task label. However, how to explicitly dispel the noisy signals for better generalization in a controllable manner remains an open issue. For instance, various factors such as identity-specific attributes, pose, illumination and expression affect the appearance of face images. Disentangling the identity-specific factors is potentially beneficial for facial expression recognition (FER). This chapter systematically summarize the detrimental factors as task-relevant/irrelevant semantic variations and unspecified latent variation. In this chapter, these problems are casted as either a deep metric learning problem or an adversarial minimax game in the latent space. For the former choice, a generalized adaptive (N+M)-tuplet clusters loss function together with the identity-aware hard-negative mining and online positive mining scheme can be used for identity-invariant FER. The better FER performance can be achieved by combining the deep metric loss and softmax loss in a unified two fully connected layer branches framework via joint optimization. For the latter solution, it is possible to equipping an end-to-end conditional adversarial network with the ability to decompose an input sample into three complementary parts. The discriminative representation inherits the desired invariance property guided by prior knowledge of the task, which is marginal independent to the task-relevant/irrelevant semantic and latent variations. The framework achieves top performance on a serial of tasks, including lighting, makeup, disguise-tolerant face recognition and facial attributes recognition. This chapter systematically summarize the popular and practical solution for disentanglement to achieve more discriminative visual recognition.
[72] arXiv:2006.07794 [pdf]
PatchUp A Regularization Technique for Convolutional Neural Networks
Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (CNNs), that is applied on selected contiguous blocks of feature maps from a random pair of samples. Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches like Mixup and CutMix. Moreover, since we are mixing the contiguous block of features in the hidden space, which has more dimensions than the input space, we obtain more diverse samples for training towards different dimensions. Our experiments on CIFAR-10, CIFAR-100, and SVHN datasets with PreactResnet18, PreactResnet34, and WideResnet-28-10 models show that PatchUp improves upon, or equals, the performance of current state-of-the-art regularizers for CNNs. We also show that PatchUp can provide better generalization to affine transformations of samples and is more robust against adversarial attacks.
[73] arXiv:2006.07771 [pdf]
Numerical Simulation of Exchange Option with Finite Liquidity Controlled Variate Model
In this paper we develop numerical pricing methodologies for European style Exchange Options written on a pair of correlated assets, in a market with finite liquidity. In contrast to the standard multi-asset Black-Scholes framework, trading in our market model has a direct impact on the asset's price. The price impact is incorporated into the dynamics of the first asset through a specific trading strategy, as in large trader liquidity model. Two-dimensional Milstein scheme is implemented to simulate the pair of assets prices. The option value is numerically estimated by Monte Carlo with the Margrabe option as controlled variate. Time complexity of these numerical schemes are included. Finally, we provide a deep learning framework to implement this model effectively in a production environment.
[74] arXiv:2006.07767 [pdf]
MixMOOD A systematic approach to class distribution mismatch in semi-supervised learning using deep dataset dissimilarity measures
In this work, we propose MixMOOD - a systematic approach to mitigate effect of class distribution mismatch in semi-supervised deep learning (SSDL) with MixMatch. This work is divided into two components (i) an extensive out of distribution (OOD) ablation test bed for SSDL and (ii) a quantitative unlabelled dataset selection heuristic referred to as MixMOOD. In the first part, we analyze the sensitivity of MixMatch accuracy under 90 different distribution mismatch scenarios across three multi-class classification tasks. These are designed to systematically understand how OOD unlabelled data affects MixMatch performance. In the second part, we propose an efficient and effective method, called deep dataset dissimilarity measures (DeDiMs), to compare labelled and unlabelled datasets. The proposed DeDiMs are quick to evaluate and model agnostic. They use the feature space of a generic Wide-ResNet and can be applied prior to learning. Our test results reveal that supposed semantic similarity between labelled and unlabelled data is not a good heuristic for unlabelled data selection. In contrast, strong correlation between MixMatch accuracy and the proposed DeDiMs allow us to quantitatively rank different unlabelled datasets ante hoc according to expected MixMatch accuracy. This is what we call MixMOOD. Furthermore, we argue that the MixMOOD approach can aid to standardize the evaluation of different semi-supervised learning techniques under real world scenarios involving out of distribution data.
[75] arXiv:2006.07755 [pdf]
Recurrent Distillation based Crowd Counting
In recent years, with the progress of deep learning technologies, crowd counting has been rapidly developed. In this work, we propose a simple yet effective crowd counting framework that is able to achieve the state-of-the-art performance on various crowded scenes. In particular, we first introduce a perspective-aware density map generation method that is able to produce ground-truth density maps from point annotations to train crowd counting model to accomplish superior performance than prior density map generation techniques. Besides, leveraging our density map generation method, we propose an iterative distillation algorithm to progressively enhance our model with identical network structures, without significantly sacrificing the dimension of the output density maps. In experiments, we demonstrate that, with our simple convolutional neural network architecture strengthened by our proposed training algorithm, our model is able to outperform or be comparable with the state-of-the-art methods. Furthermore, we also evaluate our density map generation approach and distillation algorithm in ablation studies.
[76] arXiv:2006.07744 [pdf]
Exploiting the ConvLSTM Human Action Recognition using Raw Depth Video-Based Recurrent Neural Networks
As in many other different fields, deep learning has become the main approach in most computer vision applications, such as scene understanding, object recognition, computer-human interaction or human action recognition (HAR). Research efforts within HAR have mainly focused on how to efficiently extract and process both spatial and temporal dependencies of video sequences. In this paper, we propose and compare, two neural networks based on the convolutional long short-term memory unit, namely ConvLSTM, with differences in the architecture and the long-term learning strategy. The former uses a video-length adaptive input data generator (\emph{stateless}) whereas the latter explores the \emph{stateful} ability of general recurrent neural networks but applied in the particular case of HAR. This stateful property allows the model to accumulate discriminative patterns from previous frames without compromising computer memory. Experimental results on the large-scale NTU RGB+D dataset show that the proposed models achieve competitive recognition accuracies with lower computational cost compared with state-of-the-art methods and prove that, in the particular case of videos, the rarely-used stateful mode of recurrent neural networks significantly improves the accuracy obtained with the standard mode. The recognition accuracies obtained are 75.26\% (CS) and 75.45\% (CV) for the stateless model, with an average time consumption per video of 0.21 s, and 80.43\% (CS) and 79.91\%(CV) with 0.89 s for the stateful version.
[77] arXiv:2006.07743 [pdf]
3DFCNN Real-Time Action Recognition using 3D Deep Neural Networks with Raw Depth Information
Human actions recognition is a fundamental task in artificial vision, that has earned a great importance in recent years due to its multiple applications in different areas. %, such as the study of human behavior, security or video surveillance. In this context, this paper describes an approach for real-time human action recognition from raw depth image-sequences, provided by an RGB-D camera. The proposal is based on a 3D fully convolutional neural network, named 3DFCNN, which automatically encodes spatio-temporal patterns from depth sequences without %any costly pre-processing. Furthermore, the described 3D-CNN allows %automatic features extraction and actions classification from the spatial and temporal encoded information of depth sequences. The use of depth data ensures that action recognition is carried out protecting people's privacy% allows recognizing the actions carried out by people, protecting their privacy%\sout{of them} , since their identities can not be recognized from these data. %\st{ from depth images.} 3DFCNN has been evaluated and its results compared to those from other state-of-the-art methods within three widely used %large-scale NTU RGB+D datasets, with different characteristics (resolution, sensor type, number of views, camera location, etc.). The obtained results allows validating the proposal, concluding that it outperforms several state-of-the-art approaches based on classical computer vision techniques. Furthermore, it achieves action recognition accuracy comparable to deep learning based state-of-the-art methods with a lower computational cost, which allows its use in real-time applications.
[78] arXiv:2006.07721 [pdf]
Beyond Random Matrix Theory for Deep Networks
We investigate whether the Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities. We find that even allowing for outliers, the observed spectral shapes strongly deviate from such theoretical predictions. This raises major questions about the usefulness of these models in deep learning. We further show that theoretical results, such as the layered nature of critical points, are strongly dependent on the use of the exact form of these limiting spectral densities. We consider two new classes of matrix ensembles; random Wigner/Wishart ensemble products and percolated Wigner/Wishart ensembles, both of which better match observed spectra. They also give large discrete spectral peaks at the origin, providing a theoretical explanation for the observation that various optima can be connected by one dimensional of low loss values. We further show that, in the case of a random matrix product, the weight of the discrete spectral component at $0$ depends on the ratio of the dimensions of the weight matrices.
[79] arXiv:2006.07590 [pdf]
Missed calls, Automated Calls and Health Support Using AI to improve maternal health outcomes by increasing program engagement
India accounts for 11\% of maternal deaths globally where a woman dies in childbirth every fifteen minutes. Lack of access to preventive care information is a significant problem contributing to high maternal morbidity and mortality numbers, especially in low-income households. We work with ARMMAN, a non-profit based in India, to further the use of call-based information programs by early-on identifying women who might not engage on these programs that are proven to affect health parameters positively.We analyzed anonymized call-records of over 300,000 women registered in an awareness program created by ARMMAN that uses cellphone calls to regularly disseminate health related information. We built robust deep learning based models to predict short term and long term dropout risk from call logs and beneficiaries' demographic information. Our model performs 13\% better than competitive baselines for short-term forecasting and 7\% better for long term forecasting. We also discuss the applicability of this method in the real world through a pilot validation that uses our method to perform targeted interventions.
[80] arXiv:2006.07489 [pdf]
Multispectral Biometrics System Framework Application to Presentation Attack Detection
In this work, we present a general framework for building a biometrics system capable of capturing multispectral data from a series of sensors synchronized with active illumination sources. The framework unifies the system design for different biometric modalities and its realization on face, finger and iris data is described in detail. To the best of our knowledge, the presented design is the first to employ such a diverse set of electromagnetic spectrum bands, ranging from visible to long-wave-infrared wavelengths, and is capable of acquiring large volumes of data in seconds. Having performed a series of data collections, we run a comprehensive analysis on the captured data using a deep-learning classifier for presentation attack detection. Our study follows a data-centric approach attempting to highlight the strengths and weaknesses of each spectral band at distinguishing live from fake samples.

You can also browse papers in other categories.