[1] arXiv:2006.09977 [pdf]
A novel sentence embedding based topic detection method for micro-blog
Topic detection is a challenging task, especially without knowing the exact number of topics. In this paper, we present a novel approach based on neural network to detect topics in the micro-blogging dataset. We use an unsupervised neural sentence embedding model to map the blogs to an embedding space. Our model is a weighted power mean word embedding model, and the weights are calculated by attention mechanism. Experimental result shows our embedding method performs better than baselines in sentence clustering. In addition, we propose an improved clustering algorithm referred as relationship-aware DBSCAN (RADBSCAN). It can discover topics from a micro-blogging dataset, and the topic number depends on dataset character itself. Moreover, in order to solve the problem of parameters sensitive, we take blog forwarding relationship as a bridge of two independent clusters. Finally, we validate our approach on a dataset from sina micro-blog. The result shows that we can detect all the topics successfully and extract keywords in each topic.
[2] arXiv:2006.09963 [pdf]
GCC Graph Contrastive Coding for Graph Neural Network Pre-Training
Graph representation learning has emerged as a powerful technique for real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, graph classification, and link prediction. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph, which is usually non-transferable to out-of-domain data. Inspired by recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) -- an unsupervised graph representation learning framework -- to capture the universal network topological properties across multiple networks. We design GCC's pre-training task as subgraph-level instance discrimination in and across networks and leverage contrastive learning to empower the model to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.
[3] arXiv:2006.09815 [pdf]
A Deep Neural Network for Audio Classification with a Classifier Attention Mechanism
Audio classification is considered as a challenging problem in pattern recognition. Recently, many algorithms have been proposed using deep neural networks. In this paper, we introduce a new attention-based neural network architecture called Classifier-Attention-Based Convolutional Neural Network (CAB-CNN). The algorithm uses a newly designed architecture consisting of a list of simple classifiers and an attention mechanism as a classifier selector. This design significantly reduces the number of parameters required by the classifiers and thus their complexities. In this way, it becomes easier to train the classifiers and achieve a high and steady performance. Our claims are corroborated by the experimental results. Compared to the state-of-the-art algorithms, our algorithm achieves more than 10% improvements on all selected test scores.
[4] arXiv:2006.09662 [pdf]
MetaSDF Meta-learning Signed Distance Functions
Neural implicit shape representations are an emerging paradigm that offers many potential benefits over conventional discrete representations, including memory efficiency at a high spatial resolution. Generalizing across shapes with such neural implicit representations amounts to learning priors over the respective function space and enables geometry reconstruction from partial or noisy observations. Existing generalization methods rely on conditioning a neural network on a low-dimensional latent code that is either regressed by an encoder or jointly optimized in the auto-decoder framework. Here, we formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task. We demonstrate that this approach performs on par with auto-decoder based approaches while being an order of magnitude faster at test-time inference. We further demonstrate that the proposed gradient-based method outperforms encoder-decoder based methods that leverage pooling-based set encoders.
[5] arXiv:2006.09597 [pdf]
Cross-Correlated Attention Networks for Person Re-Identification
Deep neural networks need to make robust inference in the presence of occlusion, background clutter, pose and viewpoint variations -- to name a few -- when the task of person re-identification is considered. Attention mechanisms have recently proven to be successful in handling the aforementioned challenges to some degree. However previous designs fail to capture inherent inter-dependencies between the attended features; leading to restricted interactions between the attention blocks. In this paper, we propose a new attention module called Cross-Correlated Attention (CCA); which aims to overcome such limitations by maximizing the information gain between different attended regions. Moreover, we also propose a novel deep network that makes use of different attention mechanisms to learn robust and discriminative representations of person images. The resulting model is called the Cross-Correlated Attention Network (CCAN). Extensive experiments demonstrate that the CCAN comfortably outperforms current state-of-the-art algorithms by a tangible margin.
[6] arXiv:2006.09590 [pdf]
Deep Learning with Functional Inputs
We present a methodology for integrating functional data into deep densely connected feed-forward neural networks. The model is defined for scalar responses with multiple functional and scalar covariates. A by-product of the method is a set of dynamic functional weights that can be visualized during the optimization process. This visualization leads to greater interpretability of the relationship between the covariates and the response relative to conventional neural networks. The model is shown to perform well in a number of contexts including prediction of new data and recovery of the true underlying functional weights; these results were confirmed through real applications and simulation studies. A forthcoming R package is developed on top of a popular deep learning library (Keras) allowing for general use of the approach.
[7] arXiv:2006.09545 [pdf]
Neural Optimal Control for Representation Learning
The intriguing connections recently established between neural networks and dynamical systems have invited deep learning researchers to tap into the well-explored principles of differential calculus. Notably, the adjoint sensitivity method used in neural ordinary differential equations (Neural ODEs) has cast the training of neural networks as a control problem in which neural modules operate as continuous-time homeomorphic transformations of features. Typically, these methods optimize a single set of parameters governing the dynamical system for the whole data set, forcing the network to learn complex transformations that are functionally limited and computationally heavy. Instead, we propose learning a data-conditioned distribution of \emph{optimal controls} over the network dynamics, emulating a form of input-dependent fast neural plasticity. We describe a general method for training such models as well as convergence proofs assuming mild hypotheses about the ODEs and show empirically that this method leads to simpler dynamics and reduces the computational cost of Neural ODEs. We evaluate this approach for unsupervised image representation learning; our new "functional" auto-encoding model with ODEs, AutoencODE, achieves state-of-the-art image reconstruction quality on CIFAR-10, and exhibits substantial improvements in unsupervised classification over existing auto-encoding models.
[8] arXiv:2006.09543 [pdf]
Data Driven Control with Learned Dynamics Model-Based versus Model-Free Approach
This paper compares two different types of data-driven control methods, representing model-based and model-free approaches. One is a recently proposed method - Deep Koopman Representation for Control (DKRC), which utilizes a deep neural network to map an unknown nonlinear dynamical system to a high-dimensional linear system, which allows for employing state-of-the-art control strategy. The other one is a classic model-free control method based on an actor-critic architecture - Deep Deterministic Policy Gradient (DDPG), which has been proved to be effective in various dynamical systems. The comparison is carried out in OpenAI Gym, which provides multiple control environments for benchmark purposes. Two examples are provided for comparison, i.e., classic Inverted Pendulum and Lunar Lander Continuous Control. From the results of the experiments, we compare these two methods in terms of control strategies and the effectiveness under various initialization conditions. We also examine the learned dynamic model from DKRC with the analytical model derived from the Euler-Lagrange Linearization method, which demonstrates the accuracy in the learned model for unknown dynamics from a data-driven sample-efficient approach.
[9] arXiv:2006.09437 [pdf]
A Study of Compositional Generalization in Neural Models
Compositional and relational learning is a hallmark of human intelligence, but one which presents challenges for neural models. One difficulty in the development of such models is the lack of benchmarks with clear compositional and relational task structure on which to systematically evaluate them. In this paper, we introduce an environment called ConceptWorld, which enables the generation of images from compositional and relational concepts, defined using a logical domain specific language. We use it to generate images for a variety of compositional structures 2x2 squares, pentominoes, sequences, scenes involving these objects, and other more complex concepts. We perform experiments to test the ability of standard neural architectures to generalize on relations with compositional arguments as the compositional depth of those arguments increases and under substitution. We compare standard neural networks such as MLP, CNN and ResNet, as well as state-of-the-art relational networks including WReN and PrediNet in a multi-class image classification setting. For simple problems, all models generalize well to close concepts but struggle with longer compositional chains. For more complex tests involving substitutivity, all models struggle, even with short chains. In highlighting these difficulties and providing an environment for further experimentation, we hope to encourage the development of models which are able to generalize effectively in compositional, relational domains.
[10] arXiv:2006.09410 [pdf]
Fast Correlated-Photon Imaging Enhanced by Deep Learning
Correlated photon pairs, carrying strong quantum correlations, have been harnessed to bring quantum advantages to various fields from biological imaging to range finding. Such inherent non-classical properties support extracting more valid signals to build photon-limited images even in low flux-level, where the shot noise becomes dominant as light source decreases to single-photon level. Optimization by numerical reconstruction algorithms is possible but require thousands of photon-sparse frames, thus unavailable in real time. Here, we present an experimental fast correlated-photon imaging enhanced by deep learning, showing an intelligent computational strategy to discover deeper structure in big data. Convolutional neural network is found being able to efficiently solve image inverse problems associated with strong shot noise and background noise (electronic noise, scattered light). Our results fill the key gap in incompatibility between imaging speed and image quality by pushing low-light imaging technique to the regime of real-time and single-photon level, opening up an avenue to deep leaning-enhanced quantum imaging for real-life applications.
[11] arXiv:2006.09355 [pdf]
A Note on the Global Convergence of Multilayer Neural Networks in the Mean Field Regime
In a recent work, we introduced a rigorous framework to describe the mean field limit of the gradient-based learning dynamics of multilayer neural networks, based on the idea of a neuronal embedding. There we also proved a global convergence guarantee for three-layer (as well as two-layer) networks using this framework. In this companion note, we point out that the insights in our previous work can be readily extended to prove a global convergence guarantee for multilayer networks of any depths. Unlike our previous three-layer global convergence guarantee that assumes i.i.d. initializations, our present result applies to a type of correlated initialization. This initialization allows to, at any finite training time, propagate a certain universal approximation property through the depth of the neural network. To achieve this effect, we introduce a bidirectional diversity condition.
[12] arXiv:2006.09322 [pdf]
We present a novel domain adaptation framework that uses morphologic segmentation to translate images from arbitrary input domains (real and synthetic) into a uniform output domain. Our framework is based on an established image-to-image translation pipeline that allows us to first transform the input image into a generalized representation that encodes morphology and semantics - the edge-plus-segmentation map (EPS) - which is then transformed into an output domain. Images transformed into the output domain are photo-realistic and free of artifacts that are commonly present across different real (e.g. lens flare, motion blur, etc.) and synthetic (e.g. unrealistic textures, simplified geometry, etc.) data sets. Our goal is to establish a preprocessing step that unifies data from multiple sources into a common representation that facilitates training downstream tasks in computer vision. This way, neural networks for existing tasks can be trained on a larger variety of training data, while they are also less affected by overfitting to specific data sets. We showcase the effectiveness of our approach by qualitatively and quantitatively evaluating our method on four data sets of simulated and real data of urban scenes. Additional results can be found on the project website available at this http URL .
[13] arXiv:2006.09289 [pdf]
Isometric Autoencoders
High dimensional data is often assumed to be concentrated near a low-dimensional manifold. Autoencoders (AE) is a popular technique to learn representations of such data by pushing it through a neural network with a low dimension bottleneck while minimizing a reconstruction error. Using high capacity AE often leads to a large collection of minimizers, many of which represent a low dimensional manifold that fits the data well but generalizes poorly. Two sources of bad generalization are extrinsic, where the learned manifold possesses extraneous parts that are far from the data; and intrinsic, where the encoder and decoder introduce arbitrary distortion in the low dimensional parameterization. An approach taken to alleviate these issues is to add a regularizer that favors a particular solution; common regularizers promote sparsity, small derivatives, or robustness to noise. In this paper, we advocate an isometry (ie, distance preserving) regularizer. Specifically, our regularizer encourages (i) the decoder to be an isometry; and (ii) the encoder to be a pseudo-isometry, where pseudo-isometry is an extension of an isometry with an orthogonal projection operator. In a nutshell, (i) preserves all geometric properties of the data such as volume, length, angle, and probability density. It fixes the intrinsic degree of freedom since any two isometric decoders to the same manifold will differ by a rigid motion. (ii) Addresses the extrinsic degree of freedom by minimizing derivatives in orthogonal directions to the manifold and hence disfavoring complicated manifold solutions. Experimenting with the isometry regularizer on dimensionality reduction tasks produces useful low-dimensional data representations, while incorporating it in AE models leads to an improved generalization.
[14] arXiv:2006.09205 [pdf]
Visual Identification of Individual Holstein Friesian Cattle via Deep Metric Learning
Holstein Friesian cattle exhibit individually-characteristic black and white coat patterns visually akin to those arising from Turing's reaction-diffusion systems. This work takes advantage of these natural markings in order to automate visual detection and biometric identification of individual Holstein Friesians via convolutional neural networks and deep metric learning techniques. Using agriculturally relevant top-down imaging, we present methods for the detection, localisation, and identification of individual Holstein Friesians in an open herd setting, i.e. where changes in the herd do not require system re-training. We propose the use of SoftMax-based reciprocal triplet loss to address the identification problem and evaluate the techniques in detail against fixed herd paradigms. We find that deep metric learning systems show strong performance even under conditions where cattle unseen during system training are to be identified and re-identified - achieving 98.2% accuracy when trained on just half of the population. This work paves the way for facilitating the visual non-intrusive monitoring of cattle applicable to precision farming for automated health and welfare monitoring and to veterinary research in behavioural analysis, disease outbreak tracing, and more.
[15] arXiv:2006.09204 [pdf]
PlumeNet Large-Scale Air Quality Forecasting Using A Convolutional LSTM Network
This paper presents an engine able to forecast jointly the concentrations of the main pollutants harming people's health nitrogen dioxyde (NO2), ozone (O3) and particulate matter (PM2.5 and PM10, which are respectively the particles whose diameters are below 2.5 um and 10 um respectively). The forecasts are performed on a regular grid (the results presented in the paper are produced with a 0.5° resolution grid over Europe and the United States) with a neural network whose architecture includes convolutional LSTM blocks. The engine is fed with the most recent air quality monitoring stations measures available, weather forecasts as well as air quality physical and chemical model (AQPCM) outputs. The engine can be used to produce air quality forecasts with long time horizons, and the experiments presented in this paper show that the 4 days forecasts beat very significantly simple benchmarks. A valuable advantage of the engine is that it does not need much computing power the forecasts can be built in a few minutes on a standard GPU. Thus, they can be updated very frequently, as soon as new air quality measures are available (generally every hour), which is not the case of AQPCMs traditionally used for air quality forecasting. The engine described in this paper relies on the same principles as a prediction engine deployed and used by Plume Labs in several products aiming at providing air quality data to individuals and businesses.
[16] arXiv:2006.09141 [pdf]
Improving accuracy and speeding up Document Image Classification through parallel systems
This paper presents a study showing the benefits of the EfficientNet models compared with heavier Convolutional Neural Networks (CNNs) in the Document Classification task, essential problem in the digitalization process of institutions. We show in the RVL-CDIP dataset that we can improve previous results with a much lighter model and present its transfer learning capabilities on a smaller in-domain dataset such as Tobacco3482. Moreover, we present an ensemble pipeline which is able to boost solely image input by combining image model predictions with the ones generated by BERT model on extracted text by OCR. We also show that the batch size can be effectively increased without hindering its accuracy so that the training process can be sped up by parallelizing throughout multiple GPUs, decreasing the computational time needed. Lastly, we expose the training performance differences between PyTorch and Tensorflow Deep Learning frameworks.
[17] arXiv:2006.09107 [pdf]
Learning from Demonstration with Weakly Supervised Disentanglement
Robotic manipulation tasks, such as wiping with a soft sponge, require control from multiple rich sensory modalities. Human-robot interaction, aimed at teaching robots, is difficult in this setting as there is potential for mismatch between human and machine comprehension of the rich data streams. We treat the task of interpretable learning from demonstration as an optimisation problem over a probabilistic generative model. To account for the high-dimensionality of the data, a high-capacity neural network is chosen to represent the model. The latent variables in this model are explicitly aligned with high-level notions and concepts that are manifested in a set of demonstrations. We show that such alignment is best achieved through the use of labels from the end user, in an appropriately restricted vocabulary, in contrast to the conventional approach of the designer picking a prior over the latent variables. Our approach is evaluated in the context of a table-top robot manipulation task performed by a PR2 robot -- that of dabbing liquids with a sponge (forcefully pressing a sponge and moving it along a surface). The robot provides visual information, arm joint positions and arm joint efforts. We have made videos of the task and data available - see supplementary materials at this https URL
[18] arXiv:2006.09091 [pdf]
Flatness is a False Friend
Hessian based measures of flatness, such as the trace, Frobenius and spectral norms, have been argued, used and shown to relate to generalisation. In this paper we demonstrate that for feed forward neural networks under the cross entropy loss, we would expect low loss solutions with large weights to have small Hessian based measures of flatness. This implies that solutions obtained using $L2$ regularisation should in principle be sharper than those without, despite generalising better. We show this to be true for logistic regression, multi-layer perceptrons, simple convolutional, pre-activated and wide residual networks on the MNIST and CIFAR-$100$ datasets. Furthermore, we show that for adaptive optimisation algorithms using iterate averaging, on the VGG-$16$ network and CIFAR-$100$ dataset, achieve superior generalisation to SGD but are $30 \times$ sharper. This theoretical finding, along with experimental results, raises serious questions about the validity of Hessian based sharpness measures in the discussion of generalisation. We further show that the Hessian rank can be bounded by the a constant times number of neurons multiplied by the number of classes, which in practice is often a small fraction of the network parameters. This explains the curious observation that many Hessian eigenvalues are either zero or very near zero which has been reported in the literature.
[19] arXiv:2006.08947 [pdf]
SPLASH Learnable Activation Functions for Improving Accuracy and Adversarial Robustness
We introduce SPLASH units, a class of learnable activation functions shown to simultaneously improve the accuracy of deep neural networks while also improving their robustness to adversarial attacks. SPLASH units have both a simple parameterization and maintain the ability to approximate a wide range of non-linear functions. SPLASH units are 1) continuous; 2) grounded (f(0) = 0); 3) use symmetric hinges; and 4) the locations of the hinges are derived directly from the data (i.e. no learning required). Compared to nine other learned and fixed activation functions, including ReLU and its variants, SPLASH units show superior performance across three datasets (MNIST, CIFAR-10, and CIFAR-100) and four architectures (LeNet5, All-CNN, ResNet-20, and Network-in-Network). Furthermore, we show that SPLASH units significantly increase the robustness of deep neural networks to adversarial attacks. Our experiments on both black-box and open-box adversarial attacks show that commonly-used architectures, namely LeNet5, All-CNN, ResNet-20, and Network-in-Network, can be up to 31% more robust to adversarial attacks by simply using SPLASH units instead of ReLUs.
[20] arXiv:2006.08933 [pdf]
Plug-and-Play Anomaly Detection with Expectation Maximization Filtering
Anomaly detection in crowds enables early rescue response. A plug-and-play smart camera for crowd surveillance has numerous constraints different from typical anomaly detection the training data cannot be used iteratively; there are no training labels; and training and classification needs to be performed simultaneously. We tackle all these constraints with our approach in this paper. We propose a Core Anomaly-Detection (CAD) neural network which learns the motion behavior of objects in the scene with an unsupervised method. On average over standard datasets, CAD with a single epoch of training shows a percentage increase in Area Under the Curve (AUC) of 4.66% and 4.9% compared to the best results with convolutional autoencoders and convolutional LSTM-based methods, respectively. With a single epoch of training, our method improves the AUC by 8.03% compared to the convolutional LSTM-based approach. We also propose an Expectation Maximization filter which chooses samples for training the core anomaly-detection network. The overall framework improves the AUC compared to future frame prediction-based approach by 24.87% when crowd anomaly detection is performed on a video stream. We believe our work is the first step towards using deep learning methods with autonomous plug-and-play smart cameras for crowd anomaly detection.
[21] arXiv:2006.08925 [pdf]
Improving the Performance of Deep Learning for Wireless Localization
Indoor localization systems are most commonly based on Received Signal Strength Indicator (RSSI) measurements of either WiFi or Bluetooth-Low-Energy (BLE) beacons. In such systems, the two most common techniques are trilateration and fingerprinting, with the latter providing higher accuracy. In the fingerprinting technique, Deep Learning (DL) algorithms are often used to predict the location of the receiver based on the RSSI measurements of multiple beacons received at the receiver. In this paper, we address two practical issues with applying Deep Learning to wireless localization -- transfer of solution from one wireless environment to another \emph{and} small size of labelled data set. First, we apply automatic hyperparameter optimization to a deep neural network (DNN) system for indoor wireless localization, which makes the system easy to port to new wireless environments. Second, we show how to augment a typically small labelled data set using the unlabelled data set. We observed improved performance in DL by applying the two techniques. Additionally, all relevant code has been made freely available.
[22] arXiv:2006.08924 [pdf]
GCNs-Net A Graph Convolutional Neural Network Approach for Decoding Time-resolved EEG Motor Imagery Signals
Towards developing effective and efficient brain-computer interface (BCI) systems, precise decoding of brain activity measured by electroencephalogram (EEG), is highly demanded. Traditional works classify EEG signals without considering the topological relationship among electrodes. However, neuroscience research has increasingly emphasized network patterns of brain dynamics. Thus, the Euclidean structure of electrodes might not adequately reflect the interaction between signals. To fill the gap, a novel deep learning framework based on the graph convolutional neural networks (GCNs) was presented to enhance the decoding performance of raw EEG signals during different types of motor imagery (MI) tasks while cooperating with the functional topological relationship of electrodes. Based on the absolute Pearson's matrix of overall signals, the graph Laplacian of EEG electrodes was built up. The GCNs-Net constructed by graph convolutional layers learns the generalized features. The followed pooling layers reduce dimensionality, and the fully-connected softmax layer derives the final prediction. The introduced approach has been shown to converge for both personalized and group-wise predictions. It has achieved the highest averaged accuracy, 93.056% and 88.57% (PhysioNet Dataset), 96.24% and 80.89% (High Gamma Dataset), at the subject and group level, respectively, compared with existing studies, which suggests adaptability and robustness to individual variability. Moreover, the performance was stably reproducible among repetitive experiments for cross-validation. To conclude, the GCNs-Net filters EEG signals based on the functional topological relationship, which manages to decode relevant features for brain motor imagery.
[23] arXiv:2006.08901 [pdf]
Data assimilation empowered neural network parameterizations for subgrid processes in geophysical flows
In the past couple of years, there is a proliferation in the use of machine learning approaches to represent subgrid scale processes in geophysical flows with an aim to improve the forecasting capability and to accelerate numerical simulations of these flows. Despite its success for different types of flow, the online deployment of a data-driven closure model can cause instabilities and biases in modeling the overall effect of subgrid scale processes, which in turn leads to inaccurate prediction. To tackle this issue, we exploit the data assimilation technique to correct the physics-based model coupled with the neural network as a surrogate for unresolved flow dynamics in multiscale systems. In particular, we use a set of neural network architectures to learn the correlation between resolved flow variables and the parameterizations of unresolved flow dynamics and formulate a data assimilation approach to correct the hybrid model during their online deployment. We illustrate our framework in a application of the multiscale Lorenz 96 system for which the parameterization model for unresolved scales is exactly known. Our analysis, therefore, comprises a predictive dynamical core empowered by (i) a data-driven closure model for subgrid scale processes, (ii) a data assimilation approach for forecast error correction, and (iii) both data-driven closure and data assimilation procedures. We show significant improvement in the long-term perdition of the underlying chaotic dynamics with our framework compared to using only neural network parameterizations for future prediction. Moreover, we demonstrate that these data-driven parameterization models can handle the non-Gaussian statistics of subgrid scale processes, and effectively improve the accuracy of outer data assimilation workflow loops in a modular non-intrusive way.
[24] arXiv:2006.08896 [pdf]
Model-Driven DNN Decoder for Turbo Codes Design, Simulation and Experimental Results
This paper presents a novel model-driven deep learning (DL) architecture, called TurboNet, for turbo decoding that integrates DL into the traditional max-log-maximum a posteriori (MAP) algorithm. The TurboNet inherits the superiority of the max-log-MAP algorithm and DL tools and thus presents excellent error-correction capability with low training cost. To design the TurboNet, the original iterative structure is unfolded as deep neural network (DNN) decoding units, where trainable weights are introduced to the max-log-MAP algorithm and optimized through supervised learning. To efficiently train the TurboNet, a loss function is carefully designed to prevent tricky gradient vanishing issue. To further reduce the computational complexity and training cost of the TurboNet, we can prune it into TurboNet+. Compared with the existing black-box DL approaches, the TurboNet+ has considerable advantage in computational complexity and is conducive to significantly reducing the decoding overhead. Furthermore, we also present a simple training strategy to address the overfitting issue, which enable efficient training of the proposed TurboNet+. Simulation results demonstrate TurboNet+'s superiority in error-correction ability, signal-to-noise ratio generalization, and computational overhead. In addition, an experimental system is established for an over-the-air (OTA) test with the help of a 5G rapid prototyping system and demonstrates TurboNet's strong learning ability and great robustness to various scenarios.
[25] arXiv:2006.08885 [pdf]
DeepCapture Image Spam Detection Using Deep Learning and Data Augmentation
Image spam emails are often used to evade text-based spam filters that detect spam emails with their frequently used keywords. In this paper, we propose a new image spam email detection tool called DeepCapture using a convolutional neural network (CNN) model. There have been many efforts to detect image spam emails, but there is a significant performance degrade against entirely new and unseen image spam emails due to overfitting during the training phase. To address this challenging issue, we mainly focus on developing a more robust model to address the overfitting problem. Our key idea is to build a CNN-XGBoost framework consisting of eight layers only with a large number of training samples using data augmentation techniques tailored towards the image spam detection task. To show the feasibility of DeepCapture, we evaluate its performance with publicly available datasets consisting of 6,000 spam and 2,313 non-spam image samples. The experimental results show that DeepCapture is capable of achieving an F1-score of 88%, which has a 6% improvement over the best existing spam detection model CNN-SVM with an F1-score of 82%. Moreover, DeepCapture outperformed existing image spam detection solutions against new and unseen image datasets.
[26] arXiv:2006.08878 [pdf]
CNN Acceleration by Low-rank Approximation with Quantized Factors
The modern convolutional neural networks although achieve great results in solving complex computer vision tasks still cannot be effectively used in mobile and embedded devices due to the strict requirements for computational complexity, memory and power consumption. The CNNs have to be compressed and accelerated before deployment. In order to solve this problem the novel approach combining two known methods, low-rank tensor approximation in Tucker format and quantization of weights and feature maps (activations), is proposed. The greedy one-step and multi-step algorithms for the task of multilinear rank selection are proposed. The approach for quality restoration after applying Tucker decomposition and quantization is developed. The efficiency of our method is demonstrated for ResNet18 and ResNet34 on CIFAR-10, CIFAR-100 and Imagenet classification tasks. As a result of comparative analysis performed for other methods for compression and acceleration our approach showed its promising features.
[27] arXiv:2006.08852 [pdf]
Counterexample-Guided Learning of Monotonic Neural Networks
The widespread adoption of deep learning is often attributed to its automatic feature construction with minimal inductive bias. However, in many real-world tasks, the learned function is intended to satisfy domain-specific constraints. We focus on monotonicity constraints, which are common and require that the function's output increases with increasing values of specific input features. We develop a counterexample-guided technique to provably enforce monotonicity constraints at prediction time. Additionally, we propose a technique to use monotonicity as an inductive bias for deep learning. It works by iteratively incorporating monotonicity counterexamples in the learning process. Contrary to prior work in monotonic learning, we target general ReLU neural networks and do not further restrict the hypothesis space. We have implemented these techniques in a tool called COMET. Experiments on real-world datasets demonstrate that our approach achieves state-of-the-art results compared to existing monotonic learners, and can improve the model quality compared to those that were trained without taking monotonicity constraints into account.
[28] arXiv:2006.08798 [pdf]
Equilibrium Propagation for Complete Directed Neural Networks
Artificial neural networks, one of the most successful approaches to supervised learning, were originally inspired by their biological counterparts. However, the most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible. We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibrium propagation learning framework. Specifically, we introduce a new neuronal dynamics and learning rule for arbitrary network architectures; a sparsity-inducing method able to prune irrelevant connections; a dynamical-systems characterization of the models, using Lyapunov theory.
[29] arXiv:2006.08601 [pdf]
Explaining Local, Global, And Higher-Order Interactions In Deep Learning
We present a simple yet highly generalizable method for explaining interacting parts within a neural network's reasoning process. In this work, we consider local, global, and higher-order statistical interactions. Generally speaking, local interactions occur between features within individual datapoints, while global interactions come in the form of universal features across the whole dataset. With deep learning, combined with some heuristics for tractability, we achieve state of the art measurement of global statistical interaction effects, including at higher orders (3-way interactions or more). We generalize this to the multidimensional setting to explain local interactions in multi-object detection and relational reasoning using the COCO annotated-image and Sort-Of-CLEVR toy datasets respectively. Here, we submit a new task for testing feature vector interactions, conduct a human study, propose a novel metric for relational reasoning, and use our interaction interpretations to innovate a more effective Relation Network. Finally, we apply these techniques on a real-world biomedical dataset to discover the higher-order interactions underlying Parkinson's disease clinical progression. Code for all experiments, fully reproducible, is available at this https URL.
[30] arXiv:2006.08529 [pdf]
Machine-learning approach to identification of coronal holes in solar disk images and synoptic maps
Identification of solar coronal holes (CHs) provides information both for operational space weather forecasting and long-term investigation of solar activity. Source data for the first problem are typically most recent solar disk observations, while for the second problem it is convenient to consider solar synoptic maps. Motivated by the idea that the concept of CHs should be similar for both cases we investigate universal models that can learn a CHs segmentation in disk images and reproduce the same segmentation in synoptic maps. We demonstrate that Convolutional Neural Networks (CNN) trained on daily disk images provide an accurate CHs segmentation in synoptic maps and their pole-centric projections. Using this approach we construct a catalog of synoptic maps for the period of 2010-20 based on SDO/AIA observations in the 193 Angstrom wavelength. The obtained CHs synoptic maps are compared with magnetic synoptic maps in the time-latitude and time-longitude diagrams. The initial results demonstrate that while in some cases the CHs are associated with magnetic flux transport events there are other mechanisms contributing to the CHs formation and evolution. To stimulate further investigations the catalog of synoptic maps is published in open access.
[31] arXiv:2006.08509 [pdf]
APQ Joint Search for Network Architecture, Pruning and Quantization Policy
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy predictor requires collecting a large number of quantized pairs, which involves quantization-aware finetuning and thus is highly time-consuming. To tackle this challenge, we propose to transfer the knowledge from a full-precision (i.e., fp32) accuracy predictor to the quantization-aware (i.e., int8) accuracy predictor, which greatly improves the sample efficiency. Besides, collecting the dataset for the fp32 accuracy predictor only requires to evaluate neural networks without any training cost by sampling from a pretrained once-for-all network, which is highly efficient. Extensive experiments on ImageNet demonstrate the benefits of our joint optimization approach. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ. Compared to the separate optimization approach (ProxylessNAS+AMC+HAQ), APQ achieves 2.3% higher ImageNet accuracy while reducing orders of magnitude GPU hours and CO2 emission, pushing the frontier for green AI that is environmental-friendly. The code and video are publicly available.
[32] arXiv:2006.08464 [pdf]
Globally Injective ReLU Networks
We study injective ReLU neural networks. Injectivity plays an important role in generative models where it facilitates inference; in inverse problems with generative priors it is a precursor to well posedness. We establish sharp conditions for injectivity of ReLU layers and networks, both fully connected and convolutional. We make no architectural assumptions beyond the ReLU activations so our results apply to a very general class of neural networks. We show through a layer-wise analysis that an expansivity factor of two is necessary for injectivity; we also show sufficiency by constructing weight matrices which guarantee injectivity. Further, we show that global injectivity with iid Gaussian matrices, a commonly used tractable model, requires considerably larger expansivity which might seem counterintuitive. We then derive the inverse Lipschitz constants and study the approximation-theoretic properties of injective neural networks. Using arguments from differential topology we prove that, under mild technical conditions, any Lipschitz map can be approximated by an injective neural network. This justifies the use of injective neural networks in problems which a priori do not require injectivity. Our results establish a theoretical basis for the study of nonlinear inverse and inference problems using neural networks.
[33] arXiv:2006.08457 [pdf]
Interaction Networks Using a Reinforcement Learner to train other Machine Learning algorithms
The wiring of neurons in the brain is more flexible than the wiring of connections in contemporary artificial neural networks. It is possible that this extra flexibility is important for efficient problem solving and learning. This paper introduces the Interaction Network. Interaction Networks aim to capture some of this extra flexibility. An Interaction Network consists of a collection of conventional neural networks, a set of memory locations, and a DQN or other reinforcement learner. The DQN decides when each of the neural networks is executed, and on what memory locations. In this way, the individual neural networks can be trained on different data, for different tasks. At the same time, the results of the individual networks influence the decision process of the reinforcement learner. This results in a feedback loop that allows the DQN to perform actions that improve its own decision-making. Any existing type of neural network can be reproduced in an Interaction Network in its entirety, with only a constant computational overhead. Interaction Networks can then introduce additional features to improve performance further. These make the algorithm more flexible and general, but at the expense of being harder to train. In this paper, thought experiments are used to explore how the additional abilities of Interaction Networks could be used to improve various existing types of neural networks. Several experiments have been run to prove that the concept is sound. These show that the basic idea works, but they also reveal a number of challenges that do not appear in conventional neural networks, which make Interaction Networks very hard to train. Further research needs to be done to alleviate these issues. A number of promising avenues of research to achieve this are outlined in this paper.
[34] arXiv:2006.08456 [pdf]
A Machine Learning-Based Migration Strategy for Virtual Network Function Instances
With the growing demand for data connectivity, network service providers are faced with the task of reducing their capital and operational expenses while simultaneously improving network performance and addressing the increased demand. Although Network Function Virtualization (NFV) has been identified as a promising solution, several challenges must be addressed to ensure its feasibility. In this paper, we address the Virtual Network Function (VNF) migration problem by developing the VNF Neural Network for Instance Migration (VNNIM), a migration strategy for VNF instances. The performance of VNNIM is further improved through the optimization of the learning rate hyperparameter through particle swarm optimization. Results show that the VNNIM is very effective in predicting the post-migration server exhibiting a binary accuracy of 99.07% and a delay difference distribution that is centered around a mean of zero when compared to the optimization model. The greatest advantage of VNNIM, however, is its run-time efficiency highlighted through a run-time analysis.
[35] arXiv:2006.08453 [pdf]
Bayesian Neural Network via Stochastic Gradient Descent
The goal of bayesian approach used in variational inference is to minimize the KL divergence between variational distribution and unknown posterior distribution. This is done by maximizing the Evidence Lower Bound (ELBO). A neural network is used to parametrize these distributions using Stochastic Gradient Descent. This work extends the work done by others by deriving the variational inference models. We show how SGD can be applied on bayesian neural networks by gradient estimation techniques. For validation, we have tested our model on 5 UCI datasets and the metrics chosen for evaluation are Root Mean Square Error (RMSE) error and negative log likelihood. Our work considerably beats the previous state of the art approaches for regression using bayesian neural networks.
[36] arXiv:2006.08448 [pdf]
Deep unfolding of the weighted MMSE beamforming algorithm
Downlink beamforming is a key technology for cellular networks. However, computing the transmit beamformer that maximizes the weighted sum rate subject to a power constraint is an NP-hard problem. As a result, iterative algorithms that converge to a local optimum are used in practice. Among them, the weighted minimum mean square error (WMMSE) algorithm has gained popularity, but its computational complexity and consequent latency has motivated the need for lower-complexity approximations at the expense of performance. Motivated by the recent success of deep unfolding in the trade-off between complexity and performance, we propose the novel application of deep unfolding to the WMMSE algorithm for a MISO downlink channel. The main idea consists of mapping a fixed number of iterations of the WMMSE algorithm into trainable neural network layers, whose architecture reflects the structure of the original algorithm. With respect to traditional end-to-end learning, deep unfolding naturally incorporates expert knowledge, with the benefits of immediate and well-grounded architecture selection, fewer trainable parameters, and better explainability. However, the formulation of the WMMSE algorithm, as described in Shi et al., is not amenable to be unfolded due to a matrix inversion, an eigendecomposition, and a bisection search performed at each iteration. Therefore, we present an alternative formulation that circumvents these operations by resorting to projected gradient descent. By means of simulations, we show that, in most of the settings, the unfolded WMMSE outperforms or performs equally to the WMMSE for a fixed number of iterations, with the advantage of a lower computational load.
[37] arXiv:2006.08409 [pdf]
Machine Common Sense
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI). There is a wide range of strategies that can be employed to make progress on this challenge. This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions. The basic idea is that there are several types of commonsense reasoning one is manifested at the logical level of physical actions, the other deals with the understanding of the essence of human-human interactions. Existing approaches, based on formal logic and artificial neural networks, allow for modeling only the first type of common sense. To model the second type, it is vital to understand the motives and rules of human behavior. This model is based on real-life heuristics, i.e., the rules of thumb, developed through knowledge and experience of different generations. Such knowledge base allows for development of an expert system with inference and explanatory mechanisms (commonsense reasoning algorithms and personal models). Algorithms provide tools for a situation analysis, while personal models make it possible to identify personality traits. The system so designed should perform the function of amplified intelligence for interactions, including human-machine.
[38] arXiv:2006.08383 [pdf]
Pixel Invisibility Detecting Objects Invisible in Color Images
Despite recent success of object detectors using deep neural networks, their deployment on safety-critical applications such as self-driving cars remains questionable. This is partly due to the absence of reliable estimation for detectors' failure under operational conditions such as night, fog, dusk, dawn and glare. Such unquantifiable failures could lead to safety violations. In order to solve this problem, we created an algorithm that predicts a pixel-level invisibility map for color images that does not require manual labeling - that computes the probability that a pixel/region contains objects that are invisible in color domain, during various lighting conditions such as day, night and fog. We propose a novel use of cross modal knowledge distillation from color to infra-red domain using weakly-aligned image pairs from the day and construct indicators for the pixel-level invisibility based on the distances of their intermediate-level features. Quantitative experiments show the great performance of our pixel-level invisibility mask and also the effectiveness of distilled mid-level features on object detection in infra-red imagery.
[39] arXiv:2006.08381 [pdf]
DreamCoder Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
Expert problem-solving is driven by powerful languages for thinking about problems and their solutions. Acquiring expertise means learning these languages -- systems of concepts, alongside the skills to use them. We present DreamCoder, a system that learns to solve problems by writing programs. It builds expertise by creating programming languages for expressing domain concepts, together with neural networks to guide the search for programs within these languages. A wake-sleep'' learning algorithm alternately extends the language with new symbolic abstractions and trains the neural network on imagined and replayed problems. DreamCoder solves both classic inductive programming tasks and creative tasks such as drawing pictures and building scenes. It rediscovers the basics of modern functional programming, vector algebra and classical physics, including Newton's and Coulomb's laws. Concepts are built compositionally from those learned earlier, yielding multi-layered symbolic representations that are interpretable and transferrable to new tasks, while still growing scalably and flexibly with experience.
[40] arXiv:2006.08380 [pdf]
Causal Inference with Deep Causal Graphs
Parametric causal modelling techniques rarely provide functionality for counterfactual estimation, often at the expense of modelling complexity. Since causal estimations depend on the family of functions used to model the data, simplistic models could entail imprecise characterizations of the generative mechanism, and, consequently, unreliable results. This limits their applicability to real-life datasets, with non-linear relationships and high interaction between variables. We propose Deep Causal Graphs, an abstract specification of the required functionality for a neural network to model causal distributions, and provide a model that satisfies this contract Normalizing Causal Flows. We demonstrate its expressive power in modelling complex interactions and showcase applications of the method to machine learning explainability and fairness, using true causal counterfactuals.
[41] arXiv:2006.08373 [pdf]
Self-supervised Learning for Precise Pick-and-place without Object Model
Flexible pick-and-place is a fundamental yet challenging task within robotics, in particular due to the need of an object model for a simple target pose definition. In this work, the robot instead learns to pick-and-place objects using planar manipulation according to a single, demonstrated goal state. Our primary contribution lies within combining robot learning of primitives, commonly estimated by fully-convolutional neural networks, with one-shot imitation learning. Therefore, we define the place reward as a contrastive loss between real-world measurements and a task-specific noise distribution. Furthermore, we design our system to learn in a self-supervised manner, enabling real-world experiments with up to 25000 pick-and-place actions. Then, our robot is able to place trained objects with an average placement error of 2.7 (0.2) mm and 2.6 (0.8)°. As our approach does not require an object model, the robot is able to generalize to unknown objects while keeping a precision of 5.9 (1.1) mm and 4.1 (1.2)°. We further show a range of emerging behaviors The robot naturally learns to select the correct object in the presence of multiple object types, precisely inserts objects within a peg game, picks screws out of dense clutter, and infers multiple pick-and-place actions from a single goal state.
[42] arXiv:2006.08332 [pdf]
An Augmented Translation Technique for low Resource language pair Sanskrit to Hindi translation
Neural Machine Translation (NMT) is an ongoing technique for Machine Translation (MT) using enormous artificial neural network. It has exhibited promising outcomes and has shown incredible potential in solving challenging machine translation exercises. One such exercise is the best approach to furnish great MT to language sets with a little preparing information. In this work, Zero Shot Translation (ZST) is inspected for a low resource language pair. By working on high resource language pairs for which benchmarks are available, namely Spanish to Portuguese, and training on data sets (Spanish-English and English-Portuguese) we prepare a state of proof for ZST system that gives appropriate results on the available data. Subsequently the same architecture is tested for Sanskrit to Hindi translation for which data is sparse, by training the model on English-Hindi and Sanskrit-English language pairs. In order to prepare and decipher with ZST system, we broaden the preparation and interpretation pipelines of NMT seq2seq model in tensorflow, incorporating ZST features. Dimensionality reduction of word embedding is performed to reduce the memory usage for data storage and to achieve a faster training and translation cycles. In this work existing helpful technology has been utilized in an imaginative manner to execute our NLP issue of Sanskrit to Hindi translation. A Sanskrit-Hindi parallel corpus of 300 is constructed for testing. The data required for the construction of parallel corpus has been taken from the telecasted news, published on Department of Public Information, state government of Madhya Pradesh, India website.
[43] arXiv:2006.08321 [pdf]
On the Preservation of Spatio-temporal Information in Machine Learning Applications
In conventional machine learning applications, each data attribute is assumed to be orthogonal to others. Namely, every pair of dimension is orthogonal to each other and thus there is no distinction of in-between relations of dimensions. However, this is certainly not the case in real world signals which naturally originate from a spatio-temporal configuration. As a result, the conventional vectorization process disrupts all of the spatio-temporal information about the order/place of data whether it be $1$D, $2$D, $3$D, or $4$D. In this paper, the problem of orthogonality is first investigated through conventional $k$-means of images, where images are to be processed as vectors. As a solution, shift-invariant $k$-means is proposed in a novel framework with the help of sparse representations. A generalization of shift-invariant $k$-means, convolutional dictionary learning, is then utilized as an unsupervised feature extraction method for classification. Experiments suggest that Gabor feature extraction as a simulation of shallow convolutional neural networks provides a little better performance compared to convolutional dictionary learning. Many alternatives of convolutional-logic are also discussed for spatio-temporal information preservation, including a spatio-temporal hypercomplex encoding scheme.
[44] arXiv:2006.08254 [pdf]
Dermatologist vs Neural Network
Cancer, in general, is very deadly. Timely treatment of any cancer is the key to saving a life. Skin cancer is no exception. There have been thousands of Skin Cancer cases registered per year all over the world. There have been 123,000 deadly melanoma cases detected in a single year. This huge number is proven to be a cause of a high amount of UV rays present in the sunlight due to the degradation of the Ozone layer. If not detected at an early stage, skin cancer can lead to the death of the patient. Unavailability of proper resources such as expert dermatologists, state of the art testing facilities, and quick biopsy results have led researchers to develop a technology that can solve the above problem. Deep Learning is one such method that has offered extraordinary results. The Convolutional Neural Network proposed in this study out performs every pretrained models. We trained our model on the HAM10000 dataset which offers 10015 images belonging to 7 classes of skin disease. The model we proposed gave an accuracy of 89%. This model can predict deadly melanoma skin cancer with a great accuracy. Hopefully, this study can help save people's life where there is the unavailability of proper dermatological resources by bridging the gap using our proposed study.
[45] arXiv:2006.08228 [pdf]
Finding trainable sparse networks through Neural Tangent Transfer
Deep neural networks have dramatically transformed machine learning, but their memory and energy demands are substantial. The requirements of real biological neural networks are rather modest in comparison, and one feature that might underlie this austerity is their sparse connectivity. In deep learning, trainable sparse networks that perform well on a specific task are usually constructed using label-dependent pruning criteria. In this article, we introduce Neural Tangent Transfer, a method that instead finds trainable sparse networks in a label-free manner. Specifically, we find sparse networks whose training dynamics, as characterized by the neural tangent kernel, mimic those of dense networks in function space. Finally, we evaluate our label-agnostic approach on several standard classification tasks and show that the resulting sparse networks achieve higher classification performance while converging faster.
[46] arXiv:2006.08177 [pdf]
Dissimilarity Mixture Autoencoder for Deep Clustering
In this paper, we introduce the Dissimilarity Mixture Autoencoder (DMAE), a novel neural network model that uses a dissimilarity function to generalize a family of density estimation and clustering methods. It is formulated in such a way that it internally estimates the parameters of a probability distribution through gradient-based optimization. Also, the proposed model can leverage from deep representation learning due to its straightforward incorporation into deep learning architectures, because, it consists of an encoder-decoder network that computes a probabilistic representation. Experimental evaluation was performed on image and text clustering benchmark datasets showing that the method is competitive in terms of unsupervised classification accuracy and normalized mutual information. The source code to replicate the experiments is publicly available at this https URL
[47] arXiv:2006.08162 [pdf]
Filter design for small target detection on infrared imagery using normalized-cross-correlation layer
In this paper, we introduce a machine learning approach to the problem of infrared small target detection filter design. For this purpose, similarly to a convolutional layer of a neural network, the normalized-cross-correlational (NCC) layer, which we utilize for designing a target detection/recognition filter bank, is proposed. By employing the NCC layer in a neural network structure, we introduce a framework, in which supervised training is used to calculate the optimal filter shape and the optimum number of filters required for a specific target detection/recognition task on infrared images. We also propose the mean-absolute-deviation NCC (MAD-NCC) layer, an efficient implementation of the proposed NCC layer, designed especially for FPGA systems, in which square root operations are avoided for real-time computation. As a case study we work on dim-target detection on mid-wave infrared imagery and obtain the filters that can discriminate a dim target from various types of background clutter, specific to our operational concept.
[48] arXiv:2006.08149 [pdf]
GNNGuard Defending Graph Neural Networks against Adversarial Attacks
Deep learning methods for graphs achieve remarkable performance on many tasks. However, despite the proliferation of such methods and their success, recent findings indicate that small, unnoticeable perturbations of graph structure can catastrophically reduce performance of even the strongest and most popular Graph Neural Networks (GNNs). Here, we develop GNNGuard, a general defense approach against a variety of training-time attacks that perturb the discrete graph structure. GNNGuard can be straightforwardly incorporated into any GNN. Its core principle is to detect and quantify the relationship between the graph structure and node features, if one exists, and then exploit that relationship to mitigate negative effects of the attack. GNNGuard uses network theory of homophily to learn how best assign higher weights to edges connecting similar nodes while pruning edges between unrelated nodes. The revised edges then allow the underlying GNN to robustly propagate neural messages in the graph. GNNGuard introduces two novel components, the neighbor importance estimation, and the layer-wise graph memory, and we show empirically that both components are necessary for a successful defense. Across five GNNs, three defense methods, and four datasets, including a challenging human disease graph, experiments show that GNNGuard outperforms existing defense approaches by 15.3% on average. Remarkably, GNNGuard can effectively restore the state-of-the-art performance of GNNs in the face of various adversarial attacks, including targeted and non-targeted attacks.
[49] arXiv:2006.08145 [pdf]
Classification for degraded images having various levels of degradation is very important in practical applications. This paper proposes a convolutional neural network to classify degraded images by using a restoration network and an ensemble learning. The results demonstrate that the proposed network can classify degraded images over various levels of degradation well. This paper also reveals how the image-quality of training data for a classification network affects the classification performance of degraded images.
[50] arXiv:2006.08131 [pdf]
An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks
With the widespread use of deep neural networks (DNNs) in high-stake applications, the security problem of the DNN models has received extensive attention. In this paper, we investigate a specific security problem called trojan attack, which aims to attack deployed DNN systems relying on the hidden trigger patterns inserted by malicious hackers. We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset. Specifically, we do not change parameters in the original model but insert a tiny trojan module (TrojanNet) into the target model. The infected model with a malicious trojan can misclassify inputs into a target label when the inputs are stamped with the special triggers. The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts comparing to conventional trojan attack methods. The experimental results show that TrojanNet can inject the trojan into all labels simultaneously (all-label trojan attack) and achieves 100% attack success rate without affecting model accuracy on original tasks. Experimental analysis further demonstrates that state-of-the-art trojan detection algorithms fail to detect TrojanNet attack. The code is available at this https URL.
[51] arXiv:2006.08122 [pdf]
Classification and Recognition of Encrypted EEG Data Neural Network
With the rapid development of Machine Learning technology applied in electroencephalography (EEG) signals, Brain-Computer Interface (BCI) has emerged as a novel and convenient human-computer interaction for smart home, intelligent medical and other Internet of Things (IoT) scenarios. However, security issues such as sensitive information disclosure and unauthorized operations have not received sufficient concerns. There are still some defects with the existing solutions to encrypted EEG data such as low accuracy, high time complexity or slow processing speed. For this reason, a classification and recognition method of encrypted EEG data based on neural network is proposed, which adopts Paillier encryption algorithm to encrypt EEG data and meanwhile resolves the problem of floating point operations. In addition, it improves traditional feed-forward neural network (FNN) by using the approximate function instead of activation function and realizes multi-classification of encrypted EEG data. Extensive experiments are conducted to explore the effect of several metrics (such as the hidden neuron size and the learning rate updated by improved simulated annealing algorithm) on the recognition results. Followed by security and time cost analysis, the proposed model and approach are validated and evaluated on public EEG datasets provided by PhysioNet, BCI Competition IV and EPILEPSIAE. The experimental results show that our proposal has the satisfactory accuracy, efficiency and feasibility compared with other solutions.
[52] arXiv:2006.08115 [pdf]
Minimax Dynamics of Optimally Balanced Spiking Networks of Excitatory and Inhibitory Neurons
Excitation-inhibition (E-I) balance is ubiquitously observed in the cortex. Recent studies suggest an intriguing link between balance on fast timescales, tight balance, and efficient information coding with spikes. We further this connection by taking a principled approach to optimal balanced networks of excitatory (E) and inhibitory (I) neurons. By deriving E-I spiking neural networks from greedy spike-based optimizations of constrained minimax objectives, we show that tight balance arises from correcting for deviations from the minimax optimum. We predict specific neuron firing rates in the network by solving the minimax problem, going beyond statistical theories of balanced networks. Finally, we design minimax objectives for reconstruction of an input signal, associative memory, and storage of manifold attractors, and derive from them E-I networks that perform the computation. Overall, we present a novel normative modeling approach for spiking E-I networks, going beyond the widely-used energy minimizing networks that violate Dale's law. Our networks can be used to model cortical circuits and computations.
[53] arXiv:2006.08099 [pdf]
Iterative Algorithm Induced Deep-Unfolding Neural Networks Precoding Design for Multiuser MIMO Systems
Optimization theory assisted algorithms have received great attention for precoding design in multiuser multiple-input multiple-output (MU-MIMO) systems. Although the resultant optimization algorithms are able to provide excellent performance, they generally require considerable computational complexity, which gets in the way of their practical application in real-time systems. In this work, in order to address this issue, we first propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed in matrix form to better solve the problems in communication systems. Then, we implement the proposed deepunfolding framework to solve the sum-rate maximization problem for precoding design in MU-MIMO systems. An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed. Specifically, the iterative WMMSE algorithm is unfolded into a layer-wise structure, where a number of trainable parameters are introduced to replace the highcomplexity operations in the forward propagation. To train the network, a generalized chain rule of the IAIDNN is proposed to depict the recurrence relation of gradients between two adjacent layers in the back propagation. Moreover, we discuss the computational complexity and generalization ability of the proposed scheme. Simulation results show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
[54] arXiv:2006.07990 [pdf]
Optimal Lottery Tickets via SubsetSum Logarithmic Over-Parameterization is Sufficient
The strong {\it lottery ticket hypothesis} (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al.~\cite{MalachEtAl20} establishes the first theoretical analysis for the strong LTH one can provably approximate a neural network of width $d$ and depth $l$, by pruning a random one that is a factor $O(d^4l^2)$ wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experimental research that achieves good approximation with networks that are a small factor wider than the target. In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets. We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(\log(dl))$ wider and twice as deep. Our analysis heavily relies on connecting pruning random ReLU networks to random instances of the \textsc{SubsetSum} problem. We then show that this logarithmic over-parameterization is essentially optimal for constant depth networks. Finally, we verify several of our theoretical insights with experiments.
[55] arXiv:2006.07988 [pdf]
Joint Adaptive Feature Smoothing and Topology Extraction via Generalized PageRank GNNs
In many important applications, the acquired graph-structured data includes both node features and graph topology information. Graph neural networks (GNNs) are able to accurately process both the feature signals and graph topology individually. Nevertheless, they face the problem of trading-off the benefits of a shallow network architecture and deep multi-step information propagation when attempting to optimize their learning performance using both data components. Most existing GNN implementations based on node feature propagation are shallow due to the fact that a large number of propagation steps leads to feature over-smoothing and hence diminishes their discriminative power. In contrast, when processing topological information, it is common to use label propagation and PageRank methods that require a large number of message passing steps. We address these two contradictory requirements by combining GNNs with an adaptive generalized PageRank (GPR) scheme in a model termed GPR-GNN. GPR-GNN is the first known architecture that not only provably mitigates feature over-smoothing but also adaptively learns the weights of the GPR model to optimize topological information extraction. Our theoretical analysis of the GPR-GNN method is facilitated by novel synthetic benchmark datasets generated by the contextual stochastic block model. We also compare the performance of our NN architecture with that of several state-of-the-art GNNs on the problem of node-classification, using nine well-known benchmark datasets. The results demonstrate that GPR-GNN offers significant performance improvement compared to existing techniques on both synthetic and benchmark data.
[56] arXiv:2006.07982 [pdf]
ShapeFlow Learnable Deformations Among 3D Shapes
We present ShapeFlow, a flow-based model for learning a deformation space for entire classes of 3D shapes with large intra-class variations. ShapeFlow allows learning a multi-template deformation space that is agnostic to shape topology, yet preserves fine geometric details. Different from a generative space where a latent vector is directly decoded into a shape, a deformation space decodes a vector into a continuous flow that can advect a source shape towards a target. Such a space naturally allows the disentanglement of geometric style (coming from the source) and structural pose (conforming to the target). We parametrize the deformation between geometries as a learned continuous flow field via a neural network and show that such deformations can be guaranteed to have desirable properties, such as be bijectivity, freedom from self-intersections, or volume preservation. We illustrate the effectiveness of this learned deformation space for various downstream applications, including shape generation via deformation, geometric style transfer, unsupervised learning of a consistent parameterization for entire classes of shapes, and shape interpolation.
[57] arXiv:2006.07846 [pdf]
From Graph Low-Rank Global Attention to 2-FWL Approximation
Graph Neural Networks (GNNs) are known to have an expressive power bounded by that of the vertex coloring algorithm (Xu et al., 2019a; Morris et al., 2018). However, for rich node features, such a bound does not exist and GNNs can be shown to be universal, namely, have the theoretical ability to approximate arbitrary graph functions. It is well known, however, that expressive power alone does not imply good generalization. In an effort to improve generalization of GNNs we suggest the Low-Rank Global Attention (LRGA) module, taking advantage of the efficiency of low rank matrix-vector multiplication, that improves the algorithmic alignment (Xu et al., 2019b) of GNNs with the 2-folklore Weisfeiler-Lehman (FWL) algorithm; 2-FWL is a graph isomorphism algorithm that is strictly more powerful than vertex coloring. Concretely, we (i) formulate 2-FWL using polynomial kernels; (ii) show LRGA aligns with this 2-FWL formulation; and (iii) bound the sample complexity of the kernel's feature map when learned with a randomly initialized two-layer MLP. The latter means the generalization error can be made arbitrarily small when training LRGA to learn the 2-FWL algorithm. From a practical point of view, augmenting existing GNN layers with LRGA produces state of the art results on most datasets in a GNN standard benchmark.
[58] arXiv:2006.07818 [pdf]
Alternating ConvLSTM Learning Force Propagation with Alternate State Updates
Data-driven simulation is an important step-forward in computational physics when traditional numerical methods meet their limits. Learning-based simulators have been widely studied in past years; however, most previous works view simulation as a general spatial-temporal prediction problem and take little physical guidance in designing their neural network architectures. In this paper, we introduce the alternating convolutional Long Short-Term Memory (Alt-ConvLSTM) that models the force propagation mechanisms in a deformable object with near-uniform material properties. Specifically, we propose an accumulation state, and let the network update its cell state and the accumulation state alternately. We demonstrate how this novel scheme imitates the alternate updates of the first and second-order terms in the forward Euler method of numerical PDE solvers. Benefiting from this, our network only requires a small number of parameters, independent of the number of the simulated particles, and also retains the essential features in ConvLSTM, making it naturally applicable to sequential data with spatial inputs and outputs. We validate our Alt-ConvLSTM on human soft tissue simulation with thousands of particles and consistent body pose changes. Experimental results show that Alt-ConvLSTM efficiently models the material kinetic features and greatly outperforms vanilla ConvLSTM with only the single state update.
[59] arXiv:2006.07794 [pdf]
PatchUp A Regularization Technique for Convolutional Neural Networks
Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (CNNs), that is applied on selected contiguous blocks of feature maps from a random pair of samples. Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches like Mixup and CutMix. Moreover, since we are mixing the contiguous block of features in the hidden space, which has more dimensions than the input space, we obtain more diverse samples for training towards different dimensions. Our experiments on CIFAR-10, CIFAR-100, and SVHN datasets with PreactResnet18, PreactResnet34, and WideResnet-28-10 models show that PatchUp improves upon, or equals, the performance of current state-of-the-art regularizers for CNNs. We also show that PatchUp can provide better generalization to affine transformations of samples and is more robust against adversarial attacks.
[60] arXiv:2006.07760 [pdf]
Spatial Mode Correction of Single Photons using Machine Learning
Spatial modes of light constitute valuable resources for a variety of quantum technologies ranging from quantum communication and quantum imaging to remote sensing. Nevertheless, their vulnerabilities to phase distortions, induced by random media upon propagation, impose significant limitations on the realistic implementation of numerous quantum-photonic technologies. Unfortunately, this problem is exacerbated at the single-photon level. Over the last two decades, this challenging problem has been tackled through conventional schemes that utilize optical nonlinearities, quantum correlations, and adaptive optics. In this article, we exploit the self-learning and self-evolving features of artificial neural networks to correct the complex spatial profile of distorted Laguerre-Gaussian modes at the single-photon level. Specifically, we demonstrate the possibility of correcting spatial modes distorted by thick atmospheric turbulence. Our results have important implications for real-time turbulence correction of structured photons and single-photon images.
[61] arXiv:2006.07743 [pdf]
3DFCNN Real-Time Action Recognition using 3D Deep Neural Networks with Raw Depth Information
Human actions recognition is a fundamental task in artificial vision, that has earned a great importance in recent years due to its multiple applications in different areas. %, such as the study of human behavior, security or video surveillance. In this context, this paper describes an approach for real-time human action recognition from raw depth image-sequences, provided by an RGB-D camera. The proposal is based on a 3D fully convolutional neural network, named 3DFCNN, which automatically encodes spatio-temporal patterns from depth sequences without %any costly pre-processing. Furthermore, the described 3D-CNN allows %automatic features extraction and actions classification from the spatial and temporal encoded information of depth sequences. The use of depth data ensures that action recognition is carried out protecting people's privacy% allows recognizing the actions carried out by people, protecting their privacy%\sout{of them} , since their identities can not be recognized from these data. %\st{ from depth images.} 3DFCNN has been evaluated and its results compared to those from other state-of-the-art methods within three widely used %large-scale NTU RGB+D datasets, with different characteristics (resolution, sensor type, number of views, camera location, etc.). The obtained results allows validating the proposal, concluding that it outperforms several state-of-the-art approaches based on classical computer vision techniques. Furthermore, it achieves action recognition accuracy comparable to deep learning based state-of-the-art methods with a lower computational cost, which allows its use in real-time applications.
[62] arXiv:2006.07721 [pdf]
Beyond Random Matrix Theory for Deep Networks
We investigate whether the Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities. We find that even allowing for outliers, the observed spectral shapes strongly deviate from such theoretical predictions. This raises major questions about the usefulness of these models in deep learning. We further show that theoretical results, such as the layered nature of critical points, are strongly dependent on the use of the exact form of these limiting spectral densities. We consider two new classes of matrix ensembles; random Wigner/Wishart ensemble products and percolated Wigner/Wishart ensembles, both of which better match observed spectra. They also give large discrete spectral peaks at the origin, providing a theoretical explanation for the observation that various optima can be connected by one dimensional of low loss values. We further show that, in the case of a random matrix product, the weight of the discrete spectral component at $0$ depends on the ratio of the dimensions of the weight matrices.
[63] arXiv:2006.07644 [pdf]
RoadNet-RT High Throughput CNN Architecture and SoC Design for Real-Time Road Segmentation
In recent years, convolutional neural network has gained popularity in many engineering applications especially for computer vision. In order to achieve better performance, often more complex structures and advanced operations are incorporated into the neural networks, which results very long inference time. For time-critical tasks such as autonomous driving and virtual reality, real-time processing is fundamental. In order to reach real-time process speed, a light-weight, high-throughput CNN architecture namely RoadNet-RT is proposed for road segmentation in this paper. It achieves 90.33% MaxF score on test set of KITTI road segmentation task and 8 ms per frame when running on GTX 1080 GPU. Comparing to the state-of-the-art network, RoadNet-RT speeds up the inference time by a factor of 20 at the cost of only 6.2% accuracy loss. For hardware design optimization, several techniques such as depthwise separable convolution and non-uniformed kernel size convolution are customized designed to further reduce the processing time. The proposed CNN architecture has been successfully implemented on an FPGA ZCU102 MPSoC platform that achieves the computation capability of 83.05 GOPS. The system throughput reaches 327.9 frames per second with image size 1216x176.
[64] arXiv:2006.07589 [pdf]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions, which are then used to augment the training of the model for improved robustness. While some recent works propose semi-supervised adversarial learning methods that utilize unlabeled data, they still require class labels. However, do we really need class labels at all, for adversarially robust training of deep neural networks? In this paper, we propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. Further, we present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data, which aims to maximize the similarity between a random augmentation of a data sample and its instance-wise adversarial perturbation. We validate our method, Robust Contrastive Learning (RoCL), on multiple benchmark datasets, on which it obtains comparable robust accuracy over state-of-the-art supervised adversarial learning methods, and significantly improved robustness against the black box and unseen types of attacks. Moreover, with further joint fine-tuning with supervised adversarial loss, RoCL obtains even higher robust accuracy over using self-supervised learning alone. Notably, RoCL also demonstrate impressive results in robust transfer learning.
[65] arXiv:2006.07584 [pdf]
Uncertainty Estimation with Infinitesimal Jackknife, Its Distribution and Mean-Field Approximation
Uncertainty quantification is an important research area in machine learning. Many approaches have been developed to improve the representation of uncertainty in deep models to avoid overconfident predictions. Existing ones such as Bayesian neural networks and ensemble methods require modifications to the training procedures and are computationally costly for both training and inference. Motivated by this, we propose mean-field infinitesimal jackknife (mfIJ) -- a simple, efficient, and general-purpose plug-in estimator for uncertainty estimation. The main idea is to use infinitesimal jackknife, a classical tool from statistics for uncertainty estimation to construct a pseudo-ensemble that can be described with a closed-form Gaussian distribution, without retraining. We then use this Gaussian distribution for uncertainty estimation. While the standard way is to sample models from this distribution and combine each sample's prediction, we develop a mean-field approximation to the inference where Gaussian random variables need to be integrated with the softmax nonlinear functions to generate probabilities for multinomial variables. The approach has many appealing properties it functions as an ensemble without requiring multiple models, and it enables closed-form approximate inference using only the first and second moments of Gaussians. Empirically, mfIJ performs competitively when compared to state-of-the-art methods, including deep ensembles, temperature scaling, dropout and Bayesian NNs, on important uncertainty tasks. It especially outperforms many methods on out-of-distribution detection.
[66] arXiv:2006.07579 [pdf]
Stability Analysis using Quadratic Constraints for Systems with Neural Network Controllers
A method is presented to analyze the stability of feedback systems with neural network controllers. Two stability theorems are given to prove asymptotic stability and to compute an ellipsoidal inner-approximation to the region of attraction. The first theorem addresses linear time-invariant plant dynamics, and merges Lyapunov theory with local (sector) quadratic constraints to bound the nonlinear activation functions in the neural network. The second theorem allows the plant dynamics to include perturbations such as unmodeled dynamics, slope-restricted nonlinearities, and time delay, using integral quadratic constraint (IQCs) to capture their input/output behavior. Both results rely on semidefinite programming to compute Lyapunov functions and inner-estimates of the region of attraction. The method is illustrated on systems with neural networks trained to stabilize a nonlinear inverted pendulum as well as vehicle lateral dynamics with actuator uncertainty.
[67] arXiv:2006.07556 [pdf]
Neural Architecture Search using Bayesian Optimisation with Weisfeiler-Lehman Kernel
Bayesian optimisation (BO) has been widely used for hyperparameter optimisation but its application in neural architecture search (NAS) is limited due to the non-continuous, high-dimensional and graph-like search spaces. Current approaches either rely on encoding schemes, which are not scalable to large architectures and ignore the implicit topological structure of architectures, or use graph neural networks, which require additional hyperparameter tuning and a large amount of observed data, which is particularly expensive to obtain in NAS. We propose a neat BO approach for NAS, which combines the Weisfeiler-Lehman graph kernel with a Gaussian process surrogate to capture the topological structure of architectures, without having to explicitly define a Gaussian process over high-dimensional vector spaces. We also harness the interpretable features learnt via the graph kernel to guide the generation of new architectures. We demonstrate empirically that our surrogate model is scalable to large architectures and highly data-efficient; competing methods require 3 to 20 times more observations to achieve equally good prediction performance as ours. We finally show that our method outperforms existing NAS approaches to achieve state-of-the-art results on NAS datasets.
[68] arXiv:2006.07527 [pdf]
Inductive Graph Neural Networks for Spatiotemporal Kriging
Time series forecasting and spatiotemporal kriging are the two most important tasks in spatiotemporal data analysis. Recent research on graph neural networks has made substantial progress in time series forecasting, while little attention is paid to the kriging problem---recovering signals for unsampled locations/sensors. Most existing scalable kriging methods (e.g., matrix/tensor completion) are transductive, and thus full retraining is required when we have a new sensor to interpolate. In this paper, we develop an Inductive Graph Neural Network Kriging (IGNNK) model to recover data for unsampled sensors on a network/graph structure. To generalize the effect of distance and reachability, we generate random subgraphs as samples and reconstruct the corresponding adjacency matrix for each sample. By reconstructing all signals on each sample subgraph, IGNNK can effectively learn the spatial message passing mechanism. Empirical results on several real-world spatiotemporal datasets demonstrate the effectiveness of our model. In addition, we also find that the learned model can be successfully transferred to the same type of kriging tasks on an unseen dataset. Our results show that 1) GNN is an efficient and effective tool for spatial kriging; 2) inductive GNNs can be trained using dynamic adjacency matrices; and 3) a trained model can be transferred to new graph structures.
[69] arXiv:2006.07498 [pdf]
Multi-Modal Fingerprint Presentation Attack Detection Evaluation On A New Dataset
Fingerprint presentation attack detection is becoming an increasingly challenging problem due to the continuous advancement of attack preparation techniques, which generate realistic-looking fake fingerprint presentations. In this work, rather than relying on legacy fingerprint images, which are widely used in the community, we study the usefulness of multiple recently introduced sensing modalities. Our study covers front-illumination imaging using short-wave-infrared, near-infrared, and laser illumination; and back-illumination imaging using near-infrared light. Toward studying the effectiveness of each of these unconventional sensing modalities and their fusion for liveness detection, we conducted a comprehensive analysis using a fully convolutional deep neural network framework. Our evaluation compares different combination of the new sensing modalities to legacy data from one of our collections as well as the public LivDet2015 dataset, showing the superiority of the new sensing modalities in most cases. It also covers the cases of known and unknown attacks and the cases of intra-dataset and inter-dataset evaluations. Our results indicate that the power of our approach stems from the nature of the captured data rather than the employed classification framework, which justifies the extra cost for hardware-based (or hybrid) solutions. We plan to publicly release one of our dataset collections.
[70] arXiv:2006.07495 [pdf]
Open Questions in Creating Safe Open-ended AI Tensions Between Control and Creativity
Artificial life originated and has long studied the topic of open-ended evolution, which seeks the principles underlying artificial systems that innovate continually, inspired by biological evolution. Recently, interest has grown within the broader field of AI in a generalization of open-ended evolution, here called open-ended search, wherein such questions of open-endedness are explored for advancing AI, whatever the nature of the underlying search algorithm (e.g. evolutionary or gradient-based). For example, open-ended search might design new architectures for neural networks, new reinforcement learning algorithms, or most ambitiously, aim at designing artificial general intelligence. This paper proposes that open-ended evolution and artificial life have much to contribute towards the understanding of open-ended AI, focusing here in particular on the safety of open-ended search. The idea is that AI systems are increasingly applied in the real world, often producing unintended harms in the process, which motivates the growing field of AI safety. This paper argues that open-ended AI has its own safety challenges, in particular, whether the creativity of open-ended systems can be productively and predictably controlled. This paper explains how unique safety problems manifest in open-ended search, and suggests concrete contributions and research questions to explore them. The hope is to inspire progress towards creative, useful, and safe open-ended search algorithms.
[71] arXiv:2006.07487 [pdf]
Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization
Control variates are a well-established tool to reduce the variance of Monte Carlo estimators. However, for large-scale problems including high-dimensional and large-sample settings, their advantages can be outweighed by a substantial computational cost. This paper considers control variates based on Stein operators, presenting a framework that encompasses and generalizes existing approaches that use polynomials, kernels and neural networks. A learning strategy based on minimising a variational objective through stochastic optimization is proposed, leading to scalable and effective control variates. Our results are both empirical, based on a range of test functions and problems in Bayesian inference, and theoretical, based on an analysis of the variance reduction that can be achieved.
[72] arXiv:2006.07484 [pdf]
dagger A Python Framework for Reproducible Machine Learning Experiment Orchestration
Many research directions in machine learning, particularly in deep learning, involve complex, multi-stage experiments, commonly involving state-mutating operations acting on models along multiple paths of execution. Although machine learning frameworks provide clean interfaces for defining model architectures and unbranched flows, burden is often placed on the researcher to track experimental provenance, that is, the state tree that leads to a final model configuration and result in a multi-stage experiment. Originally motivated by analysis reproducibility in the context of neural network pruning research, where multi-stage experiment pipelines are common, we present dagger, a framework to facilitate reproducible and reusable experiment orchestration. We describe the design principles of the framework and example usage.
[73] arXiv:2006.07464 [pdf]
Hypermodels for Exploration
We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its size, and as such, prior work has typically been limited to ensembles with tens of elements. We show that alternative hypermodels can enjoy dramatic efficiency gains, enabling behavior that would otherwise require hundreds or thousands of elements, and even succeed in situations where ensemble methods fail to learn regardless of size. This allows more accurate approximation of Thompson sampling as well as use of more sophisticated exploration schemes. In particular, we consider an approximate form of information-directed sampling and demonstrate performance gains relative to Thompson sampling. As alternatives to ensembles, we consider linear and neural network hypermodels, also known as hypernetworks. We prove that, with neural network base models, a linear hypermodel can represent essentially any distribution over functions, and as such, hypernetworks are no more expressive.
[74] arXiv:2006.07405 [pdf]
O(1) Communication for Distributed SGD through Two-Level Gradient Averaging
Large neural network models present a hefty communication challenge to distributed Stochastic Gradient Descent (SGD), with a communication complexity of O(n) per worker for a model of n parameters. Many sparsification and quantization techniques have been proposed to compress the gradients, some reducing the communication complexity to O(k), where k << n. In this paper, we introduce a strategy called two-level gradient averaging (A2SGD) to consolidate all gradients down to merely two local averages per worker before the computation of two global averages for an updated model. A2SGD also retains local errors to maintain the variance for fast convergence. Our theoretical analysis shows that A2SGD converges similarly like the default distributed SGD algorithm. Our evaluation validates the theoretical conclusion and demonstrates that A2SGD significantly reduces the communication traffic per worker, and improves the overall training time of LSTM-PTB by 3.2x and 23.2x, respectively, compared to Top-K and QSGD. To the best of our knowledge, A2SGD is the first to achieve O(1) communication complexity per worker for distributed SGD.
[75] arXiv:2006.07356 [pdf]
Implicit bias of gradient descent for mean squared error regression with wide neural networks
We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. Focusing on 1D regression, we show that the solution of training a width-$n$ shallow ReLU network is within $n^{- 1/2}$ of the function which fits the training data and whose difference from initialization has smallest 2-norm of the second derivative weighted by $1/\zeta$. The curvature penalty function $1/\zeta$ is expressed in terms of the probability distribution that is utilized to initialize the network parameters, and we compute it explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. The statement generalizes to the training trajectories, which in turn are captured by trajectories of spatially adaptive smoothing splines with decreasing regularization strength.
[76] arXiv:2006.07327 [pdf]
GNN3DMOT Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning
3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent work uses a standard tracking-by-detection pipeline, where feature extraction is first performed independently for each object in order to compute an affinity matrix. Then the affinity matrix is passed to the Hungarian algorithm for data association. A key process of this standard pipeline is to learn discriminative features for different objects in order to reduce confusion during data association. In this work, we propose two techniques to improve the discriminative feature learning for MOT (1) instead of obtaining features for each object independently, we propose a novel feature interaction mechanism by introducing the Graph Neural Network. As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i.e., object probably with a same ID) and deviate from objects with dissimilar features (i.e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously. As features from different modalities often have complementary information, the joint feature can be more discriminate than feature from each individual modality. To ensure that the joint feature extractor does not heavily rely on one modality, we also propose an ensemble training paradigm. Through extensive evaluation, our proposed method achieves state-of-the-art performance on KITTI and nuScenes 3D MOT benchmarks. Our code will be made available at this https URL
[77] arXiv:2006.07326 [pdf]
CPR Classifier-Projection Regularization for Continual Learning
We propose a general, yet simple patch that can be applied to existing regularization-based continual learning methods called classifier-projection regularization (CPR). Inspired by both recent results on neural networks with wide local minima and information theory, CPR adds an additional regularization term that maximizes the entropy of a classifier's output probability. We demonstrate that this additional term can be interpreted as a projection of the conditional probability given by a classifier's output to the uniform distribution. By applying the Pythagorean theorem for KL divergence, we then prove that this projection may (in theory) improve the performance of continual learning methods. In our extensive experimental results, we apply CPR to several state-of-the-art regularization-based continual learning methods and benchmark performance on popular image recognition datasets. Our results demonstrate that CPR indeed promotes a wide local minima and significantly improves both accuracy and plasticity while simultaneously mitigating the catastrophic forgetting of baseline continual learning methods.
[78] arXiv:2006.07310 [pdf]
Reservoir Computing meets Recurrent Kernels and Structured Transforms
Reservoir Computing is a class of simple yet efficient Recurrent Neural Networks where internal weights are fixed at random and only a linear output layer is trained. In the large size limit, such random neural networks have a deep connection with kernel methods. Our contributions are threefold a) We rigorously establish the recurrent kernel limit of Reservoir Computing and prove its convergence. b) We test our models on chaotic time series prediction, a classic but challenging benchmark in Reservoir Computing, and show how the Recurrent Kernel is competitive and computationally efficient when the number of data points remains moderate. c) When the number of samples is too large, we leverage the success of structured Random Features for kernel approximation by introducing Structured Reservoir Computing. The two proposed methods, Recurrent Kernel and Structured Reservoir Computing, turn out to be much faster and more memory-efficient than conventional Reservoir Computing.
[79] arXiv:2006.07292 [pdf]
A Formal Language Approach to Explaining RNNs
This paper presents LEXR, a framework for explaining the decision making of recurrent neural networks (RNNs) using a formal description language called Linear Temporal Logic (LTL). LTL is the de facto standard for the specification of temporal properties in the context of formal verification and features many desirable properties that make the generated explanations easy for humans to interpret it is a descriptive language, it has a variable-free syntax, and it can easily be translated into plain English. To generate explanations, LEXR follows the principle of counterexample-guided inductive synthesis and combines Valiant's probably approximately correct learning (PAC) with constraint solving. We prove that LEXR's explanations satisfy the PAC guarantee (provided the RNN can be described by LTL) and show empirically that these explanations are more accurate and easier-to-understand than the ones generated by recent algorithms that extract deterministic finite automata from RNNs.
[80] arXiv:2006.07253 [pdf]
Dynamic Model Pruning with Feedback
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance). We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models. Moreover, their performance surpasses that of models generated by all previously proposed pruning schemes.

You can also browse papers in other categories.