Christoffersenrossen8400

Deep neural network-based systems are now state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness. Small perturbations to sensor inputs (from noise or adversarial examples) are often enough to change network-based decisions, which was recently shown to cause an autonomous vehicle to swerve into another lane. In light of these dangers, numerous algorithms have been developed as defensive mechanisms from these adversarial inputs, some of which provide formal robustness guarantees or certificates. This work leverages research on certified adversarial robustness to develop an online certifiably robust for deep reinforcement learning algorithms. The proposed defense computes guaranteed lower bounds on state-action values during execution to identify and choose a robust action under a worst case deviation in input space due to possible adversaries or noise. Moreover, the resulting policy comes with a certificate of solution quality, even though the true state and optimal action are unknown to the certifier due to the perturbations. The approach is demonstrated on a deep Q-network (DQN) policy and is shown to increase robustness to noise and adversaries in pedestrian collision avoidance scenarios, a classic control task, and Atari Pong. This article extends our prior work with new performance guarantees, extensions to other reinforcement learning algorithms, expanded results aggregated across more scenarios, an extension into scenarios with adversarial behavior, comparisons with a more computationally expensive method, and visualizations that provide intuition about the robustness algorithm.This article is concerned with the H∞ state estimation problem for a class of bidirectional associative memory (BAM) neural networks with binary mode switching, where the distributed delays are included in the leakage terms. A couple of stochastic variables taking values of 1 or 0 are introduced to characterize the switching behavior between the redundant models of the BAM neural network, and a general type of neuron activation function (i.e., the sector-bounded nonlinearity) is considered. In order to prevent the data transmissions from collisions, a periodic scheduling protocol (i.e., round-robin protocol) is adopted to orchestrate the transmission order of sensors. The purpose of this work is to develop a full-order estimator such that the error dynamics of the state estimation is exponentially mean-square stable and the H∞ performance requirement of the output estimation error is also achieved. Sufficient conditions are established to ensure the existence of the required estimator by constructing a mode-dependent Lyapunov-Krasovskii functional. Then, the desired estimator parameters are obtained by solving a set of matrix inequalities. Finally, a numerical example is provided to show the effectiveness of the proposed estimator design method.We study the distribution of successor states in Boolean networks (BNs). The state vector y is called a successor of x if y = F(x) holds, where x,y ∊ 0,1n are state vectors and F is an ordered set of Boolean functions describing the state transitions. This problem is motivated by analyzing how information propagates via hidden layers in Boolean threshold networks (discrete model of neural networks) and is kept or lost during time evolution in BNs. In this article, we measure the distribution via entropy and study how entropy changes via the transition from x to y, assuming that x is given uniformly at random. Selleckchem SU6656 We focus on BNs consisting of exclusive OR (XOR) functions, canalyzing functions, and threshold functions. As a main result, we show that there exists a BN consisting of d-ary XOR functions, which preserves the entropy if d is odd and n > d, whereas there does not exist such a BN if d is even. We also show that there exists a specific BN consisting of d-ary threshold functions, which preserves the entropy if n mod d = 0. Furthermore, we theoretically analyze the upper and lower bounds of the entropy for BNs consisting of canalyzing functions and perform computational experiments using BN models of real biological networks.The field-programmable gate array (FPGA)-based CNN hardware accelerator adopting single-computing-engine (CE) architecture or multi-CE architecture has attracted great attention in recent years. The actual throughput of the accelerator is also getting higher and higher but is still far below the theoretical throughput due to the inefficient computing resource mapping mechanism and data supply problem, and so on. To solve these problems, a novel composite hardware CNN accelerator architecture is proposed in this article. To perform the convolution layer (CL) efficiently, a novel multiCE architecture based on a row-level pipelined streaming strategy is proposed. For each CE, an optimized mapping mechanism is proposed to improve its computing resource utilization ratio and an efficient data system with continuous data supply is designed to avoid the idle state of the CE. Besides, to relieve the off-chip bandwidth stress, a weight data allocation strategy is proposed. To perform the fully connected layer (FCL), a single-CE architecture based on a batch-based computing method is proposed. Based on these design methods and strategies, visual geometry group network-16 (VGG-16) and ResNet-101 are both implemented on the XC7VX980T FPGA platform. The VGG-16 accelerator consumed 3395 multipliers and got the throughput of 1 TOPS at 150 MHz, that is, about 98.15% of the theoretical throughput (2 x 3395 x150 MOPS). Similarly, the ResNet-101 accelerator achieved 600 GOPS at 100 MHz, about 96.12% of the theoretical throughput (2 x3121 x 100 MOPS).In this article, a novel reinforcement learning (RL) method is developed to solve the optimal tracking control problem of unknown nonlinear multiagent systems (MASs). Different from the representative RL-based optimal control algorithms, an internal reinforce Q-learning (IrQ-L) method is proposed, in which an internal reinforce reward (IRR) function is introduced for each agent to improve its capability of receiving more long-term information from the local environment. In the IrQL designs, a Q-function is defined on the basis of IRR function and an iterative IrQL algorithm is developed to learn optimally distributed control scheme, followed by the rigorous convergence and stability analysis. Furthermore, a distributed online learning framework, namely, reinforce-critic-actor neural networks, is established in the implementation of the proposed approach, which is aimed at estimating the IRR function, the Q-function, and the optimal control scheme, respectively. The implemented procedure is designed in a data-driven way without needing knowledge of the system dynamics.

Autoři článku: Christoffersenrossen8400 (Bojesen Roberts)

Práce s článkem

Osobní nástroje

Navigace

Nástroje

Christoffersenrossen8400