Various forms of noise are present in the brain. The role of noise in a exploration/exploitation trade-off is cast into the framework of reinforcement learning for a complex task of motor learning. A neuro-controler using a linear transformation of the input to which is added a gaussian noise is modelized as a stochastic controler that can be learned online in ''direct policy-gradient'' scheme. The reward signal is related to sensor information, thus no direct or indirect model of the system to control is needed. The task chosen (reaching with a multi-joint arm) is redundant and non-linear. The controler inputs are then projected to a feature space of higher dimension using a topographic coding based on gaussian kernels. We show that through a consistent noise level it possible to explore the environnment so as to find good control solution that can be exploited. Besides, the controler is able to adapt continuously to changes in the system dynamics. The general framework of this work will allow to study various noises and their effect, especially since it is quite compatible with more complexe types of stochastic neuro-controler, as demonstrated by other works on binary or spiking networks.