We investigated how naturalistic actions in a highly immersive, multimodal, interactive 3D virtual reality (VR) environment may enhance word encoding by recording EEG in a pre/post-test learning paradigm. While behavior data has shown that coupling word encoding with gestures congruent with word meaning enhances learning, the neural underpinnings of this effect have yet to be elucidated. We coupled EEG recording with VR to examine whether “embodied learning” improves learning and creates linguistic representations that produce greater motor resonance. Participants learned action verbs in an L2 in two different conditions: Specific action (observing and performing congruent actions on virtual objects) and Pointing (observing actions and pointing to virtual objects). Pre and post-training participants performed a Match-mismatch task as we measured EEG (variation in the N400 response as a function of match between observed actions and auditory verbs) and a Passive listening task while we measured motor activation (mu (8-13 Hz) and beta band (13-30Hz) desynchronization during auditory verb processing) during verb processing. Contrary to our expectations, post-training results revealed neither semantic nor motor effects in either group when considered independently of learning success. Behavioral results showed both groups learned the verbs, but also a great deal of variability in learning success. When considering performance, Low performance learners showed no semantic effect and High performance learners exhibited an N400 effect for Mismatch vs Match trails post-training, independent of the type of learning. Taken as a whole, our results suggest that embodied processes can play an important role in L2 learning.