In a virtual reality public speaking training system, it is essential to control the audience's nonverbal behavior in order to simulate different attitudes. The virtual audience's social attitude is generally represented by a two-dimensional valence-arousal model describing the opinion and engagement of virtual characters. In this article, we argue that the valence-arousal representation is not sufficient to describe the user's perception of a virtual character's social attitude. We propose a three-dimensional model by dividing the valence axis into two dimensions representing the epistemic and affective stance of the virtual character, reflecting the character's agreement and emotional reaction. To assess the perception of the virtual characters' nonverbal behavior on these two new dimensions, we conducted a perceptive study in virtual reality with 44 participants who evaluated 50 animations combining multimodal nonverbal behavioral signals such as head movements, facial expressions, gaze direction and body posture. The results of our experiment show that, in fact, the valence axis should be divided into two axes to take into account the perception of the virtual character's epistemic and affective stance. Furthermore, the results show that one behavioral signal is predominant for the evaluation of each dimension: head movements for the epistemic dimension and facial expressions for the affective dimension. These results provide useful guidelines for designing the nonverbal behavior of a virtual audience for social attitudes' simulation.