Using TrackIR to improve 3D headphone audio

thomase · Post by **thomase** » Thu Jun 16, 2005 1:56 am

A little background....
Many FPS games support 3D audio APIs such as OpenAL or DirectSound3D. With a supported soundcard and a pair of headphones, the game need only specify the location and orientation of the source AND the location and orientation of the listener. The soundcard applies filtering algorithms to the sound to create the ILLUSION of a 3D sound source (ie. footsteps behind your back, a gunshot to the front right and above or back left and below, etc.)

This works because the special HRTF filters try to "change" the sound as the virtual source moves around your head in the same way that your head, body, and folds of your ears would "change" the sound in real life.

However, one major shortcoming of this approach is that the brain often depends on slight movements of the head to resolve ambiguities in the direction of a sound. For example, when we hear a faint sound that could be in front or behind, we slightly (and perhaps unconsciously) move our heads slightly in order to figure it out. If we move our head left and the sound gets louder in the left ear, it must be behind (louder in the right ear means in front). We can SIMULATE this in an FPS game by slightly adjusting our view with the mouse, but this is not the same as a subconcious movement of the head.

I'm wondering if the TrackIR product might be able to address this problem. In theory, all you would need is some device to track the exact angle of your head in reference to a straight line extending from your head to the monitor. There would only be minimal software support necessary in the game. Basically, under normal circumstances, the game interprets the listener orientation to correspond directly to the center of the player's first-person-view. This is the orientation sent to the 3D audio API and the soundcard. With head tracking, all that is necessary is that the game ADD an X and Y axis angle offset to this orientation to compensate for the position of the listener's head. It is not necessary to translate head movement to view movement as this would probably be more complicated (it would also move your crosshairs).

Any thoughts?