I can provide some feedback as one of the uses of our systems is for the exact same application, i.e., providing indoor localization for small unmanned vehicles (both ground and aerial).
Unlike skeletons, where even for small volumes large number of cameras can provide improvements, for rigid bodies, the capture volume is generally constrained by the resolution of your cameras and the size of the markers. At some distance, the large markers and even active markers end up with too small of a footprint on the camera to be track-able. The basic recommendation is to create a square of about 20x20 feet, which gives you an effective capture footprint of about 10x10 feet. For us, that is nowhere near enough to providing an effective sandbox within which to reliable simulate indoors gps. We have tried to space the cameras out and you can gain some useful space but much beyond 25 feet distance, tracking becomes erratic (but keep in mind we only have 12 cameras, and they are not the latest high-res ones).
Another issue to consider for aerial vehicles is that the space is not a rectangular box, it is more like a squashed sphere so as you get higher, the footprint narrows.
Now, I believe it is possible to use multiple systems and create overlapping spaces, calibrate them relative to each other and create a wider space but we have not tried that. But you will not escape the fact that as you get higher, the effective footprint of your capture space narrows. And my understanding of the current system is that 24 is the maximum number of cameras, so unless you go with multiple systems, you would be limited.
Regarding number of targets, we have tracked up to 7 ground vehicles with no problem, and I believe the system can support a much larger count. Trick is to create unique marker configurations and spread the markers as far apart on your vehicle. Establishing a 'clean' origin is a bit of a pain but you can eyeball it make it work fine. And the accuracy is superb, far exceeds any other localization method (gps, cricket, sonars, etc.).
Regarding frequency, the system broadcasts packets with all the data at pretty high frequency, I believe 100 Hz. We are currently doing some detailed investigation on the latency (we are using a ground marker on which the bot knows its position and we are comparing the delay between reaching that position and when the message with that location actually arrives on the bot) but work is in progress. But I do not think the lag would be more than a couple of frames plus udp overhead, plenty fast for control loops on ground vehicles and possibly ok for flight vehicle control. In any serious application, you would have to implement a kalman filter anyway, so you may be able to get away with a bit of delay.
My only final recommendation is to purchase a system and try it out. A minimal system is so cheap (relative to others) that it is probably cheaper to purchase a system and try it than spend too much time pondering. If possible, go for the higher resolution cameras you can afford. And good luck!