The problem is that the events that triggers the "set sample position" action as well as the "play sample" action are not triggered in the audio thread, which means there can be a small delay between when each actions triggers. If MMF2 was designed, sound related actions wouldn't trigger by an action, but remembered until the end of the current frame, so that they could all be triggered by the audio thread itself with sample accuracy.
At the other hand, even without sample accuracy, perhaps it's close enough to just set the sample position to 0 once all the samples are cached. I'll make some tests.