I strip the audio from the video file and use that as the 'base'. NEVER CHANGE THE BASE because that's of course synched perfectly with the video. I then get the new audio that I want to master and 'parallel' that up to the base. there are two ways you need to match the two files; a) synching for differences in audio drift, and b) synching for drop outs that may have occurred in the video.
You already know how to correct for differences in audio drift to address a) above.
Regarding b) above, hopefully there are no drop outs on the video because then it's just a matter of linearly shrinking or stretching the new audio to match the base audio. However, if there are drop outs or other issues with the base audio, then you have to break it down into smaller and smaller chunks to get the two audio tracks to exactly parallel each other.
I've had some cases where I had to build the new audio minute-by-minute...a real PITA which can take many hours. Usually, that's not the case though. It's usually a case where there might be a slight video drop out every ten or twenty minutes. In that case, I have to find where the drop out occurs and figure out, based on the differences in the two sources, what the length of time the drop out was by comparing the files and fixing the new file to exactly match the base. OK, how to do this...
At the beginning of the recording, you make sure that the audio's start in exactly the same point. So, you need to either add or subtract from the beginning of the new audio so it starts at the EXACT same nanosecond as the base. If I need to add time, I just add silence to the start of my new audio track. So, then the two audio's are exactly parallel with each other at time = 0 sec.
Next, you adjust the new audio for time drift...you said you know how to do this. Be careful here though because if there are drop outs on the base audio, your ratio will be off. Make sure the new drift ratio is based on an apples-to-apples piece of the recording...IOW a length of audio where neither of the audio's have any drop-outs.
Next you have to seek out and correct the new audio to insert the same drop outs that may exist in the base audio. Do this by comparing the time signatures of the two audios. Obviously if that drum hit at 20.005 happens at 20.005 of both recordings (after drift adjustment), then there's no drop outs and you're OK through that point in the recording.
However, if that same drum hit happens at 20.205 of the new recording, then you've got at least one drop out (maybe more) with a total duration of 0.200 seconds long. This drop out happens at a point in the audio prior to 20.005. Your challenge is to find it and then strip that 0.200 portion out of the new audio to make the new audio parallel up to the base audio. Do this for the entire length of the new audio and you're finished building your new audio file.
Finally, mux the corrected master audio file to the video file and the two should match up perfectly, since your new master is exactly paralleled up to the audio that was stripped away from the video.
Hope this helps.