OK, here's my deal. I had a video of a band (not the White Stripes) at a venue (not the Fillmore) that I also had a seperate (not stealth) recording of. I wnated to strip the audio from the video file and replace it. What i ended up having to do was use the original camera audio as a time template, if you will.
For you, choose one of your sources to be the "master track". Overlay the other track on top and see where you are at. For me, the best solution was to split the "secondary" track into individual songs (or managable chunks). I then compressed each song a little bit (LITTLE) and then match it up (drum strikes work well, starts of verses, something like that.) Then, do the same for the next song. make sure that any overlap or gap occurs during crowd noise where it is easy to mask. (e.g., take a 1/4 second of noise before the gap, reverse it, paste it over the gap. the levels match on one side perfectly.)
This is the only way i could do it, as the result had to be the exact same duration as the original, or it would not have worked. Result was good and the only synch problems with the video were there to begin with.
YMMV,
-UJ