I do all this manually myself....never used any software assistance but you should NOT be time stretching or shrinking the actual video as that will be very noticable....only the seperate audio.
I also do it manually ... mostly since that's the way I've always done for the past 20+ years. But there's another reason for doing it manually, that I don't see people mention much: the audio and video in the original video isn't always in sync. In fact, if you're any distance from the stage, it definitely isn't going to be in sync. So after you sync the two audios, you also have to sync the finished audio to the video. If it's off by just 1/10th of a second, you will hear the drum beat when the drummer's sticks are above his head. HINT: sync by watching the drummer's stick hitting the drum - do not try to sync just by watching people's lips (while singing) or hands (while strumming a guitar). To get the exact moment the drummer hits the drum, pause right before the drum hit, and use the F key to move forward one frame at a time (works in most video editing programs).
Having said all that ... now that everything is digital, syncing is a lot easier. With analog sources you have to deal with drift (both audio and video) that can fluctuate throughout the recording, etc.
Excellent point. For handheld with built in mic, the further the camera is from sound source, the longer the delay between picture and sound, but our brains make sense of it. If there is nearby crowd noise, I would be very hesitant to try to sync sound directly to picture as one might do with a broadcast quality source.
When picture leads sound, our brains interpret that as distance.
When sound leads picture it just looks WRONG.
What I do is process the video with whatever captions and end fades I like, then import the audio from that into my workstation software to make a cut of the better audio to match. Then just replace the phone sound with the Neumanns or whatever mix I've been able to come up with.
Like this, iPhone sound replaced with KM140's dead center, slightly FoB, camera and mics were very near each other.
https://archive.org/details/KingGizzard2024-11-04-video