On point 1)-
Time synchronization issues manifest in two ways:
A) Time-offset difference- a difference in time-alignment of the same events as represented across multiple the sources. On playback, the time-offset will remain constant throughout the recording, unless there also happened to be a clock-rate difference between recorders (and unless the source or microphones are moved while recording). If recorded using a single multichannel recorder or multiple devices which share the same clock data, the time-offset at that at the end of the recording will be exactly the same as the time offset at the beginning of the recording. There is offset, but no "drift".
B) Clock-rate differencee- a difference between the nominal sampling rate at which the recordings were made, and the actual clock rates at which the two sources were sampled, when recording the same events to separate recorders that do not share the same clock-data. For example- two recorders writing 48kHz audio files will sample the analog audio signals very close to, but quite exactly 48,000 times per second. In actuality, the clock of one recorder is going to run ever so slightly slower than the other. Like a time-offset, such a clock-rate difference should remain constant throughout the recording as well (hopefully! If not it would be far more challenging to correct*). On playback, that slight difference between recording clocks manifests as a cumulative "additive" time difference when both sources are reconstructed to their nominal 48kHz rate using a single playback clock. If the two sources are time-aligned at the start, and then played back using single playback clock, the slight difference between their actual recording sample rates will cause them to slowly "drift" out of synchronization as the file plays. The longer the recording, the more out of sync the two sources "drift" by the end.
It's frequently the case that only a time-offset exists and needs correction. When sample-rate difference also exists, there is going to be a time offset as well.
For concert tapers, a time offset between sources is most commonly caused by the difference in the speed of sound through air verses an electrical signal through a wire. Mics placed out in the audience further away pick up a sound significantly later than mics on-stage. But there are other sources of time-offset too. Even when recording simultaneously to several recorders which are "clock-linked" so as to operate using the same sample clock, the different recordings may not start and stop at exactly the same time. A recordist may need to push record and stop separately on each machine. The resulting sources will have been sampled at exactly the same clock-rate, but they will need to be time-aligned later. Time-offset is corrected for by offsetting one source in time with respect to the other. That can be achieved by starting playback of each at exactly the right time, with a delay line, or most commonly and most easily by shifting one source along the playback time-line with respect to the other in the audio editing program. By ear, it's is most easily done by listening to segments containing sparse speech (stage announcements and banter) or sparse sharp transients sounds like a single clap, snare or drum hit, or the like. Zooming in on the waveforms, one may be able to see and align the peak of the transient. Either way, it's easiest when the event being used stands out clearly from the background sounds, and difficult to impossible during dense passages. Although there are exceptions, ordinarily the goal is to shift one source in time with respect to the other until a transient representing the same event occurs simultaneously, sounding clear and concise with no echo or blur.
A sample-rate difference occurs when recording to two or more separate, non-clock-linked recorders. In that case, each recorder digitally samples the signals it is converting based on it's own local clock. The clock-rate difference between clocks may be pretty close or not be particularly close, but in either case, if the recording runs long enough there will be some measurable difference between them, in addition to the time-offset difference between the starting and stopping points of the recordings. This "drift" is sometimes so minimal it doesn't require correction. Other times it causes a significantly audible effect which grows more severe until it becomes a blur and eventually an obviously discernible echo. There are multiple ways of correcting for this. One older, less than optimal method was to divide both sources into single song files, then individually align each song at it's beginning. The hope being that the time-offset "drift" is not significant enough to become audible before each song ends and the next begins. The more appropriate and modern way of addressing the problem requires modifying the time-basis of one of the sources with respect to the other. In effect, the length of one source is either "stretched" or "shrunk" to match that of the other. In some software that's done via entering a new time-length for the file, after doing some work with a calculator to figure out the correct value. In others it's via entering a percentage or +/- value into a stretch/shrink function, by dragging the duration envelope of one source, or by manipulating a parameter value for the the source object.