I've just tried my luck with Audalign (
https://github.com/benfmiller/audalign/). There's good news and bad news. The good news is that it works perfectly. The bad news: it is not trivial to install.
The installation instructions on the project page should be taken with a grain of salt. If you install the whole thing with a current Python version (e.g. under a current Anaconda on Windows or Python 3.12 or 3.13 on Linux), you will encounter errors because some versions of the libraries used are no longer compatible with each other.
The last update of Audalign released a year ago is specific to Python 3.11.
The solution is to create a virtual environment with Python 3.11 and install the modules. I did this on Linux, it should work similarly on Windows:
pyenv install 3.11
/home/markus/.pyenv/versions/3.11.10/bin/python3.11 -m pip install --upgrade pip
/home/markus/.pyenv/versions/3.11.10/bin/pip3 install audalign
/home/markus/.pyenv/versions/3.11.10/bin/pip3 install audalign[visrecognize]
Do not install audalign[noisereduce]. It will pull some *big* Nvidia specific modules that would probably require a Nvidia graphics card and drivers.
This is a small sample Python script to align two files that are located in the folder "bak" and places the aligned files in a directory called "destination".
import audalign as ad
# Create a recognizer (in this case, using the FingerprintRecognizer)
fingerprint_rec = ad.FingerprintRecognizer()
# Optionally, configure the recognizer
fingerprint_rec.config.set_accuracy(3)
# Align files in a folder
results = ad.align("target/folder/", recognizer=fingerprint_rec)
# Or align specific files
results = ad.align_files(
"bak/20160117_CHVE_44.1Khz-16bit.wav",
"bak/AUD.wav",
destination_path="destination/",
recognizer=fingerprint_rec
)
# For fine-tuning the alignment
fine_results = ad.fine_align(
results,
recognizer=ad.CorrelationSpectrogramRecognizer()
)
Call the script with Python 3.11 from your virtualenv environment
/home/markus/.pyenv/versions/3.11.10/bin/python ./align_files.py
The sample output would look similar to the following.
Directory contains 0 files or could not be found
No matches detected
0 out of 0 found and aligned
Fingerprinting 20160117_CHVE_44.1Khz-16bit.wav
Fingerprinting AUD.wav
Finished fingerprinting 20160117_CHVE_44.1Khz-16bit.wav
Finished fingerprinting AUD.wav
20160117_CHVE_44.1Khz-16bit.wav: Finding Matches... Aligning matches
AUD.wav: Finding Matches... Aligning matches
Writing destination/20160117_CHVE_44.1Khz-16bit.wav
Writing destination/AUD.wav
Writing destination/total.wav
2 out of 2 found and aligned
Total fingerprints: 10395127
Fine Aligning...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3480.75it/s]
Comparing AUD.wav against 20160117_CHVE_44.1Khz-16bit.wav...
Comparing 20160117_CHVE_44.1Khz-16bit.wav against AUD.wav...
Calculating correlation... Finding Local Maximums... done
Calculating correlation... Finding Local Maximums... done
2 out of 2 found and aligned
I've loaded the original files in Reaper. You can see that the original files (track 1 and 2) are unaligned.
The two tracks taken from the "destination" directory are aligned though. One of the files (track 2) has been padded with silence at the beginning.
Again, this is for "track align" purposes. For phase alignment, you might want to use one of the phase alignment plugins that have been discussed earlier in this thread.
Oh, and unless I've done anything wrong, the output in destination appears to be in mono. (Which would be backed by the animated gif / demo on the project website. I'm going to contact the author about it anyways.) You might want to use mono stems as input. This could also be scripted:
ffmpeg -i input.wav -filter_complex "[0:a]channelsplit=channel_layout=stereo[left][right]" -map "[left]" left.wav -map "[right]" right.wav
To join two mono channels back to stereo:
ffmpeg -i left.wav -i right.wav -filter_complex "[0:a][1:a]amerge=inputs=2[a]" -map "[a]" output_stereo.wav
Alternatively, there is an internal workaround with Audalign:
https://github.com/benfmiller/audalign/issues/67I'll try to come up with a Python script tailored to "our" workflow when I have time.