Its REALLY hard to get conversation out of a recording as it’s not sharp and sudden and it’s in the same frequency range as the music
When it’s hard on one channel the first thing I do is subtly swap channels via long fades. The image will collapse into mono and be noticeable on headphones but sometimes not noticeable at all when played back on speakers
Perhaps some combination of the above, where at first you try to reduce it on the one channel as much as possible, and then swap the channels as little as possible to try to maintain a little bit of image, and use the slight channel swap to partially cover up the manual work you did to remove the tower, which is not without consequences to the rest of the recording