agree with Bri, who cares if the clapping is clipped, set your levels based on the music.
I should clarify. While I don't care about reducing dynamics by compressing an uproarious audience between songs, I do care about clipping the audience. Two reasons:
<1> While clapping itself is often transient enough that it doesn't clip in nasty and crunchy fashion, audience hoots and hollers definitely have the potential to clip in nasty and crunchy fashion. Once the audience hoots and hollers clip, there's no going back - and IME there's typically no good way to recover. Sure, some clip restoration tools help, but in cases like these I find they don't remove the nasty crunchiness and only minimize the audible effects. Even during audience sections, I find crunchy clipping unpleasant.
<2> IME, even fast transients that clip may pose problems during editing. For example, compressing a clipped audience section sometimes produces a distorted result. I haven't found this happens every time, but often enough that I try not to clip the audience at all.
Of course, I ran into those two issues after running my share of 16-bit recordings with levels set for the music, damn the audience clipping. I eventually found the results too inconsistent, given the above, so I started setting levels for the audience and adjusted in post. While I didn't capture the music at as high resolution as I might have, had I set levels for the music (and clipped the audience), I at least didn't run the risk of having nasty artifacts scattered throughout my recordings (if only during the audience sections).
Recording music in situations where the audience produces higher SPLs than the music certainly poses its challenges at 16-bit. Fortunately, it's not really an issue at 24-bit.