I record a lot of stuff like this. My preference is making natural sounding recordings, which "teleport the listener to that time and place" and are as enjoyable for me (and by extension, others) to listen to as possible. Although a listener is welcome to listen to just a single song or handful of them in isolation if they prefer, my aim is to make the entire concert one seamless and enjoyable listening experience. My goal is not recording a bunch of individual songs with audience noise between them, I'm recording the entire event and feel strongly the entire thing should be as enjoyable to listen to as I can make it.
If that's your goal as well, set your levels so that the loudest sounds recorded will not clip. It doesn't matter if the loudest sounds are produced by the musicians or the audience. For events with this kind of SPL, I determine an appropriate recording level by gearing up and clapping loudly myself beforehand at home, setting levels so that only my loudest clapping clips. I then don't have to worry about monitoring levels at all during the recording. Although I could engage a limiter on the recorder as a secondary safety, I don't do so because I don't need it, and because limiter circuits have a history of effecting the sound even when they are not active. That may or may not be the case with digital limiters, but I've set levels conservatively enough to prevent all but a very occasional, overly loud applause or ear-piercing whistle by a person ass in very near proximity from clipping slightly.
I can think of two reasons someone might want to set levels higher than that, which would purposefully allow the applause to clip but not the music:
1) If the self-noise floor of the recording rig is higher than the noise-floor of the venue, and boosting levels later raises that noise-floor to a level where it becomes more subjectively objectionable than the hard clipping of the applause peaks.
2) If the recordist doesn't care to go through the effort of optimizing the playback level and dynamics of the recording later, preferring instead to try to get the average level of the music high enough by the initial recording level setting alone, even though there is not enough headroom to keep the applause from clipping (and sounding less natural, more bothersome and more fatiguing, regardless of dynamics manipulation which might be done to it later).
Those are both logical reasons for letting the applause clip. But neither apply to me. I optimize the recording level for recording. I optimize the overall level of the file, and dynamics within that range, after the recording has been made when I have far more control over those things.