This is somewhat of a misleading comparison.
Take an identical recording made with the X/Y mic attachment and convert that file from it's native Left/Right format to Mid/Side. The resulting Mid signal will sound far more clear and direct than the Side signal which will sound distant and reverberant.. just like a native M/S recording.
Nothing stops one from adjusting the Mid/Side balance of a L/R recording just one does for a M/S recording. I encourage doing just that for achieving the best possible image-width and reverberant balance from an X/Y recording. Consider it as a necessary step with a native M/S recording and an optional step with with an X/Y recording.
Remember that as coincident configurations, X/Y and M/S are mathematically identical and interchangeable:
Mid = sum of Left and Right channels
Side = difference of Left and Right channels
Left = sum of Mid and Side channels
Right = difference of Mid and Side channels
For a optimized comparison based solely on the quality of the resulting audio, make two identical recordings, with the only difference between them being the mic attachment used. Then adjust the Mid/Side ratio of both recordings to achieve the most optimal balance in each case before making the judgement call. With good behaving capsules in good implementations, the main difference using identical M/S ratios may be that the X/Y config is probably crossed-cardioids with a 90 degree angle between them, and the Mid/Side attachment probably decodes to crossed supercardioids with the same angle. Preference for one over the other based on convenience and lack of need to do any ratio adjustment is perfectly valid of course, but is "better" in terms of ease of use, rather than better in terms of "audio quality".
And because of that, the most meaningful differences may be based more upon the mic capsules used in each of these attachments, and their off-axis behavior, rather than the differences in coincident configurations. For all I know, all the capsules in both attachments may be identical- I know nothing about these attachments, but I suspect the Side channel of the M/S attachment may be using back to back cardioid capsules with one connected in reverse polarity, rather than a true figure-8 capsule. If I were Zoom that's how I'd consider designing it, so as to use identical inexpensive inventory items, reducing cost. How well they differentially sum to a virtual figure-8 is important and will effect the resulting audio quality. Likewise, the off-axis behavior of the capsules is important because with M/S the Mid is pointed directly at the primary source, and with X/Y both channels are slightly off-axis to the source. Which is better in terms of resulting audio quality mostly likely depends more upon the specifics of these implementations, rather than the configurations themselves.