I'm still learning myself, but I think you've got your figures a little mixed up.
Assuming your mic capsules are at the top and the sound source is at the top (not the bottom), ORTF, DIN and DIN(a) all involve separating your microphones into a sorta 'toes pointed outward' manner.
<Sound Source>
\ /
ORTF has 120 degree's of angle between the mics and the capsules are 17 cm apart.
DIN has 90 degree's of angle between the mics and the capsules are 20cm apart.
DIN(a) also has 90 degree's of angle between the mics, but the capsules are 17 cm apart.
Your right-hand example, or toes pointed in configuration, is called the X-Y or coincident mic'ing technique.
<Sound Source>
/ \
In this technique, the mics are aligned so that they also may have a 90 degree or 120 degree angle, but the capsules are aligned so that one is directly above the other vertically.
The goal for ORTF, DIN, and DIN(a) is for the sound to arrive at the capsules from the L and R speakers at a slightly different time...and this is similar to how our ears work...there is a slight delay between the time that sound from a single point source gets to each ear. Our brain understands and interprets that delay. Coincidently, if you haven't figured out the logic, 17cm - 20cm is the distance between most peoples' ears.
The goal for X-Y is to make have the sound waves from both speakers arrive at both capsules at the exact same time. In this situation, there is no time delay on our final recording between the two mics (unless of course the sound is coming from above or below the mics.)
Generally, people on this list prefer the first three methods to X-Y because it's more natural sounding and usually can provide a better stereo image with nice separation if the PA is stereo mixed...or if there's no PA and you're just close to the stage and want to have the feel for...say...guitar left, drums center, and bass right. Having said this, there are plenty of X-Y lovers on this list, so as always, the best advice is that you should experiment with your own gear and come to your own conclusions.
The rule of thumb is that ORTF is most popular outdoor where there's not as much reverberant sounds, where DIN and DIN(a) are used indoors. DIN and DIN(a) have the 90 degree angle, which vs 120 would tend to minimize the amount of sound that's coming from bounced sources instead of directly from the stage. ORTF is also popular when you know that the venue has great sound...and you'd like to simulate the overall sound in the venue as closely as possible.
Worthy of mention is another technique called NOS. In NOS, the mics are also separated by 90 degress angle, but they're 30cm apart. I've never tried NOS, but my understanding is that this technique attempts to further accentuate stereo imaging, but the downside is that many recordings will end up with a 'hole' in the middle.