Only the brown ones are different.
What makes the second test much harder is the fact that in the first test we're testing perception, and in the second one retention. To succeed in latter, you have to have colour memory, not just colour perception.
And this is the problem with ABX testing in audio. There is no reasonable way to present 2 or more audio streams simultaneously. Thus, all ABX protocols rely on sequential presentation of recordings. They test audio retention instead of perception, and aural retention in humans is notoriously weak. 1/10th of a second pause, and we cannot tell similar sounds apart at all.