I think that we see the eyes and not the other things because we are so focused on eyes.
It needs to be trained on something other than dogs. It's all dogsdogsdogs.Maybe dogs were only a small part of its training but the way we have made dogs into all sizes and shapes has made the dog-pattern too broad, combine that with our ability to recognize human faces and the role of dogs in our history and society (dogs are people too!)
The most you could say is that it hasn't seen enough human faces.
It's dogs and eyes. So many eyes.
This is interesting to me because human visual processing also devotes a disproportionate amount of real estate to eyes.
As the people who wrote the thing in the first place noted in their paper, in addition to dog faces and eyes, the google imagenet model seems to have an obsession with tropical birds, pagodas, waterfalls, and gothic cathedral latticework. In fact, the architectural features are a little bit more prevalent; these images are all the result of a starting image that's just randomly colored pixels:
This started out as a picture of the sky:
But, people tend to take pictures of people and animals. And, correctly recognizing organic shapes associated with animals, deepdream proceeds to overfit the definition and make animal faces look more like its conception of an average animal face (which appears to be equal parts cat, dog, and human) and make animal orifices look like its conception of the average animal orifice (an eye).