Distance between two distributions

Today's diary entry is more like a chain of thoughts to assist my research in Artificial Neural Network:

After the lab meeting today it has become clearer what my task is. Basically, I need to figure out a very general method to define distance (difference?) between two distributions. It reads easy but when I get this idea to apply to neural networks, it does not come up immediately how I'm going to do this. I'm not even sure how I should define PDF in a layer of the model--should it be the frequencies (density) of nodes in a layer?

I can think of the nodes of a layer in the network as a vector, say Y, and try to find the pdf of y's inside Y. (Y=[y1, y2,..yn]) I guess the question to answer is, simply put, how can we define PDF(Y2) - PDF(Y1)? (distance between distributions of two different Y's) If this could be resolved, this could be applied for unsupervised learning in neural network, by defining error in this sense.