In the construction of a probabilistic network from observed data, the fundamental assumption that the variables starting from the same parent are conditionally independent can be met by introduction of hidden node. The conditional probability matrices for the hidden node, linking observed nodes, can be determined by the gradient descent method. In our previous work in the field of medical diagnosis based on computer vision, we demonstrated that this technique does yield improved results, however the success depends on the choice of initial values for the conditional probabilities. In this paper we introduce methods that provide a good initial estimation of the hidden variables and improve our learning algorithm. With these improvements, we have found that the first solution obtained by the training algorithm is a near optimal solution, and we have eliminated the need for multiple restarts.