Emerging Web standards promise a network of heterogeneous yet interoperable Web Services. Web Services would greatly simplify
the development of many kinds of data integration and knowledge management applications. Unfortunately, this vision requires
that services describe themselves with large amounts of semantic metadata “glue”. We explore a variety of machine learning
techniques to semi-automatically create such metadata.
We make three contributions. First, we describe a Bayesian learning and inference algorithm for classifying HTML forms into
semantic categories, as well as assigning semantic labels to the form’s fields. These techniques are important as legacy HTML
interfaces are migrated to Web Services. Second, we describe the application of the Naive Bayes and SVM algorithms to the
task of Web Service classification. We show that an ensemble approach that treats Web Services as structured objects is more
accurate than an unstructured approach. Finally, we describe a clustering algorithm that automatically discovers the semantic
categories of Web Services. All of our algorithms are evaluated using large collections of real HTML forms and Web Services.