Although grid computing has adopted Web services technology to deal with platforms heterogeneity and to enhance service and
application interoperability, it is still a challenge to build web service applications with high reliability and availability
to meet the requirements of grid communities. The paper discusses the design of Platform EGO WSG with high reliability. To
support a huge user base and reduce the response time, WSGs work in cluster model and the loads are dynamic balanced among
them. Besides, a lightweight notification mechanism is implemented to provide better interoperability between WSG and WSCs.
Moreover, we designed a session-based a-synchronized recovery algorithm to achieve WSG fault tolerance, which has short freezing
time and is able to isolate the recovery process for each WSC. This approach can rebuild the service sessions and the notification
mechanism after restart, to handle Notification failure, and WSG failure report, etc.
Keywords Grid - web service gateway - fault tolerance - session - load balance