We perform static analysis of Java programs to answer a simple question: which values may occur as results of string expressions?
The answers are summarized for each expression by a regular language that is guaranteed to contain all possible values. We
present several applications of this analysis, including statically checking the syntax of dynamically generated expressions,
such as SQL queries. Our analysis constructs flow graphs from class files and generates a context-free grammar with a nonterminal
for each string expression. The language of this grammar is then widened into a regular language through a variant of an algorithm
previously used for speech recognition. The collection of resulting regular languages is compactly represented as a special
kind of multi-level automaton from which individual answers may be extracted. If a program error is detected, examples of
invalid strings are automatically produced. We present extensive benchmarks demonstrating that the analysis is efficient and
produces results of useful precision.
Supported by the Carlsberg Foundation contract number ANS-1069/20.
Basic Research in Computer Science (http://www.brics.dk), funded by the Danish National Research Foundation.