Two main results in the area of information hiding in natural language text are presented. A semantically-based scheme dramatically
improves the information-hiding capacity of any text through two tech- niques: (i) modifying the granularity of meaning of
individual sentences, whereas our own previous scheme kept the granularity fixed, and (ii) halving the number of sentences
affected by the watermark. No longer a “long text”, short watermark. approach, it now makes it possible to wa- termark short
texts, like wire agency reports. Using both the above- mentioned semantic marking scheme and our previous syntactically- based
method hides information in a way that reveals any non-trivial tampering with the text (while re-formatting is not considered
to be tampering.the problem would be solved trivially otherwise by hiding a hash of the text) with a probability 1-2
-β(n+1)
, n being its number of sentences and β a small positive integer based on the extent of co- referencing.
Portions of this work were supported by Grants EIA-9903545 and ISS-0219560 from the National Science Foundation, Contract
N00014-02-1-0364 from the Office of Naval Re- search, and by sponsors of the Center for Education and Research in Information
Assurance and Security. An online demo can be found at http://www.cerias.purdue.edu/homes/wmnlt/semdemo.html.