Today I’ve encountered one of those strange problems that makes you learn a few interesting things about the internals of the tool we daily use.
I’m working on a Java project with Maven and Eclipse. Since I’m not a masochist, every now and then I write a few unit tests. And so I did this time. Well, maybe I’m a bit of a masochist, because in a test, I’ve inserted a non Latin string (“งานเลี้ยงอำลา”, which should mean “Farewell party” in Thai). So far so good. The JUnit was fine, and so the test ran by the maven surefire plugin.
Today I did some refactoring, and I moved the tested class, along with the test, in another Maven module. I ran mvn package, and the test failed unexpectedly. Hey, wait a minute, I didn’t change a single line of code, well, apart from the package name. I ran the JUnit directly from Eclipse, and guess what, no error at all. So I ran again mvn package, and looked at the failure message:

expected:<...><footer><message>[?????????????]</message></footer...> but was:<...><footer><message>[?????????????]</message></footer...>

What are all those question marks? Where did my farewell go? Changing the Thai string with one with only Latin characters, made the error disappear. A little cell in my brain  was screaming “Character encoding, character encoding!”. After a bit of research, I’ve added this to my pom.xml:

			<argLine>-Xms256m -Xmx512m -XX:MaxPermSize=128m -ea

The most important parameter is “-Dfile.encoding=UTF-8”, which tells to the underlining JVM the default character encoding to use. With this parameter the failing test finally succeeded again.

I would like to understand why I did not have to use that parameter before, but only after I created the new project. I quickly compared the pom files, but I couldn’t find any relevant difference. Maybe I’ll investigate it further, and describe my findings in another post.