Wednesday, March 31, 2010

Junk Characters while reading Excel using Jxl API

In my project, i have to read some data from an Excel file to use that for showing some data on JSP. Its a multilingual application so data has words in various european langauges. I have used Jxl api to read data from excel file. We have an ant task in our build file which runs at build time and read excel file to make a java file which is further used by JSP to show the data. This build when we run on windows every data in excel file comes correctly in java file but when we run the same build file on unix few of the records come with junk characters.

I investigated this issue and found that Solaris OS locale was default "en " which does not support all european characters. So first i changed that locale to en_US.UTF-8 which supports all European languages.( we have to install separate package to get en_US.UTF-8 locale )

After this change also junk character issue remains the same. Then i checked for the encoding option in the WorkbookSetting() class of Jxl. If we do not specify any encoding for workbook then it takes default encoding for that OS. I tried specifying UTF-8 as encoding for workbook explicitly but it did not work even on windows. Then i tried to get the encoding used by the window by using WorkbookSetting().getEncoding(). Then i come to know that window uses Cp1252 encoding. I then specified this encoding in work book setting using WorkbookSetting.setEncoding("Cp1252"). This solved issue of junk characters for me.

3 comments:

  1. Man, THANK YOU for sharing this. I just had a nasty issue similar to yours. Using:
    WorkbookSettings ws = new WorkbookSettings();
    ws.setEncoding("Cp1252");
    Workbook workbook = Workbook.getWorkbook(new File(my_name), ws);
    worked.

    ReplyDelete
  2. same here mate. thnx for this post.

    ReplyDelete
  3. Same problem, the solution also worked for me! Thanks a lot for sharing this trick.

    ReplyDelete