Jul 19 2006

Encoding Help Needed.

Here’s the plan: There’s an XML file that contains plain text data.

[Attempt 1]: Use PHP and some simple parsing to convert the data to look pretty.
[Results]: Crazy characters like “Joan Miró” and “peoples’”

[Attempt 2]: Use C# to parse the XML and output the file.
[Results]: Crazy characters like “Joan Miró” and “peoples’”

After a bunch of research, I found that the TextWriter class can encode the file:
TextWriter tw = new StreamWriter(“fileName” + “.txt”);
TextWriter tw = new StreamWriter(“fileName” + “.txt”, false, Encoding.Default);
TextWriter tw = new StreamWriter(“fileName” + “.txt”, false, Encoding.ASCII);
TextWriter tw = new StreamWriter(“fileName” + “.txt”, false, Encoding.UTF8);

All of them didn’t work. I’ve been using Ultra Edit for a while and it can do multiple file conversions. So I give it a try…ASCII to Unicode, UTF-8 to Unicode, UTF-8 to ASCII, Unicode to ASCII, DOS to UNIX, UNIX/MAC to DOS.

It all comes down to set the C# encoding to Encoding.Default and then converting the file from UTF-8 to ASCII. There’s no other way. It sucks. Any suggestions?

UPDATE
This was what I originally had at the top of the XML file.
<?xml version=1.0 encoding=utf-8?>

This was what I now have at the top of the XML file.
<?xml version=1.0 encoding=utf-16?>

I also had to change the TextWriter initialization from:
TextWriter tw = new StreamWriter(“fileName” + “.txt”);

to:
TextWriter tw = new StreamWriter(“fileName” + “.txt”, false, Encoding.Default);

Thanks, Abhi and kashif for the input.

BTW: “PHP assumes your XML is in ISO-8859-1!” Even if you have it set as UTF-8. This is also why PHP isn’t going to work for these files, BOO. If there is a real PHP solution, let me know. We’re also trying another approach using a third party PHP DB interface.