Wednesday, February 20, 2008

Writing XML with UTF-8 Encoding using XmlTextWriter and StringWriter







If you want to use XmlTextWriter to write XML into a StringBuilder you can create the XmlTextWriter like this:

StringBuilder builder = new StringBuilder();
XmlWriter writer = new XmlTextWriter(new StringWriter(builder));


But this generates a declaration on the resulting XML with the encoding of UTF-16 (the encoding of a .Net String). There doesn't seem to be a straightforward way of making this declaration UTF-8 in this set up.

You can, of course, use a MemoryStream instead of a StringWriter, and then use Encoding.UTF8.GetString(...) to convert the bytes to a string, but doing this made the resulting string have non-printable characters in it, which we don't want.

The solution is to subclass StringWriter and override the Encoding property. Sounds a bit overkill for a solution, but it works very well. Just create the following class (based on Jon Skeet's class):

public class StringWriterWithEncoding : StringWriter
{
Encoding encoding;

public StringWriterWithEncoding (StringBuilder builder, Encoding encoding)
:base(builder)
{
this.encoding = encoding;
}

public override Encoding Encoding
{
get { return encoding; }
}
}

Then use StringWriterWithEncoding instead of StringWriter in your XmlTextWriter.

9 comments:

Anonymous said...

try the following

StringBuilder sb = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent=true;
settings.Encoding = Encoding.UTF8;
settings.CloseOutput = false;
settings.CheckCharacters = true;

XmlWriter w = XmlWriter.Create(sb, settings);

thoward37 said...

Here's a slightly more complete version (sorry no html formatting for the code listing):



public class StringWriterWithEncoding : StringWriter
{
private Encoding _encoding;

public StringWriterWithEncoding()
: base() {}

public StringWriterWithEncoding(IFormatProvider formatProvider)
: base(formatProvider) {}

public StringWriterWithEncoding(StringBuilder sb)
: base(sb) { }

public StringWriterWithEncoding(StringBuilder sb, IFormatProvider formatProvider)
: base(sb, formatProvider) { }


public StringWriterWithEncoding(Encoding encoding)
: base() {
_encoding = encoding;
}

public StringWriterWithEncoding(IFormatProvider formatProvider, Encoding encoding)
: base(formatProvider) {
_encoding = encoding;
}

public StringWriterWithEncoding(StringBuilder sb, Encoding encoding)
: base(sb) {
_encoding = encoding;
}

public StringWriterWithEncoding(StringBuilder sb, IFormatProvider formatProvider, Encoding encoding)
: base(sb, formatProvider) {
_encoding = encoding;
}

public override Encoding Encoding
{
get
{
return (null == _encoding) ? base.Encoding : _encoding;
}
}
}

Anonymous said...

To avoid the undesirable chars when using Encoding.UTF8.GetString on a memory stream, load the memory stream into a text reader and then read from that:

Stream s = new MemoryStream();
XmlWriter xw = new XmlTextWriter(s, Encoding.UTF8);
// Write XML to xw

// Now read back
TextReader tr = new StreamReader(s);
s.Seek(0, SeekOrigin.Begin);
string xml = tr.ReadToEnd();

Chris said...

You can reduce the lines down quite a bit, just use a XmlWriter (not XmlTextWriter), specify your tab settings and use a MemoryStream with a StreamReader as anonymous said, though it needs UTF8 forced for the byte order to be correct. I shoved two example snippets to write UTF8 XML here, feel free to take.

Unknown said...

I ran into the same issue trying to write an XML with UTF-8 encoding to a String Writer. I tried all suggestions from this email chain. Only the suggestion from thoward37 with a subclass of StringWriter worked.

Unknown said...

The hint to override the Endoding property was great!

Thank You!

UTHAM said...

Great solution.... really good..works well...

UTHAM said...
This comment has been removed by the author.
UTHAM said...
This comment has been removed by the author.