Writing XML with UTF-8 Encoding using XmlTextWriter and StringWriter
If you want to use XmlTextWriter to write XML into a StringBuilder you can create the XmlTextWriter like this:
StringBuilder builder = new StringBuilder();
XmlWriter writer = new XmlTextWriter(new StringWriter(builder));
But this generates a declaration on the resulting XML with the encoding of UTF-16 (the encoding of a .Net String). There doesn't seem to be a straightforward way of making this declaration UTF-8 in this set up.
You can, of course, use a MemoryStream instead of a StringWriter, and then use Encoding.UTF8.GetString(...) to convert the bytes to a string, but doing this made the resulting string have non-printable characters in it, which we don't want.
The solution is to subclass StringWriter and override the Encoding property. Sounds a bit overkill for a solution, but it works very well. Just create the following class (based on Jon Skeet's class):
public class StringWriterWithEncoding : StringWriter
{
Encoding encoding;
public StringWriterWithEncoding (StringBuilder builder, Encoding encoding)
:base(builder)
{
this.encoding = encoding;
}
public override Encoding Encoding
{
get { return encoding; }
}
}
Then use StringWriterWithEncoding instead of StringWriter in your XmlTextWriter.
9 comments:
try the following
StringBuilder sb = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent=true;
settings.Encoding = Encoding.UTF8;
settings.CloseOutput = false;
settings.CheckCharacters = true;
XmlWriter w = XmlWriter.Create(sb, settings);
Here's a slightly more complete version (sorry no html formatting for the code listing):
public class StringWriterWithEncoding : StringWriter
{
private Encoding _encoding;
public StringWriterWithEncoding()
: base() {}
public StringWriterWithEncoding(IFormatProvider formatProvider)
: base(formatProvider) {}
public StringWriterWithEncoding(StringBuilder sb)
: base(sb) { }
public StringWriterWithEncoding(StringBuilder sb, IFormatProvider formatProvider)
: base(sb, formatProvider) { }
public StringWriterWithEncoding(Encoding encoding)
: base() {
_encoding = encoding;
}
public StringWriterWithEncoding(IFormatProvider formatProvider, Encoding encoding)
: base(formatProvider) {
_encoding = encoding;
}
public StringWriterWithEncoding(StringBuilder sb, Encoding encoding)
: base(sb) {
_encoding = encoding;
}
public StringWriterWithEncoding(StringBuilder sb, IFormatProvider formatProvider, Encoding encoding)
: base(sb, formatProvider) {
_encoding = encoding;
}
public override Encoding Encoding
{
get
{
return (null == _encoding) ? base.Encoding : _encoding;
}
}
}
To avoid the undesirable chars when using Encoding.UTF8.GetString on a memory stream, load the memory stream into a text reader and then read from that:
Stream s = new MemoryStream();
XmlWriter xw = new XmlTextWriter(s, Encoding.UTF8);
// Write XML to xw
// Now read back
TextReader tr = new StreamReader(s);
s.Seek(0, SeekOrigin.Begin);
string xml = tr.ReadToEnd();
You can reduce the lines down quite a bit, just use a XmlWriter (not XmlTextWriter), specify your tab settings and use a MemoryStream with a StreamReader as anonymous said, though it needs UTF8 forced for the byte order to be correct. I shoved two example snippets to write UTF8 XML here, feel free to take.
I ran into the same issue trying to write an XML with UTF-8 encoding to a String Writer. I tried all suggestions from this email chain. Only the suggestion from thoward37 with a subclass of StringWriter worked.
The hint to override the Endoding property was great!
Thank You!
Great solution.... really good..works well...
Post a Comment