tag:blogger.com,1999:blog-273593670040001243.post8787532487261541018..comments2022-03-28T05:51:26.366-07:00Comments on Useless Factor: The most common Unicode-processing bugAnonymoushttp://www.blogger.com/profile/00902922561603041049noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-273593670040001243.post-34187848266002140682008-02-01T02:36:00.000-08:002008-02-01T02:36:00.000-08:00MS Outlook Express doesn't bother declaring encodi...MS Outlook Express doesn't bother declaring encoding used at all (it doesn't add charset parameter to Content-Type header). It certainly happens on newsgroups, and I wouldn't be surprised if it was equally crappy in sending regular e-mail too.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-273593670040001243.post-20925357022597236002008-01-31T14:19:00.000-08:002008-01-31T14:19:00.000-08:00To the first anonymous comment: yes, I know that, ...To the first anonymous comment: yes, I know that, but I don't think anyone uses ISO 8859-15 since it came out so recently, after Unicode had already become popular. To the second anonymous, you're probably right; I'll change the last sentence of the post to reflect this.Anonymoushttps://www.blogger.com/profile/00902922561603041049noreply@blogger.comtag:blogger.com,1999:blog-273593670040001243.post-13514445411889433722008-01-31T14:15:00.000-08:002008-01-31T14:15:00.000-08:00due to the nature of utf-8, it is highly unlikely ...due to the nature of utf-8, it is highly unlikely that a byte sequence that decodes as valid utf-8 and has bytes with the upper bit "on" is anything other than utf-8. in other words, utf-8 is easy to recognize and for not trivially short texts the chance of a false positive is low. so decoders should try to read as utf-8 and fall back to iso-8859-x and windows-125x if it's not valid utf-8.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-273593670040001243.post-56636381234838213932008-01-31T14:13:00.000-08:002008-01-31T14:13:00.000-08:00The € sign is included in iso-8859-15, I think. T...The € sign is included in iso-8859-15, I think. They made new copies of the character sets to accomodate €.Anonymousnoreply@blogger.com