View Single Post
  #4  
Old 02-22-2010, 08:41 AM
kinook kinook is online now
Administrator
 
Join Date: 03-06-2001
Location: Colorado
Posts: 6,003
It appears that what is happening is the web page text is UTF-8 encoded, but without a BOM (byte order mark), and within the web page itself, the content is declared to be encoded as iso-8859-1 (the encoding for Western European text), which is inconsistent with the actual encoding. UR imports the data correctly, but when displaying the page, the embedded IE browser assumes the page is encoded as iso-8859-1 rather than UTF-8, which results in the accented characters displaying incorrectly. My guess is that scrapbook converts everything to the current code page or UTF-8 (adding a BOM), but UR doesn't do this (and even Firefox's Save Page As does something similar to UR, except that it doesn't capture images).

http://en.wikipedia.org/wiki/UTF-8

http://en.wikipedia.org/wiki/Byte-order_mark

http://en.wikipedia.org/wiki/ISO/IEC_8859-1

One workaround is to select the page content (Ctrl+A) in the browser before importing into UR -- the HTML clipboard data has consistent encodings and is handled correctly.
Reply With Quote