Kinook Software Forums

Kinook Software Forums (http://www.kinook.com/Forum/index.php)
-   [UR] General Discussion (http://www.kinook.com/Forum/forumdisplay.php?f=23)
-   -   Strange strange behaviour with special French characters (http://www.kinook.com/Forum/showthread.php?t=4405)

hartmut 02-19-2010 05:00 AM

Strange strange behaviour with special French characters
 
Problem:
When I copy from firefox or IE a French webpage directly to UR the French characters are not shown correctly.
When I save the same webpage to scrapbook, export it to a folder and then import to UR via file import the characters are displayer correctly.

Please see following example

direct copy:
Décoration et loisirs créatifs

via scrapbook:
Décoration et loisirs créatifs
I am using UR 4.1b, Firefox 3.6 and WinmdowsXP


Hartmut

kinook 02-19-2010 09:55 AM

1 Attachment(s)
That works ok in our tests. The first Google result for "Décoration et loisirs créatifs" was http://www.creamalice.com/. After importing that page from Firefox 3.6 into UR 4.1b using the UR Firefox extension 'Copy to Ultra Recall' button, the item text displays correctly in UR (see attached .urd file and screen shot).

hartmut 02-20-2010 02:07 AM

Than yo, I tried with the side you mention and it works fine,.
I suppose it is a problem of the site were I downloadad this side, as the side of this site have all the same problem.

The original side was

http://www.tourisme-hautemarne.com/t...810,1283.html?


Hartmut

kinook 02-22-2010 09:41 AM

It appears that what is happening is the web page text is UTF-8 encoded, but without a BOM (byte order mark), and within the web page itself, the content is declared to be encoded as iso-8859-1 (the encoding for Western European text), which is inconsistent with the actual encoding. UR imports the data correctly, but when displaying the page, the embedded IE browser assumes the page is encoded as iso-8859-1 rather than UTF-8, which results in the accented characters displaying incorrectly. My guess is that scrapbook converts everything to the current code page or UTF-8 (adding a BOM), but UR doesn't do this (and even Firefox's Save Page As does something similar to UR, except that it doesn't capture images).

http://en.wikipedia.org/wiki/UTF-8

http://en.wikipedia.org/wiki/Byte-order_mark

http://en.wikipedia.org/wiki/ISO/IEC_8859-1

One workaround is to select the page content (Ctrl+A) in the browser before importing into UR -- the HTML clipboard data has consistent encodings and is handled correctly.


All times are GMT -5. The time now is 02:04 PM.


Copyright © 1999-2019 Kinook Software, Inc.