Special characters not extracting properly

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
simonb
Posts: 6
Joined: Sun Jul 17, 2016 1:37 pm

Special characters not extracting properly

Post by simonb » Sun Jul 17, 2016 1:55 pm

Hello!

I found a post dealing with the same subject, but it doesn't look like a solution was discovered. http://forum.imacros.net/viewtopic.php?f=7&t=18015

I use iMacros for FireFox (Version 8.9.7). I am trying to extract data from a page with special characters in to a CSV. When I extract a word like "Düsseldorf" I get "Düsseldorf" in the csv.

Does anyone know a solution to get iMacros to extract the word as it is written on the page?

Also, I'm running Mac OS X Lion 10.7.5 if that matters.

Also, when I remove the "SET !EXTRACT_TEST_POPUP NO" so that iMacros shows a prompt with the extracted text the characters display properly in the prompt window, so it's only a problem in the csv file.
chivracq
Posts: 9004
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Special characters not extracting properly

Post by chivracq » Mon Jul 18, 2016 3:02 am

simonb wrote:Hello!

I found a post dealing with the same subject, but it doesn't look like a solution was discovered. http://forum.imacros.net/viewtopic.php?f=7&t=18015

I use iMacros for FireFox (Version 8.9.7). I am trying to extract data from a page with special characters in to a CSV. When I extract a word like "Düsseldorf" I get "Düsseldorf" in the csv.

Does anyone know a solution to get iMacros to extract the word as it is written on the page?
simonb wrote:Also, I'm running Mac OS X Lion 10.7.5 if that matters.

Also, when I remove the "SET !EXTRACT_TEST_POPUP NO" so that iMacros shows a prompt with the extracted text the characters display properly in the prompt window, so it's only a problem in the csv file.
Oh...!, yep OS matters...!, and the rest which is still missing...! => Still FCIM...! :mrgreen:
=> FF Version still missing, + OS & FF Language, + FF Character Encoding Settings in your case + Language & Character Encoding Settings in the Prog (which Prog btw...!?) you are using to view your '.CSV' File...?

Hum, and you are lucky I checked your Post/Thread again, you should mention it (that you added some Info) in some apart Post when you add (silently) some (Required) Info to your Original Post, I normally only check Posts/Threads once and don't react if FCI as a start is not mentioned, check my Sig...

And handy as well would be if you could post your Script with the URL of your Page, I will be using the following Test-Post for Testing otherwise, don't complain afterwards if "it" doesn't work for your "real" Site when I do some Testing...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
chivracq
Posts: 9004
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Special characters not extracting properly

Post by chivracq » Mon Jul 18, 2016 3:04 am

"Düsseldorf" /// "Düsseldorf"
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
simonb
Posts: 6
Joined: Sun Jul 17, 2016 1:37 pm

Re: Special characters not extracting properly

Post by simonb » Thu Jul 21, 2016 6:04 pm

Thank you for your reply chivracq! Apologies for the delay. For some reason I didn't get an email notification that someone had replied.

FF Version - 47.0.1
FF Language - US English?
FF Text Encoding - Unicode

I appears as though the problem may be the program I am viewing it in. I am using Excel for Mac 2011 (Version 14.3.8). At the moment my character settings seem to be set to Western European, but I had changed that to UTF-8 when I was testing the file. I tried the csv file on a PC and it worked fine (Excel 2010. Version 14.0.4756.1000). Do you have any idea why it may not be displaying properly on a Mac? I've read a few posts about Excel for Mac not handling unicode very well.
chivracq
Posts: 9004
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Special characters not extracting properly

Post by chivracq » Thu Jul 21, 2016 9:06 pm

simonb wrote:Thank you for your reply chivracq! Apologies for the delay. For some reason I didn't get an email notification that someone had replied.

FF Version - 47.0.1
FF Language - US English?
FF Text Encoding - Unicode

I appears as though the problem may be the program I am viewing it in. I am using Excel for Mac 2011 (Version 14.3.8). At the moment my character settings seem to be set to Western European, but I had changed that to UTF-8 when I was testing the file. I tried the csv file on a PC and it worked fine (Excel 2010. Version 14.0.4756.1000). Do you have any idea why it may not be displaying properly on a Mac? I've read a few posts about Excel for Mac not handling unicode very well.
OK for FCI:

Code: Select all

iMacros for FF v8.9.7
FF47_US-Eng_Unicode-UTF8
Mac OS X Lion 10.7.5
Excel for Mac 2011 (Version 14.3.8)
The Language and Charset Encoding Settings are normally not needed, only in your Case because dealing with Character Encoding...

Yep the first step was indeed to open your '.CSV' in some Plain-Text Program like Notepad on Win32/64 or like you did with Excel on Win32/64 to identify if the Pb came from the Browser / iMacros (Extraction/Save) or from the Prog you were using to open your CSV.

Pfff, I don't know much about Excel, especially on MacOS, I actually use OpenOffice (on Win7-x64/Win10-x64) and under Tools / Options / Language Settings / Languages, you have some Settings you can try playing with, I guess you will have similar Settings for Excel.

Another thing you can try, is to open your '.CSV' from your MacOS Notepad Equivalent and to re-save it directly trying different Formats, UTF8 with BOM being one that should work. (Same thing to try from your Excel on Win32/64.)
Next step if that doesn't help, is still from your Notepad Equivalent, to manually copy the whole Content of the CSV and to manually paste it into some New Excel Doc/Sheet, you should then get some Dialogbox where you can specify a few Settings, among which Charset Encoding I think... and/or the Paste will work fine directly, because I guess if you manually paste or type your "Düsseldorf" into some New Excel/Sheet/Cell, save it and reopen the File, the Special Chars will (still) display correctly...

And you could try using OpenOffice as well, if it exists for MacOS...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
simonb
Posts: 6
Joined: Sun Jul 17, 2016 1:37 pm

Re: Special characters not extracting properly

Post by simonb » Mon Dec 05, 2016 3:17 pm

Thanks! I used TextWrangler to open the csv file and then resaved the copy using UTF-8. Fixed the problem.
Post Reply