Special characters not extracting properly

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Special characters not extracting properly

by simonb on Sun Jul 17, 2016 6:55 am

Hello!

I found a post dealing with the same subject, but it doesn't look like a solution was discovered. viewtopic.php?f=7&t=18015

I use iMacros for FireFox (Version 8.9.7). I am trying to extract data from a page with special characters in to a CSV. When I extract a word like "Düsseldorf" I get "Düsseldorf" in the csv.

Does anyone know a solution to get iMacros to extract the word as it is written on the page?

Also, I'm running Mac OS X Lion 10.7.5 if that matters.

Also, when I remove the "SET !EXTRACT_TEST_POPUP NO" so that iMacros shows a prompt with the extracted text the characters display properly in the prompt window, so it's only a problem in the csv file.
simonb
 
Posts: 6
Joined: Sun Jul 17, 2016 6:37 am

Re: Special characters not extracting properly

by chivracq on Sun Jul 17, 2016 8:02 pm

simonb wrote:Hello!

I found a post dealing with the same subject, but it doesn't look like a solution was discovered. viewtopic.php?f=7&t=18015

I use iMacros for FireFox (Version 8.9.7). I am trying to extract data from a page with special characters in to a CSV. When I extract a word like "Düsseldorf" I get "Düsseldorf" in the csv.

Does anyone know a solution to get iMacros to extract the word as it is written on the page?


simonb wrote:Also, I'm running Mac OS X Lion 10.7.5 if that matters.

Also, when I remove the "SET !EXTRACT_TEST_POPUP NO" so that iMacros shows a prompt with the extracted text the characters display properly in the prompt window, so it's only a problem in the csv file.

Oh...!, yep OS matters...!, and the rest which is still missing...! => Still FCIM...! :mrgreen:
=> FF Version still missing, + OS & FF Language, + FF Character Encoding Settings in your case + Language & Character Encoding Settings in the Prog (which Prog btw...!?) you are using to view your '.CSV' File...?

Hum, and you are lucky I checked your Post/Thread again, you should mention it (that you added some Info) in some apart Post when you add (silently) some (Required) Info to your Original Post, I normally only check Posts/Threads once and don't react if FCI as a start is not mentioned, check my Sig...

And handy as well would be if you could post your Script with the URL of your Page, I will be using the following Test-Post for Testing otherwise, don't complain afterwards if "it" doesn't work for your "real" Site when I do some Testing...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5201
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Special characters not extracting properly

by chivracq on Sun Jul 17, 2016 8:04 pm

"Düsseldorf" /// "Düsseldorf"
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5201
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Special characters not extracting properly

by simonb on Thu Jul 21, 2016 11:04 am

Thank you for your reply chivracq! Apologies for the delay. For some reason I didn't get an email notification that someone had replied.

FF Version - 47.0.1
FF Language - US English?
FF Text Encoding - Unicode

I appears as though the problem may be the program I am viewing it in. I am using Excel for Mac 2011 (Version 14.3.8). At the moment my character settings seem to be set to Western European, but I had changed that to UTF-8 when I was testing the file. I tried the csv file on a PC and it worked fine (Excel 2010. Version 14.0.4756.1000). Do you have any idea why it may not be displaying properly on a Mac? I've read a few posts about Excel for Mac not handling unicode very well.
simonb
 
Posts: 6
Joined: Sun Jul 17, 2016 6:37 am

Re: Special characters not extracting properly

by chivracq on Thu Jul 21, 2016 2:06 pm

simonb wrote:Thank you for your reply chivracq! Apologies for the delay. For some reason I didn't get an email notification that someone had replied.

FF Version - 47.0.1
FF Language - US English?
FF Text Encoding - Unicode

I appears as though the problem may be the program I am viewing it in. I am using Excel for Mac 2011 (Version 14.3.8). At the moment my character settings seem to be set to Western European, but I had changed that to UTF-8 when I was testing the file. I tried the csv file on a PC and it worked fine (Excel 2010. Version 14.0.4756.1000). Do you have any idea why it may not be displaying properly on a Mac? I've read a few posts about Excel for Mac not handling unicode very well.

OK for FCI:
Code: Select all
iMacros for FF v8.9.7
FF47_US-Eng_Unicode-UTF8
Mac OS X Lion 10.7.5
Excel for Mac 2011 (Version 14.3.8)

The Language and Charset Encoding Settings are normally not needed, only in your Case because dealing with Character Encoding...

Yep the first step was indeed to open your '.CSV' in some Plain-Text Program like Notepad on Win32/64 or like you did with Excel on Win32/64 to identify if the Pb came from the Browser / iMacros (Extraction/Save) or from the Prog you were using to open your CSV.

Pfff, I don't know much about Excel, especially on MacOS, I actually use OpenOffice (on Win7-x64/Win10-x64) and under Tools / Options / Language Settings / Languages, you have some Settings you can try playing with, I guess you will have similar Settings for Excel.

Another thing you can try, is to open your '.CSV' from your MacOS Notepad Equivalent and to re-save it directly trying different Formats, UTF8 with BOM being one that should work. (Same thing to try from your Excel on Win32/64.)
Next step if that doesn't help, is still from your Notepad Equivalent, to manually copy the whole Content of the CSV and to manually paste it into some New Excel Doc/Sheet, you should then get some Dialogbox where you can specify a few Settings, among which Charset Encoding I think... and/or the Paste will work fine directly, because I guess if you manually paste or type your "Düsseldorf" into some New Excel/Sheet/Cell, save it and reopen the File, the Special Chars will (still) display correctly...

And you could try using OpenOffice as well, if it exists for MacOS...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5201
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Special characters not extracting properly

by simonb on Mon Dec 05, 2016 8:17 am

Thanks! I used TextWrangler to open the csv file and then resaved the copy using UTF-8. Fixed the problem.
simonb
 
Posts: 6
Joined: Sun Jul 17, 2016 6:37 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 1 guest

Website Monitoring