Info needed / Problem advice - strange chars in imacros extr

Discussions and Tech Support specific to the iMacros Firefox add-on.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
StarFaller
Posts: 13
Joined: Sat Jan 20, 2007 9:03 am

Info needed / Problem advice - strange chars in imacros extr

Post by StarFaller » Fri May 25, 2007 6:33 am

hi,

I've been using the extract function in iMacros to grab data from certain websites. problem is extracted data has strange characters that mess up integer and string conversions.

first of all: the file that contains extract variable contents starts with a strange character, or let me put it this way, using the Filesystemobject and VBS to open the file and read the first line, or the whole file lands me the strange characters in the output. they are also visibile when opening the csv in excel.

There are also other characters that show up, AND ARE NOT PRESENT IN THE text of the page, I believe the nbsp character gets translated into "Â", but i don't know that much html to tell for sure.
User avatar
Tech Support
Posts: 4947
Joined: Tue Sep 20, 2005 7:25 pm
Contact:

Post by Tech Support » Sat May 26, 2007 9:32 pm

Can you send us the URL of the web page and/or the macro that creates the problem? This would allow us to quickly re-create the issue on our test systems.
StarFaller
Posts: 13
Joined: Sat Jan 20, 2007 9:03 am

Post by StarFaller » Wed May 30, 2007 7:33 am

ok, i think if u paste this code in a htm file you will be able to replicate the effect:

Code: Select all

<tr><td valign="top" width="227">
<b>Belongs to:&nbsp;</b> <a href="index.htm">Armata Ultras 100</a><br/>
<b>Wage:&nbsp;</b> 5.313 &euro;<br/>
<b>Age:&nbsp;</b> 18<br/>
<b>Height:&nbsp;</b> 186 cm<br/>					

<b>Deadline:&nbsp;</b> 30.05.07 11:32<br/>
</td>

<td valign="top">								
<table border="0" cellpadding="0" cellspacing="0">
<tr><td><b>text1:&nbsp;</b></td><td width="78">placeholder</td>
<td><b>&nbsp; &nbsp; text2:&nbsp;</b></td><td>placeholder</td></tr>
<tr><td><b>text3:&nbsp;</b></td><td>placeholder</td>

<td><b>&nbsp; &nbsp; text4:&nbsp;</b></td><td>placeholder</td></tr>
<tr><td><b>text5:&nbsp;</b></td><td>placeholder</td>
<td><b>&nbsp; &nbsp; text6:&nbsp;</b></td><td>placeholder</td></tr>
<tr><td><b>text7:&nbsp;</b></td><td>placeholder</td>
<td><b>&nbsp; &nbsp; text8:&nbsp;</b></td><td>very placeholder</td></tr>

<tr><td><b>text9:&nbsp;</b></td><td>poor</td>
<td><b>&nbsp; &nbsp; text10:&nbsp;</b></td><td>placeholder</td></tr>
</table>
</td>
</tr>
the code i'm using to grab the data from the table is:

Code: Select all

TAG POS=1 TYPE=TD ATTR=TXT:<BR>Belongs*<BR>* EXTRACT=TXT
TAG POS=1 TYPE=TABLE ATTR=TXT:<BR>text1:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\ FILE=test.csv
StarFaller
Posts: 13
Joined: Sat Jan 20, 2007 9:03 am

Post by StarFaller » Wed Jun 06, 2007 7:15 pm

any progrress in this sense?
blondu
Posts: 1
Joined: Fri Sep 18, 2009 6:54 pm

Re: Info needed / Problem advice - strange chars in imacros extr

Post by blondu » Fri Sep 18, 2009 7:03 pm

From what I could figure out, the problem seems to be &nbsp;, which is represented in extracted text as unicode character U+00A0. If you open the text file with any editor you will see this character printed as space. In console, however, you will see the following characters:   (for example you can use type <filename> in Windows Command Prompt).

I encountered this issue while trying to insert the extracted data in a MySQL database. In order to remove them, you'd use trim, for example, but you need to write the character. If you write the query in the console, you can use copy / paste to write the characters ┬á . However, if you load the query from a text file, you need something different. I used (echo '┬á' >filename) and got this text printing of the characters ' '. Then I copy / pasted this in my query and all went ok.
Post Reply