hi,
I've been using the extract function in iMacros to grab data from certain websites. problem is extracted data has strange characters that mess up integer and string conversions.
first of all: the file that contains extract variable contents starts with a strange character, or let me put it this way, using the Filesystemobject and VBS to open the file and read the first line, or the whole file lands me the strange characters in the output. they are also visibile when opening the csv in excel.
There are also other characters that show up, AND ARE NOT PRESENT IN THE text of the page, I believe the nbsp character gets translated into "Â", but i don't know that much html to tell for sure.
Info needed / Problem advice - strange chars in imacros extr
Forum rules
iMacros EOL - Attention!
The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.
Thank you again for your business and support.
Sincerely,
The Progress Team
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
iMacros EOL - Attention!
The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.
Thank you again for your business and support.
Sincerely,
The Progress Team
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
-
- Posts: 13
- Joined: Sat Jan 20, 2007 9:03 am
- Tech Support
- Posts: 4948
- Joined: Tue Sep 20, 2005 7:25 pm
- Contact:
-
- Posts: 13
- Joined: Sat Jan 20, 2007 9:03 am
ok, i think if u paste this code in a htm file you will be able to replicate the effect:
the code i'm using to grab the data from the table is:
Code: Select all
<tr><td valign="top" width="227">
<b>Belongs to: </b> <a href="index.htm">Armata Ultras 100</a><br/>
<b>Wage: </b> 5.313 €<br/>
<b>Age: </b> 18<br/>
<b>Height: </b> 186 cm<br/>
<b>Deadline: </b> 30.05.07 11:32<br/>
</td>
<td valign="top">
<table border="0" cellpadding="0" cellspacing="0">
<tr><td><b>text1: </b></td><td width="78">placeholder</td>
<td><b> text2: </b></td><td>placeholder</td></tr>
<tr><td><b>text3: </b></td><td>placeholder</td>
<td><b> text4: </b></td><td>placeholder</td></tr>
<tr><td><b>text5: </b></td><td>placeholder</td>
<td><b> text6: </b></td><td>placeholder</td></tr>
<tr><td><b>text7: </b></td><td>placeholder</td>
<td><b> text8: </b></td><td>very placeholder</td></tr>
<tr><td><b>text9: </b></td><td>poor</td>
<td><b> text10: </b></td><td>placeholder</td></tr>
</table>
</td>
</tr>
Code: Select all
TAG POS=1 TYPE=TD ATTR=TXT:<BR>Belongs*<BR>* EXTRACT=TXT
TAG POS=1 TYPE=TABLE ATTR=TXT:<BR>text1:* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\ FILE=test.csv
-
- Posts: 13
- Joined: Sat Jan 20, 2007 9:03 am
Re: Info needed / Problem advice - strange chars in imacros extr
From what I could figure out, the problem seems to be , which is represented in extracted text as unicode character U+00A0. If you open the text file with any editor you will see this character printed as space. In console, however, you will see the following characters:   (for example you can use type <filename> in Windows Command Prompt).
I encountered this issue while trying to insert the extracted data in a MySQL database. In order to remove them, you'd use trim, for example, but you need to write the character. If you write the query in the console, you can use copy / paste to write the characters ┬á . However, if you load the query from a text file, you need something different. I used (echo '┬á' >filename) and got this text printing of the characters ' '. Then I copy / pasted this in my query and all went ok.
I encountered this issue while trying to insert the extracted data in a MySQL database. In order to remove them, you'd use trim, for example, but you need to write the character. If you write the query in the console, you can use copy / paste to write the characters ┬á . However, if you load the query from a text file, you need something different. I used (echo '┬á' >filename) and got this text printing of the characters ' '. Then I copy / pasted this in my query and all went ok.