Missing #NEXT# delimiters in .csv from web extraction

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Jiabgwh
Posts: 7
Joined: Wed Feb 09, 2011 5:56 pm

Missing #NEXT# delimiters in .csv from web extraction

Post by Jiabgwh » Wed Feb 09, 2011 7:41 pm

Hi, new user here. I'm modifying code for a macro we've run in version 5 for a while to be used for version 7. It is a web scraping macro.

The commend in version 5 was: EXTRACT POS=10 TYPE=TXT ATTR=<TABLE*
The txt in the .csv file looks like this:

2010 - 09#NEXT# 08/31/2010#NEXT# 10/01/2010#NEXT# 7744#NEXT# 4#NEXT# 31#NEXT# 0#NEXT# Final#NEXT# #NEXT# - #NEXT##NEWLINE#
2010 - 08#NEXT# 08/02/2010#NEXT# 08/31/2010#NEXT# 7740#NEXT# 12#NEXT# 29#NEXT# 0#NEXT# Final#NEXT# #NEXT# - #NEXT##NEWLINE#

Now I'm converting this code to version 7 with the TAG commend as follows: TAG POS=10 TYPE=TABLE ATTR=* EXTRACT=TXT
Now the txt in the .csv file looks like this:

2010 - 0908/31/201010/01/201077444310Final-
2010 - 0808/02/201008/31/2010774012290Final-

Knowing these breaks in the different fields is everything for us for using this macro. I read on the wiki that these #NEXT# and #NEWLINE# tags should still be present.
How can I get them back? Am I formatting my TAG command incorrectly? Help is much appreciated. Thanks!
Tom, Tech Support
Posts: 3834
Joined: Mon May 31, 2010 4:59 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Tom, Tech Support » Thu Feb 10, 2011 12:00 pm

Hello Jiabgwh,

The table extraction format has changed in version 7. When you save a table extraction with SAVEAS TYPE=EXTRACT, the data does not include the delimiters. But if you return the extracted data to a script via the scripting interface call iimGetLastExtract(), then you receive the data formatted with #NEXT# and #NEWLINE# for further processing.

There is one exception: If the table you are attempting to extract also contains nested tables, then a normal text extraction is performed (without the #NEXT# and #NEWLINE# delimiters). The workaround is to only extract the innermost table that you need. [Update: 1/23/2012] Another workaround is to use the solution provided on this post for extracting nested table data with the same format as V6.

What edition of iMacros are you using (i.e. Scripting, PRO, Power Surfer)?
Regards,

Tom, iMacros Support
Jiabgwh
Posts: 7
Joined: Wed Feb 09, 2011 5:56 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Jiabgwh » Thu Feb 10, 2011 2:36 pm

I'm not using the SAVEAS TYPE=EXTRACT command. I'm using the iimGetLastExtract() command in order to pull back the #NEXT# and #NEWLINE# tags. What I'm saying is that they're not there. I'm using the Scripting version.
Tom, Tech Support
Posts: 3834
Joined: Mon May 31, 2010 4:59 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Tom, Tech Support » Thu Feb 10, 2011 2:46 pm

Can you run the following script and tell me if it is displaying the delimiters:

ExtractTable.vbs:

Code: Select all

Dim macroExtractTable
macroExtractTable = "CODE:"
macroExtractTable = macroExtractTable + "URL GOTO=http://www.iopus.com/imacros/demo/v6/extract.htm" + vbNewLine 
macroExtractTable = macroExtractTable + "TAG POS=9 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT" + vbNewLine 

Dim im
Set im = CreateObject("iMacros")
im.iimInit

im.iimPlay(macroExtractTable)

MsgBox im.iimGetLastExtract(1)
Regards,

Tom, iMacros Support
Jiabgwh
Posts: 7
Joined: Wed Feb 09, 2011 5:56 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Jiabgwh » Thu Feb 10, 2011 9:20 pm

It is displaying the delimiters. The TAG command format is exactly the same as the one I'm using. What would explain the differences?
Tom, Tech Support
Posts: 3834
Joined: Mon May 31, 2010 4:59 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Tom, Tech Support » Thu Feb 10, 2011 9:22 pm

The only thing I can think of is that the table you are extracting isn't a standard HTML <TABLE>? Can you view the HTML source for the table?
Regards,

Tom, iMacros Support
Jiabgwh
Posts: 7
Joined: Wed Feb 09, 2011 5:56 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Jiabgwh » Thu Feb 10, 2011 10:09 pm

Yes I'm able to view the HTML code. It seems to be standard HTML. I'd have thought if I could get the tags in iMacros v5 I'd get them in v7.
Tom, Tech Support
Posts: 3834
Joined: Mon May 31, 2010 4:59 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Tom, Tech Support » Fri Feb 11, 2011 2:33 pm

Is there any way for me to try the extraction with your table myself?

If you don't want to post it in a public forum, please email the URL and/or macro to support AT iopus.com and mention this forum post.
Regards,

Tom, iMacros Support
Jiabgwh
Posts: 7
Joined: Wed Feb 09, 2011 5:56 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Jiabgwh » Fri Feb 11, 2011 3:38 pm

Hi Tom,

Unfortunately I'm not able to since the site isn't public. Would it help if I copied chunks of the source code that might reveal the underlying issue? I can also send you a screenshot of the site if you think that would help and more of my code. Let me know
Tom, Tech Support
Posts: 3834
Joined: Mon May 31, 2010 4:59 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Tom, Tech Support » Mon Feb 14, 2011 12:57 pm

Yes, both of those would be helpful.
Regards,

Tom, iMacros Support
gomyers
Posts: 2
Joined: Thu Mar 10, 2011 1:59 am

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by gomyers » Thu Mar 10, 2011 2:04 am

I too am having this problem with V7.
I get the table data as a series of lines each ending with \r\n.

Here is my macros and the HTML copied from the source page.

' Find the Patient Name
' Go back from there to any previous table.
' Then select the next table ( with the Patient Name )

--CODE
TAG POS=1 TYPE=TD FORM=NAME:ScreenForm ATTR=TXT:DAVID<SP>PETERS*
TAG POS=R-1 TYPE=TABLE ATTR=TXT:*
TAG POS=R1 TYPE=TABLE ATTR=* EXTRACT=TXT

--HTML
<table border="0" cellspacing="0" cellpadding="2" width="100%" bgcolor="#FFFFFF">
<tr>
<td colspan="2" class="BGBlue" height="30">
<span class="ColumnLabel">
<font size="3">DAVID PETERS</font></span>&nbsp; <span class="nonHipaaText9PT">( Subscriber)</span></td>
<td colspan="3" class="BGBlue" align="right"><font color="ffffff">Date of Birth:&nbsp;</font>
<span class="ColumnLabel">Oct 14, 1953</span></td>
</tr>
<tr>
<td width="23%" class="BGLtBlue"><b>Claim Number</b></td>
<td width="30%" class="BGLtBlue"><b>Service Dates</b></td>
<td width="15%" class="BGLtBlue" align="right"><b>Total Charge</b></td>
<td width="4%" class="BGLtBlue">&nbsp;</td>
<td width="28%" class="BGLtBlue"><b>Status </b></td>
</tr>

<tr CLASS="">
<td valign="top">
<A href='javascript:submitClaimNumber("31106227082", "E143M64086")'>31106227082</A>
<td valign="top">Feb 4, 2010 - Feb 4, 2010</td>
<td align="right" valign="top">$239.09</td>
<td>&nbsp;</td>
<td valign="top">Pending</td>
</tr>

<tr CLASS="BGRowGray">
<td valign="top">
<A href='javascript:submitClaimNumber("31004650651", "E143M64086")'>31004650651</A>
<td valign="top">Feb 4, 2010 - Feb 4, 2010</td>
<td align="right" valign="top">$217.86</td>
<td>&nbsp;</td>
<td valign="top">Paid</td>
</tr>

<tr CLASS="">
<td valign="top">
<A href='javascript:submitClaimNumber("31004637108", "E143M64086")'>31004637108</A>
<td valign="top">Feb 4, 2010 - Feb 4, 2010</td>
<td align="right" valign="top">$25,450.23</td>
<td>&nbsp;</td>
<td valign="top">Paid</td>
</tr>

</table>
jherington
Posts: 21
Joined: Mon Dec 04, 2006 10:08 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by jherington » Sun Mar 27, 2011 3:01 am

I am getting the same result with 7.20.1199. Everything works fine in version 6.x. I can send someone an HTM extract of the table. Please let me know. Thanks.
Jeff Herington
Tom, Tech Support
Posts: 3834
Joined: Mon May 31, 2010 4:59 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Tom, Tech Support » Thu Mar 31, 2011 7:30 am

gomyers wrote:I too am having this problem with V7.
I get the table data as a series of lines each ending with \r\n.

Here is my macros and the HTML copied from the source page.

' Find the Patient Name
' Go back from there to any previous table.
' Then select the next table ( with the Patient Name )

--CODE
TAG POS=1 TYPE=TD FORM=NAME:ScreenForm ATTR=TXT:DAVID<SP>PETERS*
TAG POS=R-1 TYPE=TABLE ATTR=TXT:*
TAG POS=R1 TYPE=TABLE ATTR=* EXTRACT=TXT
Hello gomyers,

I see you are using relative positioning in your macro. The way relative positioning is implemented in V7 is different from V6 (it was changed to be compatible with the iMacros for Firefox implementation). Please see the following article for more information:

http://wiki.imacros.net/V7_Relative_positioning
jherington wrote:I am getting the same result with 7.20.1199. Everything works fine in version 6.x. I can send someone an HTM extract of the table. Please let me know. Thanks.
Yes, please do. If you don't want to post it here, please email the HTM and/or the URL and macro to support AT iopus.com and mention this forum post.
Regards,

Tom, iMacros Support
Mindy
Posts: 1
Joined: Mon Sep 26, 2011 10:17 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Mindy » Mon Sep 26, 2011 10:24 pm

Hello,

Does this mean that I can not extract a table including the delimiters if I'm using the firefox add-on? I'm not using the scripting interface. I'd like to avoid having to process the full html of the page if possible.

Thanks,
Mindy
Re: Missing #NEXT# delimiters in .csv from web extraction

Postby Tom, iOpus on Thu Feb 10, 2011 7:00 am
Hello Jiabgwh,

The table extraction format has changed in version 7. When you save a table extraction with SAVEAS TYPE=EXTRACT, the data does not include the delimiters. But if you return the extracted data to a script via the scripting interface call iimGetLastExtract(), then you receive the data formatted with #NEXT# and #NEWLINE# for further processing.

There is one exception: If the table you are attempting to extract also contains nested tables, then a normal text extraction is performed (without the #NEXT# and #NEWLINE# delimiters). The workaround is to only extract the innermost table that you need.

What edition of iMacros are you using (i.e. Scripting, PRO, Power Surfer)?
Regards,

Tom, iOpus Support
Tom, Tech Support
Posts: 3834
Joined: Mon May 31, 2010 4:59 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

Post by Tom, Tech Support » Mon Oct 31, 2011 1:19 pm

Mindy wrote:Does this mean that I can not extract a table including the delimiters if I'm using the firefox add-on? I'm not using the scripting interface. I'd like to avoid having to process the full html of the page if possible.
Hi Mindy,

The discussion is this thread pertains to the iMacros Browser and iMacros for IE, not iMacros for Firefox.
Regards,

Tom, iMacros Support
Post Reply