Missing #NEXT# delimiters in .csv from web extraction

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Missing #NEXT# delimiters in .csv from web extraction

by Jiabgwh on Wed Feb 09, 2011 12:41 pm

Hi, new user here. I'm modifying code for a macro we've run in version 5 for a while to be used for version 7. It is a web scraping macro.

The commend in version 5 was: EXTRACT POS=10 TYPE=TXT ATTR=<TABLE*
The txt in the .csv file looks like this:

2010 - 09#NEXT# 08/31/2010#NEXT# 10/01/2010#NEXT# 7744#NEXT# 4#NEXT# 31#NEXT# 0#NEXT# Final#NEXT# #NEXT# - #NEXT##NEWLINE#
2010 - 08#NEXT# 08/02/2010#NEXT# 08/31/2010#NEXT# 7740#NEXT# 12#NEXT# 29#NEXT# 0#NEXT# Final#NEXT# #NEXT# - #NEXT##NEWLINE#

Now I'm converting this code to version 7 with the TAG commend as follows: TAG POS=10 TYPE=TABLE ATTR=* EXTRACT=TXT
Now the txt in the .csv file looks like this:

2010 - 0908/31/201010/01/201077444310Final-
2010 - 0808/02/201008/31/2010774012290Final-

Knowing these breaks in the different fields is everything for us for using this macro. I read on the wiki that these #NEXT# and #NEWLINE# tags should still be present.
How can I get them back? Am I formatting my TAG command incorrectly? Help is much appreciated. Thanks!
Jiabgwh
 
Posts: 7
Joined: Wed Feb 09, 2011 10:56 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by Tom, Tech Support on Thu Feb 10, 2011 5:00 am

Hello Jiabgwh,

The table extraction format has changed in version 7. When you save a table extraction with SAVEAS TYPE=EXTRACT, the data does not include the delimiters. But if you return the extracted data to a script via the scripting interface call iimGetLastExtract(), then you receive the data formatted with #NEXT# and #NEWLINE# for further processing.

There is one exception: If the table you are attempting to extract also contains nested tables, then a normal text extraction is performed (without the #NEXT# and #NEWLINE# delimiters). The workaround is to only extract the innermost table that you need. [Update: 1/23/2012] Another workaround is to use the solution provided on this post for extracting nested table data with the same format as V6.

What edition of iMacros are you using (i.e. Scripting, PRO, Power Surfer)?
Regards,

Tom, iMacros Support
Tom, Tech Support
 
Posts: 3300
Joined: Mon May 31, 2010 9:59 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by Jiabgwh on Thu Feb 10, 2011 7:36 am

I'm not using the SAVEAS TYPE=EXTRACT command. I'm using the iimGetLastExtract() command in order to pull back the #NEXT# and #NEWLINE# tags. What I'm saying is that they're not there. I'm using the Scripting version.
Jiabgwh
 
Posts: 7
Joined: Wed Feb 09, 2011 10:56 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by Tom, Tech Support on Thu Feb 10, 2011 7:46 am

Can you run the following script and tell me if it is displaying the delimiters:

ExtractTable.vbs:
Code: Select all
Dim macroExtractTable
macroExtractTable = "CODE:"
macroExtractTable = macroExtractTable + "URL GOTO=http://www.iopus.com/imacros/demo/v6/extract.htm" + vbNewLine
macroExtractTable = macroExtractTable + "TAG POS=9 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT" + vbNewLine

Dim im
Set im = CreateObject("iMacros")
im.iimInit

im.iimPlay(macroExtractTable)

MsgBox im.iimGetLastExtract(1)
Regards,

Tom, iMacros Support
Tom, Tech Support
 
Posts: 3300
Joined: Mon May 31, 2010 9:59 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by Jiabgwh on Thu Feb 10, 2011 2:20 pm

It is displaying the delimiters. The TAG command format is exactly the same as the one I'm using. What would explain the differences?
Jiabgwh
 
Posts: 7
Joined: Wed Feb 09, 2011 10:56 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by Tom, Tech Support on Thu Feb 10, 2011 2:22 pm

The only thing I can think of is that the table you are extracting isn't a standard HTML <TABLE>? Can you view the HTML source for the table?
Regards,

Tom, iMacros Support
Tom, Tech Support
 
Posts: 3300
Joined: Mon May 31, 2010 9:59 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by Jiabgwh on Thu Feb 10, 2011 3:09 pm

Yes I'm able to view the HTML code. It seems to be standard HTML. I'd have thought if I could get the tags in iMacros v5 I'd get them in v7.
Jiabgwh
 
Posts: 7
Joined: Wed Feb 09, 2011 10:56 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by Tom, Tech Support on Fri Feb 11, 2011 7:33 am

Is there any way for me to try the extraction with your table myself?

If you don't want to post it in a public forum, please email the URL and/or macro to support AT iopus.com and mention this forum post.
Regards,

Tom, iMacros Support
Tom, Tech Support
 
Posts: 3300
Joined: Mon May 31, 2010 9:59 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by Jiabgwh on Fri Feb 11, 2011 8:38 am

Hi Tom,

Unfortunately I'm not able to since the site isn't public. Would it help if I copied chunks of the source code that might reveal the underlying issue? I can also send you a screenshot of the site if you think that would help and more of my code. Let me know
Jiabgwh
 
Posts: 7
Joined: Wed Feb 09, 2011 10:56 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by Tom, Tech Support on Mon Feb 14, 2011 5:57 am

Yes, both of those would be helpful.
Regards,

Tom, iMacros Support
Tom, Tech Support
 
Posts: 3300
Joined: Mon May 31, 2010 9:59 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by gomyers on Wed Mar 09, 2011 7:04 pm

I too am having this problem with V7.
I get the table data as a series of lines each ending with \r\n.

Here is my macros and the HTML copied from the source page.

' Find the Patient Name
' Go back from there to any previous table.
' Then select the next table ( with the Patient Name )

--CODE
TAG POS=1 TYPE=TD FORM=NAME:ScreenForm ATTR=TXT:DAVID<SP>PETERS*
TAG POS=R-1 TYPE=TABLE ATTR=TXT:*
TAG POS=R1 TYPE=TABLE ATTR=* EXTRACT=TXT

--HTML
<table border="0" cellspacing="0" cellpadding="2" width="100%" bgcolor="#FFFFFF">
<tr>
<td colspan="2" class="BGBlue" height="30">
<span class="ColumnLabel">
<font size="3">DAVID PETERS</font></span>&nbsp; <span class="nonHipaaText9PT">( Subscriber)</span></td>
<td colspan="3" class="BGBlue" align="right"><font color="ffffff">Date of Birth:&nbsp;</font>
<span class="ColumnLabel">Oct 14, 1953</span></td>
</tr>
<tr>
<td width="23%" class="BGLtBlue"><b>Claim Number</b></td>
<td width="30%" class="BGLtBlue"><b>Service Dates</b></td>
<td width="15%" class="BGLtBlue" align="right"><b>Total Charge</b></td>
<td width="4%" class="BGLtBlue">&nbsp;</td>
<td width="28%" class="BGLtBlue"><b>Status </b></td>
</tr>

<tr CLASS="">
<td valign="top">
<A href='javascript:submitClaimNumber("31106227082", "E143M64086")'>31106227082</A>
<td valign="top">Feb 4, 2010 - Feb 4, 2010</td>
<td align="right" valign="top">$239.09</td>
<td>&nbsp;</td>
<td valign="top">Pending</td>
</tr>

<tr CLASS="BGRowGray">
<td valign="top">
<A href='javascript:submitClaimNumber("31004650651", "E143M64086")'>31004650651</A>
<td valign="top">Feb 4, 2010 - Feb 4, 2010</td>
<td align="right" valign="top">$217.86</td>
<td>&nbsp;</td>
<td valign="top">Paid</td>
</tr>

<tr CLASS="">
<td valign="top">
<A href='javascript:submitClaimNumber("31004637108", "E143M64086")'>31004637108</A>
<td valign="top">Feb 4, 2010 - Feb 4, 2010</td>
<td align="right" valign="top">$25,450.23</td>
<td>&nbsp;</td>
<td valign="top">Paid</td>
</tr>

</table>
gomyers
 
Posts: 2
Joined: Wed Mar 09, 2011 6:59 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

by jherington on Sat Mar 26, 2011 8:01 pm

I am getting the same result with 7.20.1199. Everything works fine in version 6.x. I can send someone an HTM extract of the table. Please let me know. Thanks.
Jeff Herington
jherington
 
Posts: 21
Joined: Mon Dec 04, 2006 3:08 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

by Tom, Tech Support on Thu Mar 31, 2011 12:30 am

gomyers wrote:I too am having this problem with V7.
I get the table data as a series of lines each ending with \r\n.

Here is my macros and the HTML copied from the source page.

' Find the Patient Name
' Go back from there to any previous table.
' Then select the next table ( with the Patient Name )

--CODE
TAG POS=1 TYPE=TD FORM=NAME:ScreenForm ATTR=TXT:DAVID<SP>PETERS*
TAG POS=R-1 TYPE=TABLE ATTR=TXT:*
TAG POS=R1 TYPE=TABLE ATTR=* EXTRACT=TXT
Hello gomyers,

I see you are using relative positioning in your macro. The way relative positioning is implemented in V7 is different from V6 (it was changed to be compatible with the iMacros for Firefox implementation). Please see the following article for more information:

http://wiki.imacros.net/V7_Relative_positioning

jherington wrote:I am getting the same result with 7.20.1199. Everything works fine in version 6.x. I can send someone an HTM extract of the table. Please let me know. Thanks.
Yes, please do. If you don't want to post it here, please email the HTM and/or the URL and macro to support AT iopus.com and mention this forum post.
Regards,

Tom, iMacros Support
Tom, Tech Support
 
Posts: 3300
Joined: Mon May 31, 2010 9:59 am

Re: Missing #NEXT# delimiters in .csv from web extraction

by Mindy on Mon Sep 26, 2011 3:24 pm

Hello,

Does this mean that I can not extract a table including the delimiters if I'm using the firefox add-on? I'm not using the scripting interface. I'd like to avoid having to process the full html of the page if possible.

Thanks,
Mindy

Re: Missing #NEXT# delimiters in .csv from web extraction

Postby Tom, iOpus on Thu Feb 10, 2011 7:00 am
Hello Jiabgwh,

The table extraction format has changed in version 7. When you save a table extraction with SAVEAS TYPE=EXTRACT, the data does not include the delimiters. But if you return the extracted data to a script via the scripting interface call iimGetLastExtract(), then you receive the data formatted with #NEXT# and #NEWLINE# for further processing.

There is one exception: If the table you are attempting to extract also contains nested tables, then a normal text extraction is performed (without the #NEXT# and #NEWLINE# delimiters). The workaround is to only extract the innermost table that you need.

What edition of iMacros are you using (i.e. Scripting, PRO, Power Surfer)?
Regards,

Tom, iOpus Support
Mindy
 
Posts: 1
Joined: Mon Sep 26, 2011 3:17 pm

Re: Missing #NEXT# delimiters in .csv from web extraction

by Tom, Tech Support on Mon Oct 31, 2011 6:19 am

Mindy wrote:Does this mean that I can not extract a table including the delimiters if I'm using the firefox add-on? I'm not using the scripting interface. I'd like to avoid having to process the full html of the page if possible.
Hi Mindy,

The discussion is this thread pertains to the iMacros Browser and iMacros for IE, not iMacros for Firefox.
Regards,

Tom, iMacros Support
Tom, Tech Support
 
Posts: 3300
Joined: Mon May 31, 2010 9:59 am

Next

Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 4 guests

-->