As many of you are aware, the table extraction format changed from iMacros 6 to iMacros 7. As a result, iMacros 7 no longer formats table data with the #NEXT# and #NEWLINE# delimiters when extracting a table containing nested tables - a normal text extraction is performed instead.
[Update: iMacros 8 now includes #NEXT# and #NEWLINE# delimiters when extracting nested tables.]
One workaround to this is to only extract the innermost table. However, there are many cases where you may need to extract the entire table including all sub-tables, while retaining the format of V6. This is especially important if you have many legacy macros and scripts that depend on this format.
The following VBScript routines will accomplish this for you!
This solution requires the iMacros Scripting Edition, and it also requires you to use EXTRACT=HTM when extracting your table instead of EXTRACT=TXT.
Once you have extracted the table HTML and retrieved it with a call to iimGetLastExtract, you simply pass it to ExtractTableV6 and it returns the data in a format similar to that used in V6.
Code: Select all
Function ExtractTableV6(htmlTable)
' Extracts a table (including nested tables) from raw HTML extracted with iMacros
' using the TYPE=TABLE and EXTRACT=HTM parameters.
Dim doc
Set doc = CreateObject("HTMLFile")
doc.write htmlTable
Dim outermostTable, tableData
Set outermostTable = doc.getElementsByTagName("TABLE")(0)
ExtractTableV6 = ExtractChildNodesV6(outermostTable, tableData, False)
End Function
Function ExtractChildNodesV6(ByRef node, ByRef tableData, ByVal isNestedTable)
' Recursive function to extract all child nodes of the given outermost TABLE node.
' The returned data contains #NEXT# and #NEWLINE# delimiters for each table cell and
' row in a format similar to iMacros 6.
Dim child, text
For Each child In node.children
If child.tagName = "P" Then
tableData = tableData & vbNewLine
End If
If child.children.length = 0 Then
If child.tagName = "BR" Then
text = vbNewLine
Else
If Len(tableData) > 1 And Right(tableData, 2) <> vbNewLine Then
text = " "
Else
text = ""
End If
text = text & child.innerText
End If
tableData = tableData & text
ElseIf child.children.length > 0 Then
Dim afterBegin : afterBegin = child.getAdjacentText("afterBegin")
If Len(afterBegin) > 0 Then
tableData = tableData & child.getAdjacentText("afterBegin")
End If
If child.tagName = "TABLE" Then
isNestedTable = True
End If
tableData = ExtractChildNodesV6(child, tableData, isNestedTable)
tableData = tableData & child.getAdjacentText("beforeEnd")
End If
If child.tagName = "TD" Or child.tagName = "TH" Then
tableData = tableData & "#NEXT#"
ElseIf child.tagName = "TR" Then
tableData = tableData & "#NEWLINE#"
If isNestedTable Then
tableData = tableData & vbNewLine
End If
End If
Next
ExtractChildNodesV6 = tableData
End Function
This code is provided AS-IS, please feel free to update and modify it to suit your own specific needs!
Attached is a complete example script that replicates the behavior and output of iMacros 6 for the following macro:
Code: Select all
URL GOTO=http://www.iopus.com/imacros/demo/v6/extract2/
TAG POS=1 TYPE=P ATTR=CLASS:heading EXTRACT=TXT
TAG POS=1 TYPE=BLOCKQUOTE ATTR=CLASS:bdytxt EXTRACT=TXT
TAG POS=1 TYPE=TABLE ATTR=TXT:This<SP>line<SP>is<SP>extracted* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=TestExtract_{{!NOW:hhmmss}}.csv