Here's how the site and the table do look like:
Technique 1: By position only:
As HTML table fields are enclosed by <td> tags, we can use is in the extraction. Here's the code that extracts the first field ("Last Trade") using the table's heading as an extraction anchor
Code: Select all
TAG POS=1 TYPE=B ATTR=TXT:ACME<SP>UNITED<SP>CP
TAG POS=R3 TYPE=TD ATTR=TXT:* EXTRACT=TXT
and POS=R5 extracts the first entry of the second line.
Thus, extracting the whole table simply means to loop through the position values (starting with POS=R3). The odd values extracting the left column's data, while the even values identify the second column.
======
Technique 2: By special formatting
When using the Extraction Wizard on the first element of the first column ("Last Trade"),
the extraction command looks like this:
Code: Select all
TAG POS=1 TYPE=TD ATTR=CLASS:yfnc_tablehead1&&TXT:* EXTRACT=TXT
For the second column, it is
Code: Select all
TAG POS=1 TYPE=TD ATTR=CLASS:yfnc_tabledata1&&TXT:* EXTRACT=TXT
(Note that the extraction might need to be manipulated in a script to yield valid CSV,)
=======
[Edit: updated this section on 2008/09/26]
Technique 3: By extracting whole lines or the whole table at once:
As HTML uses the <tr> tag to enclose a table's lines, we can also use these to extract the table's data. Using the Wizard (and relative extraction, again), we click on the table's heading, and then replace the type TR and the ATTRibute by "TXT:*".
When saving this extraction, the macro looks like this:
Code: Select all
TAG POS=1 TYPE=H1 ATTR=TXT:ACME<SP>UNITED<SP>CP
TAG POS=R1 TYPE=TR ATTR=TXT:* EXTRACT=TXT
The same can be done by using TABLE as the extraction tag's TYPE
which produces the following macro commands:
Code: Select all
TAG POS=1 TYPE=H1 ATTR=TXT:ACME<SP>UNITED<SP>CP
TAG POS=1 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT