How to extract info from tables using CSS Selectors?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
ad0lf0
Posts: 1
Joined: Tue May 04, 2021 2:32 am

How to extract info from tables using CSS Selectors?

Post by ad0lf0 » Tue May 04, 2021 2:52 am

Hi guys, so I am new at this. A few Info:
VERSION BUILD=1010 RECORDER=CR
Chrome Version 90.0.4430.93 64 bits
Windows 10


Also, I am sorry if I say something wrong and I will try to give full information as possible.

1) I am trying to scraping a table, though could not work with regular TAG POS and now I am using CSS Selectors, though with no success.

2) Attached is the example of the html page I am trying to scrape. Observation: After the successful captcha input.

3) The Captcha killing - this part is working very well. The problem starts selecting tags then extracting at line 27.

Here is the complete macro:

Code: Select all

VERSION BUILD=1010 RECORDER=CR
TAB T=1
TAB CLOSEALLOTHERS
'SET !PLAYBACKDELAY 0.00
URL GOTO=https://servicos.receita.fazenda.gov.br/Servicos/cnpjreva/cnpjreva_solicitacao.asp

TAG POS=1 TYPE=INPUT:TEXT ATTR=NAME:cnpj CONTENT=00.360.305/0001-04

' Insert your Anti-Captcha API key here
SET antiCaptchaApiKey XXXXXX

' Fetch Anti-Captcha API key in TEXTAREA.g-recaptcha-response element
TAG POS=1 TYPE=TEXTAREA ATTR=CLASS:g-recaptcha-response CONTENT={{antiCaptchaApiKey}}
' Or you can place the API key in DIV#anticaptcha-imacros-account-key, it will also work
'URL GOTO=javascript:(function(){var<SP>d=document.getElementById("anticaptcha-imacros-account-key");d||(d=document.createElement("div"),d.innerHTML="{{antiCaptchaApiKey}}",d.style.display="none",d.id="anticaptcha-imacros-account-key",document.body.appendChild(d))})();
'
' Include recaptcha.js file with all the functional
URL GOTO=javascript:(function(){var<SP>s=document.createElement("script");s.src="https://cdn.antcpt.com/imacros_inclusion/recaptcha.js?"+Math.random();document.body.appendChild(s);})();

' Most important part: we wait 120 seconds until an AntiCatcha indicator 
' with class "antigate_solver" gets in additional "solved" class
SET !TIMEOUT_STEP 120
TAG POS=1 TYPE=DIV ATTR=CLASS:"*antigate_solver*solved*"

TAG POS=1 TYPE=BUTTON:SUBMIT ATTR=TXT:Consultar

TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(2)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL1}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(2)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL2}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(3)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL3}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(4)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL4}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(4)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL5}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(5)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL6}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(6)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL7}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(7)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL8}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(8)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL9}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(8)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL10}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(8)>TBODY>TR>TD:nth-of-type(5)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL11}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(9)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL12}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(9)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL13}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(9)>TBODY>TR>TD:nth-of-type(5)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL14}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(9)>TBODY>TR>TD:nth-of-type(7)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL15}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(10)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL16}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(11)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL17}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(12)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL18}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(12)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL19}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(13)>TBODY>TR>TD" EXTRACT=TXT !EXTRACT {{!COL20}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(14)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL21}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(14)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL22}}

SAVEAS TYPE=EXTRACT FOLDER=* FILE=Extract_{{!NOW:ddmmyy_hhnnss}}.csv
So I dont know how to exactly select the correct tags without the "normal mode" because they return the same tags for every selection on the table, and I am completely lost using CSS selectors.

How you guys can help me and I am completely all ears to learn a solution.
Attachments
EXAMPLE.zip
(4.82 KiB) Downloaded 168 times
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: How to extract info from tables using CSS Selectors?

Post by chivracq » Tue May 04, 2021 3:55 pm

ad0lf0 wrote:
Tue May 04, 2021 2:52 am
Hi guys, so I am new at this. A few Info:
VERSION BUILD=1010 RECORDER=CR
Chrome Version 90.0.4430.93 64 bits
Windows 10


Also, I am sorry if I say something wrong and I will try to give full information as possible.

1) I am trying to scraping a table, though could not work with regular TAG POS and now I am using CSS Selectors, though with no success.

2) Attached is the example of the html page I am trying to scrape. Observation: After the successful captcha input.

3) The Captcha killing - this part is working very well. The problem starts selecting tags then extracting at line 27.

Here is the complete macro:

Code: Select all

VERSION BUILD=1010 RECORDER=CR
TAB T=1
TAB CLOSEALLOTHERS
'SET !PLAYBACKDELAY 0.00
URL GOTO=https://servicos.receita.fazenda.gov.br/Servicos/cnpjreva/cnpjreva_solicitacao.asp

TAG POS=1 TYPE=INPUT:TEXT ATTR=NAME:cnpj CONTENT=00.360.305/0001-04

' Insert your Anti-Captcha API key here
SET antiCaptchaApiKey XXXXXX

' Fetch Anti-Captcha API key in TEXTAREA.g-recaptcha-response element
TAG POS=1 TYPE=TEXTAREA ATTR=CLASS:g-recaptcha-response CONTENT={{antiCaptchaApiKey}}
' Or you can place the API key in DIV#anticaptcha-imacros-account-key, it will also work
'URL GOTO=javascript:(function(){var<SP>d=document.getElementById("anticaptcha-imacros-account-key");d||(d=document.createElement("div"),d.innerHTML="{{antiCaptchaApiKey}}",d.style.display="none",d.id="anticaptcha-imacros-account-key",document.body.appendChild(d))})();
'
' Include recaptcha.js file with all the functional
URL GOTO=javascript:(function(){var<SP>s=document.createElement("script");s.src="https://cdn.antcpt.com/imacros_inclusion/recaptcha.js?"+Math.random();document.body.appendChild(s);})();

' Most important part: we wait 120 seconds until an AntiCatcha indicator 
' with class "antigate_solver" gets in additional "solved" class
SET !TIMEOUT_STEP 120
TAG POS=1 TYPE=DIV ATTR=CLASS:"*antigate_solver*solved*"

TAG POS=1 TYPE=BUTTON:SUBMIT ATTR=TXT:Consultar

TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(2)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL1}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(2)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL2}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(3)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL3}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(4)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL4}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(4)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL5}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(5)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL6}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(6)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL7}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(7)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL8}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(8)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL9}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(8)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL10}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(8)>TBODY>TR>TD:nth-of-type(5)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL11}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(9)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL12}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(9)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL13}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(9)>TBODY>TR>TD:nth-of-type(5)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL14}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(9)>TBODY>TR>TD:nth-of-type(7)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL15}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(10)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL16}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(11)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL17}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(12)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL18}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(12)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL19}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(13)>TBODY>TR>TD" EXTRACT=TXT !EXTRACT {{!COL20}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(14)>TBODY>TR>TD>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL21}}
TAG SELECTOR="HTML>BODY>DIV>DIV>DIV>DIV>DIV>DIV:nth-of-type(2)>DIV>DIV>TABLE>TBODY>TR>TD>TABLE:nth-of-type(14)>TBODY>TR>TD:nth-of-type(3)>FONT:nth-of-type(2)>B" EXTRACT=TXT !EXTRACT {{!COL22}}

SAVEAS TYPE=EXTRACT FOLDER=* FILE=Extract_{{!NOW:ddmmyy_hhnnss}}.csv
So I dont know how to exactly select the correct tags without the "normal mode" because they return the same tags for every selection on the table, and I am completely lost using CSS selectors.

How you guys can help me and I am completely all ears to learn a solution.

Hum, Compliment on the Good Quality for your OP, and even providing an Example Page for Testing... Perfect...! :D
Only 'Free'/'PE' is missing from your FCI...?

Then OK, thanks to your Example, I was able to have a Look at this Page, and..., yep OK, I can understand why you want to use CSS Selectors and the 'SELECTOR' Mode, ... that I have never "really" used, as it is not implemented/supported in the Version(s) I use for myself, but I don't use the 'TAG XPATH' Mode either, which could also be a "way to go"...

BUT...!, the HTML Structure on this Page/Site is pretty straightforward (apart from re-nesting a 2nd mini Sub-Table in every single Cell in the "main" Table), to extract all Data Cell by Cell, => using the "Standard" 'TAG POS' Mode + 'Relative Positioning'..., as each Cell ('TD' Element) then contains a "Header"/"Label" (as 'FONT' Element, which will be the 'Anchor') and the Data you want to extract as 'B' Element, which will be the 'R-POS' 'Target'...). :idea:

And that will give for example, applied to a few Fields:

Code: Select all

TAG POS=1 TYPE=FONT ATTR=TXT:DATA<SP>DE<SP>ABERTURA
'TAG POS=1 TYPE=B ATTR=TXT:03/02/1971 // (Recorded)
TAG POS=R1 TYPE=B ATTR=TXT:* EXTRACT=TXT

TAG POS=1 TYPE=FONT ATTR=TXT:NOME<SP>EMPRESARIAL
'TAG POS=1 TYPE=B ATTR=TXT:CAIXA<SP>ECONOMICA<SP>FEDERAL // (Recorded)
TAG POS=R1 TYPE=B ATTR=TXT:* EXTRACT=TXT

TAG POS=1 TYPE=FONT ATTR=TXT:TELEFONE
'TAG POS=1 TYPE=B ATTR=TXT:(61)<SP>3521-8600 // (Recorded)
TAG POS=R1 TYPE=B ATTR=TXT:* EXTRACT=TXT

PROMPT {{!EXTRACT}}
... Which will give in the 'PROMPT':

Code: Select all

03/02/1971[EXTRACT]CAIXA ECONOMICA FEDERAL        [... some Spaces removed...]             [EXTRACT](61) 3521-8600
Hum..., not all Data seems to be completely "clean", the 2nd Field (the "NOME EMPRESARIAL") for example contains some extract Spaces, so you'll probably need to use 'EVAL()' if you want to "clean" the Data before saving it to your '.CSV'.

Some possible Drawback/Difficulty will be for example for the "NÚMERO DE INSCRIÇÃO" Cell which contains 2 'B' Elements to display the Data, or for the "ENDEREÇO ELETRÔNICO" Cell which is empty, then 'POS=R1' will then "catch" and extract the "next" 'B' Element, ... which will be the one corresponding to the "TELEFONE" Cell, oops...!

It might then be more "reliable" to extract the Data at the 'TD'-Container Level, and using 'EVAL()' to retain the Data that you want to extract, => even an empty String if that Cell is empty...
But this will take a few Lines for Code for each single Cell you want to extract, you didn't mention in your FCI if you are using the 'Free' or 'PE' Version, but if using the 'Free' Version, you will quickly reach the "Max=50 Lines" Limit for the 'Free' Version... And you'll have to constantly "juggle" between the 3 Vars allowed by the 'Free' Version...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: How to extract info from tables using CSS Selectors?

Post by chivracq » Thu May 06, 2021 10:48 pm

And...?, any Update/Follow-up, 3 days later...!? :o

The Follow-up is a bit "less impressive" than the Quality on the OP, I would say... :(

>>>

EDIT:
"Misusing" this Post for some Testing...:

Code: Select all

12345678901234567890123456789012345678901234567-50-234567890123456789012345678901234567890123456-100-234567890123456
=> 116 Monospace Chars... => No H-Scrollbar...!

Code: Select all

12345678901234567890123456789012345678901234567-50-234567890123456789012345678901234567890123456-100-2345678901234567
=> 117 Monospace Chars... => H-Scrollbar triggered...!
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Post Reply