using css selector for imacros with conditions

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
faizzzsheikh
Posts: 4
Joined: Wed Dec 02, 2020 1:15 pm

using css selector for imacros with conditions

Post by faizzzsheikh » Wed Dec 02, 2020 1:58 pm

iMacros: v12
Browser: Internet Explorer 11
OS: Windows 10

Hi... want to know if conditional css selector works with iMacros? I want to run a web scraper. The below selector is working with Web Scrape chrome extension, but it doesn't work with iMacros.

Selector:

Code: Select all

div.s-expand-height:has(span.a-price.a-text-price), .celwidget div.s-item-container:has(span.a-price.a-text-price), div.s-include-content-margin:has(span.a-price.a-text-price)
I tried this with iMacros in below formats, but not working

Format 1

Code: Select all

TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:"s-expand-height:has(span a-price.a-text-price), celwidget s-item-container:has(span.a-price.a-text-price), s-include-content-margin:has(span.a-price.a-text-price)" EXTRACT=TXT
Format 2

Code: Select all

TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:"div.s-expand-height:has(span.a-price.a-text-price), .celwidget div.s-item-container:has(span.a-price.a-text-price), div.s-include-content-margin:has(span.a-price.a-text-price)" EXTRACT=TXT
My complete iMacros script looks like this.

Code: Select all

SET !DATASOURCE E:\imacros\urllist1.csv
SET !LOOP 2
SET !DATASOURCE_LINE {{!LOOP}}

URL GOTO={{!COL1}}
WAIT SECONDS={{!COL2}}

TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:"s-expand-height:has(span.a-price.a-text-price), .celwidget s-item-container:has(span.a-price.a-text-price), s-include-content-margin:has(span.a-price.a-text-price)" EXTRACT=TXT
ADD !EXTRACT {{!URLCURRENT}}

'TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:"s-expand-height s-include-content-margin s-border-bottom s-latency-cf-section" EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=E:\imacros FILE=data.csv
Last edited by faizzzsheikh on Wed Dec 02, 2020 5:46 pm, edited 1 time in total.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: using css selector for imacros with conditions

Post by chivracq » Wed Dec 02, 2020 5:08 pm

faizzzsheikh wrote:
Wed Dec 02, 2020 1:58 pm

Code: Select all

iMaros: v12
Browser: Internet Explorer 11
OS: Windows 10
Hi... want to know if conditional css selector works with iMacros? I want to run a web scraper. The below selector is working with Web Scrape chrome extension, but it doesn't work with iMacros.

Selector:

Code: Select all

div.s-expand-height:has(span.a-price.a-text-price), .celwidget div.s-item-container:has(span.a-price.a-text-price), div.s-include-content-margin:has(span.a-price.a-text-price)
I tried this with iMacros in below formats, but not working

Format 1

Code: Select all

TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:"s-expand-height:has(span a-price.a-text-price), celwidget s-item-container:has(span.a-price.a-text-price), s-include-content-margin:has(span.a-price.a-text-price)" EXTRACT=TXT
Format 2

Code: Select all

TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:"div.s-expand-height:has(span.a-price.a-text-price), .celwidget div.s-item-container:has(span.a-price.a-text-price), div.s-include-content-margin:has(span.a-price.a-text-price)" EXTRACT=TXT
My complete iMacros script looks like this.

Code: Select all

SET !DATASOURCE E:\imacros\urllist1.csv
SET !LOOP 2
SET !DATASOURCE_LINE {{!LOOP}}

URL GOTO={{!COL1}}
WAIT SECONDS={{!COL2}}

TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:"s-expand-height:has(span.a-price.a-text-price), .celwidget s-item-container:has(span.a-price.a-text-price), s-include-content-margin:has(span.a-price.a-text-price)" EXTRACT=TXT
ADD !EXTRACT {{!URLCURRENT}}

'TAG POS={{!LOOP}} TYPE=DIV ATTR=CLASS:"s-expand-height s-include-content-margin s-border-bottom s-latency-cf-section" EXTRACT=TXT

SAVEAS TYPE=EXTRACT FOLDER=E:\imacros FILE=data.csv

Alright, good-good-good...!, so you ARE interested after all to open a parallel Thread on "our" Forum, like I had "suggested" in your (original) Thread on SOF, ah-ah...! :D :wink:
And yep, like I mentioned on SOF, I don't or very rarely answer Threads on that Forum because most Users for the 'iMacros' Tag usually delete their Qt/Thread once they've got their Answer, to avoid sharing with the "Competition", while I help Users for the "whole Community", so "here" you won't be able to delete your Thread, ah-ah...! (Reason I quote a bit "systematically"...)

Alright, FCI mentioned, hum..., correct Spelling is "iMacros" not the ugly malformed Word you "creatively" made for it...! :shock: , and hum, iMB v12 is a bit vague, there are 3 iMB v12.x Versions, + x2 with 'Full'/'Trial' = 6 Combinations..., even if the 2 "oldest" v12.x Versions would sound a bit "strange" in 'Trial' Mode, but we still have 4 Possibilities..., => can you mention the exact Version...?

Then OK, I "see" that the 'TAG SELECTOR' Mode I have already mentioned on SOF was implemented already in iMB v11.0, I "thought" it was more recent, ah-ah...! It is also implemented in v10.x for CR like I had mentioned, as you didn't mention your FCI on SOF and I "thought" I had understood you were on CR...
I've never used and "played" with that Mode actually, as I use for myself "older" Versions on FF (v8.8.2 + v8.9.7) where that Mode was not implemented yet...

Then well..., "next Step" would be if you can post at least one URL from your DataSource (or 2 or 3 would be even "better"), where you want to run your Script... Tja...!, I need "a bit logically" I would think, to be able to have a Look at the HTML Structure of the Page/Site if I want to "understand" a bit what you are trying to do, and to do any Testing myself, ah-ah...! :idea:

And hum..., could you also "translate" in "clear Wording" your "Requirements" about what all your "CLASS:"s-expand-height:has(span.a-price.a-text-price), .celwidget s-item-container:has(span.a-price.a-text-price), s-include-content-margin:has(span.a-price.a-text-price)"" etc are supposed to do...? Or is it "really" the 'CLASS' Attribute of that 'DIV' on the Page...? But hum..., looks a bit "strange" for a 'CLASS' ATTR to me...!? :?

+ Mention also "clearly", applied to at least the first URL you will mention, which HTML Elements you expect/want to tag and extract and what the expected Result should be...

(And today is my BDay, so I'll be "a bit busy", ah-ah...! :P )
Last edited by chivracq on Wed Dec 02, 2020 6:03 pm, edited 1 time in total.
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
faizzzsheikh
Posts: 4
Joined: Wed Dec 02, 2020 1:15 pm

Re: using css selector for imacros with conditions

Post by faizzzsheikh » Wed Dec 02, 2020 6:00 pm

Alright, good-good-good...!, so you ARE interested after all to open a parallel Thread on "our" Forum, like I had "suggested" in your (original) Thread on SOF, ah-ah...! :D :wink:
And yep, like I mentioned on SOF, I don't or very rarely answer Threads on that Forum because most Users for the 'iMacros' Tag usually delete their Qt/Thread once they've got their Answer, to avoid sharing with the "Competition", while I help Users for the "whole Community", so "here" you won't be able to delete your Thread, ah-ah...! (Reason I quote a bit "systematically"...)

Alright, FCI mentioned, hum..., correct Spelling is "iMacros" not the ugly malformed Word you "creatively" made for it...! :shock: , and hum, iMB v12 is a bit vague, there are 3 iMB v12.x Versions, + x2 with 'Full'/'Trial' = 6 Combinations..., even if the 2 "oldest" v12.x Versions would sound a bit "strange" in 'Trial' Mode, but we still have 4 Possibilities..., => can you mention the exact Version...?

Then OK, I "see" that the 'TAG SELECTOR' Mode I have already mentioned on SOF was implemented already in iMB v11.0, I "thought" it was more recent, ah-ah...! It is also implemented in v10.x for CR like I had mentioned, as you didn't mention your FCI on SOF and I "thought" i had understood you were on CR...
I've never used and "played" with that Mode actually, as I use for myself "older" Versions on FF (v8.8.2 + v8.9.7) where that Mode was not implemented yet...

Then well..., "next Step" would be if you can post at least one URL from your DataSource (or 2 or 3 would be even "better"), where you want to run your Script... Tja...!, I need "a bit logically" I would think, to be able to have a Look at the HTML Structure of the Page/Site if I want to "understand" a bit what you are trying to do, and to do any Testing myself, ah-ah...! :idea:

And hum..., could you also "translate" in "clear Wording" your "Requirements" about what all your "CLASS:"s-expand-height:has(span.a-price.a-text-price), .celwidget s-item-container:has(span.a-price.a-text-price), s-include-content-margin:has(span.a-price.a-text-price)"" etc are supposed to do...? Or is it "really" the 'CLASS' Attribute of that 'DIV' on the Page...? But hum..., looks a bit "strange" for a 'CLASS' ATTR to me...!? :?

+ Mention also "clearly", applied to at least the first URL you will mention, which HTML Elements you expect/want to tag and extract and what the expected Result should be...

(And today is my BDay, so I'll be "a bit busy", ah-ah...! :P )

Thanks for your response.. wish you a very happy birthday!!

My iMacros version is v12.5

The source is Amazon's category pages. They have 3 different layout for category pages.

4 columns: https://www.amazon.com/s?k=Camping+Cool ... _1&page=26

1 column: https://www.amazon.com/s?k=audio+headph ... s-a-p_1_10

4 columns but different css selectors: https://www.amazon.com/s/ref=lp_289814_ ... lo=kitchen

The css selector I want to use i.e "div.s-expand-height:has(span.a-price.a-text-price), .celwidget div.s-item-container:has(span.a-price.a-text-price), div.s-include-content-margin:has(span.a-price.a-text-price)"

In above selector I am using comma as OR condition, on giving the list of url's, irrespective of any layout it skips the other two selectors.
The "has" operator is for the products that only have a strike through pricing example in below image.

Image
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: using css selector for imacros with conditions

Post by chivracq » Thu Dec 03, 2020 10:01 pm

faizzzsheikh wrote:
Wed Dec 02, 2020 6:00 pm
Thanks for your response.. wish you a very happy birthday!!

My iMacros version is v12.5

The source is Amazon's category pages. They have 3 different layout for category pages.

4 columns: https://www.amazon.com/s?k=Camping+Cool ... _1&page=26

1 column: https://www.amazon.com/s?k=audio+headph ... s-a-p_1_10

4 columns but different css selectors: https://www.amazon.com/s/ref=lp_289814_ ... lo=kitchen

The css selector I want to use i.e "div.s-expand-height:has(span.a-price.a-text-price), .celwidget div.s-item-container:has(span.a-price.a-text-price), div.s-include-content-margin:has(span.a-price.a-text-price)"

In above selector I am using comma as OR condition, on giving the list of url's, irrespective of any layout it skips the other two selectors.
The "has" operator is for the products that only have a strike through pricing example in below image.

Code: Select all

[img]https://i.imgur.com/oRz1LWX_d.webp?maxwidth=760&fidelity=grand[/img]

OK for iMB v12.5.

Your Screenshot is very useful, but could you (also) upload it "directly" to the Forum...?, like explained in the Forum Rules... :!:

Alright, perfect with the 3 URL's, even if the 3 Pages "apparently" get displayed a bit differently in my Browser (Pale Moon v26.3.3), I never get the 4 Cols, i only get 1 Col / 2 Cols / 1 + 2 Cols.

But I was able to do some Testing on the Page corresponding to/containing your Screenshot, and you didn't "really" explain your Requirements nor what exactly you want to extract for each Article, but if for example you want to extract both Prices + the Article Description, and only for Articles with a Discount Price, then that Functionality can be achieved using the following Script for example, and mostly using 'Relative Positioning' that I find much more "straightforward" than your "Conditional CSS Selector(s)", ah-ah...!:

Code: Select all

VERSION BUILD=8820413 RECORDER=FX
SET !EXTRACT_TEST_POPUP NO
SET !TIMEOUT_STEP 2

'SET !LOOP 1

TAB T=1
'URL GOTO=https://www.amazon.com/s?rh=n%3A1055398%2Cn%3A%211063498%2Cn%3A284507%2Cn%3A289814&lo=image&qid=1606931446&ref=lp_289814_il_ti_kitchen
'TAG POS=2 TYPE=SPAN ATTR=TXT:$14.95
'TAG POS=2 TYPE=SPAN ATTR=TXT:$14.95 EXTRACT=HTM
'=> Extracted: "<span style="outline: 1px solid blue;" aria-hidden="true">$14.95</span>"

'BACK
'TAG POS=2 TYPE=SPAN ATTR=TXT:11.
'TAG POS=2 TYPE=SPAN ATTR=TXT:11. EXTRACT=HTM
'=> Extracted: "<span style="outline: 1px solid blue;" class="a-price-whole">11<span class="a-price-decimal">.</span></span>"

'BACK
'TAG POS=4 TYPE=DIV ATTR=TXT:$11.99$11.99$14.95$14.95
'TAG POS=4 TYPE=DIV ATTR=TXT:$11.99$11.99$14.95$14.95 EXTRACT=HTM
'=> Extracted:
'<div style="outline: 1px solid blue;" class="a-row">
'  <span class="a-price" data-a-size="base_plus" data-a-color="base">
'    <span class="a-offscreen">$11.99</span>
'    <span aria-hidden="true">
'      <span class="a-price-symbol">$</span>
'      <span style="outline: 1px solid blue;" class="a-price-whole">11<span class="a-price-decimal">.</span></span>
'      <span class="a-price-fraction">99</span>
'    </span>
'  </span>
'  <span class="a-letter-space"></span>
'  <span class="a-price a-text-price" data-a-size="mini" data-a-strike="true" data-a-color="secondary">
'    <span class="a-offscreen">$14.95</span>
'    <span style="outline: 1px solid blue;" aria-hidden="true">$14.95</span>
'  </span>
'</div>

'Extract 'Original Price' (Striked through):
'TAG POS=3 TYPE=SPAN ATTR=CLASS:"a-price a-text-price"&&DATA-A-STRIKE:"true"&&TXT:$*$* EXTRACT=HTM
SET !EXTRACT NULL
'TAG POS=3 TYPE=SPAN ATTR=CLASS:"a-price a-text-price"&&DATA-A-STRIKE:"true"&&TXT:$*$* EXTRACT=TXT
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:"a-price a-text-price"&&DATA-A-STRIKE:"true"&&TXT:$* EXTRACT=TXT
'SET Orig_Price {{!EXTRACT}}
SET Orig_Price EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('$'); y=x[1]; z='$'+y; z;")

'Extract 'Discount Price':
SET !EXTRACT NULL
TAG POS=R-1 TYPE=SPAN ATTR=CLASS:"a-price"&&TXT:$*$* EXTRACT=TXT
'SET Discount_Price {{!EXTRACT}}
SET Discount_Price EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('$'); y=x[1]; z='$'+y; z;")

'Extract 'Description':
'TAG POS=1 TYPE=H2 ATTR=TXT:Genuine<SP>Instant<SP>Pot<SP>Tempered<SP>Glass<SP>Lid,<SP>9*
'TAG POS=1 TYPE=H2 ATTR=TXT:Genuine<SP>Instant<SP>Pot<SP>Tempered<SP>Glass<SP>Lid,<SP>9* EXTRACT=HTM
'=> Extracted:
'<h2 style="outline: 1px solid blue;" class="a-size-mini a-spacing-none a-color-base s-line-clamp-2">
'  <span class="a-size-small a-color-base a-text-normal" dir="auto">Genuine Instant Pot Tempered Glass Lid, 9 in. (23 cm), 6 Quart, Clear</span>
'</h2>
'>
SET !EXTRACT NULL
TAG POS=R-1 TYPE=H2 ATTR=TXT:* EXTRACT=TXT
'SET Descr {{!EXTRACT}}
SET Descr EVAL("var s='{{!EXTRACT}}'; var z=s.trim(); z;")

'PROMPT Descr:<SP>_{{Descr}}_<BR>Discount_Price:<SP>_{{Discount_Price}}_<BR>Original_Price:<SP>_{{Orig_Price}}_
PROMPT LOOP:<SP>_{{!LOOP}}_<BR><BR>Descr:<SP>_{{Descr}}_<BR>Discount_Price:<SP>_{{Discount_Price}}_<BR>Original_Price:<SP>_{{Orig_Price}}_

'PAUSE
' POS=1 TYPE=SPAN ATTR=TXT:$14.95
'TAG POS=2 TYPE=SPAN ATTR=TXT:11
'TAG POS=2 TYPE=DIV ATTR=TXT:$1199$14.95#listPriceLegalMessageText<SP>{margin-left:<SP>4p*
This is the "full" Script I used, with all Debugging for you to better understand what I did... :wink:
If you loop it 20 times for example on the Page with your "Genuine Instant Pot etc..." Article, it will extract (only) the 16 or 17 Articles with a Discounted Price... 8)

And here is a "shortened" Version of the Script, a bit cleaner, without all Debug Info...:

Code: Select all

VERSION BUILD=8820413 RECORDER=FX
SET !EXTRACT_TEST_POPUP NO
SET !TIMEOUT_STEP 2

'SET !LOOP 1

TAB T=1
'URL GOTO=https://www.amazon.com/s?rh=n%3A1055398%2Cn%3A%211063498%2Cn%3A284507%2Cn%3A289814&lo=image&qid=1606931446&ref=lp_289814_il_ti_kitchen

'Extract 'Original Price' (Striked through):
'TAG POS=2 TYPE=SPAN ATTR=TXT:$14.95
'TAG POS=3 TYPE=SPAN ATTR=CLASS:"a-price a-text-price"&&DATA-A-STRIKE:"true"&&TXT:$*$* EXTRACT=HTM
SET !EXTRACT NULL
'TAG POS=3 TYPE=SPAN ATTR=CLASS:"a-price a-text-price"&&DATA-A-STRIKE:"true"&&TXT:$*$* EXTRACT=TXT
TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:"a-price a-text-price"&&DATA-A-STRIKE:"true"&&TXT:$*$* EXTRACT=TXT
SET Orig_Price EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('$'); y=x[1]; z='$'+y; z;")

'Extract 'Discount Price':
SET !EXTRACT NULL
TAG POS=R-1 TYPE=SPAN ATTR=CLASS:"a-price"&&TXT:$*$* EXTRACT=TXT
SET Discount_Price EVAL("var s='{{!EXTRACT}}'; var x,y,z; x=s.split('$'); y=x[1]; z='$'+y; z;")

'Extract 'Description':
'TAG POS=1 TYPE=H2 ATTR=TXT:Genuine<SP>Instant<SP>Pot<SP>Tempered<SP>Glass<SP>Lid,<SP>9* EXTRACT=HTM
SET !EXTRACT NULL
TAG POS=R-1 TYPE=H2 ATTR=TXT:* EXTRACT=TXT
SET Descr EVAL("var s='{{!EXTRACT}}'; var z=s.trim(); z;")

PROMPT LOOP:<SP>_{{!LOOP}}_<BR><BR>Descr:<SP>_{{Descr}}_<BR>Discount_Price:<SP>_{{Discount_Price}}_<BR>Original_Price:<SP>_{{Orig_Price}}_
(Tested on iMacros for FF v8.8.2, PM v26.3.3, Win10_x64.)

And the same Script works on all 3 Pages... 8)
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
faizzzsheikh
Posts: 4
Joined: Wed Dec 02, 2020 1:15 pm

Re: using css selector for imacros with conditions

Post by faizzzsheikh » Sat Dec 05, 2020 2:19 pm

This is too advance for me. I am still unsure if or how I can use the CSS Selectors with conditions like shown in my first post. I want to use conditions like Multiple CLASS and the HAS condition with the class.

Thanks for your support.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: using css selector for imacros with conditions

Post by chivracq » Sat Dec 05, 2020 4:28 pm

faizzzsheikh wrote:
Sat Dec 05, 2020 2:19 pm
This is too advance for me. I am still unsure if or how I can use the CSS Selectors with conditions like shown in my first post. I want to use conditions like Multiple CLASS and the HAS condition with the class.

Thanks for your support.

"Too advanced"...!? Hum..., it looks much simpler to me... :P

Using 'Relative Positioning', your:

Code: Select all

div.s-expand-height:has(span.a-price.a-text-price), .celwidget div.s-item-container:has(span.a-price.a-text-price), div.s-include-content-margin:has(span.a-price.a-text-price)
... can be "translated"/converted to simply:

Code: Select all

TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:"a-price a-text-price"
TAG POS=R-1 TYPE=DIV ATTR=*
'R-POS' then handles the Conditional Logic that you want, and you only need to specify that Condition once instead of 3 times in "your" Syntax...

... Well, I "think"... I didn't test again...
And in my "original" Script, I had included the '[data-a-strike="true"]' Attribute for the 'SPAN' Element as 'Anchor' which i "thought" was the main Condition for your Logic, but maybe it's not needed indeed... :|

>>>

But hum..., I will "ping" @TechSup if they want to have a Look at your Thread, maybe they will be able to give you some Answer using "exactly" the Functionality/Implementation that you want..., I've never used the 'TAG SELECTOR' Mode myself..., nor the 'TAG XPATH' Mode either...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
faizzzsheikh
Posts: 4
Joined: Wed Dec 02, 2020 1:15 pm

Re: using css selector for imacros with conditions

Post by faizzzsheikh » Sat Dec 05, 2020 5:58 pm

Actually I want to save TEXT of the entire individual product container in each cell, but for only those products that have a strike through (don't just want to extract the pricing data). If you check the class selectors I had mentioned they are for the product container. "a-price a-text-price" only extracts the pricing data which is of no use. If you look at the selectors I had mentioned they are for the main product div i.e "s-expand-height". Once the data is saved I will delimit it in excel.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: using css selector for imacros with conditions

Post by chivracq » Sat Dec 05, 2020 8:40 pm

faizzzsheikh wrote:
Sat Dec 05, 2020 5:58 pm
Actually I want to save TEXT of the entire individual product container in each cell, but for only those products that have a strike through (don't just want to extract the pricing data). If you check the class selectors I had mentioned they are for the product container. "a-price a-text-price" only extracts the pricing data which is of no use. If you look at the selectors I had mentioned they are for the main product div i.e "s-expand-height". Once the data is saved I will delimit it in excel.

Yeah, well..., the Script I had posted was meant as an Example on how to locate/extract 3 precise Fields based on some Condition, by simply using 'TAG POS=Rn', + how to get the extracted Data cleaned from any Formatting/extra Data and/or Blanks directly from iMacros before saving it to your '.CSV'...

Hopefully that Script Example and the Techniques I used will "one day" be useful to some other User(s)... 8)
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Tom, Tech Support
Posts: 3834
Joined: Mon May 31, 2010 4:59 pm

Re: using css selector for imacros with conditions

Post by Tom, Tech Support » Tue Dec 08, 2020 2:31 pm

Hi faizzzsheikh,

Here are some comments and observations:
  1. As already informed by chivracq, you can only use a CSS selector in iMacros with a TAG SELECTOR or EVENT[S] ... SELECTOR command. Attempting to specify a CSS selector as part of a CLASS attribute in a TAG command is not valid syntax.
  2. The CSS selector has to resolve to one and only one element (the same applies to using XPATH with TAG or EVENT[S]). This is because the version of the TAG command that supports SELECTOR and XPATH does not support the POS parameter for further qualifying the element to select, so it has to select a specific element. The same applies to the EVENT[S] command, which inherently has no POS parameter anyway.
  3. I am curious about your use of the :has pseudo-class in your selector. According to MDN, :has is not yet supported in any browser, so I don't understand how or why you are using it.
So, you need to take a different approach to extracting the desired information rather than using CSS selectors, as also suggested by chivracq. Here is a quick example to achieve what I think you want:

Code: Select all

TAG POS={{!LOOP}} TYPE=SPAN ATTR=CLASS:*a-text-price&&DATA-A-STRIKE:true EXTRACT=TXT
SET !EXTRACT NULL
TAG POS=R-1 TYPE=SPAN ATTR=CLASS:celwidget*widgetId=search-results EXTRACT=TXT

The first TAG command is 'selecting' a strike-through price. We don't want to actually click on the element because this will cause a navigation to the product page, so we extract the value instead and then immediately throw away the extracted value by setting !EXTRACT to NULL.

Next, we use reverse relative positioning (POS=R-1) to select the first SPAN element that appears prior to the strike-through price and has a class attribute beginning with celwidget and ending with widgetId=search-results, and we extract this element, which contains the complete product listing. This is consistent for two of the pages for which you provided links (the first and second link).

For the item listings on the third linked page containing results for kitchenware, the strike-through price contains different class values, and I don't think your original CSS selector is currently taking this into account (perhaps the site HTML has changed since your original post). Plus, the entire product listing is contained within a list item (LI) element on this page. So you'll need to use some different macro code for extracting the items on this type of page, for which I will provide in the expanded example below.

There is yet one more piece to this puzzle. Notice that I am referencing the !LOOP variable in the first TAG command, so that this macro can be repeated/looped in order to extract all of the items on the page that have a strike-through price.

However, you also want to loop through rows in your input file, so what you really have here is a nested loop situation (a loop within a loop). If you also want to iterate over multiple pages of product listings, then you are talking about three separate loops.

iMacros can only inherently handle a single loop a time. You either have to eliminate one of the loops, or you need to make use of the iMacros scripting interface and write some script code for managing both loops. The scripting interface is only available with the Enterprise Edition of iMacros.

The expanded, scripted example follows. This example only handles two loops: looping over the input file, and looping over the items on the first page of results. I've also added an extra column to the input file such that the third column now provides the name of the macro to run to perform the extraction for that specific URL.

ExtractListings.vbs

Code: Select all

Option Explicit

' Define some error constants
Const ERR_SUCCESS			= 1
Const ERR_ENDOFFILE			= -1240
Const ERR_EVALERROR		 	= -1340

Dim im, ret
Set im = CreateObject("iMacros")

' Launch the iMacros browser (or attach to existing instance)
CheckErr(im.iimOpen(, False))

' Setup the main loop to iterate rows in the input file
Dim row : row = 2	' Initialize to 2 to skip header row in file
Dim moreRows : moreRows = True

Do While moreRows

	Call im.iimSet("row", row)
	ret = im.iimPlay("Do Search")

	If ret = ERR_SUCCESS Then

		' Get the name of the extraction macro to call
		Dim extractMacro : extractMacro = im.iimGetExtract(1)

		' Loop through desired items on the page
		Dim pos : pos = 1
		Dim moreItems : moreItems = True

		Do While moreItems

			Call im.iimSet("pos", pos)
			ret = im.iimPlay(extractMacro)

			If ret = ERR_SUCCESS Then
				pos = pos + 1
			ElseIf ret = ERR_EVALERROR Then
				' Assume we've extracted all desired items on the page
				moreItems = False
			Else
				' Some other unexpected error occurred
				CheckErr(ret)
			End If

		Loop

		row = row + 1

	ElseIf ret = ERR_ENDOFFILE Then
		moreRows = False
	Else
		CheckErr(ret)
	End If

Loop

' Generic error routine that checks the specified return code and
' if it is an iMacros error then it displays an error message and
' aborts the script.
Sub CheckErr(retCode)

	If retCode < 0 Then
		MsgBox im.iimGetLastError(), vbCritical, "Macro Error: " & retCode
		WScript.Quit()
	End If
	
End Sub
Do Search.iim

Code: Select all

SET !DATASOURCE E:\imacros\urllist1.csv
SET !DATASOURCE_LINE {{row}}

URL GOTO={{!COL1}}
WAIT SECONDS={{!COL2}}

' Add col3 to the extraction buffer so that the calling script can
' retrieve the name of the macro to call for this specific URL.
SET !EXTRACT {{!COL3}}
Extract Type 1.iim

Code: Select all

SET !TIMEOUT_STEP 0

TAG POS={{pos}} TYPE=SPAN ATTR=CLASS:*a-text-price&&DATA-A-STRIKE:true EXTRACT=TXT
' Raise an error if the element was not found
SET failIfNotFound EVAL("if (\"{{!EXTRACT}}\" == '#EANF#') MacroError('Element not found');")
SET !EXTRACT NULL
TAG POS=R-1 TYPE=SPAN ATTR=CLASS:celwidget*widgetId=search-results EXTRACT=TXT

ADD !EXTRACT {{!URLCURRENT}}

SAVEAS TYPE=EXTRACT FOLDER=E:\imacros FILE=data.csv
Extract Type 2.iim

Code: Select all

SET !TIMEOUT_STEP 0

TAG POS={{pos}} TYPE=SPAN ATTR=CLASS:*a-text-strike EXTRACT=TXT
' Raise an error if the element was not found
SET failIfNotFound EVAL("if (\"{{!EXTRACT}}\" == '#EANF#') MacroError('Element not found');")
SET !EXTRACT NULL
TAG POS=R-1 TYPE=LI ATTR=CLASS:s-result-item* EXTRACT=TXT

ADD !EXTRACT {{!URLCURRENT}}

SAVEAS TYPE=EXTRACT FOLDER=E:\imacros FILE=data.csv
urllist1.csv

Code: Select all

url,delay,extraction macro
https://www.amazon.com/s?k=Camping+Coolers&i=outdoor-recreation&rh=n%3A14335431&_encoding=UTF8&c=ts&qid=1606405322&ts_id=14335431&ref=sr_pg_1&page=26,2,Extract Type 1
https://www.amazon.com/s?k=audio+headphones&crid=1TTTKHX38TIX3&sprefix=audio+head%2Caps%2C350&ref=nb_sb_ss_ts-a-p_1_10,2,Extract Type 1
https://www.amazon.com/s/ref=lp_289814_il_ti_kitchen?rh=n%3A1055398%2Cn%3A%211063498%2Cn%3A284507%2Cn%3A289814&ie=UTF8&qid=1606931446&lo=kitchen,2,Extract Type 2
Regards,

Tom, iMacros Support
Post Reply