Extracting HREF link atribute on sub class tag

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
OniLinken
Posts: 6
Joined: Sun Nov 06, 2016 5:04 pm

Extracting HREF link atribute on sub class tag

Post by OniLinken » Sun Nov 06, 2016 10:44 pm

Win 10, 64bits, iMacros v11.1.149/Firefox extension.

Hi, first of all, thanks to everybody who reply here.
I'm new to iMacros, i've been trying to get it for a while and I have a question.
I'm trying to get the href text of a lot of links in a site... all of them are in class tags like this one...


<h2 class="clear nombre">
<a target="_blank" onclick="GATrackEvent("resultados_HOT", "nombre producto", "click ficha", 0, true);" class="product-name" href="/hoteles/853008-0_lodge-andes" title="Lodge Andes undefined"><span class="openFontSemiBold">Lodge Andes</span> </a>
</h2>

I want to save all the href text from these sites and save them in a CVS... I should get the "/hoteles/853008-0_lodge-andes"
I've tried some time with other examples, and extracting links but this one i've been having troubles...

I thought this might do the trick, but no luck...
Can anybody help me?
Last edited by OniLinken on Mon Nov 07, 2016 2:36 am, edited 1 time in total.
chivracq
Posts: 9004
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting HREF link atribute on sub class tag

Post by chivracq » Sun Nov 06, 2016 11:45 pm

OniLinken wrote:Hi, first of all, thanks to everybody who reply here.
I'm new to iMacros, i've been trying to get it for a while and I have a question.
I'm trying to get the href text of a lot of links in a site... all of them are in class tags like this one...

Code: Select all

<h2 class="clear nombre">
<a target="_blank" onclick="GATrackEvent("resultados_HOT", "nombre producto", "click ficha", 0, true);" class="product-name" href="/hoteles/853008-0_lodge-andes" title="Lodge Andes undefined"><span class="openFontSemiBold">Lodge Andes</span>  </a>
</h2>
I wanna save all the href text from these sites and save them in a CVS... I should get the "/hoteles/853008-0_lodge-andes"
I've tried some time with other examples, and extracting links but this one i've been having troubles...

I thought this might do the trick, but no luck...
Can anybody help me?
CIM for me to read, read my Sig...

Hum..., and I see a "wanna" somewhere in your Post, you can better use proper English on the Forum instead of Chicago like street Language especially if your Site looks Spanish to me... :roll:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
OniLinken
Posts: 6
Joined: Sun Nov 06, 2016 5:04 pm

Re: Extracting HREF link atribute on sub class tag

Post by OniLinken » Mon Nov 07, 2016 2:37 am

CIM edit
chivracq
Posts: 9004
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting HREF link atribute on sub class tag

Post by chivracq » Mon Nov 07, 2016 11:45 am

OniLinken wrote:CIM edit

Code: Select all

Win 10, 64bits, 
iMacros v11.1.149/Firefox extension.
Hum, nearly good, but FF Version still missing (=> FF47/49...?), and iMacros for FF Version missing as well (=> v9.0.3...? (Is buggy btw, can better use v8.9.7.).) But OK, good enough for me to read...

Oh..., and the "wanna" has disappeared, ah-ah...!, good...!

>>>

OK, if you want to use the Class Name on the 'H2' Element as Reference for POS=n, the "Difficulty" in your Case is that the Link you are after is on/inside a 'SPAN' Element which is inside the 'H2' Element itself.

The way to go is to use Relative Positioning (you could have mentioned btw what you tried before coming to the Forum...) with the 'H2' Element as Anchor. But if you use Relative Positioning with "POS=R1" for extracting the Link, you will not "catch" the Link corresponding to your current 'H2' but the one for the next 'H2' as iMacros starts looking "after" the 'H2' Element and cannot see "inside" it.

The "Trick" is then to use "Double Relative Positioning", a Technique I've explained several times already on the Forum, which consists of first getting out of your 'H2' Element for iMacros to then be able to "see" inside again and to then be able to tag your Link and extract whatever you want from it.
Search the Forum for my Posts using the Key-Words I've put in Italic (especially "Double Relative Positioning") for more Info and Examples and post your final working Script or if you get stuck and I'll help you further...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
OniLinken
Posts: 6
Joined: Sun Nov 06, 2016 5:04 pm

Re: Extracting HREF link atribute on sub class tag

Post by OniLinken » Mon Nov 14, 2016 5:27 am

Hi!
Sorry it took me a while to respond.
I had a thought some days ago and I finally got the info I needed to extract
I used this.

Code: Select all

TAG POS=1 TYPE=H2 ATTR=CLASS:"clear nombre"
TAG POS=R1 TYPE=A ATTR=TXT:* EXTRACT=HREF


You helped me a lot. Thanks!
chivracq
Posts: 9004
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting HREF link atribute on sub class tag

Post by chivracq » Mon Nov 14, 2016 6:02 am

OniLinken wrote:Hi!
Sorry it took me a while to respond.
I had a thought some days ago and I finally got the info I needed to extract
I used this.

Code: Select all

TAG POS=1 TYPE=H2 ATTR=CLASS:"clear nombre"
TAG POS=R1 TYPE=A ATTR=TXT:* EXTRACT=HREF


You helped me a lot. Thanks!
Hum, a bit surprised that your "direct" Relative Positioning works, your Link is inside the 'H2' HTML Definition, you are sure you are not catching "the next Link"...?
Otherwise OK, maybe it's only for 'DIV's that you need to get out and in... :?

But if it works then OK, ah-ah...! :D
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
OniLinken
Posts: 6
Joined: Sun Nov 06, 2016 5:04 pm

Re: Extracting HREF link atribute on sub class tag

Post by OniLinken » Mon Nov 14, 2016 12:17 pm

It was a surprise for me as well... but after trying it actually did what I needed.
But, searching for your double relative positioning got me to the idea.
So, yeah, thanks. Have a good week.
chivracq
Posts: 9004
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting HREF link atribute on sub class tag

Post by chivracq » Mon Nov 14, 2016 4:16 pm

OniLinken wrote:It was a surprise for me as well... but after trying it actually did what I needed.
But, searching for your double relative positioning got me to the idea.
So, yeah, thanks. Have a good week.
OK, then..., good to know, ah-ah...!

But once you understand the "Double Relative Positioning" Mechanism then it's easy to make sure you are tagging/extracting the Element you really want and if you need it or if "Simple" Relative Positioning is already sufficient to "do the job"...

I still find it a bit "suspicious" that Relative Positioning works differently for different HTML Types, could be a Bug actually, I find the Behaviour like for <H2> more logical, it took me a while to understand the Mechanism anyway the first time I was trying to R-tag some Element using a 'DIV' as Anchor. I wouldn't be surprised we then get a "Mix" of both Behaviours in case of embedded 'DIV's, which would then indicate that it's indeed a Bug, as I would be surprised if the Developer really included a Check on counting all embedded 'DIV's and all Levels to make sure to get out of the Outer 'DIV' used as Anchor, ah-ah...!
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
Post Reply