Extracting HREF link atribute on sub class tag

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Extracting HREF link atribute on sub class tag

by OniLinken on Sun Nov 06, 2016 3:44 pm

Win 10, 64bits, iMacros v11.1.149/Firefox extension.

Hi, first of all, thanks to everybody who reply here.
I'm new to iMacros, i've been trying to get it for a while and I have a question.
I'm trying to get the href text of a lot of links in a site... all of them are in class tags like this one...


<h2 class="clear nombre">
<a target="_blank" onclick="GATrackEvent(&quot;resultados_HOT&quot;, &quot;nombre producto&quot;, &quot;click ficha&quot;, 0, true);" class="product-name" href="/hoteles/853008-0_lodge-andes" title="Lodge Andes undefined"><span class="openFontSemiBold">Lodge Andes</span> </a>
</h2>

I want to save all the href text from these sites and save them in a CVS... I should get the "/hoteles/853008-0_lodge-andes"
I've tried some time with other examples, and extracting links but this one i've been having troubles...

I thought this might do the trick, but no luck...
Can anybody help me?
Last edited by OniLinken on Sun Nov 06, 2016 7:36 pm, edited 1 time in total.
OniLinken
 
Posts: 6
Joined: Sun Nov 06, 2016 10:04 am

Re: Extracting HREF link atribute on sub class tag

by chivracq on Sun Nov 06, 2016 4:45 pm

OniLinken wrote:Hi, first of all, thanks to everybody who reply here.
I'm new to iMacros, i've been trying to get it for a while and I have a question.
I'm trying to get the href text of a lot of links in a site... all of them are in class tags like this one...
Code: Select all
<h2 class="clear nombre">
<a target="_blank" onclick="GATrackEvent(&quot;resultados_HOT&quot;, &quot;nombre producto&quot;, &quot;click ficha&quot;, 0, true);" class="product-name" href="/hoteles/853008-0_lodge-andes" title="Lodge Andes undefined"><span class="openFontSemiBold">Lodge Andes</span>  </a>
</h2>


I wanna save all the href text from these sites and save them in a CVS... I should get the "/hoteles/853008-0_lodge-andes"
I've tried some time with other examples, and extracting links but this one i've been having troubles...

I thought this might do the trick, but no luck...
Can anybody help me?

CIM for me to read, read my Sig...

Hum..., and I see a "wanna" somewhere in your Post, you can better use proper English on the Forum instead of Chicago like street Language especially if your Site looks Spanish to me... :roll:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5195
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting HREF link atribute on sub class tag

by OniLinken on Sun Nov 06, 2016 7:37 pm

CIM edit
OniLinken
 
Posts: 6
Joined: Sun Nov 06, 2016 10:04 am

Re: Extracting HREF link atribute on sub class tag

by chivracq on Mon Nov 07, 2016 4:45 am

OniLinken wrote:CIM edit

Code: Select all
Win 10, 64bits,
iMacros v11.1.149/Firefox extension.

Hum, nearly good, but FF Version still missing (=> FF47/49...?), and iMacros for FF Version missing as well (=> v9.0.3...? (Is buggy btw, can better use v8.9.7.).) But OK, good enough for me to read...

Oh..., and the "wanna" has disappeared, ah-ah...!, good...!

>>>

OK, if you want to use the Class Name on the 'H2' Element as Reference for POS=n, the "Difficulty" in your Case is that the Link you are after is on/inside a 'SPAN' Element which is inside the 'H2' Element itself.

The way to go is to use Relative Positioning (you could have mentioned btw what you tried before coming to the Forum...) with the 'H2' Element as Anchor. But if you use Relative Positioning with "POS=R1" for extracting the Link, you will not "catch" the Link corresponding to your current 'H2' but the one for the next 'H2' as iMacros starts looking "after" the 'H2' Element and cannot see "inside" it.

The "Trick" is then to use "Double Relative Positioning", a Technique I've explained several times already on the Forum, which consists of first getting out of your 'H2' Element for iMacros to then be able to "see" inside again and to then be able to tag your Link and extract whatever you want from it.
Search the Forum for my Posts using the Key-Words I've put in Italic (especially "Double Relative Positioning") for more Info and Examples and post your final working Script or if you get stuck and I'll help you further...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5195
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting HREF link atribute on sub class tag

by OniLinken on Sun Nov 13, 2016 10:27 pm

Hi!
Sorry it took me a while to respond.
I had a thought some days ago and I finally got the info I needed to extract
I used this.

Code: Select all
TAG POS=1 TYPE=H2 ATTR=CLASS:"clear nombre"
TAG POS=R1 TYPE=A ATTR=TXT:* EXTRACT=HREF


You helped me a lot. Thanks!
OniLinken
 
Posts: 6
Joined: Sun Nov 06, 2016 10:04 am

Re: Extracting HREF link atribute on sub class tag

by chivracq on Sun Nov 13, 2016 11:02 pm

OniLinken wrote:Hi!
Sorry it took me a while to respond.
I had a thought some days ago and I finally got the info I needed to extract
I used this.
Code: Select all
TAG POS=1 TYPE=H2 ATTR=CLASS:"clear nombre"
TAG POS=R1 TYPE=A ATTR=TXT:* EXTRACT=HREF


You helped me a lot. Thanks!

Hum, a bit surprised that your "direct" Relative Positioning works, your Link is inside the 'H2' HTML Definition, you are sure you are not catching "the next Link"...?
Otherwise OK, maybe it's only for 'DIV's that you need to get out and in... :?

But if it works then OK, ah-ah...! :D
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5195
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Extracting HREF link atribute on sub class tag

by OniLinken on Mon Nov 14, 2016 5:17 am

It was a surprise for me as well... but after trying it actually did what I needed.
But, searching for your double relative positioning got me to the idea.
So, yeah, thanks. Have a good week.
OniLinken
 
Posts: 6
Joined: Sun Nov 06, 2016 10:04 am

Re: Extracting HREF link atribute on sub class tag

by chivracq on Mon Nov 14, 2016 9:16 am

OniLinken wrote:It was a surprise for me as well... but after trying it actually did what I needed.
But, searching for your double relative positioning got me to the idea.
So, yeah, thanks. Have a good week.

OK, then..., good to know, ah-ah...!

But once you understand the "Double Relative Positioning" Mechanism then it's easy to make sure you are tagging/extracting the Element you really want and if you need it or if "Simple" Relative Positioning is already sufficient to "do the job"...

I still find it a bit "suspicious" that Relative Positioning works differently for different HTML Types, could be a Bug actually, I find the Behaviour like for <H2> more logical, it took me a while to understand the Mechanism anyway the first time I was trying to R-tag some Element using a 'DIV' as Anchor. I wouldn't be surprised we then get a "Mix" of both Behaviours in case of embedded 'DIV's, which would then indicate that it's indeed a Bug, as I would be surprised if the Developer really included a Check on counting all embedded 'DIV's and all Levels to make sure to get out of the Outer 'DIV' used as Anchor, ah-ah...!
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 5195
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 2 guests

Website Monitoring