How to extract the link after a specific div

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

How to extract the link after a specific div

by cherubin13 on Sun Dec 27, 2015 4:45 pm

Hi,


I would like to create a list of link for a newspaper, so i built this macro :

Code: Select all
VERSION BUILD=8940826 RECORDER=FX
TAB T=1
SET !ERRORIGNORE YES
SET !LOOP 1
URL GOTO=http://www.lemonde.fr/recherche/?keywords=keyword&qt=recherche_globale&page_num={{!loop}}
TAG POS=1 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=2 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=3 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=4 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=5 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=6 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=7 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=8 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=9 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=10 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
SAVEAS TYPE=EXTRACT FOLDER=* FILE=links_lemonde.txt
WAIT SECONDS=3


Everything was good at the begining, but then i discover sometimes links are not formated always with this attribute, so i search the commun symbole:

Code: Select all
<article class="grid_12 alpha enrichi mgt8">
               <div class="grid_11 conteneur_fleuve alpha omega">
                  <a href="/culture/article/2012/04/02/selection-cd_1679176_3246.html?xtmc=immobilier&xtcr=3463" class="grid_3 alpha obf"><img width="147" height="97" data-item-type="article" data-lazyload="true" alt="Sélection CD" title="Sélection CD" class="lazy-retina" data-src="http://s2.lemde.fr/image/2007/03/10/147x97/881629_7_2fa4_les-journalistes-de-la-rubrique-musiques-du_72443f12f62ce8088d1db266472488c7.jpg" src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" onload="lmd.pic(this);" onerror="lmd.pic(this);"></a>
                  <div class="grid_8 omega resultat">
                                    <h3 class="txt4_120 "><a href="/culture/article/2012/04/02/selection-cd_1679176_3246.html?xtmc=immobilier&xtcr=3463">Sélection CD</a></h3>
                     <span class="txt1 signature">LE MONDE | 9 avril 2012</span>
                     <p class="txt3">Chutes de type Niagara glauque (Flowing Down Too Slow), mirages psychédéliques (Domeniche alla periferia dell'impero), corps fantomatiques (Nell'alto dei giorni immobili), trames effilochées (The Nameless City), tous ces moments de désagrégation se doublent paradoxalement d'une irrésistible force...</p>
                  </div>
               </div>
            </article>
            <article class="grid_12 alpha enrichi mgt8">
               <div class="grid_11 conteneur_fleuve alpha omega">
                                 <div class="grid_11 omega resultat">
                     <h3 class="txt4_120 marqueur_restreint"><a href="/a-la-une/article/2012/04/09/le-club-des-entrepreneurs-intouchables_1682627_3208.html?xtmc=immobilier&xtcr=3464">Le club des entrepreneurs intouchables</a></h3>
                     <span class="txt1 signature">LE MONDE | 9 avril 2012</span>
                     <p class="txt3">Ils appartiennent à la caste la plus méprisée d'Inde et pourtant, aujourd'hui, ils ont réussi dans les affaires. ...Ces quelques centaines de patrons tissent leur réseau et revendiquent leur place parmi l'élite...Ce n'est pas pour sa moquette rouge épaisse, ses moulures au plafond et son volumineux...</p>
                  </div>
               </div>
            </article>


I founded that :

Code: Select all
<div class="grid_* omega resultat"><h3 class="txt4_120 ">

Where * = 8 or 11

Here an example page http://www.lemonde.fr/recherche/?keywor ... ge_num=424

So i try to do this modification (and few others) :
Code: Select all
TAG POS=1 TYPE=DIV ATTR=CLASS:grid_*omega*resultat&&TXT:HREF*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF


But i don't succeed.

So, do you have an idea to help me with this macro?

Thank you
cherubin13
 
Posts: 17
Joined: Mon Apr 11, 2011 2:29 am

Re: How to extract the link after a specific div

by chivracq on Sun Dec 27, 2015 6:30 pm

cherubin13 wrote:Hi,


I would like to create a list of link for a newspaper, so i built this macro :

Code: Select all
VERSION BUILD=8940826 RECORDER=FX
TAB T=1
SET !ERRORIGNORE YES
SET !LOOP 1
URL GOTO=http://www.lemonde.fr/recherche/?keywords=keyword&qt=recherche_globale&page_num={{!loop}}
TAG POS=1 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=2 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=3 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=4 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=5 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=6 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=7 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=8 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=9 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
TAG POS=10 TYPE=A ATTR=CLASS:grid_3*alpha*obf* EXTRACT=HREF
SAVEAS TYPE=EXTRACT FOLDER=* FILE=links_lemonde.txt
WAIT SECONDS=3


Everything was good at the begining, but then i discover sometimes links are not formated always with this attribute, so i search the commun symbole:

Code: Select all
<article class="grid_12 alpha enrichi mgt8">
               <div class="grid_11 conteneur_fleuve alpha omega">
                  <a href="/culture/article/2012/04/02/selection-cd_1679176_3246.html?xtmc=immobilier&xtcr=3463" class="grid_3 alpha obf"><img width="147" height="97" data-item-type="article" data-lazyload="true" alt="Sélection CD" title="Sélection CD" class="lazy-retina" data-src="http://s2.lemde.fr/image/2007/03/10/147x97/881629_7_2fa4_les-journalistes-de-la-rubrique-musiques-du_72443f12f62ce8088d1db266472488c7.jpg" src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" onload="lmd.pic(this);" onerror="lmd.pic(this);"></a>
                  <div class="grid_8 omega resultat">
                                    <h3 class="txt4_120 "><a href="/culture/article/2012/04/02/selection-cd_1679176_3246.html?xtmc=immobilier&xtcr=3463">Sélection CD</a></h3>
                     <span class="txt1 signature">LE MONDE | 9 avril 2012</span>
                     <p class="txt3">Chutes de type Niagara glauque (Flowing Down Too Slow), mirages psychédéliques (Domeniche alla periferia dell'impero), corps fantomatiques (Nell'alto dei giorni immobili), trames effilochées (The Nameless City), tous ces moments de désagrégation se doublent paradoxalement d'une irrésistible force...</p>
                  </div>
               </div>
            </article>
            <article class="grid_12 alpha enrichi mgt8">
               <div class="grid_11 conteneur_fleuve alpha omega">
                                 <div class="grid_11 omega resultat">
                     <h3 class="txt4_120 marqueur_restreint"><a href="/a-la-une/article/2012/04/09/le-club-des-entrepreneurs-intouchables_1682627_3208.html?xtmc=immobilier&xtcr=3464">Le club des entrepreneurs intouchables</a></h3>
                     <span class="txt1 signature">LE MONDE | 9 avril 2012</span>
                     <p class="txt3">Ils appartiennent à la caste la plus méprisée d'Inde et pourtant, aujourd'hui, ils ont réussi dans les affaires. ...Ces quelques centaines de patrons tissent leur réseau et revendiquent leur place parmi l'élite...Ce n'est pas pour sa moquette rouge épaisse, ses moulures au plafond et son volumineux...</p>
                  </div>
               </div>
            </article>


I founded that :

Code: Select all
<div class="grid_* omega resultat"><h3 class="txt4_120 ">

Where * = 8 or 11

Here an example page http://www.lemonde.fr/recherche/?keywor ... ge_num=424

So i try to do this modification (and few others) :
Code: Select all
TAG POS=1 TYPE=DIV ATTR=CLASS:grid_*omega*resultat&&TXT:HREF*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF


But i don't succeed.

So, do you have an idea to help me with this macro?

Thank you

FCIM...! :mrgreen:
=> iMacros for FF v8.9.4, FF42...?, OS...?

"But i don't succeed." is a bit vague as you don't say why your Statement with Relative Positioning (which looks OK btw) doesn't work, but without looking at your Site (yet), I guess it is actually picking up the Link of the next Article and not the one that you want, because the Link is located "inside" the DIV, meaning that you probably need to use "Double Relative Positioning" to first get out of the DIV for iMacros to be able to see the Link inside it. Search my Posts on those Terms for more Info and some Examples if you don't understand what I mean...

But OK, I'm in a good mood and it's easy, so if I'm correct with my Assumption, this is what I mean...:
Code: Select all
TAG POS=1 TYPE=DIV ATTR=CLASS:grid_*omega*resultat&&TXT:HREF*
TAP POS=R1 TYPE=* ATTR=* EXTRACT=TXT
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF
(Not tested on your Site...)

EDIT: Typo in the mini-Script: "ATTR:*" (not correct) => "ATTR=*" (correct).
I've corrected it...
Last edited by chivracq on Mon Dec 28, 2015 3:32 am, edited 1 time in total.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6475
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: How to extract the link after a specific div

by cherubin13 on Mon Dec 28, 2015 12:45 am

Hi,

Sorry for my mistake, i had the head inside the macro during a while when i posted my help request ;)

So i run Imacro 8.9.4 on Firefox 43.0.2

When i told i don't succeed it's because when the macro arrived on theses lines, it take 6s for each element of this part

Code: Select all
TAG POS=1 TYPE=DIV ATTR=CLASS:grid_*omega*resultat&&TXT:HREF*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF


The result of extract is the following :

#EANF#


I tried your code however there is an error at this line

Code: Select all
TAP POS=R1 TYPE=* ATTR:* EXTRACT=TXT


But i don't understand why, i thought it was the ATTR:* instead of ATTR=* but no...
cherubin13
 
Posts: 17
Joined: Mon Apr 11, 2011 2:29 am

Re: How to extract the link after a specific div

by chivracq on Mon Dec 28, 2015 4:22 am

cherubin13 wrote:Hi,

Sorry for my mistake, i had the head inside the macro during a while when i posted my help request ;)

So i run
Code: Select all
Imacro 8.9.4  on Firefox 43.0.2

OK for FCI, even if OS is still missing... :roll: Though it won't play a role in your case, so never mind...
(But I usually don't react to Threads when FCI is not mentioned..., and the "F" stands for "Full"...)

cherubin13 wrote:When i told i don't succeed it's because when the macro arrived on theses lines, it take 6s for each element of this part

Code: Select all
TAG POS=1 TYPE=DIV ATTR=CLASS:grid_*omega*resultat&&TXT:HREF*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF


The result of extract is the following :

#EANF#

Ah, OK, that means that your Anchor is not found (and you would get some Runtime Error if you were not using '!ERRORIGNORE'...) and the HTML Element (= Link) you try to tag after that using Relative Positioning cannot be found either...

But hum, the "&&TXT:HREF*" part looks a bit strange to me...
Try removing this 'TXT' Attribute for tagging the Anchor, which will give, using your original Script with (Simple) Relative Positioning:
Code: Select all
TAG POS=1 TYPE=DIV ATTR=CLASS:grid_*omega*resultat
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF

or, using Double Relative Positioning:
Code: Select all
TAG POS=1 TYPE=DIV ATTR=CLASS:grid_*omega*resultat
TAG POS=R1 TYPE=* ATTR=* EXTRACT=TXT
SET !EXTRACT NULL
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF


cherubin13 wrote:I tried your code however there is an error at this line

Code: Select all
TAP POS=R1 TYPE=* ATTR:* EXTRACT=TXT


But i don't understand why, i thought it was the ATTR:* instead of ATTR=* but no...

Always mention the Error if you get one, "there is an error" is always vague..., but OK, you're right, Typo indeed, I've corrected in my previous Post... (I had not tested it, I said...)

And hum, maybe not necessary, but I had added a fake "EXTRACT=TXT" to the first Relative Positioning in order to prevent following a Link or clicking on a Button depending on what that first HTML Element would be, but you could maybe restrain/filter it to some "TYPE=DIV" for example, and that first 'EXTRACT' will actually "pollute" the Content of '!EXTRACT' by adding some extra Data you won't be interested in, so '!EXTRACT' needs to be reset to 'NULL' before doing your "real" Extract. Caveat is then that if you conduct several Extracts in your Script before doing the 'SAVEAS', you'll need to use a Temp Variable to store the Content of each Extract before putting it back into '!EXTRACT' when you want to do the 'SAVEAS'.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6475
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: How to extract the link after a specific div

by cherubin13 on Mon Dec 28, 2015 12:58 pm

Thank you for your quick feedback :)

I tried your 2 codes, unfortunately they don't work for now.

The first one extract the link of the following DIV so it jumps the first one. The second code extract the text of this second div, in <p class="txt3"> </p>.

Maybe i made a mistake so find the image of the content, and the generated source code :

Image

Normally, the extract might be the link behind "Un Madoff sous les tropiques"

Code: Select all
<article class="grid_12 alpha enrichi mgt8">
               <div class="grid_11 conteneur_fleuve alpha omega">
                  <a href="/la-crise-financiere/article/2009/02/23/un-madoff-sous-les-tropiques_1159125_1101386.html?xtmc=immobilier&xtcr=3531" class="grid_3 alpha obf"><img width="147" height="97" data-item-type="article" data-lazyload="true" alt="Un Madoff sous les tropiques" title="Un Madoff sous les tropiques" class="lazy-retina" data-src="http://s1.lemde.fr/image/2009/02/23/147x97/1158978_7_5d83_allen-stanford-etait-incarcere-depuis-2009_e4e90e60a5eeba77797d927f69a8baee.jpg" src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" onload="lmd.pic(this);" onerror="lmd.pic(this);"></a>
                  <div class="grid_8 omega resultat">
                                    <h3 class="txt4_120 "><a href="/la-crise-financiere/article/2009/02/23/un-madoff-sous-les-tropiques_1159125_1101386.html?xtmc=immobilier&xtcr=3531">Un Madoff sous les tropiques</a></h3>
                     <span class="txt1 signature">LE MONDE | 16 mars 2012</span>
                     <p class="txt3">Robert Allen Stanford promettait à ses clients de copieux rendements, via des paradis fiscaux : il est accusé d'une escroquerie de 9 milliards de dollars....A l'exemple des crises monétaires d'antan, la fraude présumée du milliardaire américain Robert Allen Stanford a profondément ébranlé l'Amérique...</p>
                  </div>
               </div>
            </article>
            <article class="grid_12 alpha enrichi mgt8">
               <div class="grid_11 conteneur_fleuve alpha omega">
                  <a href="/election-presidentielle-2012/article/2012/02/17/cadeaux-de-sarkozy-aux-plus-riches-les-calculs-genereux-du-ps_1645020_1471069.html?xtmc=immobilier&xtcr=3532" class="grid_3 alpha obf"><img width="147" height="97" data-item-type="article" data-lazyload="true" alt='"Cadeaux de Sarkozy aux plus riches" : les calculs généreux du PS' title='"Cadeaux de Sarkozy aux plus riches" : les calculs généreux du PS' class="lazy-retina" data-src="http://s2.lemde.fr/image/2011/05/12/147x97/1520791_7_ec73_de-janvier-a-fin-juillet-les-francais_4c5cea41f90d93c5197359747b427834.jpg" src="data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" onload="lmd.pic(this);" onerror="lmd.pic(this);"></a>
                  <div class="grid_8 omega resultat">
                                    <h3 class="txt4_120 "><a href="/election-presidentielle-2012/article/2012/02/17/cadeaux-de-sarkozy-aux-plus-riches-les-calculs-genereux-du-ps_1645020_1471069.html?xtmc=immobilier&xtcr=3532">"Cadeaux de Sarkozy aux plus riches" : les calculs généreux du PS</a></h3>
                     <span class="txt1 signature">Le Monde.fr | 16 mars 2012</span>
                     <p class="txt3">François Fillon a accusé François Hollande et les socialistes d'avoir menti pour dénoncer le bilan fiscal du président. ...Il n'a pas complètement tort....Le premier ministre, François Fillon, a accusé à son tour, vendredi 17 février, François Hollande de "mentir", comme l'avait fait la veille Nicolas...</p>
                  </div>
               </div>
            </article>


Cherubin
Last edited by cherubin13 on Thu Feb 23, 2017 9:25 am, edited 2 times in total.
cherubin13
 
Posts: 17
Joined: Mon Apr 11, 2011 2:29 am

Re: How to extract the link after a specific div

by cherubin13 on Mon Jan 04, 2016 4:41 am

Hello!

First, i wish you a happy new year !! I hope you took holidays, and enjoy Christmas time :)

I continue to try to find a solution, i'm close to succeed but i still have one problem.

With this macro :

Code: Select all
VERSION BUILD=8940826 RECORDER=FX
TAB T=1
SET !ERRORIGNORE YES
URL GOTO=http://www.lemonde.fr/recherche/?keywords=immobilier&qt=recherche_globale&page_num=354
TAG POS=1 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF
TAG POS=2 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF
TAG POS=3 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF
TAG POS=4 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF
TAG POS=5 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF
TAG POS=6 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF
TAG POS=7 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF
TAG POS=8 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF
TAG POS=9 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF
TAG POS=10 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=A ATTR=HREF:* EXTRACT=HREF
SAVEAS TYPE=EXTRACT FOLDER=* FILE=links.txt
WAIT SECONDS=3


I extract correctly 9 links on 10, but instead of to have the first one, i have the last one unrelevant

So, how to start correctly with the first link, and have my 10 links?

Thank you :)
cherubin13
 
Posts: 17
Joined: Mon Apr 11, 2011 2:29 am

Re: How to extract the link after a specific div

by cherubin13 on Mon Jan 04, 2016 9:32 am

I found the solution !

If someone is interested :

Code: Select all
VERSION BUILD=8940826 RECORDER=FX
TAB T=1
SET !ERRORIGNORE YES
'PROMPT "Please enter your keyword:" !VAR1
SET !LOOP 1
URL GOTO=http://www.lemonde.fr/recherche/?keywords=KEYWORD&qt=recherche_globale&page_num={{!loop}}
'à mettre à la place de KEYWORD{{!VAR1}}

TAG POS=1 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=2 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=3 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=4 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=5 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=6 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=7 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF
 
TAG POS=8 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=9 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=10 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

SAVEAS TYPE=EXTRACT FOLDER=* FILE=links_KEYWORD_lemonde.txt
'à mettre à la place de KEYWORD{{!VAR1}}
WAIT SECONDS=3



Now i will try to improve this macro by asking the keyword with PROMPT command, and scripting a little with this http://wiki.imacros.net/Loop_after_Query_or_Login
cherubin13
 
Posts: 17
Joined: Mon Apr 11, 2011 2:29 am

Re: How to extract the link after a specific div

by chivracq on Mon Jan 04, 2016 3:39 pm

cherubin13 wrote:I found the solution !

If someone is interested :

Code: Select all
VERSION BUILD=8940826 RECORDER=FX
TAB T=1
SET !ERRORIGNORE YES
'PROMPT "Please enter your keyword:" !VAR1
SET !LOOP 1
URL GOTO=http://www.lemonde.fr/recherche/?keywords=KEYWORD&qt=recherche_globale&page_num={{!loop}}
'à mettre à la place de KEYWORD{{!VAR1}}

TAG POS=1 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=2 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=3 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=4 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=5 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=6 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=7 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF
 
TAG POS=8 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=9 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

TAG POS=10 TYPE=H3 ATTR=CLASS:txt4_120*
TAG POS=R1 TYPE=* ATTR=* EXTRACT=
TAG POS=R-1 TYPE=A ATTR=HREF:* EXTRACT=HREF

SAVEAS TYPE=EXTRACT FOLDER=* FILE=links_KEYWORD_lemonde.txt
'à mettre à la place de KEYWORD{{!VAR1}}
WAIT SECONDS=3



Now i will try to improve this macro by asking the keyword with PROMPT command, and scripting a little with this http://wiki.imacros.net/Loop_after_Query_or_Login

... Which is using Double Relative Positioning, exactly what I told you in my first Reply... Good-good-good that you managed to get your Script to work...! :D
(And Thanks for sharing your Final Script, that's a neat finish of a Thread and can always be useful for other Users...)
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6475
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: How to extract the link after a specific div

by cherubin13 on Mon Jan 04, 2016 4:02 pm

Yes chivracq, i succeed thank to you ! I will continue to test and try, so excuse me in advance if you see me again around here ;)
cherubin13
 
Posts: 17
Joined: Mon Apr 11, 2011 2:29 am

Re: How to extract the link after a specific div

by chivracq on Mon Jan 04, 2016 5:46 pm

cherubin13 wrote:Yes chivracq, i succeed thank to you ! I will continue to test and try, so excuse me in advance if you see me again around here ;)

Well, nice to hear, and glad I could help.

And it's 20 times more useful for you that you managed to get it to work by yourself than if I had corrected your Script, which would have taken me a few minutes, I reckon...

And I only help Users who use the Forum a bit "correctly" by not spamming the Forum with several Duplicates of their Question in different Sub-Forums, use a Descriptive Thread Title, mention their FCI, read the Documentation and really get stuck somewhere after they've tried their best, (and finish their Thread neatly by sharing their Solution) which you all did perfectly, so I'll be happy to help you again in the future...

If you want some Fun Reading, here is a sinister Example of a User who does nearly everything possibly wrong... :roll:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6475
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: Bing [Bot] and 2 guests

-->