Direct Link to SOF...
Parallel Thread on the iMacros Forum (Opened by me..., as I don't really trust this Site for "Continuity"...):
SOF: Save all HREF's inside LI's in a named UL to a '.txt' File
Grrr..., annoying, Links never work on this Buggy Site, => Direct Link...:
viewtopic.php?f=7&t=31672
[Answer is now more or less "finished", but I might still edit it "slightly"...]
[Time spent writing this Answer: About [10] Hours...]...
=> Forum Posting: ~2h approx, Writing Script(s) and Testing: ... the rest, ah-ah...!
(And first time ever I spend so much time on an Answer, annoying to have to "fight" against the "Design" of the Site...)
>>>
Addressing all your different Qt's more or less in reverse order, + posting 2 (or 3 actually) different Solutions/Implementations in "minimalistic" Implementations/Scripts, I will mention several Concepts/Techniques that I won't explain (in depth) or I'd need to quote half of the Wiki and/or of the iMacros Forum...
(Terms I enclose between Single Quotes or Backticks are such Terms...)
>>>
Warning Popup about "Loop" and "Play":
Well, read the Msg on that Popup, it looks pretty clear and self-explanatory to me...
(Well, apart from the ugly Typo in it, of course...!)
>>>
Getting #EANF# in the EXTRACT and SAVEAS:
=> Yep, normal, that's because you are using the UL Element as 'Anchor' for 'Relative Positioning', you would need to use 'Double Relative Positioning' in this Case as the UL Element is actually the 'Container' for all LI + A Elements...
(More Explanation [on the Forum][1], where I've explained the Concept/Technique many times already...)
Hum, Linking doesn't seem to work..., here is the Link:
search.php?keywords=Double+Relative+Positioning
The number of list items can be different so I cannot use a loop of a known number of iterations...
(Emphasis mine...)
Hum, well..., this is not really true, this would actually be the "easiest" Implementation in my Opinion..., as that will have the Advantage that you can then let SAVEAS take care of saving each Link on a separate/new Row for each Loop, (or you'll need to add/implement yourself a Mechanism for that Func...), and you can simply let iMacros abort your Script "naturally" if an Element is "not found", like when there is no new Link to extract...
... And I will use 2 different Mechanisms for that part...
(All Scripts written and tested in iMacros for FF v8.8.2, PM v26.3.3, Win10_PRO_x64.)
>>>
Implementation 1: Looping + Abort on Not Found:
And that will give stg like:
Code: Select all
VERSION BUILD=8820413 RECORDER=FX
TAB T=1
SET Search_Keyword "Estate Agents"
'Debug:
'SET !LOOP 15
'URL GOTO=https://www.home.co.uk/search/agents/?county=beds
'Extract Links using 'Relative Positioning':
'TAG POS=1 TYPE=H1 ATTR=TXT:Estate<SP>Agents<SP>in<SP>Bedfordshire // (Recorded)
TAG POS=1 TYPE=H1 ATTR=TXT:{{Search_Keyword}}<SP>in<SP>*
TAG POS=R{{!LOOP}} TYPE=A ATTR=TXT:{{Search_Keyword}}<SP>in<SP>* EXTRACT=HREF
'>
'Debug:
'PROMPT {{!EXTRACT}}
'Save Link to '.CSV':
SAVEAS TYPE=EXTRACT FOLDER=* FILE=c:\Development\towns.txt
'SAVEAS TYPE=EXTRACT FOLDER=* FILE=SOF_MSB.txt
'Abort Script if no more Link(s) to extract:
SET !TIMEOUT_STEP 1
TAG POS=R1 TYPE=LI ATTR=TXT:{{Search_Keyword}}<SP>in<SP>*
Yep, OK, this one works already...
21 Links on the Page with URL provided, I looped the Script 30x times, and it aborts by itself at the end of Loop=21...!
- Notice, I don't use !ERRORIGNORE, and the Abort Func actually relies on that...
- While extracting and looping on the "Links" (the A Elements), I "switched back" to an LI Element for the 2nd R-POS to abort the Script, as if I had used also the next Link, the EXTRACT Command never aborts a Script (by Design), it will simply return #EANF# if the Element is not found, and without the EXTRACT, the Script would then click on and follow the Links for all previous Loops.
- And !EXTRACT_TEST_POPUP can be omitted when looping a Script...
- Works "best" with the Page already loaded once "manually", or reloading the Page on every Loop will slow the Execution... If the Page "really" needs to be loaded from the Script, it's possible to add a Mechanism for a 'Conditional URL GOTO' (another "Concept/Technique" to search the iMacros Forum for, ah-ah...!) for loading the Page only for Loop=1...
>>>
Implementation 2: Looping + Abort with MacroError() + Report:
Alright..., and this one would be my "Favorite"...!:
Same like Script_1 but can be applied to an A Element to abort the Script and using MacroError() allows to display some mini-Report in the iMacros Side-Panel like for example:
Code: Select all
VERSION BUILD=8820413 RECORDER=FX
TAB T=1
SET Search_Keyword "Estate Agents"
'Debug:
'SET !LOOP 15
'URL GOTO=https://www.home.co.uk/search/agents/?county=beds
'Extract Links using 'Relative Positioning':
'TAG POS=1 TYPE=H1 ATTR=TXT:Estate<SP>Agents<SP>in<SP>Bedfordshire // (Recorded)
TAG POS=1 TYPE=H1 ATTR=TXT:{{Search_Keyword}}<SP>in<SP>* EXTRACT=TXT
SET Title {{!EXTRACT}}
SET !EXTRACT NULL
TAG POS=R{{!LOOP}} TYPE=A ATTR=TXT:{{Search_Keyword}}<SP>in<SP>* EXTRACT=HREF
'>
'Debug:
'PROMPT {{!EXTRACT}}
'Save Link to '.CSV' (or '.TXT':
'SAVEAS TYPE=EXTRACT FOLDER=* FILE=c:\Development\towns.txt
SAVEAS TYPE=EXTRACT FOLDER=* FILE=SOF_MSB.txt
'Abort Script if no more Link(s) to extract:
SET !TIMEOUT_STEP 1
SET !EXTRACT NULL
'TAG POS=R1 TYPE=LI ATTR=TXT:{{Search_Keyword}}<SP>in<SP>*
TAG POS=R1 TYPE=A ATTR=TXT:{{Search_Keyword}}<SP>in<SP>* EXTRACT=TXT
'Prepare mini-Report:
SET Report {{!LOOP}}<SP>Links<SP>extracted<SP>for:<BR>{{Title}}
SET Summary (No<SP>Error...!!)<SP>({{!NOW:yyyy-mm-dd<SP>hhhnn}})<BR><BR>{{Report}}<BR><BR>
SET !ERRORIGNORE NO
SET Abort_Report EVAL("var s='{{!EXTRACT}}'; if(s=='#EANF#'){MacroError(\"{{Summary}}\");}")
SET !ERRORIGNORE YES
Like Script_1, => looped 30 or 50 times, and which will display:
Code: Select all
MacroError: (No Error...!!) (2021-07-22 15h57)
21 Links extracted for:
Estate Agents in Bedfordshire
, line 36 (Error code: -1340)
>>>
Implementation 3: Extract all LI Elements with 1 EXTRACT from the Containing UL Element:
[... Work in Progress...]
This one is a "quick and dirty" Demo, as I find it a bit of a cumbersome Implementation, but here you go...:
Code: Select all
VERSION BUILD=8820413 RECORDER=FX
TAB T=1
SET Search_Keyword "Estate Agents"
URL GOTO=https://www.home.co.uk/search/agents/?county=beds
'TAG POS=1 TYPE=LI ATTR=TXT:Estate<SP>Agents<SP>in<SP>Ampthill
'TAG POS=1 TYPE=LI ATTR=TXT:Estate<SP>Agents<SP>in<SP>Barton-Le-Clay
'TAG POS=1 TYPE=DIV ATTR=TXT:Estate<SP>agent<SP>listings<SP>are<SP>available<SP>for<SP>th* EXTRACT=HTM
'TAG POS=1 TYPE=P ATTR=TXT:Estate<SP>agent<SP>listings<SP>are<SP>available*
'TAG POS=R1 TYPE=UL ATTR=* EXTRACT=HTM
'Hum, can better use the 'H1' Element as Anchor...:
'TAG POS=1 TYPE=H1 ATTR=TXT:Estate<SP>Agents<SP>in<SP>Bedfordshire // (Recorded)
TAG POS=1 TYPE=H1 ATTR=TXT:{{Search_Keyword}}<SP>in<SP>*
TAG POS=R1 TYPE=UL ATTR=* EXTRACT=HTM
SET Results_HREF EVAL("var s='{{!EXTRACT}}'; var w,x,y,z; w=s.split('regular\">')[1]; x=w.split('\"'); y=x[1]+','+x[3]+','+x[5]; z=y.split(',').join('\\r\\n'); z;")
'>
'Debug:
PROMPT Results:<BR><BR>_{{Results_HREF}}_
'Not really finished... (Quick and dirty Demo...)
'Save Links to '.CSV' (or '.TXT':
'SAVEAS TYPE=EXTRACT FOLDER=* FILE=c:\Development\towns.txt
'>>>
'Extracted:
'<ul style="outline: 1px solid blue;" class="bullet-list columns-2 columns--regular">
'<li style="outline: 1px solid blue;"><a href="/search/agents/results.htm?location=ampthill">Estate Agents in Ampthill</a></li>
'<li style="outline: 1px solid blue;"><a href="/search/agents/results.htm?location=barton_le_clay">Estate Agents in Barton-Le-Clay</a></li>
'<li><a href="/search/agents/results.htm?location=bedford">Estate Agents in Bedford</a></li>
'<li><a href="/search/agents/results.htm?location=biggleswade">Estate Agents in Biggleswade</a></li>
'<li><a href="/search/agents/results.htm?location=bromham">Estate Agents in Bromham</a></li>
'<li><a href="/search/agents/results.htm?location=clapham_beds">Estate Agents in Clapham</a></li> <li><a href="/search/agents/results.htm?location=dunstable">Estate Agents in Dunstable</a></li> <li><a href="/search/agents/results.htm?location=flitwick">Estate Agents in Flitwick</a></li> <li><a href="/search/agents/results.htm?location=harlington">Estate Agents in Harlington</a></li> <li><a href="/search/agents/results.htm?location=henlow">Estate Agents in Henlow</a></li> <li><a href="/search/agents/results.htm?location=houghton_regis">Estate Agents in Houghton Regis</a></li> <li><a href="/search/agents/results.htm?location=kempston">Estate Agents in Kempston</a></li> <li><a href="/search/agents/results.htm?location=langford">Estate Agents in Langford</a></li> <li><a href="/search/agents/results.htm?location=leighton_buzzard">Estate Agents in Leighton Buzzard</a></li> <li><a href="/search/agents/results.htm?location=linslade">Estate Agents in Linslade</a></li> <li><a href="/search/agents/results.htm?location=luton">Estate Agents in Luton</a></li> <li><a href="/search/agents/results.htm?location=potton">Estate Agents in Potton</a></li> <li><a href="/search/agents/results.htm?location=sandy">Estate Agents in Sandy</a></li> <li><a href="/search/agents/results.htm?location=shefford">Estate Agents in Shefford</a></li> <li><a href="/search/agents/results.htm?location=stotfold">Estate Agents in Stotfold</a></li>
'<li><a href="/search/agents/results.htm?location=toddington">Estate Agents in Toddington</a></li> </ul>
About the Script, it's a "Quick and dirty" Solution, about the y part, only demonstrating for the first 3 Links...
Neater would be to use a for Loop until x.length/2 (Incr=2), with Array.push(), but that was just a quick and dirty Demo, where recreating the y String/Array will need to be "hard-coded" 30 or 50 times...
=> See Content of the Debug PROMPT...
(And the Script needs to be run only x1 time, => with the 'Play' Button, not with the 'Loop' Button.)
---
[1]:
search.php?keywords=Double%20Relative%20Positioning