Search for a list of URLs in site and extract to file?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Search for a list of URLs in site and extract to file?

by Candita on Wed Nov 04, 2015 12:38 pm

I need to search my site for several old URLs so they can be updated. Then, for each URL that's found I need to extract the page URL where it was found and the link text that's displayed. This all needs to be written to a CSV file for review. I'm fairly sure the URL will only appear once on a page and there won't be multiple URLs found on a page, but I can't rule that out. Ideally the macro could read the URLs I'm searching for from an excel or CSV file (in which case I would need the macro to list which URL was found with the other info mentioned above, but I'm willing to search them individually. I've tried checking the example macros, forums and wiki, but I'm having trouble fitting it all together (haven't used iMacros in a loooong time). Help or a point in the right direction is appreciated!
Candita
 
Posts: 3
Joined: Wed Nov 04, 2015 12:22 pm

Re: Search for a list of URLs in site and extract to file?

by chivracq on Wed Nov 04, 2015 5:56 pm

Candita wrote:I need to search my site for several old URLs so they can be updated. Then, for each URL that's found I need to extract the page URL where it was found and the link text that's displayed. This all needs to be written to a CSV file for review. I'm fairly sure the URL will only appear once on a page and there won't be multiple URLs found on a page, but I can't rule that out. Ideally the macro could read the URLs I'm searching for from an excel or CSV file (in which case I would need the macro to list which URL was found with the other info mentioned above, but I'm willing to search them individually. I've tried checking the example macros, forums and wiki, but I'm having trouble fitting it all together (haven't used iMacros in a loooong time). Help or a point in the right direction is appreciated!

Hum, Post/Thread approved because no Spam, but CIM...! :mrgreen: for me to read...

OK, I read 11h->5h, you use the right Terminology and you've already used iMacros... => CIM indeed...! :mrgreen:
(I only read and answer Threads where Users mention their FCI, like mentioned as Required Info in the Forum Rules...)

Then, URL is missing + Script + where you get stuck exactly and what you've tried...

Compliment: Thread Title is perfect...! :D
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6473
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Search for a list of URLs in site and extract to file?

by Candita on Thu Nov 05, 2015 7:36 am

Thanks, I guess it's obvious I'm really out of practice and not just with iMacros :oops:

I've managed to search a single page and save a CSV with:

VERSION BUILD=8940826 RECORDER=FX
TAB T=1
URL GOTO=http://www.####.html
'TAG POS=1 TYPE=A ATTR=TXT:Knowledge<SP>Center EXTRACT=HREF
SET !LOOP 1
TAG POS= {{!LOOP}} TYPE=A ATTR=HREF:*[URL pattern I'm searching for]* EXTRACT=TXT
TAG POS= {{!LOOP}} TYPE=A ATTR=HREF:*[URL pattern I'm searching for]* EXTRACT=HREF
SET !EXTRACTADD {{!URLCURRENT}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=SUPPORT-EXTRACT

I'm not seeing how to search the entire site or how to search each page until the pattern is not found, then move to another page. This also reloads the page each time, which seems inefficient.
Candita
 
Posts: 3
Joined: Wed Nov 04, 2015 12:22 pm

Re: Search for a list of URLs in site and extract to file?

by chivracq on Thu Nov 05, 2015 8:16 am

Candita wrote:Thanks, I guess it's obvious I'm really out of practice and not just with iMacros :oops:

I've managed to search a single page and save a CSV with:

Code: Select all
VERSION BUILD=8940826 RECORDER=FX
TAB T=1
URL GOTO=http://www.####.html
'TAG POS=1 TYPE=A ATTR=TXT:Knowledge<SP>Center EXTRACT=HREF
SET !LOOP 1
TAG POS= {{!LOOP}} TYPE=A ATTR=HREF:*[URL pattern I'm searching for]* EXTRACT=TXT
TAG POS= {{!LOOP}} TYPE=A ATTR=HREF:*[URL pattern I'm searching for]* EXTRACT=HREF
SET !EXTRACTADD {{!URLCURRENT}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=SUPPORT-EXTRACT


I'm not seeing how to search the entire site or how to search each page until the pattern is not found, then move to another page. This also reloads the page each time, which seems inefficient.

OK, that's already a bit better, but CIM => FCIM...! :mrgreen: Read my Sig if you don't get that one... :idea:
Code: Select all
=> iMacros for FF v8.9.4, FF...?, OS...?


If it's "your Site", then I don't understand why you obfuscate the URL... :?
It's difficult to have a clear idea of what you want if you don't provide all Info..., => a few Samples of your "[URL pattern I'm searching for]" would be needed as well...

For using a .CSv File as a DataSource, look at the 'Loop-CSV-2-Web.iim' Demo Macro... :idea:
And/or search the Forum on "!DATASOURCE" and you'll find many Examples...

You come indeed from some "ice age period" with iMacros...! :shock: , I haven't seem '!EXTRACTADD' in years anymore, it's been deprecated several years ago, even if I think it still works (except on CR maybe) thanks to Backward Compatibility, but you should rather use the 'ADD' Command now.

Reloading the Page each time with 'URL GOTO' is indeed not very efficient, you can just comment it out.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6473
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Search for a list of URLs in site and extract to file?

by Candita on Thu Nov 05, 2015 10:48 am

Sorry to have bothered you. Please delete my posts and account. I'll find another solution.
Candita
 
Posts: 3
Joined: Wed Nov 04, 2015 12:22 pm

Re: Search for a list of URLs in site and extract to file?

by chivracq on Thu Nov 05, 2015 12:09 pm

Candita wrote:Sorry to have bothered you. Please delete my posts and account. I'll find another solution.

You don't bother me nor any other Advanced User(s) who'll be willing to help you, I'm just asking you to mention your FCI (Full Config Info) like stated in the Forum Rules as Required Info, and I/we need some more Info to be able to help you. From just looking at your Script we cannot deduct how your Page looks like and iMacros depends heavily on the HTML Structure of a Page..., and I guess you know that as you've been using iMacros for a long time/in the past...

"Please delete my posts and account." is a bit of a childish reaction... (and you can do it yourself... if you want to go that far..., that's why I quoted all your Posts...)
This Forum is the best place to help you when you have some Pb/Question with/about iMacros and finding a way to get your Script running, but you have to play by the "Rules" and mention all relevant Info needed for other (More Advanced) Users to be able to try to help you... It will be the same on any other (Technical) Forums... OK, good luck anyway... 8)
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6473
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Search for a list of URLs in site and extract to file?

by pursharthahuja on Mon Jan 25, 2016 1:14 am

i am facing the same problem my,
iMacros for FF v8.9.4 and os is linux..
rest of the details are same mentioned above
please provide an answer to this question asap.

thanks in advance.
pursharthahuja
 
Posts: 3
Joined: Mon Jan 25, 2016 12:17 am


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 3 guests

-->