iMacros skips some pages while extracting webpages, what should be the solution ?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
jyotirmaya
Posts: 45
Joined: Wed Jul 27, 2016 6:25 pm

iMacros skips some pages while extracting webpages, what should be the solution ?

Post by jyotirmaya » Tue Jun 22, 2021 10:33 am

I am using Browser Firefox 48.0
iMacros for Firefox 8.9.7
Windows 10 64-bit Operating system

I am using the below code to extract data from a website

Code: Select all

VERSION BUILD=8970419 RECORDER=FX
TAB T=1
SET !DATASOURCE kd.csv
SET !LOOP 1
'Increase the current position in the file with each loop 
SET !DATASOURCE_LINE {{!LOOP}}
WAIT SECONDS=3
TAG POS=1 TYPE=A ATTR=TXT:{{!COL1}}
WAIT SECONDS=3
TAG POS=1 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT
 SET !CLIPBOARD {{!EXTRACT}}
WAIT SECONDS=3
imacro test.png
imacro test.png (7.63 KiB) Viewed 744 times
I am copying data from the website, but I am getting an error, like if the website has 160 pages then while running the code skips some pages randomly. I thought that is because of internet speed and loading issue hence I have raised the waiting seconds into 3 still its happening. I am copying the data and storing in clipboard using ditto software. When clicking the next button the webpage don't load, it simply display a loading image and loads within a second. The website is made of JavaScript. What should be the change in the code so that I can extract all the pages ?? Please help
Last edited by jyotirmaya on Tue Jun 22, 2021 2:26 pm, edited 1 time in total.
chivracq
Posts: 9929
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: iMacros skips some pages while extracting webpages, what should be the solution ?

Post by chivracq » Tue Jun 22, 2021 1:29 pm

jyotirmaya wrote:
Tue Jun 22, 2021 10:33 am
I am using

Code: Select all

Browser Firefox 48.0
iMacros for Firefox 8.9.7
Windows 10 64-bit Operating system
I am using the below code to extract data from a website

Code: Select all

VERSION BUILD=8970419 RECORDER=FX
TAB T=1
SET !DATASOURCE kd.csv
SET !LOOP 1
'Increase the current position in the file with each loop 
SET !DATASOURCE_LINE {{!LOOP}}
WAIT SECONDS=3
TAG POS=1 TYPE=A ATTR=TXT:{{!COL1}}
WAIT SECONDS=3
TAG POS=1 TYPE=TABLE ATTR=TXT:* EXTRACT=TXT
 SET !CLIPBOARD {{!EXTRACT}}
WAIT SECONDS=3

Code: Select all

https://1.bp.blogspot.com/-5vN0H-7Nul4/YNG7p8osH8I/AAAAAAAAG6I/S2EJzOvAjq4oGYPEZcSFIlTKZJz0S20uACLcBGAsYHQ/s16000/imacro%2Btest.png
Example image link.

I am copying data from the website, but I am getting an error, like if the website has 160 pages then while running the code skips some pages randomly. I thought that is because of internet speed and loading issue hence I have raised the waiting seconds into 3 still its happening. I am copying the data and storing in clipboard using ditto software. When clicking the next button the webpage don't load, it simply display a loading image and loads within a second. The website is made of JavaScript. What should be the change in the code so that I can extract all the pages ?? Please help

Euh..., could you (also) upload your Screenshot directly to the Forum, like explained in the Forum Rules...? Thanks... :idea:
You can leave the Link to this external Site if you want but "we" also need a direct Upload to the Forum as those external Pix Hosting Sites all go dark or commercial "one day" and the Screenshots otherwise become unavailable... :(

EDIT: Done..., perfect..., and Thanks...! :D

>>>

Hum, I don't really understand the Looping Workflow of your Script as I don't see any clicking or 'URL GOTO' on/to a Page Nb or the 'Next' Button, and I'm not sure what the "TAG POS=1 TYPE=A ATTR=TXT:{{!COL1}}" is doing as I don't know what you have in '!COL1', and I don't see any "Pausing" to handle the Clipboard for the 'Paste' Part, but anyway...

"When clicking the next button the webpage don't load, it simply display a loading image and loads within a second."
=> Yep-yep, I know this Behaviour also myself, (and a few Users have reported it also in a few Threads), I see it "sometimes" with one Script that I run from time to time on one specific Site/Page that contains a few 100's of 'BUTTON' Elements with an 'onclick' Attribute calling different '.php' Scripts/Actions. When "it" happens, my Script will quickly "slide" on the Page without doing anything and quickly fake-finish the remaining Loops at "Super-Speed", ah-ah...! :?
The only Thing that helps is to force a manual Refresh of the Page when that happens...

I never "really" investigated it any further as I only use that Script from time to time and I run it "semi-manually" on some external Monitor and I can run (loop) it again once I check again if it didn't "finish the Job", or I can abort/pause it if I see it happening, + refresh the Page manually, and see if it resumes working correctly.

In your Case, if you want to automate that Part, you would need to force a Refresh/Reload/fresh Load of the/each Page, maybe Conditional, but the 'REFRESH' Command cannot ("really"/easily) be used conditionally, (at least in pure '.iim'), you need to use the 'URL GOTO' Command for that, if the Pages have their own URL's..., then you could add a Mechanism to check the current Page Nb, and after clicking on the 'Next' Button, if the Page Nb indeed increased by 1. But even after a Reload/Refresh, that will still not guarantee that the Page loaded "correctly" the 2nd time, and that the JS Buttons will work this time... Then you could add a Conditional 'PAUSE' or 'PROMPT' to alert you to take Action manually...

Oh yeah..., and my "Observations" were also made like you in [iMacros for FF v8.9.7 + FF55 + Win10], but also in [iMacros for FF v8.8.2 + PM26 + Win10], FCI that I use for most of my "Prod" Scripting...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
jyotirmaya
Posts: 45
Joined: Wed Jul 27, 2016 6:25 pm

Re: iMacros skips some pages while extracting webpages, what should be the solution ?

Post by jyotirmaya » Tue Jun 22, 2021 2:41 pm

Euh..., could you (also) upload your Screenshot directly to the Forum, like explained in the Forum Rules...? Thanks... :idea:
You can leave the Link to this external Site if you want but "we" also need a direct Upload to the Forum as those external Pix Hosting Sites all go dark or commercial "one day" and the Screenshots otherwise become unavailable...
SORRY ABOUT THE PHOTO, I have rectified that.
Hum, I don't really understand the Looping Workflow of your Script as I don't see any clicking or 'URL GOTO' on/to a Page Nb or the 'Next' Button, and I'm not sure what the "TAG POS=1 TYPE=A ATTR=TXT:{{!COL1}}" is doing as I don't know what you have in '!COL1', and I don't see any "Pausing" to handle the Clipboard for the 'Paste' Part, but anyway...
In column A of data source I have the values 1,2,3,4,5,6,7,8,9,10,NEXT,11,12 & so on......
"When clicking the next button the webpage don't load, it simply display a loading image and loads within a second."
=> Yep-yep, I know this Behaviour also myself, (and a few Users have reported it also in a few Threads), I see it "sometimes" with one Script that I run from time to time on one specific Site/Page that contains a few 100's of 'BUTTON' Elements with an 'onclick' Attribute calling different '.php' Scripts/Actions. When "it" happens, my Script will quickly "slide" on the Page without doing anything and quickly fake-finish the remaining Loops at "Super-Speed", ah-ah...! :?
The only Thing that helps is to force a manual Refresh of the Page when that happens...
Ok, but what to do in this case, the link remains same irrespective if there are 200 pages
I never "really" investigated it any further as I only use that Script from time to time and I run it "semi-manually" on some external Monitor and I can run (loop) it again once I check again if it didn't "finish the Job", or I can abort/pause it if I see it happening, + refresh the Page manually, and see if it resumes working correctly.


In your Case, if you want to automate that Part, you would need to force a Refresh/Reload/fresh Load of the/each Page, maybe Conditional, but the 'REFRESH' Command cannot ("really"/easily) be used conditionally, (at least in pure '.iim'), you need to use the 'URL GOTO' Command for that, if the Pages have their own URL's..., then you could add a Mechanism to check the current Page Nb, and after clicking on the 'Next' Button, if the Page Nb indeed increased by 1. But even after a Reload/Refresh, that will still not guarantee that the Page loaded "correctly" the 2nd time, and that the JS Buttons will work this time... Then you could add a Conditional 'PAUSE' or 'PROMPT' to alert you to take Action manually...
When I tried extracting data of 160 pages, 1st time it copied 155 pages another time 158 times etc, it skips random pages, so whats the solution ? need to use refresh command before every page extraction ?? oh ok then I will use the URL GOTO ? but all the pages have same URL here. and also for the NEXT button I have added that also in Column A, you may find it funny :D but I tried this method & worked so didnt research further how to automate the NEXT.


Oh yeah..., and my "Observations" were also made like you in [iMacros for FF v8.9.7 + FF55 + Win10], but also in [iMacros for FF v8.8.2 + PM26 + Win10], FCI that I use for most of my "Prod" Scripting...
Ok I am upgrading the FF to 55
jyotirmaya
Posts: 45
Joined: Wed Jul 27, 2016 6:25 pm

Re: iMacros skips some pages while extracting webpages, what should be the solution ?

Post by jyotirmaya » Tue Jun 22, 2021 3:04 pm

Actually Chivracq, In the website there are 10 column except the photo column and using exports to excel function I can get all the data in excel format except the PHOTO details, I want to find that how many PHOTO NOT AVAILABLE are there in pages and I want that information in that existing excel file in column 11.
In a page there are 10 person's details are available.
imacro test new 2.png
When I clicked on the photos of the page I found code like this.

Code: Select all

TAG POS=1 TYPE=IMG ATTR=ID:imgPhoto
TAG POS=2 TYPE=IMG ATTR=ID:imgPhoto
TAG POS=1 TYPE=SPAN ATTR=TXT:Photo<SP>Not<SP>Available
TAG POS=3 TYPE=IMG ATTR=ID:imgPhoto
TAG POS=4 TYPE=IMG ATTR=ID:imgPhoto
TAG POS=5 TYPE=IMG ATTR=ID:imgPhoto
TAG POS=6 TYPE=IMG ATTR=ID:imgPhoto 
I found this code

Code: Select all

TAG POS=1 TYPE=SPAN ATTR=TXT:Photo<SP>Not<SP>Available
where photo is not available in a page.

Is it possible what I am trying to do using iMacros??

my export to excel file is available in C:\Users\jyotirmaya\Downloads, file name is 1.xls

kindly guide me.
chivracq
Posts: 9929
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: iMacros skips some pages while extracting webpages, what should be the solution ?

Post by chivracq » Tue Jun 22, 2021 3:52 pm

Yep, Thanks for the Screenshot, already seen that... :D

>>>

"Ok I am upgrading the FF to 55"
=> Boah..., no "real" Need, if your Environment works fine with v8.9.7 for FF in FF48, updating FF to FF55 (v55.0.3 is the "last" FF Version working "correctly" with v8.9.7 btw) won't really make a Difference... And FF48 actually has one "Advantage" on FF55 if you use the 'FILTER' Command, that Command got broken in/from FF51...
Or you can install FF55 Portable maybe, like that you could keep your FF48 Env. if you want to "compare" both FCI's..., but you'll need to manually import your Profile(s), and they will get "desynchronized" if you add some new Logins/Passwords to remember or some new Bookmarks...
Possible to run both FCI's at the same time, but you need to launch the Portable one first...

>>>

"but all the pages have same URL here."
=> Arrgghhh...!, Grrr...!, that's exactly why I mentioned "if the Pages have their own URL's...", then hum..., not good, ah-ah...! :(

Then pfff..., I don't know..., I would need to have a Look myself... :oops:
Maybe the Data/Tables are displayed in their own Frame, and the URL on the Frame, or within the Frame will include the Page Nb for the Navigation... :idea:
Or try finding a Switch/Argument to use in the URL to go "directly" to some specific Page..., it will "often" be with stg like "?start=40"... :idea:

If you are already on say, P_41 like on your Screenshot, and you manually hit the 'Refresh' Button from the Browser, does the Page then stays on P_41, or does it get "completely" reloaded and starts again at P_1...?, with clicking on '10' + '20' + '30' + '40' as only Possibility to navigate to P_41 again...?
(I do have a "Workaround" for a Conditional 'REFRESH' in pure '.iim', but it's "a bit" cumbersome, ah-ah...!, it might then be easier to switch to a '.js' Implementation...)

If the Site is "Public", post the URL and I could have a Look at how it "behaves" exactly... :idea:
Or you can "fake-report" one Post in this Thread if you don't want to post the URL on the Forum, and I'm the only one who can see your "Report" (+ the Forum Admin), same if you need to include some Login & (Temp) Password... :idea:

>>>

Oh..., just about to post this Reply, but I see you've posted another Post in the meantime, ah-ah...! OK, I'm still posting this Reply, but it doesn't include your last Post...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
chivracq
Posts: 9929
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: iMacros skips some pages while extracting webpages, what should be the solution ?

Post by chivracq » Tue Jun 22, 2021 5:00 pm

jyotirmaya wrote:
Tue Jun 22, 2021 3:04 pm
Actually Chivracq, In the website there are 10 column except the photo column and using exports to excel function I can get all the data in excel format except the PHOTO details, I want to find that how many PHOTO NOT AVAILABLE are there in pages and I want that information in that existing excel file in column 11.
In a page there are 10 person's details are available.

imacro test new 2.png

When I clicked on the photos of the page I found code like this.

Code: Select all

TAG POS=1 TYPE=IMG ATTR=ID:imgPhoto
TAG POS=2 TYPE=IMG ATTR=ID:imgPhoto
TAG POS=1 TYPE=SPAN ATTR=TXT:Photo<SP>Not<SP>Available
TAG POS=3 TYPE=IMG ATTR=ID:imgPhoto
TAG POS=4 TYPE=IMG ATTR=ID:imgPhoto
TAG POS=5 TYPE=IMG ATTR=ID:imgPhoto
TAG POS=6 TYPE=IMG ATTR=ID:imgPhoto 
I found this code

Code: Select all

TAG POS=1 TYPE=SPAN ATTR=TXT:Photo<SP>Not<SP>Available
where photo is not available in a page.

Is it possible what I am trying to do using iMacros??

my export to excel file is available in C:\Users\jyotirmaya\Downloads, file name is 1.xls

kindly guide me.

Alright, I understand...
So you already have one "full Export" of the whole Data (= the 1574 Records or the 158 Pages) where the "Photo" Col is missing. That 'Excel' Export/File is generated by the Site itself, and afaik, it's not possible to "manipulate" its Content before/while downloading it.
And iMacros also won't be able to "manipulate" it, or add a Col on the fly, iMacros can only append New Rows to an existing ('.csv') File (by Design).

... Which is why you have to "re-extract" the Content of the 158 Pages with iMacros to get the Data about the Photos, which you do at the 'TYPE=TABLE' Level, and I guess you "later" re-add manually that 11th Col to the "official" '.xls' File.
Could be done Cell by Cell for the 10 Rows per Page in order to only get a "small(er)" '.csv' 'SAVEAS' with only 1 Col, but I guess it won't be quicker, and if any Page is "missing", you won't know which one(s), and you'll have to run the whole Procedure again, oops...!

Then well..., back to navigating to a specific Page, I see a "Go" Button on your last Screenshot, Right-Above the Table, isn't it to navigate "directly" to a specific Page...? :idea:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- I don't even read the Qt if that (required) Info is not mentioned...!
- Script & URL help a lot for more "educated" Help...
jyotirmaya
Posts: 45
Joined: Wed Jul 27, 2016 6:25 pm

Re: iMacros skips some pages while extracting webpages, what should be the solution ?

Post by jyotirmaya » Wed Jun 23, 2021 5:55 am

First Thank you for your quick response and help. :) :) :) :) :D
Thank you for your valuable time.
"but all the pages have same URL here."
=> Arrgghhh...!, Grrr...!, that's exactly why I mentioned "if the Pages have their own URL's...", then hum..., not good, ah-ah...! :(
hmmm :( :(
Then pfff..., I don't know..., I would need to have a Look myself... :oops:
Maybe the Data/Tables are displayed in their own Frame, and the URL on the Frame, or within the Frame will include the Page Nb for the Navigation... :idea:
Or try finding a Switch/Argument to use in the URL to go "directly" to some specific Page..., it will "often" be with stg like "?start=40"... :idea:

No there are no such page numbers mentioned in the URL :( :(
If you are already on say, P_41 like on your Screenshot, and you manually hit the 'Refresh' Button from the Browser, does the Page then stays on P_41, or does it get "completely" reloaded and starts again at P_1...?,
Yea dear you guessed right, when I am at any page and clicking refresh it goes to Page 1
with clicking on '10' + '20' + '30' + '40' as only Possibility to navigate to P_41 again...?
(I do have a "Workaround" for a Conditional 'REFRESH' in pure '.iim', but it's "a bit" cumbersome, ah-ah...!, it might then be easier to switch to a '.js' Implementation...)
Yes after refresh I need to go to the page again by clicking NEXT NEXT, for example if I am at 105 then I need to click NEXT 10 times then click on Page 105.
If the Site is "Public", post the URL and I could have a Look at how it "behaves" exactly... :idea:
Or you can "fake-report" one Post in this Thread if you don't want to post the URL on the Forum, and I'm the only one who can see your "Report" (+ the Forum Admin), same if you need to include some Login & (Temp) Password... :idea:
Sorry but I don't find any option to convert the post into fake report, :( :( :( or I should include the text "FAKE REPORT" ? I couldn't understand, the site is open but there are 2 factor authentication are there, 1st login then OTP again, can I send the SAVE AS HTML file ??


Alright, I understand...
So you already have one "full Export" of the whole Data (= the 1574 Records or the 158 Pages) where the "Photo" Col is missing. That 'Excel' Export/File is generated by the Site itself, and afaik, it's not possible to "manipulate" its Content before/while downloading it.
And iMacros also won't be able to "manipulate" it, or add a Col on the fly, iMacros can only append New Rows to an existing ('.csv') File (by Design).
Yes exactly I have 1574 records of 158 pages.
... Which is why you have to "re-extract" the Content of the 158 Pages with iMacros to get the Data about the Photos, which you do at the 'TYPE=TABLE' Level, and I guess you "later" re-add manually that 11th Col to the "official" '.xls' File.
Yes I am trying to doing that only, but right now I am extracting all the data instead of the photo column data, so I think I should only extract the photo column info data based on the column 1 data, like if Column 1 text is "AAA" then it will extract the photo column info, whether photo available or not. I have column 1 data available with me. If i can get the data based on column 1 with IF function then I don't need to extract all the data. I want to use column 1 because it has unique values for all rows.

column 1 looks like this in a page with 10 rows

Code: Select all

TAG POS=1 TYPE=TD ATTR=TXT:KDN1434851
TAG POS=1 TYPE=TD ATTR=TXT:ABE0125575
TAG POS=1 TYPE=TD ATTR=TXT:ABE1348994
TAG POS=1 TYPE=TD ATTR=TXT:ABE0329789
TAG POS=1 TYPE=TD ATTR=TXT:OR/11/075/092177
TAG POS=1 TYPE=TD ATTR=TXT:OR/11/075/092179
TAG POS=1 TYPE=TD ATTR=TXT:KDN1434554
TAG POS=1 TYPE=TD ATTR=TXT:ABE0723163
TAG POS=1 TYPE=TD ATTR=TXT:OR/11/075/092182
TAG POS=1 TYPE=TD ATTR=TXT:ABE0666875
Could be done Cell by Cell for the 10 Rows per Page in order to only get a "small(er)" '.csv' 'SAVEAS' with only 1 Col, but I guess it won't be quicker, and if any Page is "missing", you won't know which one(s), and you'll have to run the whole Procedure again, oops...!
Yes, ok let it take time, ok if some pages are missing also then its ok as I am getting 99% info but 100% would be better but the site was designed like that then what can we do in this condition, Chivracq if I will not get all the pages then I think I want column 1 and photo column extraction in column A & B in a CSV file then I can use that in my excel file and I can use VLOOKUP function to add the photo info column in the existing excel file, also if I get the both column then I can find out the missing pages also by comparing.


Then well..., back to navigating to a specific Page, I see a "Go" Button on your last Screenshot, Right-Above the Table, isn't it to navigate "directly" to a specific Page...? :idea:
No dear that's not navigating the page, that's assigned to a list box after after selecting the list boxes when clicking GO its showing these 1574 records. Please help me the better way to extract the data. :)
Post Reply