Scroll Ajax and Extract data

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Scroll Ajax and Extract data

by bw13 on Sun Apr 03, 2016 3:50 am

iMacros for Firefox 9.0.0b2
WIN 10 64
FF 45.0.1

Say I want to get some sneaker's price on a website and the result is dynamic, I have to scroll down if I want to get the data, I have attached my code.

My Code:

TAB T=1
TAB CLOSEALLOTHERS
SET !ERRORIGNORE YES
URL GOTO=https://stockx.com/sneakers/most-popular?view=list
WAIT SECONDS=2
TAG POS={{!LOOP}} TYPE=A ATTR=TXT:*&&HREF:*&&DATA-REACTID:.4.0.1.0.1.2.1.1.1.0:*
WAIT SECONDS=2
TAG POS=1 TYPE=TITLE ATTR=* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Test.csv


If you the run the code it will go to the website and click into each sneaker and get the title and back and get next one and so on and so forth, but there's more sneakers in the list that one has to scroll to see more.

My question is how should I add in the scroll function to make it work?

URL GOTO=javascript:window.scrollBy(0,20000)

^^^this one doesn't work in the sense that it will only scroll down maybe 20 rows more but it doesn't help in the big picture.

Thanks for all of your helps and time and effort!
bw13
 
Posts: 2
Joined: Sun Apr 03, 2016 3:38 am

Re: Scroll Ajax and Extract data

by chivracq on Sun Apr 03, 2016 6:42 am

bw13 wrote:
Code: Select all
iMacros for Firefox 9.0.0b2
WIN 10 64
FF 45.0.1


Say I want to get some sneaker's price on a website and the result is dynamic, I have to scroll down if I want to get the data, I have attached my code.

My Code:
Code: Select all
TAB T=1
TAB CLOSEALLOTHERS
SET !ERRORIGNORE YES
URL GOTO=https://stockx.com/sneakers/most-popular?view=list
WAIT SECONDS=2
TAG POS={{!LOOP}} TYPE=A ATTR=TXT:*&&HREF:*&&DATA-REACTID:.4.0.1.0.1.2.1.1.1.0:*
WAIT SECONDS=2
TAG POS=1 TYPE=TITLE ATTR=* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Test.csv


If you the run the code it will go to the website and click into each sneaker and get the title and back and get next one and so on and so forth, but there's more sneakers in the list that one has to scroll to see more.

My question is how should I add in the scroll function to make it work?

Code: Select all
URL GOTO=javascript:window.scrollBy(0,20000)


^^^this one doesn't work in the sense that it will only scroll down maybe 20 rows more but it doesn't help in the big picture.

Thanks for all of your helps and time and effort!

Hum..., interesting to see that you are using the v9 Beta 2 Version... It is supposed to still be pretty buggy... TechSup/Dev is gathering some Feedback...

Well, I would say, start your Macro with the Page already loaded, and don't reload it again for each Loop otherwise you "lose" the previous Scrollings, and I guess you can lower the '20000' Value, so that every Loop will scroll a little bit down..., just make sure the Scrolling anticipates the Height of the Rows for when it is needed...
Hum, and you maybe need to disable 'Scroll to Object when found' in the iMacros Options otherwise iMacros will make the Page jump up and down continuously which may give some undesirable Results.

Other Approach is using 'EVAL()' (+ 'ceil()' I guess), to compute dynamically the Value of the Scrolling, '0' for most Loops and '20000' only every 10 or 15 Loops.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6473
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Scroll Ajax and Extract data

by bw13 on Sun Apr 03, 2016 12:05 pm

Thank you Chivracq, for your prompt response!

I adopted the v9B version because it seemed I had no choice. I was also using the same code to extract price table (though I didn't attach the code in OP) and IE was extremely slow and unreliable, iMacros for FF was very fast and reliable however the table it pulled was so ugly and all swamped into 1 cell/1 line. Supposedly it was due to some delimiter issue. V9B is buggy however it's fast and can pull table in a desired format.


I have follow up question regarding your "and don't reload it again for each Loop otherwise you "lose" the previous Scrollings," comment - how should I adjust my code accordingly? I have to click into each page/link in the result to scrape the actual data for each shoes (In real setting I am scraping way more columns than just the title) and I have to "reload" page because essentially it's the "return to the list and start clicking next result" process.

Thank you in advance and sorry in advance as well for my poor description.




chivracq wrote:
bw13 wrote:
Code: Select all
iMacros for Firefox 9.0.0b2
WIN 10 64
FF 45.0.1


Say I want to get some sneaker's price on a website and the result is dynamic, I have to scroll down if I want to get the data, I have attached my code.

My Code:
Code: Select all
TAB T=1
TAB CLOSEALLOTHERS
SET !ERRORIGNORE YES
URL GOTO=https://stockx.com/sneakers/most-popular?view=list
WAIT SECONDS=2
TAG POS={{!LOOP}} TYPE=A ATTR=TXT:*&&HREF:*&&DATA-REACTID:.4.0.1.0.1.2.1.1.1.0:*
WAIT SECONDS=2
TAG POS=1 TYPE=TITLE ATTR=* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Test.csv


If you the run the code it will go to the website and click into each sneaker and get the title and back and get next one and so on and so forth, but there's more sneakers in the list that one has to scroll to see more.

My question is how should I add in the scroll function to make it work?

Code: Select all
URL GOTO=javascript:window.scrollBy(0,20000)


^^^this one doesn't work in the sense that it will only scroll down maybe 20 rows more but it doesn't help in the big picture.

Thanks for all of your helps and time and effort!

Hum..., interesting to see that you are using the v9 Beta 2 Version... It is supposed to still be pretty buggy... TechSup/Dev is gathering some Feedback...

Well, I would say, start your Macro with the Page already loaded, and don't reload it again for each Loop otherwise you "lose" the previous Scrollings, and I guess you can lower the '20000' Value, so that every Loop will scroll a little bit down..., just make sure the Scrolling anticipates the Height of the Rows for when it is needed...
Hum, and you maybe need to disable 'Scroll to Object when found' in the iMacros Options otherwise iMacros will make the Page jump up and down continuously which may give some undesirable Results.

Other Approach is using 'EVAL()' (+ 'ceil()' I guess), to compute dynamically the Value of the Scrolling, '0' for most Loops and '20000' only every 10 or 15 Loops.
bw13
 
Posts: 2
Joined: Sun Apr 03, 2016 3:38 am

Re: Scroll Ajax and Extract data

by chivracq on Sun Apr 03, 2016 1:03 pm

bw13 wrote:Thank you Chivracq, for your prompt response!

I adopted the v9B version because it seemed I had no choice. I was also using the same code to extract price table (though I didn't attach the code in OP) and IE was extremely slow and unreliable, iMacros for FF was very fast and reliable however the table it pulled was so ugly and all swamped into 1 cell/1 line. Supposedly it was due to some delimiter issue. V9B is buggy however it's fast and can pull table in a desired format.

Ah OK, interesting Difference in Table Extraction to know...
You could still use v8.9.6 and its "ugly" 1 Cell Extraction, there is an easy Workaround which is to manually delete from Notepad the completely Leading and completely Trailing Double Quotes in your 'SAVEAS' and I gave another way in some other Thread maybe 2 weeks ago to play with the 'Import CSV' Options in Excel/OpenOffice Calc to manage to get the Data displayed in different Cells and not just one Cell...

bw13 wrote:I have follow up question regarding your "and don't reload it again for each Loop otherwise you "lose" the previous Scrollings," comment - how should I adjust my code accordingly? I have to click into each page/link in the result to scrape the actual data for each shoes (In real setting I am scraping way more columns than just the title) and I have to "reload" page because essentially it's the "return to the list and start clicking next result" process.

Thank you in advance and sorry in advance as well for my poor description.


OK, I had only run your Script to load the Page and I didn't realize that you were extracting after clicking on a Row which would load another Page and you then need to come back to the List, losing any previous Scrolling thus...

To avoid losing the Scrolling on the List = Main Page, some easy way would be to run your Macro on 2 Tabs, the first Tab for the List and the 2nd Tab for each Row and all Extractions:
Code: Select all
TAB T=1
TAB CLOSEALLOTHERS
SET !ERRORIGNORE YES

'URL GOTO=https://stockx.com/sneakers/most-popular?view=list
'WAIT SECONDS=2
TAG POS={{!LOOP}} TYPE=A ATTR=TXT:*&&HREF:*&&DATA-REACTID:.4.0.1.0.1.2.1.1.1.0:* EXTRACT=HREF

TAB OPEN
TAB T=2
URL GOTO={{!EXTRACT}}
'WAIT SECONDS=2
TAG POS=1 TYPE=TITLE ATTR=* EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=Test.csv
TAB CLOSE

TAB T=1
URL GOTO=javascript:window.scrollBy(0,1500)
'WAIT SECONDS=1
(Not tested..., I've only checked that 'EXTRACT=HREF' indeed extracts the correct URL...)

The 'TAB CLOSE' is not really needed because of your 'TAB CLOSEALLOTHERS', more efficient would be to disable 'TAB CLOSEALLOTHERS' and just make sure you have a Tab available as Tab_2, when you start the Macro on Tab_1 after you've loaded the Page, then you don't need the constant 'TAB OPEN' + 'TAB CLOSE'...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6473
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 10 guests

-->