Scrape elements which are generated by php code

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
wootshuska
Posts: 23
Joined: Mon Mar 07, 2016 12:04 am

Scrape elements which are generated by php code

Post by wootshuska » Fri Jan 06, 2017 12:02 pm

Hi,
I want to scrape content from webpage which is not visible in Source code. It's probably generated by php script, so I cant find it looking at source code. Of course I can click "inspect element" and see the html code of elements I want to extract. What I also can do is clicking at them.

Event clicks are very similar:

Code: Select all

EVENT TYPE=CLICK SELECTOR="HTML>BODY>DIV>DIV:nth-of-type(2)>DIV>DIV>DIV>DIV>DIV:nth-of-type(5)>DIV:nth-of-type(2)>DIV>DIV>DIV:nth-of-type(3)>DIV:nth-of-type(1)>DIV>TABLE>TBODY>TR>TD:nth-of-type(2)>DIV:nth-of-type(3)>DIV:nth-of-type(2)" BUTTON=0
EVENT TYPE=CLICK SELECTOR="HTML>BODY>DIV>DIV:nth-of-type(2)>DIV>DIV>DIV>DIV>DIV:nth-of-type(5)>DIV:nth-of-type(2)>DIV>DIV>DIV:nth-of-type(3)>DIV:nth-of-type(2)>DIV>TABLE>TBODY>TR>TD:nth-of-type(2)>DIV:nth-of-type(3)>DIV:nth-of-type(2)" BUTTON=0
EVENT TYPE=CLICK SELECTOR="HTML>BODY>DIV>DIV:nth-of-type(2)>DIV>DIV>DIV>DIV>DIV:nth-of-type(5)>DIV:nth-of-type(2)>DIV>DIV>DIV:nth-of-type(3)>DIV:nth-of-type(3)>DIV>TABLE>TBODY>TR>TD:nth-of-type(2)>DIV:nth-of-type(3)>DIV:nth-of-type(2)" BUTTON=0
EVENT TYPE=CLICK SELECTOR="HTML>BODY>DIV>DIV:nth-of-type(2)>DIV>DIV>DIV>DIV>DIV:nth-of-type(5)>DIV:nth-of-type(2)>DIV>DIV>DIV:nth-of-type(3)>DIV:nth-of-type(4)>DIV>TABLE>TBODY>TR>TD:nth-of-type(2)>DIV:nth-of-type(3)>DIV:nth-of-type(2)" BUTTON=0
As you can see only 1 number is changing. My first question: Is it possible to extract href from these selectors?

When I click "inspect element" and go to the element I want to extract this is what I see:

Code: Select all

<div class="gs-per-result-labels" url="http://this-is-what-i-want-to-extract.html"></div>
I tried by scraping by XPATH but it seem not to work (I think because imacros cant find these elements in source code). Do you have any tips for me?


MY OS:
OS: Windows 7 N service pack 1 64 bit
Intel Core i7-4700MQ
16gb ram
gt 755m

Firefox 50.1.0
iMacros for Firefox 9.0.3
VERSION BUILD=9030808 RECORDER=FX
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Scrape elements which are generated by php code

Post by chivracq » Fri Jan 06, 2017 1:37 pm

wootshuska wrote:Hi,
I want to scrape content from webpage which is not visible in Source code. It's probably generated by php script, so I cant find it looking at source code. Of course I can click "inspect element" and see the html code of elements I want to extract. What I also can do is clicking at them.

Event clicks are very similar:

Code: Select all

EVENT TYPE=CLICK SELECTOR="HTML>BODY>DIV>DIV:nth-of-type(2)>DIV>DIV>DIV>DIV>DIV:nth-of-type(5)>DIV:nth-of-type(2)>DIV>DIV>DIV:nth-of-type(3)>DIV:nth-of-type(1)>DIV>TABLE>TBODY>TR>TD:nth-of-type(2)>DIV:nth-of-type(3)>DIV:nth-of-type(2)" BUTTON=0
EVENT TYPE=CLICK SELECTOR="HTML>BODY>DIV>DIV:nth-of-type(2)>DIV>DIV>DIV>DIV>DIV:nth-of-type(5)>DIV:nth-of-type(2)>DIV>DIV>DIV:nth-of-type(3)>DIV:nth-of-type(2)>DIV>TABLE>TBODY>TR>TD:nth-of-type(2)>DIV:nth-of-type(3)>DIV:nth-of-type(2)" BUTTON=0
EVENT TYPE=CLICK SELECTOR="HTML>BODY>DIV>DIV:nth-of-type(2)>DIV>DIV>DIV>DIV>DIV:nth-of-type(5)>DIV:nth-of-type(2)>DIV>DIV>DIV:nth-of-type(3)>DIV:nth-of-type(3)>DIV>TABLE>TBODY>TR>TD:nth-of-type(2)>DIV:nth-of-type(3)>DIV:nth-of-type(2)" BUTTON=0
EVENT TYPE=CLICK SELECTOR="HTML>BODY>DIV>DIV:nth-of-type(2)>DIV>DIV>DIV>DIV>DIV:nth-of-type(5)>DIV:nth-of-type(2)>DIV>DIV>DIV:nth-of-type(3)>DIV:nth-of-type(4)>DIV>TABLE>TBODY>TR>TD:nth-of-type(2)>DIV:nth-of-type(3)>DIV:nth-of-type(2)" BUTTON=0
As you can see only 1 number is changing. My first question: Is it possible to extract href from these selectors?

When I click "inspect element" and go to the element I want to extract this is what I see:

Code: Select all

<div class="gs-per-result-labels" url="http://this-is-what-i-want-to-extract.html"></div>
I tried by scraping by XPATH but it seem not to work (I think because imacros cant find these elements in source code). Do you have any tips for me?

Code: Select all

MY OS:
OS: Windows 7 N service pack 1 64 bit
Intel Core i7-4700MQ
16gb ram
gt 755m

Firefox 50.1.0
iMacros for Firefox 9.0.3
VERSION BUILD=9030808 RECORDER=FX
Hum, would be easier if you had provided to URL of your Page to have a look..., or some HTML Saveas of the Page uploaded to your Thread if it's behind Login&Password (zipped, Max 256Kb), but OK...

If the 'EVENT' Mode is able to "see"/tag your Elements, I would expect the 'TAG' Mode to "see" them as well...
I have a (somewhat cumbersome!) way to "emulate" an 'EXTRACT=TXT' using the 'EVENT' Mode, but I'm not even sure that would work for your URL's...

What you can try:

Code: Select all

TAG POS=1 TYPE=DIV ATTR=CLASS:gs-per-result-labels EXTRACT=HREF
and/or:

Code: Select all

TAG POS=1 TYPE=DIV ATTR=CLASS:gs-per-result-labels EXTRACT=HTM
You can play with 'POS=n' but do those 2 Extract Statements already return stg?, i.e. not "#EANF#", and especially for 'EXTRACT=HTM', your URL should be part of the extracted Data, I would expect...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Post Reply