extract DIV text with (maybe) relative positioning

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
florina
Posts: 8
Joined: Mon Jun 19, 2017 12:33 pm

extract DIV text with (maybe) relative positioning

Post by florina » Tue Nov 20, 2018 11:28 am

Hello

I have the following structure:

Code: Select all

<div class="left">
  <h2>title</h2>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
</div>
... about 30 times

Unfortunately <div class="left"> can be positioned in various places in the page, so I cannot use XPath to get to it or to any of the inside divs
There's also no telling what comes next. Lastly the <div class="cls"> can be found in various other places in the page.
So is there a way to extract the text of the divs inside?
Ideally, is there a way to say extract the text from body > div.content > div.left > div:nth-child(2) or (3) or (30)?

The whole structure
would be something like:

Code: Select all

body
  content
    div.title
    div.intro text
    div.short abstract
    div.gallery
    div.left
    div.right
    div.links
    div.bio
    div.more text
  /content
/body
The only ones that will always appear on any page are the div.title and div.left
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: extract DIV text with (maybe) relative positioning

Post by chivracq » Tue Nov 20, 2018 12:01 pm

florina wrote:Hello

I have the following structure:

Code: Select all

<div class="left">
  <h2>title</h2>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
</div>
... about 30 times

Unfortunately <div class="left"> can be positioned in various places in the page, so I cannot use XPath to get to it or to any of the inside divs
There's also no telling what comes next. Lastly the <div class="cls"> can be found in various other places in the page.
So is there a way to extract the text of the divs inside?
Ideally, is there a way to say extract the text from body > div.content > div.left > div:nth-child(2) or (3) or (30)?

The whole structure
would be something like:

Code: Select all

body
  content
    div.title
    div.intro text
    div.short abstract
    div.gallery
    div.left
    div.right
    div.links
    div.bio
    div.more text
  /content
/body
The only ones that will always appear on any page are the div.title and div.left
CIM...! :mrgreen: (=> ... For me to have a look, read my Sig...)

Oh...!, but hum..., your previous Thread has been waiting for some Follow-up from your Side for about 1.5 years..., oops...! :shock:
You'll first need to follow up on and to "finish" your previous Thread "a bit correctly" for me to want to help you again..., and maybe you can bump this one in 1.5 years if you are still looking for a Solution then... :idea:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
florina
Posts: 8
Joined: Mon Jun 19, 2017 12:33 pm

Re: extract DIV text with (maybe) relative positioning

Post by florina » Tue Nov 20, 2018 1:23 pm

I'll check the last thread, but maybe not today.
So the answer for this problem, as far as I can tell, is to switch to Python. I'd rather not, but if it works it works :mrgreen:
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: extract DIV text with (maybe) relative positioning

Post by chivracq » Tue Nov 20, 2018 1:31 pm

florina wrote:I'll check the last thread, but maybe not today.
So the answer for this problem, as far as I can tell, is to switch to Python. I'd rather not, but if it works it works :mrgreen:
OK, I'll see your Update (in the previous Thread)...

For this current one, no idea, I'll only read it after you'll have handled the previous one, and mentioned your FCI in this one..., but I don't see any correlation between extracting a 'DIV' with iMacros (Standard and Core Functionality for iMacros) and switching to Python...! Sounds like "killing a fly with a bazooka" to me...! :shock:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Post Reply