extract DIV text with (maybe) relative positioning

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.
Post Reply
florina
Posts: 3
Joined: Mon Jun 19, 2017 12:33 pm

extract DIV text with (maybe) relative positioning

Post by florina » Tue Nov 20, 2018 11:28 am

Hello

I have the following structure:

Code: Select all

<div class="left">
  <h2>title</h2>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
</div>
... about 30 times

Unfortunately <div class="left"> can be positioned in various places in the page, so I cannot use XPath to get to it or to any of the inside divs
There's also no telling what comes next. Lastly the <div class="cls"> can be found in various other places in the page.
So is there a way to extract the text of the divs inside?
Ideally, is there a way to say extract the text from body > div.content > div.left > div:nth-child(2) or (3) or (30)?

The whole structure
would be something like:

Code: Select all

body
  content
    div.title
    div.intro text
    div.short abstract
    div.gallery
    div.left
    div.right
    div.links
    div.bio
    div.more text
  /content
/body
The only ones that will always appear on any page are the div.title and div.left
chivracq
Posts: 7722
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: extract DIV text with (maybe) relative positioning

Post by chivracq » Tue Nov 20, 2018 12:01 pm

florina wrote:Hello

I have the following structure:

Code: Select all

<div class="left">
  <h2>title</h2>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
  <div class="cls">
    <a>link</a>
    some text
  </div>
</div>
... about 30 times

Unfortunately <div class="left"> can be positioned in various places in the page, so I cannot use XPath to get to it or to any of the inside divs
There's also no telling what comes next. Lastly the <div class="cls"> can be found in various other places in the page.
So is there a way to extract the text of the divs inside?
Ideally, is there a way to say extract the text from body > div.content > div.left > div:nth-child(2) or (3) or (30)?

The whole structure
would be something like:

Code: Select all

body
  content
    div.title
    div.intro text
    div.short abstract
    div.gallery
    div.left
    div.right
    div.links
    div.bio
    div.more text
  /content
/body
The only ones that will always appear on any page are the div.title and div.left
CIM...! :mrgreen: (=> ... For me to have a look, read my Sig...)

Oh...!, but hum..., your previous Thread has been waiting for some Follow-up from your Side for about 1.5 years..., oops...! :shock:
You'll first need to follow up on and to "finish" your previous Thread "a bit correctly" for me to want to help you again..., and maybe you can bump this one in 1.5 years if you are still looking for a Solution then... :idea:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
florina
Posts: 3
Joined: Mon Jun 19, 2017 12:33 pm

Re: extract DIV text with (maybe) relative positioning

Post by florina » Tue Nov 20, 2018 1:23 pm

I'll check the last thread, but maybe not today.
So the answer for this problem, as far as I can tell, is to switch to Python. I'd rather not, but if it works it works :mrgreen:
chivracq
Posts: 7722
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: extract DIV text with (maybe) relative positioning

Post by chivracq » Tue Nov 20, 2018 1:31 pm

florina wrote:I'll check the last thread, but maybe not today.
So the answer for this problem, as far as I can tell, is to switch to Python. I'd rather not, but if it works it works :mrgreen:
OK, I'll see your Update (in the previous Thread)...

For this current one, no idea, I'll only read it after you'll have handled the previous one, and mentioned your FCI in this one..., but I don't see any correlation between extracting a 'DIV' with iMacros (Standard and Core Functionality for iMacros) and switching to Python...! Sounds like "killing a fly with a bazooka" to me...! :shock:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
Post Reply