same elements ,different TAG POS=x across webpages. extract?

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

same elements ,different TAG POS=x across webpages. extract?

by peppe1 on Fri Mar 04, 2016 12:37 pm

Question: How to extract data with iMacros from a website whose TAG POS=x of the same element is variable between different webpages?

Dear iMacros community,

I wish to extract data from a website that contains multiple webpages by searching in the website according to a list of keywords defined in a datasource .csv.

iMacros should enter sequentially in each individual page, grab certain elements on each webpage and save data in a csv. The elements to be extracted are the same in between all webpages.

My problem is that the TAG POS=x does not remain the same for an element when moving from webpage to webpage.

e.g on a page a HTML TAG element has T
Code: Select all
AG POS=95 TYPE=SPAN ATTR=* EXTRACT=TXT
, while on other page same HTML TAG element changes to
Code: Select all
TAG POS=96 TYPE=SPAN ATTR=* EXTRACT=TXT

The only possibility I am thinking would be to pick the elements by their text attribute ( I mean their text).

Question:
Does the TXT parameter like TXT:Manufacturer (or eventually TXT:Manufacturer*) permits the selection without knowing the exact TAG POS=?


Is there other solution to make this kind of an extraction with iMacros?(variable position of the tag for the same html element across pages)

Thank you.
peppe1
 
Posts: 11
Joined: Fri Mar 04, 2016 12:32 pm

Re: same elements ,different TAG POS=x across webpages. extr

by chivracq on Fri Mar 04, 2016 1:07 pm

peppe1 wrote:Question: How to extract data with iMacros from a website whose TAG POS=x of the same element is variable between different webpages?

Dear iMacros community,

I wish to extract data from a website that contains multiple webpages by searching in the website according to a list of keywords defined in a datasource .csv.

iMacros should enter sequentially in each individual page, grab certain elements on each webpage and save data in a csv. The elements to be extracted are the same in between all webpages.

My problem is that the TAG POS=x does not remain the same for an element when moving from webpage to webpage.

e.g on a page a HTML TAG element has T
Code: Select all
AG POS=95 TYPE=SPAN ATTR=* EXTRACT=TXT
, while on other page same HTML TAG element changes to
Code: Select all
TAG POS=96 TYPE=SPAN ATTR=* EXTRACT=TXT

The only possibility I am thinking would be to pick the elements by their text attribute ( I mean their text).

Question:
Does the TXT parameter like TXT:Manufacturer (or eventually TXT:Manufacturer*) permits the selection without knowing the exact TAG POS=?


Is there other solution to make this kind of an extraction with iMacros?(variable position of the tag for the same html element across pages)

Thank you.

CIM...! :mrgreen:

Yep, using POS=95 or 96 with all Attributes muted with Wildcards is not very reliable...
You'll need indeed to specify some unique Attribute(s) in order to lower POS to POS=1 or to some constant Number, eventually with the Addition of Relative Positioning.
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6485
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: same elements ,different TAG POS=x across webpages. extr

by peppe1 on Sat Mar 05, 2016 12:28 am

thanks a lot for the fast reply and for the smart piece of advice.

I used a common ground in the H1 tag and used Relative Positioning against this common tag to match the other elements across different pages.
It is functioning.

have a great weekend.

George
peppe1
 
Posts: 11
Joined: Fri Mar 04, 2016 12:32 pm

Re: same elements ,different TAG POS=x across webpages. extr

by chivracq on Sat Mar 05, 2016 2:25 pm

peppe1 wrote:thanks a lot for the fast reply and for the smart piece of advice.

I used a common ground in the H1 tag and used Relative Positioning against this common tag to match the other elements across different pages.
It is functioning.

have a great weekend.

George

OK, glad it works. And nice that you follow up, even if you could have posted your Final Script as an Example for other Users...

Glad that I could help but you didn't understand/comply (with) "CIM", meaning I won't answer your other (future) Thread(s) until you do so (for all Threads), read my Sig... :idea:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6485
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: same elements ,different TAG POS=x across webpages. extr

by peppe1 on Sat Mar 05, 2016 3:14 pm

CIM:

OS: Windows 8.1 64 bit
iMacros 10.0.2

my solution:
Code: Select all
URL GOTO=<<Address>>
TAG POS=1 TYPE=H1 ATTR=TXT:*
TAG POS=R1 TYPE=LI ATTR=TXT:Description* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:Color* EXTRACT=TXT
TAG POS=R2 TYPE=P ATTR=* EXTRACT=TXT
peppe1
 
Posts: 11
Joined: Fri Mar 04, 2016 12:32 pm

Re: same elements ,different TAG POS=x across webpages. extr

by chivracq on Sat Mar 05, 2016 4:16 pm

peppe1 wrote:CIM:

Code: Select all
OS: Windows 8.1 64 bit
iMacros 10.0.2


my solution:
Code: Select all
URL GOTO=<<Address>>
TAG POS=1 TYPE=H1 ATTR=TXT:*
TAG POS=R1 TYPE=LI ATTR=TXT:Description* EXTRACT=TXT
TAG POS=R1 TYPE=LI ATTR=TXT:Color* EXTRACT=TXT
TAG POS=R2 TYPE=P ATTR=* EXTRACT=TXT

Well-well-well, thread looks (close to) perfect now...! :D
And you've even mentioned your FCI in your other Thread, perfect...

And your Script is a beautiful Example, with 3 Relative Positioning Statements in a row, nice Example... :twisted:
(I don't use the Term(s) "Double (or Triple) Relative Positioning" that I reserve for some Technique that I use when you have to use 'R1' and 'R-1' after each other in order to get out of an HTML Element (usually a DIV) if you want to be able to use Relative Positioning within that Element, search my Posts on those Terms if you want to learn more about that Technique, that could come in handy one day, if you ever need it...)

And I said "close to perfect", as you may have noticed that a Thread Title can only accept a Max Number of Chars, which makes that subsequent Replies to your Thread get truncated because of the "Re: " + Original Thread Title which makes that they (the Replies) won't be found by the Search Engine on the Forum if the last Word of your Title gets truncated and it was an "important" Keyword.
And it's nice as well once a Thread has been neatly finished with a working Solution, to add '[Solved]' to your original Thread Title, which takes another 9 Chars...!, as it makes the Thread interesting for other Users to read when they are searching the Forum with a similar Question... But it won't help in this case, so never mind...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6485
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 6 guests

cron
-->