Stopping the word [EXTRACT] from appearing in extracted text

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.

Moderators: Community Moderators, iMacros Moderators

Forum rules
Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the Google search box (at the top of each forum page) to see if a similar problem or question has already been addressed. This will search the entire contents of the forums as well as the iMacros Wiki.
3. We can respond much faster to your posts if you include the following information:

CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST

Answering your own posts (e.g. attempting to "bump" your topic) drops your topic from the list of unanswered threads, so it may actually receive less views.

Stopping the word [EXTRACT] from appearing in extracted text

by jackofalltrades on Thu Feb 01, 2018 5:03 pm

Hi guys

Really simple question, hope it's equally simple to answer!

I am trying to extract a list of postcodes from a website as follows:

TAG POS=1 TYPE=TD ATTR=CLASS:POSCTODE&&TXT:* EXTRACT=TXT
ADD !EXTRACT <BR><LF>
TAG POS=2 TYPE=TD ATTR=CLASS:POSCTODE&&TXT:* EXTRACT=TXT
ADD !EXTRACT <BR><LF>
TAG POS=3 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT

But when I retrieve it, although I get the data with carriage returns, the word [EXTRACT] is included free of charge on every line!!

What do I need to fix it?

Thanks
Jack
jackofalltrades
 
Posts: 13
Joined: Wed Jan 24, 2018 9:37 am

Re: Stopping the word [EXTRACT] from appearing in extracted

by chivracq on Thu Feb 01, 2018 5:50 pm

jackofalltrades wrote:Hi guys

Really simple question, hope it's equally simple to answer!

I am trying to extract a list of postcodes from a website as follows:

Code: Select all
TAG POS=1 TYPE=TD ATTR=CLASS:POSCTODE&&TXT:* EXTRACT=TXT
ADD !EXTRACT <BR><LF>
TAG POS=2 TYPE=TD ATTR=CLASS:POSCTODE&&TXT:* EXTRACT=TXT
ADD !EXTRACT <BR><LF>
TAG POS=3 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT


But when I retrieve it, although I get the data with carriage returns, the word [EXTRACT] is included free of charge on every line!!

What do I need to fix it?

Thanks
Jack

Hum..., you should maybe first follow up on your previous Post from 1 week ago... :idea:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6967
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

by jackofalltrades on Thu Feb 01, 2018 11:31 pm

Now replied to my previous thread :D
jackofalltrades
 
Posts: 13
Joined: Wed Jan 24, 2018 9:37 am

Re: Stopping the word [EXTRACT] from appearing in extracted

by chivracq on Fri Feb 02, 2018 6:31 am

jackofalltrades wrote:Now replied to my previous thread :D

Okay, good-good..., but before even checking it, you probably missed the "(F)CIM" part, which applies to this current Thread as well of course, and to all Threads you'll open or post for the first time in, actually...
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6967
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

by jackofalltrades on Sun Feb 04, 2018 5:03 am

Now fixed this by using VAR1, VAR2 etc.

TAG POS=14 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
SET !VAR1 {{!EXTRACT}}
SET !EXTRACT NULL

Now have a different problem - can't work out the correct syntax to ignore duplicates between a variable and the previous one (eg where VAR2 would be the same as VAR1).

So when I come to SET VAR2, I want to compare !EXTRACT with VAR1. If it's the same, I want to just set VAR2 as <BR><LF> so basically a carriage return. However where !EXTRACT is different from !VAR1, then I want to set !VAR2 to be the same as !EXTRACT.

The following doesn't work:

TAG POS=14 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
SET !VAR1 {{!EXTRACT}}
SET !EXTRACT NULL
TAG POS=15 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
SET !VAR2 EVAL("var s=\"{{!EXTRACT}}\";"var h=\"{{!VAR1}}\";var d=<BR><LF>;if((h=s) d; else h;)
SET !EXTRACT NULL

Any ideas please?
Last edited by jackofalltrades on Mon Feb 05, 2018 4:41 am, edited 1 time in total.
jackofalltrades
 
Posts: 13
Joined: Wed Jan 24, 2018 9:37 am

Re: Stopping the word [EXTRACT] from appearing in extracted

by chivracq on Sun Feb 04, 2018 2:54 pm

jackofalltrades wrote:Now fixed this by using VAR1, VAR2 etc.

Code: Select all
TAG POS=14 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
SET !VAR1 {{!EXTRACT}}
SET !EXTRACT NULL


Now have a different problem - can't work out the correct syntax to ignore duplicates between a variable and the previous one (eg where VAR2 would be the same as VAR1).

So when I come to SET VAR2, I want to compare !EXTRACT with VAR1. If it's the same, I want to just set VAR2 as <BR><LF> so basically a carriage return. However where !EXTRACT is different from !VAR1, then I want to set !VAR2 to be the same as !EXTRACT.

The following doesn't work:

Code: Select all
TAG POS=14 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
SET !VAR1 {{!EXTRACT}}
SET !EXTRACT NULL
TAG POS=15 TYPE=TD ATTR=CLASS:POTCODE&&TXT:* EXTRACT=TXT
SET !VAR2 EVAL("var s=\"{{!EXTRACT}}\";"var h=\"{{!VAR1}}\";var d=<BR><LF>;if((h=s) d; else h;)
SET !EXTRACT NULL


Any ideas please?

Hum, very good... Using a Temp Var is indeed one of the Solutions for your original Qt... :D

But sorry, for the rest, I won't help you until you mention your FCI, like I've already asked 3 or 4 times..., last time I react in one of your Threads I guess, if mentioning 3 Versions about your Env. is so "complicated"..., but you would/could already have gotten your Answer(s) nearly 3 days ago..., I guess you are not in a hurry then..., OK, fair enough... :shock:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6967
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

by jackofalltrades on Sun Feb 04, 2018 5:10 pm

Hi Chivracq, sorry I missed the point here. I can't give details of the website as the relevant pages are behind passwords etc which I obviously can't give out.

However I think you mean the config details on my machine, as follows:

Windows 10
iMacros Browser V12.0.501.6698

Let me know if you need any other details :D
jackofalltrades
 
Posts: 13
Joined: Wed Jan 24, 2018 9:37 am

Re: Stopping the word [EXTRACT] from appearing in extracted

by chivracq on Sun Feb 04, 2018 7:21 pm

jackofalltrades wrote:Hi Chivracq, sorry I missed the point here. I can't give details of the website as the relevant pages are behind passwords etc which I obviously can't give out.

However I think you mean the config details on my machine, as follows:

Code: Select all
Windows 10
iMacros Browser V12.0.501.6698


Let me know if you need any other details :D

He-he...! Good, we finally have your FCI (Full Config Info)...!, so yep, that's what I meant... I don't answer Threads if that Required Info is not mentioned, many Commands are not supported in all Browsers/Versions, or got broken or only implemented from some specific Version...
I won't even react next time you open a Thread (or post for the first time in some existing Thread) if you don't include it in your OP...

So, OK, back to your Thread...
Concerning your first/original Qt, yep, like you've found yourself, using a Temp "normal" Var to handle the 'ADD' part is indeed a Solution as this Command is supposed to be a "smart" Command and "behaves" differently following the "Type" of Var(s) you are using and its/their Content if it identifies it/them as String or Number, doing some Mathematical Addition or Subtraction for Numbers and String Concatenation for Strings.
(The "-0" btw is buggy and is treated like a String...)

And when applied to the '!EXTRACT' Built-in Var, it indeed adds a '[EXTRACT]' as a kind of "Separator" that will serve to add an extra Cell when using the 'SAVEAS TYPE=EXTRACT' Mechanism...

Another Method could have been to do a Global 'replace()' on the Full '!EXTRACT' at the end of all your Extracts, just before you want to reuse its Content. That could be done with just one 'EVAL()' Statement instead of the 3 or 4 Statements needed each time for each 'EXTRACT' in your Script by using a Temp Var.

OK, now concerning your 2nd Qt and your 'EVAL()' statement, I think I see a few Syntax Errors indeed and you use some sort of Syntax that I don't really know myself and therefore don't use personally... It might be correct, but before we go any further, can you confirm/demystify the Difference between the 'TAG POS=14' and 'TAG POS=15' Statement for the Class Name...?: :?
"CLASS:POSTCODE" <> "CLASS:POSCODE" (The 'T' is missing in the 2nd one...!)
Is there a Typo there or do you really have 2 different Class Names...?

And maybe that would help also if you mentioned the Error that you must be getting, even if I already have some "good idea" what might be wrong in your Syntax, or at least I know that "my" Syntax will work... :idea:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6967
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

by jackofalltrades on Mon Feb 05, 2018 4:40 am

Great, many thanks for your help so far.

Sorry yes it’s a typo. Will correct it now. It’s only a typo on here not my actual code.
Last edited by jackofalltrades on Mon Feb 05, 2018 4:44 am, edited 1 time in total.
jackofalltrades
 
Posts: 13
Joined: Wed Jan 24, 2018 9:37 am

Re: Stopping the word [EXTRACT] from appearing in extracted

by jackofalltrades on Mon Feb 05, 2018 4:42 am

Just to add, ideally I also want a method to remove ANY duplicates, so eg if VAR4 is the same as VAR1
jackofalltrades
 
Posts: 13
Joined: Wed Jan 24, 2018 9:37 am

Re: Stopping the word [EXTRACT] from appearing in extracted

by chivracq on Mon Feb 05, 2018 9:18 am

jackofalltrades wrote:Great, many thanks for your help so far.

Sorry yes it’s a typo. Will correct it now. It’s only a typo on here not my actual code.

Hum, OK... Dunno how you managed to squeeze a Typo in the middle of an Attribute (especially the Class Name!) in a 'TAG' Statement, that's a place where you would only "work" with Copy&Paste I would think, especially as you already had the same Statement on 'POS=14' that was correct, and where the only Editing sometimes needed is to replace some parts with Wildcards, but you never need to (re)type manually any part normally... :?

I had asked if you could mention the RuntimeError you get, you seem to have a very "selective" Reading, you need to be a bit more "precise" on a Tech Forum, and a bit more "proactive" with Info... But OK, never mind...

jackofalltrades wrote:Just to add, ideally I also want a method to remove ANY duplicates, so eg if VAR4 is the same as VAR1

Ah...!, that's a bit of some "proactive" Info, good-good...! But..., that changes the "Scope" of your 2nd Qt, and you are now "surfing" a bit far away from your original Qt...! The Solution to this 3rd Qt would very probably be interesting for other Users, ... who very probably wouldn't be able to locate it here in the middle/end of this current Thread because the Thread Title is about stg else...
=> I will still answer/follow up on Qt_2, but you can better open a new Thread for Qt_3, with a Descriptive Thread Title and all Required Info (=> FCI included...!) to make it a Standalone Thread... And I think that Qt btw has already been asked, I think I remember answering such a (similar) Thread already... (or doing the "Thinking" already, several Solutions come directly to my Mind...)

OK, concerning Qt_2 then, like I already said, you use a Syntax that I don't use myself (for the 'if/else'), maybe it's correct, but you didn't close the 'EVAL()' Statement correctly, so you can first try if this one works:
Code: Select all
SET !VAR2 EVAL("var s=\"{{!EXTRACT}}\";"var h=\"{{!VAR1}}\";var d=<BR><LF>;if((h=s) d; else h;);")

Hum..., I'm afraid not...

OK, the Syntax I would use myself...:
Code: Select all
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='<BR><LF>'; var z; if(s==h){z=d;} else{z=h;}; z;")

Not tested but the Syntax should be correct, I think...

The "<BR><LF>" part might prove to be a bit "problematic" because of its "special" Functionality that also applies inside an 'EVAL()' Statement I think, the '<'/'>' Chars might need to be escaped..., or it might work more easily if you to first declare those Special Chars outside the 'EVAL()':
Code: Select all
SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s==h){z=d;} else{z=h;}; z;")

... or:
Code: Select all
SET BR_LF "\<BR\>\<LF\>"
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s==h){z=d;} else{z=h;}; z;")

I expect one of the last 2 to work, if the first one didn't work directly...

Well, good luck, and post the "Results"... :wink:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6967
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

by jackofalltrades on Mon Feb 05, 2018 5:33 pm

Many thanks for your help, this one worked fine:

SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s==h){z=d;} else{z=h;}; z;")


Except I just had to change it to else{z=s;} instead of else{z=h;}

I will log a new post with the question about multiple recurrences of the same text.

I am also wondering how to replace #EANF# with a carriage return - again I will post that separately.

Thanks again. Really good.
jackofalltrades
 
Posts: 13
Joined: Wed Jan 24, 2018 9:37 am

Re: Stopping the word [EXTRACT] from appearing in extracted

by chivracq on Mon Feb 05, 2018 7:25 pm

jackofalltrades wrote:Many thanks for your help, this one worked fine:

Code: Select all
SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s==h){z=d;} else{z=h;}; z;")


Except I just had to change it to else{z=s;} instead of else{z=h;}

I will log a new post with the question about multiple recurrences of the same text.

I am also wondering how to replace #EANF# with a carriage return - again I will post that separately.

Thanks again. Really good.

OK, Thanks for the Feedback, and good to hear that one Solution at least works, Special Chars can sometimes be a bit tricky and often need some trial and error Attempts...

About the Change you had to make, yep..., I didn't change the "Logic" you had already in your 'if/else' Statement, I only converted "your" Syntax to a Syntax I use myself and that I know would work... and is easy to understand and to reuse/adapt for other Needs...

=> That very same 'EVAL()' can btw probably easily be modified a bit to include the extra '#EANF#' Condition with and extra 'if/else', stg like:
Code: Select all
SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s=='#EANF#'){z=d;} else if(s==h){z=d;} else{z=s;}; z;")

... or:
Code: Select all
SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if((s=='#EANF#')||(s==h)){z=d;} else{z=s;}; z;")
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6967
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

by jackofalltrades on Tue Feb 06, 2018 2:50 pm

Excellent, that further code works perfectly as well, and removes the #EANF# entries.

SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s=='#EANF#'){z=d;} else if(s==h){z=d;} else{z=s;}; z;")

Many thanks again.
jackofalltrades
 
Posts: 13
Joined: Wed Jan 24, 2018 9:37 am

Re: Stopping the word [EXTRACT] from appearing in extracted

by chivracq on Fri Feb 09, 2018 8:08 pm

jackofalltrades wrote:Excellent, that further code works perfectly as well, and removes the #EANF# entries.

SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s=='#EANF#'){z=d;} else if(s==h){z=d;} else{z=s;}; z;")

Many thanks again.

Yeah, was of course the Purpose...! :wink:

Still waiting for your (new) Thread about "avoiding" existing Elements in '!EXTRACT' btw... :(
Or did you find out by yourself how to handle that...? :D
The "Subject" is quite interesting btw, and even if you found a Solution by yourself, it would still be useful to open a Thread to share it, and there are several Solutions ah-ah...!, so I'm a bit "curious" which one you found, and how you implemented it of course...! :wink:
- (F)CIM = (Full) Config Info Missing: iMacros + Browser + OS with all 3 Versions...
- I usually don't even read the Question if that (required) Info is not mentioned...
- Script & URL usually help a lot for a more "educated" Help...
chivracq
 
Posts: 6967
Joined: Sat Apr 13, 2013 6:07 am
Location: Amsterdam (NL)


Return to Data Extraction and Web Screen Scraping

Who is online

Users browsing this forum: No registered users and 2 guests

-->