Stopping the word [EXTRACT] from appearing in extracted text

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
jackofalltrades
Posts: 13
Joined: Wed Jan 24, 2018 4:37 pm

Stopping the word [EXTRACT] from appearing in extracted text

Post by jackofalltrades » Fri Feb 02, 2018 12:03 am

Hi guys

Really simple question, hope it's equally simple to answer!

I am trying to extract a list of postcodes from a website as follows:

TAG POS=1 TYPE=TD ATTR=CLASS:POSCTODE&&TXT:* EXTRACT=TXT
ADD !EXTRACT <BR><LF>
TAG POS=2 TYPE=TD ATTR=CLASS:POSCTODE&&TXT:* EXTRACT=TXT
ADD !EXTRACT <BR><LF>
TAG POS=3 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT

But when I retrieve it, although I get the data with carriage returns, the word [EXTRACT] is included free of charge on every line!!

What do I need to fix it?

Thanks
Jack
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by chivracq » Fri Feb 02, 2018 12:50 am

jackofalltrades wrote:Hi guys

Really simple question, hope it's equally simple to answer!

I am trying to extract a list of postcodes from a website as follows:

Code: Select all

TAG POS=1 TYPE=TD ATTR=CLASS:POSCTODE&&TXT:* EXTRACT=TXT
ADD !EXTRACT <BR><LF>
TAG POS=2 TYPE=TD ATTR=CLASS:POSCTODE&&TXT:* EXTRACT=TXT
ADD !EXTRACT <BR><LF>
TAG POS=3 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
But when I retrieve it, although I get the data with carriage returns, the word [EXTRACT] is included free of charge on every line!!

What do I need to fix it?

Thanks
Jack
Hum..., you should maybe first follow up on your previous Post from 1 week ago... :idea:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
jackofalltrades
Posts: 13
Joined: Wed Jan 24, 2018 4:37 pm

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by jackofalltrades » Fri Feb 02, 2018 6:31 am

Now replied to my previous thread :D
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by chivracq » Fri Feb 02, 2018 1:31 pm

jackofalltrades wrote:Now replied to my previous thread :D
Okay, good-good..., but before even checking it, you probably missed the "(F)CIM" part, which applies to this current Thread as well of course, and to all Threads you'll open or post for the first time in, actually...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
jackofalltrades
Posts: 13
Joined: Wed Jan 24, 2018 4:37 pm

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by jackofalltrades » Sun Feb 04, 2018 12:03 pm

Now fixed this by using VAR1, VAR2 etc.

TAG POS=14 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
SET !VAR1 {{!EXTRACT}}
SET !EXTRACT NULL

Now have a different problem - can't work out the correct syntax to ignore duplicates between a variable and the previous one (eg where VAR2 would be the same as VAR1).

So when I come to SET VAR2, I want to compare !EXTRACT with VAR1. If it's the same, I want to just set VAR2 as <BR><LF> so basically a carriage return. However where !EXTRACT is different from !VAR1, then I want to set !VAR2 to be the same as !EXTRACT.

The following doesn't work:

TAG POS=14 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
SET !VAR1 {{!EXTRACT}}
SET !EXTRACT NULL
TAG POS=15 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
SET !VAR2 EVAL("var s=\"{{!EXTRACT}}\";"var h=\"{{!VAR1}}\";var d=<BR><LF>;if((h=s) d; else h;)
SET !EXTRACT NULL

Any ideas please?
Last edited by jackofalltrades on Mon Feb 05, 2018 11:41 am, edited 1 time in total.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by chivracq » Sun Feb 04, 2018 9:54 pm

jackofalltrades wrote:Now fixed this by using VAR1, VAR2 etc.

Code: Select all

TAG POS=14 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
SET !VAR1 {{!EXTRACT}}
SET !EXTRACT NULL
Now have a different problem - can't work out the correct syntax to ignore duplicates between a variable and the previous one (eg where VAR2 would be the same as VAR1).

So when I come to SET VAR2, I want to compare !EXTRACT with VAR1. If it's the same, I want to just set VAR2 as <BR><LF> so basically a carriage return. However where !EXTRACT is different from !VAR1, then I want to set !VAR2 to be the same as !EXTRACT.

The following doesn't work:

Code: Select all

TAG POS=14 TYPE=TD ATTR=CLASS:POSTCODE&&TXT:* EXTRACT=TXT
SET !VAR1 {{!EXTRACT}}
SET !EXTRACT NULL
TAG POS=15 TYPE=TD ATTR=CLASS:POTCODE&&TXT:* EXTRACT=TXT
SET !VAR2 EVAL("var s=\"{{!EXTRACT}}\";"var h=\"{{!VAR1}}\";var d=<BR><LF>;if((h=s) d; else h;)
SET !EXTRACT NULL
Any ideas please?
Hum, very good... Using a Temp Var is indeed one of the Solutions for your original Qt... :D

But sorry, for the rest, I won't help you until you mention your FCI, like I've already asked 3 or 4 times..., last time I react in one of your Threads I guess, if mentioning 3 Versions about your Env. is so "complicated"..., but you would/could already have gotten your Answer(s) nearly 3 days ago..., I guess you are not in a hurry then..., OK, fair enough... :shock:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
jackofalltrades
Posts: 13
Joined: Wed Jan 24, 2018 4:37 pm

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by jackofalltrades » Mon Feb 05, 2018 12:10 am

Hi Chivracq, sorry I missed the point here. I can't give details of the website as the relevant pages are behind passwords etc which I obviously can't give out.

However I think you mean the config details on my machine, as follows:

Windows 10
iMacros Browser V12.0.501.6698

Let me know if you need any other details :D
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by chivracq » Mon Feb 05, 2018 2:21 am

jackofalltrades wrote:Hi Chivracq, sorry I missed the point here. I can't give details of the website as the relevant pages are behind passwords etc which I obviously can't give out.

However I think you mean the config details on my machine, as follows:

Code: Select all

Windows 10
iMacros Browser V12.0.501.6698
Let me know if you need any other details :D
He-he...! Good, we finally have your FCI (Full Config Info)...!, so yep, that's what I meant... I don't answer Threads if that Required Info is not mentioned, many Commands are not supported in all Browsers/Versions, or got broken or only implemented from some specific Version...
I won't even react next time you open a Thread (or post for the first time in some existing Thread) if you don't include it in your OP...

So, OK, back to your Thread...
Concerning your first/original Qt, yep, like you've found yourself, using a Temp "normal" Var to handle the 'ADD' part is indeed a Solution as this Command is supposed to be a "smart" Command and "behaves" differently following the "Type" of Var(s) you are using and its/their Content if it identifies it/them as String or Number, doing some Mathematical Addition or Subtraction for Numbers and String Concatenation for Strings.
(The "-0" btw is buggy and is treated like a String...)

And when applied to the '!EXTRACT' Built-in Var, it indeed adds a '[EXTRACT]' as a kind of "Separator" that will serve to add an extra Cell when using the 'SAVEAS TYPE=EXTRACT' Mechanism...

Another Method could have been to do a Global 'replace()' on the Full '!EXTRACT' at the end of all your Extracts, just before you want to reuse its Content. That could be done with just one 'EVAL()' Statement instead of the 3 or 4 Statements needed each time for each 'EXTRACT' in your Script by using a Temp Var.

OK, now concerning your 2nd Qt and your 'EVAL()' statement, I think I see a few Syntax Errors indeed and you use some sort of Syntax that I don't really know myself and therefore don't use personally... It might be correct, but before we go any further, can you confirm/demystify the Difference between the 'TAG POS=14' and 'TAG POS=15' Statement for the Class Name...?: :?
"CLASS:POSTCODE" <> "CLASS:POSCODE" (The 'T' is missing in the 2nd one...!)
Is there a Typo there or do you really have 2 different Class Names...?

And maybe that would help also if you mentioned the Error that you must be getting, even if I already have some "good idea" what might be wrong in your Syntax, or at least I know that "my" Syntax will work... :idea:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
jackofalltrades
Posts: 13
Joined: Wed Jan 24, 2018 4:37 pm

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by jackofalltrades » Mon Feb 05, 2018 11:40 am

Great, many thanks for your help so far.

Sorry yes it’s a typo. Will correct it now. It’s only a typo on here not my actual code.
Last edited by jackofalltrades on Mon Feb 05, 2018 11:44 am, edited 1 time in total.
jackofalltrades
Posts: 13
Joined: Wed Jan 24, 2018 4:37 pm

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by jackofalltrades » Mon Feb 05, 2018 11:42 am

Just to add, ideally I also want a method to remove ANY duplicates, so eg if VAR4 is the same as VAR1
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by chivracq » Mon Feb 05, 2018 4:18 pm

jackofalltrades wrote:Great, many thanks for your help so far.

Sorry yes it’s a typo. Will correct it now. It’s only a typo on here not my actual code.
Hum, OK... Dunno how you managed to squeeze a Typo in the middle of an Attribute (especially the Class Name!) in a 'TAG' Statement, that's a place where you would only "work" with Copy&Paste I would think, especially as you already had the same Statement on 'POS=14' that was correct, and where the only Editing sometimes needed is to replace some parts with Wildcards, but you never need to (re)type manually any part normally... :?

I had asked if you could mention the RuntimeError you get, you seem to have a very "selective" Reading, you need to be a bit more "precise" on a Tech Forum, and a bit more "proactive" with Info... But OK, never mind...
jackofalltrades wrote:Just to add, ideally I also want a method to remove ANY duplicates, so eg if VAR4 is the same as VAR1
Ah...!, that's a bit of some "proactive" Info, good-good...! But..., that changes the "Scope" of your 2nd Qt, and you are now "surfing" a bit far away from your original Qt...! The Solution to this 3rd Qt would very probably be interesting for other Users, ... who very probably wouldn't be able to locate it here in the middle/end of this current Thread because the Thread Title is about stg else...
=> I will still answer/follow up on Qt_2, but you can better open a new Thread for Qt_3, with a Descriptive Thread Title and all Required Info (=> FCI included...!) to make it a Standalone Thread... And I think that Qt btw has already been asked, I think I remember answering such a (similar) Thread already... (or doing the "Thinking" already, several Solutions come directly to my Mind...)

OK, concerning Qt_2 then, like I already said, you use a Syntax that I don't use myself (for the 'if/else'), maybe it's correct, but you didn't close the 'EVAL()' Statement correctly, so you can first try if this one works:

Code: Select all

SET !VAR2 EVAL("var s=\"{{!EXTRACT}}\";"var h=\"{{!VAR1}}\";var d=<BR><LF>;if((h=s) d; else h;);")
Hum..., I'm afraid not...

OK, the Syntax I would use myself...:

Code: Select all

SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='<BR><LF>'; var z; if(s==h){z=d;} else{z=h;}; z;")
Not tested but the Syntax should be correct, I think...

The "<BR><LF>" part might prove to be a bit "problematic" because of its "special" Functionality that also applies inside an 'EVAL()' Statement I think, the '<'/'>' Chars might need to be escaped..., or it might work more easily if you to first declare those Special Chars outside the 'EVAL()':

Code: Select all

SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s==h){z=d;} else{z=h;}; z;")
... or:

Code: Select all

SET BR_LF "\<BR\>\<LF\>"
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s==h){z=d;} else{z=h;}; z;")
I expect one of the last 2 to work, if the first one didn't work directly...

Well, good luck, and post the "Results"... :wink:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
jackofalltrades
Posts: 13
Joined: Wed Jan 24, 2018 4:37 pm

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by jackofalltrades » Tue Feb 06, 2018 12:33 am

Many thanks for your help, this one worked fine:

SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s==h){z=d;} else{z=h;}; z;")


Except I just had to change it to else{z=s;} instead of else{z=h;}

I will log a new post with the question about multiple recurrences of the same text.

I am also wondering how to replace #EANF# with a carriage return - again I will post that separately.

Thanks again. Really good.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by chivracq » Tue Feb 06, 2018 2:25 am

jackofalltrades wrote:Many thanks for your help, this one worked fine:

Code: Select all

SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s==h){z=d;} else{z=h;}; z;")
[/b]

Except I just had to change it to else{z=s;} instead of else{z=h;}

I will log a new post with the question about multiple recurrences of the same text.

I am also wondering how to replace #EANF# with a carriage return - again I will post that separately.

Thanks again. Really good.
OK, Thanks for the Feedback, and good to hear that one Solution at least works, Special Chars can sometimes be a bit tricky and often need some trial and error Attempts...

About the Change you had to make, yep..., I didn't change the "Logic" you had already in your 'if/else' Statement, I only converted "your" Syntax to a Syntax I use myself and that I know would work... and is easy to understand and to reuse/adapt for other Needs...

=> That very same 'EVAL()' can btw probably easily be modified a bit to include the extra '#EANF#' Condition with and extra 'if/else', stg like:

Code: Select all

SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s=='#EANF#'){z=d;} else if(s==h){z=d;} else{z=s;}; z;")
... or:

Code: Select all

SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if((s=='#EANF#')||(s==h)){z=d;} else{z=s;}; z;")
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
jackofalltrades
Posts: 13
Joined: Wed Jan 24, 2018 4:37 pm

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by jackofalltrades » Tue Feb 06, 2018 9:50 pm

Excellent, that further code works perfectly as well, and removes the #EANF# entries.

SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s=='#EANF#'){z=d;} else if(s==h){z=d;} else{z=s;}; z;")

Many thanks again.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Stopping the word [EXTRACT] from appearing in extracted

Post by chivracq » Sat Feb 10, 2018 3:08 am

jackofalltrades wrote:Excellent, that further code works perfectly as well, and removes the #EANF# entries.

SET BR_LF <BR><LF>
SET !VAR2 EVAL("var s='{{!EXTRACT}}', h='{{!VAR1}}', d='{{BR_LF}}'; var z; if(s=='#EANF#'){z=d;} else if(s==h){z=d;} else{z=s;}; z;")

Many thanks again.
Yeah, was of course the Purpose...! :wink:

Still waiting for your (new) Thread about "avoiding" existing Elements in '!EXTRACT' btw... :(
Or did you find out by yourself how to handle that...? :D
The "Subject" is quite interesting btw, and even if you found a Solution by yourself, it would still be useful to open a Thread to share it, and there are several Solutions ah-ah...!, so I'm a bit "curious" which one you found, and how you implemented it of course...! :wink:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Post Reply