Extracting complicated text

Discussions and Tech Support related to website data extraction, screen scraping and data mining using iMacros.
Forum rules
iMacros EOL - Attention!

The renewal maintenance has officially ended for Progress iMacros effective November 20, 2023 and all versions of iMacros are now considered EOL (End-of-Life). The iMacros products will no longer be supported by Progress (aside from customer license issues), and these forums will also no longer be moderated from the Progress side.

Thank you again for your business and support.

Sincerely,
The Progress Team

Before asking a question or reporting an issue:
1. Please review the list of FAQ's.
2. Use the search box (at the top of each forum page) to see if a similar problem or question has already been addressed.
3. Try searching the iMacros Wiki - it contains the complete iMacros reference as well as plenty of samples and tutorials.
4. We can respond much faster to your posts if you include the following information: CLICK HERE FOR IMPORTANT INFORMATION TO INCLUDE IN YOUR POST
Post Reply
Eilip999
Posts: 6
Joined: Fri Nov 30, 2018 4:44 pm

Extracting complicated text

Post by Eilip999 » Fri Nov 30, 2018 5:51 pm

Imacros browser V8.3 trial
Windows 10
Hey I am trying to make simple script to solve internet test for me automatically.
I made script that works well with simple text but simetimes text is confusing. Text on both sites look the same.
There is my script

Code: Select all

VERSION BUILD=8032216
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.odpowiedzi.c0.pl/algorytmika_programowanie/jak_wnioskuja_maszyny.html
TAB OPEN
TAG POS=1 TYPE=H4 ATTR=* EXTRACT=TXT
SET VAR1 {{!extract}}
SET !EXTRACT NULL
TAG POS=188 TYPE=P ATTR=* EXTRACT=TXT
SET VAR2 {{!extract}}
TAB T=2
URL GOTO=https://it-szkola.edu.pl/kkurs,kurs,59,test
TAG POS=1 TYPE=TH ATTR=TXT:*{{VAR1}}*
TAG POS=R1 TYPE=LABEL FORM=ID:testForm ATTR=TXT:{{VAR2}}
It copies question and correct answer then open site with test search for question then choose correct answer.

Site with answers code:

Code: Select all

        <h4>Question?</h4>
        <p>answer</p>
        <br>

Site with test:

Code: Select all

<div class="testPyt  ">
                                <table>
                                    <tr>
                                        <th colspan="2">
                                            <span class="testPytLiczFull"><span class="testPytLicz">3</span>. </span>
                                             
                                            Question?<br />                                     </th>
                                    </tr>
                                                                            <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830845]" id="idodp_828949_830845" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830845">ANSWER1</label></td>
                                        </tr>
                                                                                <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830846]" id="idodp_828949_830846" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830846">ANSWER2</label></td>
                                        </tr>
                                                                                <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830844]" id="idodp_828949_830844" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830844">ANSWER3</label></td>
                                        </tr>
                                        
                                </table>

                                <br /><br />
                            </div>
When the question is one sentence it works perfectly, but sometimes it looks like this.

Site with answers:

Code: Select all

<h4>Question about script below:<br>

<script type="text/javascript"><br>
  z = 1;<br>
  function a()<br>
  {<br>
    var z = 2;<br>
  }<br>
  a();<br>
  alert(z)<br>
</script></h4>
        <p>Answer</p>
Test site code:

Code: Select all

<th colspan="2">
                                            <span class="testPytLiczFull"><span class="testPytLicz">10</span>. </span>
                                             
                                            Co będzie wynikiem wykonania następującego skryptu:<br /><br />
  <script type="text/javascript"><br />
  &nbsp;&nbsp;z = 1;<br />
  &nbsp;&nbsp;function a()<br />
  &nbsp;&nbsp;{<br />
  &nbsp;&nbsp;&nbsp;&nbsp;var z = 2;<br />
  &nbsp;&nbsp;}<br />
  &nbsp;&nbsp;a();<br />
  &nbsp;&nbsp;alert(z)<br />
  </script>                                        </th>
When i run code i get error that it can't find matching element.

Code: Select all

Error -1300: Cannot find HTML element of type "TH:" with attribute(s) "TXT:*Co będzie wynikiem wykonania następującego skryptu: <script type="text/javascript">z = Array();z.push("ABC");z.push("D");z.push("E");alert(z[1])</script>*".. Line 13: TAG POS=1 TYPE=TH ATTR=TXT:*{{VAR1}}*
Image

Text on both sites look the same.
Is there any universal solution that work with complicated text without difficult javascript?
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Fri Nov 30, 2018 5:55 pm

Eilip999 wrote:

Code: Select all

Imacros browser V8.3 trial
Windows 10
Hey I am trying to make simple script to solve internet test for me automatically.
I made script that works well with simple text but simetimes text is confusing. Text on both sites look the same.
There is my script

Code: Select all

VERSION BUILD=8032216
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.odpowiedzi.c0.pl/algorytmika_programowanie/jak_wnioskuja_maszyny.html
TAB OPEN
TAG POS=1 TYPE=H4 ATTR=* EXTRACT=TXT
SET VAR1 {{!extract}}
SET !EXTRACT NULL
TAG POS=188 TYPE=P ATTR=* EXTRACT=TXT
SET VAR2 {{!extract}}
TAB T=2
URL GOTO=https://it-szkola.edu.pl/kkurs,kurs,59,test
TAG POS=1 TYPE=TH ATTR=TXT:*{{VAR1}}*
TAG POS=R1 TYPE=LABEL FORM=ID:testForm ATTR=TXT:{{VAR2}}
It copies question and correct answer then open site with test search for question then choose correct answer.

Site with answers code:

Code: Select all

        <h4>Question?</h4>
        <p>answer</p>
        <br>

Site with test:

Code: Select all

<div class="testPyt  ">
                                <table>
                                    <tr>
                                        <th colspan="2">
                                            <span class="testPytLiczFull"><span class="testPytLicz">3</span>. </span>
                                             
                                            Question?<br />                                     </th>
                                    </tr>
                                                                            <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830845]" id="idodp_828949_830845" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830845">ANSWER1</label></td>
                                        </tr>
                                                                                <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830846]" id="idodp_828949_830846" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830846">ANSWER2</label></td>
                                        </tr>
                                                                                <tr>
                                            <td class="testCheck"><input type="checkbox" name="idodp[830844]" id="idodp_828949_830844" value="1" /></td>
                                            <td class="testQuest"><label for="idodp_828949_830844">ANSWER3</label></td>
                                        </tr>
                                        
                                </table>

                                <br /><br />
                            </div>
When the question is one sentence it works perfectly, but sometimes it looks like this.

Site with answers:

Code: Select all

<h4>Question about script below:<br>

<script type="text/javascript"><br>
  z = 1;<br>
  function a()<br>
  {<br>
    var z = 2;<br>
  }<br>
  a();<br>
  alert(z)<br>
</script></h4>
        <p>Answer</p>
Test site code:

Code: Select all

<th colspan="2">
                                            <span class="testPytLiczFull"><span class="testPytLicz">10</span>. </span>
                                             
                                            Co będzie wynikiem wykonania następującego skryptu:<br /><br />
  <script type="text/javascript"><br />
  &nbsp;&nbsp;z = 1;<br />
  &nbsp;&nbsp;function a()<br />
  &nbsp;&nbsp;{<br />
  &nbsp;&nbsp;&nbsp;&nbsp;var z = 2;<br />
  &nbsp;&nbsp;}<br />
  &nbsp;&nbsp;a();<br />
  &nbsp;&nbsp;alert(z)<br />
  </script>                                        </th>
When i run code i get error that it can't find matching element.

Code: Select all

Error -1300: Cannot find HTML element of type "TH:" with attribute(s) "TXT:*Co będzie wynikiem wykonania następującego skryptu: <script type="text/javascript">z = Array();z.push("ABC");z.push("D");z.push("E");alert(z[1])</script>*".. Line 13: TAG POS=1 TYPE=TH ATTR=TXT:*{{VAR1}}*
Image

Text on both sites look the same.
Is there any universal solution that work with complicated text without difficult javascript?
OK, was waiting for you to post your Post to approve it, long one, have to go, will have a look when I'm back...

Hum, iMB v8.3 => 'Trial'...!, hum, sure about that...?, that Version must be 8 years old, I'm not even sure it works on Win10...
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Eilip999
Posts: 6
Joined: Fri Nov 30, 2018 4:44 pm

Re: Extracting complicated text

Post by Eilip999 » Fri Nov 30, 2018 6:01 pm

I use old version of imacros browser because new versions of addon for mozilla forefox are restricted to 50 lines of code. 8.9.7 version doen't restrict so i use older versions of imacros browser to code :D I can download latest version of trial, but i want it to be compatible with 8.9.7 firefox addon
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Sat Dec 01, 2018 12:49 am

Eilip999 wrote:I use old version of imacros browser because new versions of addon for mozilla forefox are restricted to 50 lines of code. 8.9.7 version doen't restrict so i use older versions of imacros browser to code :D I can download latest version of trial, but i want it to be compatible with 8.9.7 firefox addon
OK, I'm back, I still haven't read your Post to be honest, because your Thread Title is a bit (completely) vague, this complete-whole Sub-Forum is about "Extracting", when you extract stg with iMacros, it's always about "text", and "complicated" is a completely subjective Term which doesn't mean much, see for example this recent Thread about "complicated-complicated-very-very-hard-question"...

(What is "complicated" for you will probably take me a few Seconds to solve, once I will have found the "Motivation" to read your Post..., which is nearly perfect btw, except the Thread Title that I find "vague" as I don't have any Idea from the Title about what your Pb might be about, but you've put a lot of nice Effort in it with several Blocks of Info...)

Your FCI is still a bit of "a mystery" to me, yeah the last Versions for the 3 Add-ons for IE/CR/FF are now limited to 50 Lines, but iMB is not, even Trial, then why use v8.3 and not v12.5...? If you are using a Trial Version, then use the last one, I would think...
This Sentence doesn't make sense: "8.9.7 version doen't restrict so i use older versions of imacros browser to code..."
=> Yeah well, then simply use v8.9.7 which is a Version for FF (until FF56), and has nothing to do with iMB, unless you are also using the Scripting Interface, which doesn't seem to be the case or you should have opened your Thread in the 'Scripting Interface' Sub-Forum...

But OK, maybe I'm acting a bit like a "Jerk", I post this one and I read your Post, ah-ah...! :wink:

But hum, I might reply once, but you'll need to "improve" your Thread Title to make it a bit more Descriptive anyway... 8)
Last edited by chivracq on Sat Dec 01, 2018 11:57 am, edited 2 times in total.
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Sat Dec 01, 2018 1:06 am

Hum, OK, read your OP..., but hum, it's a bit about "cheating" on some Qt & Answer Game/Competition Site, then, oops, sorry, but I don't help for that... :shock:
Nice to want to automate your "Playing" and Scoring to try to be more "clever" than your Competitors, but you need to do that on your own, or everybody will soon use the same Script, and that Game/Competition will only become a Bot vs Bot Competition which is probably not the Intention of the Site Owner... :cry:
And if it is, can you ask if they can make some English Version...?, and I'll join in the Game...! :twisted:

I actually acquired 90% of my Knowledge of iMacros from also playing that kind of Games, but tja..., I never asked anybody to help me with any of my Scripts, or the "Competition" could/would have used "my" Scripts... :shock: , and you see, a few years later, I'm now able to answer maybe 90% of all Threads posted on the Forum..., so you are on the right "Path", ah-ah...! 8)

I'm the most "active" Helper on the Forum, if you are a bit "lucky" some other Advanced User(s) might still want to help you, I would be a bit "surprised" though to be honest, you'll see... Your OP is actually "perfect" for me, I prefer "too much" Info (that I can filter easily, I'm used to work with very complex Systems), but most (Advanced) Users prefer some "simple" Qt they can understand within less than 30 sec...
You could try SOF maybe (Stackoverflow), if you don't get any further Replies here, but hum..., try to "simplify" and to summarize your "real" Qt a lot for that Forum... :idea:
But good luck anyway... :wink:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Sat Dec 01, 2018 1:14 pm

Hum, I had a look at your Site(s) anyway, I can only access the Question Site, the other one is behind L&P...

Well, you could add a little bit of "Intelligence" to your Script by shortening a bit the Extract on the Qt, stg like:

Code: Select all

VERSION BUILD=8032216
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=http://www.odpowiedzi.c0.pl/algorytmika_programowanie/jak_wnioskuja_maszyny.html
SET KW1 " skryptu:"
SET KW2 "Keyword(s) Nb2..."
SET KW3 "... etc..."

TAG POS=1 TYPE=H4 ATTR=* EXTRACT=TXT
'SET VAR1 {{!extract}}
SET !VAR1 EVAL("var s='{{!EXTRACT}}', kw1='{{KW1}}', kw2='{{KW2}}', kw3='{{KW3}}'; var z=s.split(kw1)[0].split(kw2)[0].split(kw3)[0]; z;")
PROMPT EXTRACT:<BR>_{{!EXTRACT}}_<BR><BR>Short<SP>EXTRACT:<BR>_!VAR1}}_
'>
SET !EXTRACT NULL
'TAG POS=188 TYPE=P ATTR=* EXTRACT=TXT
TAG POS=R1 TYPE=P ATTR=* EXTRACT=TXT
SET !VAR2 {{!extract}}

TAB OPEN
TAB T=2
URL GOTO=https://it-szkola.edu.pl/kkurs,kurs,59,test
TAG POS=1 TYPE=TH ATTR=TXT:*{{!VAR1}}*
TAG POS=R1 TYPE=LABEL FORM=ID:testForm ATTR=TXT:{{!VAR2}}
(Not tested...)

I declared the Keyword(s) outside of the 'EVAL()' at the beginning of your Script for "Easy Access" and in case you want to add some extra Keywords...

And you could also add a Conditional 'PROMPT' or Sound or Pause (+ Conditional 'WAIT' to have the time to manually pause the Script, if combined with 'PROMPT' or Sound) by first checking if the 'TH' Element is (not) found by iMacros on the Answer Site...

As the 'split()' shortens your Extract from the Answer Site, you may run into the Case that there are 2 different Qt's both starting with "Co będzie wynikiem wykonania następującego skryptu:", but the Script after that will be different, => you might also add on a Check if by any chance there is not some:

Code: Select all

TAG POS=2 TYPE=TH ATTR=TXT:*{{VAR1}}*
And all the "Conditional + whatever" I have mentioned are covered by other Threads on the Forum if you search it a bit... :idea:

But hum, you should be able to find all those "Tricks" and Techniques yourself if you are studying Maths and Programming from what I understand from this Answer Site, I have the "Feeling" I'm helping you a bit to "cheat the System", but you put a lot of Effort in your OP, so... OK, fair enough..., and enjoy...! :wink:
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Eilip999
Posts: 6
Joined: Fri Nov 30, 2018 4:44 pm

Re: Extracting complicated text

Post by Eilip999 » Tue Dec 04, 2018 12:32 pm

Thx for help. I will try your code out. My IT teatcher requires to copy and paste these answers to rise my Scholl in ranking. I just want to make scrpit that will solve it for me.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Thu Dec 06, 2018 10:37 pm

Eilip999 wrote:Thx for help. I will try your code out. My IT teatcher requires to copy and paste these answers to rise my Scholl in ranking. I just want to make scrpit that will solve it for me.
OK, and...?, any Results/Feedback/Follow-up...? :?:
(You don't need 36h to "try" my Code, done in a few minutes I would think..., already took 2 or 3 days for the first Follow-up, you need to "speed up the Process" a little bit or I'll lose interest in your Thread, I was not very-very motivated to answer it in the first place... :shock: )

>>>

Your last Post is a bit full of Typos and grammatical Mistakes btw, Communication & Documentation Language about Programming is English, your Teacher could "better" translate those 2 Sites into English and make sure his/her Students master English "a bit properly", that would raise/increase the Ranking of your School a bit automatically and probably much quicker than some C&P from one Site to another one... :idea:
Most Prog Languages don't like Typos at all, all you get is then some "Syntax Error"...! :idea:
["to rise" is Intransitive btw, "stg can rise", but "you cannot rise stg"..., but you can raise/increase/augment/improve stg..., those Verbs are Transitive... (And English is not my Native Language either...) :wink: ]
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Eilip999
Posts: 6
Joined: Fri Nov 30, 2018 4:44 pm

Re: Extracting complicated text

Post by Eilip999 » Tue Jan 08, 2019 12:08 am

I haven't tried it yet, I totally forgot about this script. I am am going to test in this week. I AM going also to make script for online chats od it is possible. I learned imacros by creatin' automatic Bitcoin claimer, but i miss a lot od JavaScript. I am native english speaker. 😀 I will post results if it works. It will be harder for me as long as i don't have trial anymore. It thre any free version od imacros browser? I have only free Firefox plugin.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Tue Jan 08, 2019 12:58 am

Eilip999 wrote:
Tue Jan 08, 2019 12:08 am
I haven't tried it yet, I totally forgot about this script. I am am going to test in this week. I AM going also to make script for online chats od it is possible. I learned imacros by creatin' automatic Bitcoin claimer, but i miss a lot od JavaScript. I am native english speaker. 😀 I will post results if it works. It will be harder for me as long as i don't have trial anymore. It thre any free version od imacros browser? I have only free Firefox plugin.
:|
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Eilip999
Posts: 6
Joined: Fri Nov 30, 2018 4:44 pm

Re: Extracting complicated text

Post by Eilip999 » Wed Mar 06, 2019 7:06 pm

I found better software for this script. Browser automation studio. Making universal script is almost impossible with imacros. I had been struggling with comparing text in imacros for 2 weeks then i gave up. In Browser automation studio I created in less than 3 hours universal script that solves every test even with strange syntax.
chivracq
Posts: 10301
Joined: Sat Apr 13, 2013 1:07 pm
Location: Amsterdam (NL)

Re: Extracting complicated text

Post by chivracq » Wed Mar 06, 2019 10:26 pm

Eilip999 wrote:
Wed Mar 06, 2019 7:06 pm
I found better software for this script. Browser automation studio. Making universal script is almost impossible with imacros. I had been struggling with comparing text in imacros for 2 weeks then i gave up. In Browser automation studio I created in less than 3 hours universal script that solves every test even with strange syntax.
OK, fair enough, and Thanks for the Heads up, 2 months later, ah-ah...! :wink:

I had never heard of this BAS 'BrowserAutomationStudio', I'll have to check it and give it a try..., I don't always find myself iMacros best suited for all Web-Automation Tasks, all Tools have their "strong points" for different Scenarios, I guess... I try to "stick" myself to iMacros because I have been using it for about 10 years, so I always find a way to implement what I need/want with it, but I can understand that other Users might prefer some other Tool(s)... :|

But hum, very first impression is not very "good", first Content I see on their Web-Site is in approx English, they can't even spell "Standard" correctly, so that always already makes me a bit "suspicious" about the Quality of the Software, oops... :shock:

Hum, OK, Scripts need to be compiled as '.EXE' Files, hum OK..., not very easy to distribute Scripts then I would think, or if need to run many different Scripts during the day, or even at the same time, I prefer Scripts accessible from within my Browser... But that could be handy for Users who want to start their Script from their Task Scheduler though..., this is not easy to do anymore with iMacros v10.0.x for CR & FF now...
Not clear to me if you can use those BAS Scripts with any Browser like with iMacros and the Add-ons for IE/FF/CR or if they use their own Browser like iMB (the iMacros Browser)...
Oh hum, and the Download for the BAS Setup is 170Mb large, fouff...!, that's a bit "heavy", compared to the 1Mb for the iMacros Add-on for FF/CR that can be installed in no time...

But OK, I'll give it a try anyway... Thanks for sharing... 8)
- (F)CI(M) = (Full) Config Info (Missing): iMacros + Browser + OS (+ all 3 Versions + 'Free'/'PE'/'Trial').
- FCI not mentioned: I don't even read the Qt...! (or only to catch Spam!)
- Script & URL help a lot for more "educated" Help...
Post Reply